<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Homepage Hannes Schulz</title>
 <link href="http://www.ais.uni-bonn.de/~schulz/atom.xml" rel="self"/>
 <link href="http://www.ais.uni-bonn.de/~schulz"/>
 <updated>2012-02-02T01:02:43+01:00</updated>
 <id></id>

 
 <author>
   <name>Hannes Schulz</name>
   <email>schulz at ais dot uni-bonn dot de</email>
 </author>
 
 
 
 <entry>
   <title>GPU convolutions for neural networks</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2012/01/19/gpu-convolutions-for-neural-networks.html"/>
   <updated>2012-01-19T00:00:00+01:00</updated>
   <id>/2012/01/19/gpu-convolutions-for-neural-networks</id>
   <content type="html">&lt;p&gt;
With all the popularity of deep learning, many researchers in the
field might wonder which framework is &quot;right&quot; to implement their
experiments. For plain neural networks, the main &quot;work horse&quot; is the
matrix multiplication, which can be accelerated a lot using graphics
processing units (GPU). For convolutional architectures, the matrix
multiplication is typically &quot;replaced&quot; by a convolution, and we would
also like to see them being fast(er) on GPU. 
&lt;/p&gt;
&lt;p&gt;
Neural net convolutions are somewhat special, since there &lt;a href=&quot;http://deeplearning.net/software/theano/library/tensor/nnet/conv.html&quot;&gt;filters are 3D and pool over input layers&lt;/a&gt;. Also, since they are usually applied to
many small &quot;maps&quot; at once, common FFT acceleration techniques do not
apply.
&lt;/p&gt;
&lt;p&gt;
For my own implementations, I compared 3 convolution implementations:
&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;The convolutions that come with Theano (from git, 2011-1-14). This
  implementation is by far the most flexible, as we will see. It is
  based on the formely separate, now theano-integrated CudaNdarray
  library.
&lt;/li&gt;
&lt;li&gt;Alex Krizhevsky, a PhD student in Toronto, wrote &lt;a href=&quot;http://www.cs.toronto.edu/~kriz/&quot;&gt;two publically available convolution routines&lt;/a&gt;. We already integrated the first
  version of his convolutions in CUV.
&lt;/li&gt;
&lt;li&gt;Alex' new convolutions created for the &lt;a href=&quot;http://code.google.com/p/cuda-convnet/&quot;&gt;cuda-convnet&lt;/a&gt; (svn, 2011-1-13)
  which are described as being &quot;several times faster&quot; than the first
  version.
&lt;/li&gt;
&lt;/ul&gt;


&lt;div id=&quot;outline-container-1&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;sec-1&quot;&gt;Constraints &lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-1&quot;&gt;


&lt;p&gt;
The (main) constraints of the three versions are quite different:
&lt;/p&gt;
&lt;table class=&quot;orgmode&quot; border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;hsides&quot;&gt;
&lt;caption&gt;&lt;/caption&gt;
&lt;colgroup&gt;&lt;col class=&quot;left&quot; /&gt;&lt;col class=&quot;left&quot; /&gt;&lt;col class=&quot;left&quot; /&gt;&lt;col class=&quot;left&quot; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Implementation&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Image Size&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Memory-Ordering (row-major)&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Other&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;Theano&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;any&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;(nImages, nChannels, imageH, imageW)&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&amp;ndash;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;Alex old&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;square only&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;(nChannels, nImages, imageH*imageW)&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;nFilters%2==0&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;Alex new&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;square only&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;(nChannels, imageH*imageW, nImages)&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;nFilters%16==0&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;
Regarding squared images, one can argue that in random image
collections the shapes vary, anyway, and for batch processing it is
necessary to square them.
&lt;/p&gt;
&lt;p&gt;
The ordering is tricky. At first sight, Theano's ordering looks most
intuitive. However, all operations which are functions of all channels
of a single pixel are a bit tricky to optimize. Alex' old and new
orderings can both use efficient matrix-row operations for
cross-channel functions. The &quot;Alex old&quot; convolution has the
disadvantage that images in one batch are not in the columns &lt;i&gt;or&lt;/i&gt; the
rows of a matrix, so that final &quot;full&quot; layers (for example in &lt;a href=&quot;http://yann.lecun.com/exdb/lenet/&quot;&gt;LeNet&lt;/a&gt;)
require reordering the matrix. The new convolutions have images in the
columns of a matrix, solving the reordering problem, even though this
ordering looks most un-intuitive.
&lt;/p&gt;
&lt;p&gt;
I should also mention the &quot;sparse&quot; filter option in Alex' code, which
allows to convolve only certain maps with a filter. I'm not going into
detail since Theano does not have this feature and I want to compare
execution times.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-2&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;sec-2&quot;&gt;Speed &lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-2&quot;&gt;


&lt;p&gt;
In the following table, all operations were computed 10 times and the
(wall clock) times averaged. For Theano, I varied the 'version'
parameter, but found that the auto-selection (-1) selects the best
algorithm. I used a GTX480 and in an Intel Xeon X5650 (2.67 GHz).
&lt;/p&gt;
&lt;table  class=&quot;orgmode&quot; border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;border&quot;&gt;
&lt;caption&gt;Execution speed of convolution packages&lt;/caption&gt;
&lt;colgroup&gt;&lt;col class=&quot;left&quot; /&gt;&lt;col class=&quot;left&quot; /&gt;&lt;col class=&quot;left&quot; /&gt;&lt;col class=&quot;left&quot; /&gt;&lt;col class=&quot;right&quot; /&gt;&lt;col class=&quot;left&quot; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Version&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Image Size&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Filter Size&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Type&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;right&quot;&gt;Time (ms)&lt;/th&gt;&lt;th scope=&quot;col&quot; class=&quot;left&quot;&gt;Comment&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;Naive CPU&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,176,176&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,7,7&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;fwd&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;34200&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dimg&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;26800&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dflt&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;n/a&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;Alex new&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,176,176&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,7,7&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;fwd&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;75&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dimg&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;90&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dflt&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;55&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;trn&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;0.3&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;transposing all input batch&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;total&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;220.3&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;Alex old&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,176,176&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,7,7&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;fwd&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;101&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dimg&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;240&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;plus error padding (3 ms)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dflt&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;115&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;plus summing over batch (.8 ms)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;total&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;459&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;Theano&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,176,176&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;32,8,7,7&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;fwd&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;268&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dimg&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;451&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;dflt&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;281&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;total&lt;/td&gt;&lt;td class=&quot;right&quot;&gt;1000&lt;/td&gt;&lt;td class=&quot;left&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;
&lt;b&gt;Key&lt;/b&gt;: 
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;Image Size&lt;/dt&gt;&lt;dd&gt;batch size, number of input maps, height, width
&lt;/dd&gt;
&lt;dt&gt;Filter Size&lt;/dt&gt;&lt;dd&gt;number of output maps, number of input maps, height, width
&lt;/dd&gt;
&lt;dt&gt;Type&lt;/dt&gt;&lt;dd&gt;&lt;i&gt;fwd&lt;/i&gt; is the &quot;forward pass&quot; convolution, &lt;i&gt;dimg&lt;/i&gt; is the
          derivative w.r.t. the inputs and &lt;i&gt;dflt&lt;/i&gt; is the derivative w.r.t. the
          filters.
&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;
&lt;b&gt;Discussion&lt;/b&gt;: I was quite surprised to see Theano is comparably slow.
It seems that Alex' new convolutions are indeed faster, albeit not
several times (for the tested case) (&lt;b&gt;Update&lt;/b&gt;: With patches for small
batch sizes kindly provided by Alex, speed nearly doubled!). The
overhead of a transpose (to comply with the &quot;weird&quot; memory layout) is
negligible compared to the overall advantages.  All GPU
implementations significantly outperform a naive CPU version (just
many nested for-loops).  Note however that theano is able to generate
code for efficient CPU convolutions.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;Combinations&lt;/b&gt;: Theano is quite flexible, but &quot;Alex new&quot; is 
&lt;i&gt;fast&lt;/i&gt;. How do we get the best of two worlds? It is interesting to
note that the memory layouts of both convolutions are transposed to
each other, and that for just 0.3 ms (in the above setting), we can
get from one to the other. So we &lt;i&gt;can&lt;/i&gt; get speed &lt;i&gt;or&lt;/i&gt; flexibility at
wish.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-3&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;sec-3&quot;&gt;Maintenance concerns &lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-3&quot;&gt;


&lt;p&gt;
Both implementations are not particularly very well documented, but
well tested.  At least for CudaNdarray, there is a &lt;a href=&quot;https://github.com/inducer/compyte&quot;&gt;successor on the way&lt;/a&gt;. It seems to me that optimized code at this level is mostly
&lt;a href=&quot;http://en.wikipedia.org/wiki/Write-only_language&quot;&gt;write-only&lt;/a&gt; anyway.
&lt;/p&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;
</content>



 </entry>
 
 <entry>
   <title>Easy Parallelization with C++0X lambda functions, Thread Building Blocks</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2011/03/25/c%2B%2B0x-lambda-functions-and-gcc-4.5.html"/>
   <updated>2011-03-25T00:00:00+01:00</updated>
   <id>/2011/03/25/c++0x-lambda-functions-and-gcc-4.5</id>
   <content type="html">&lt;div id=&quot;outline-container-1&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-1&quot;&gt;Lambda functors in C++0X &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-1&quot;&gt;


&lt;p&gt;
The relatively new &lt;a href=&quot;http://gcc.gnu.org/gcc-4.5/&quot;&gt;gcc-4.5&lt;/a&gt; release supports lambda expressions, 
which &amp;ndash; in contrast to &lt;a href=&quot;http://www.boost.org/doc/libs/1_46_0/doc/html/lambda.html&quot;&gt;boost.lambda&lt;/a&gt;, &lt;a href=&quot;http://www.boost.org/doc/libs/1_46_1/libs/bind/bind.html&quot;&gt;boost.bind&lt;/a&gt; and the like &amp;ndash; provide
easy capturing of variables in context in the lambda expression.
This finally makes the &lt;a href=&quot;http://www.cplusplus.com/reference/algorithm&quot;&gt;STL algorithms&lt;/a&gt; usable, such as
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;struct&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;add_val&lt;/span&gt;{
  &lt;span style=&quot;color: #0000ff;&quot;&gt;add_val&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;d&lt;/span&gt;):val(d){}
  &lt;span style=&quot;color: #228b22;&quot;&gt;void&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;operator&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;()&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;d&lt;/span&gt;){
    d+=val;
  }
  &lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;val&lt;/span&gt;;
};


&lt;span style=&quot;color: #228b22;&quot;&gt;int&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;main&lt;/span&gt;(){
  &lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;vector&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;&amp;gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;v&lt;/span&gt;;
  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;... fill vector
&lt;/span&gt;
  &lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;inc&lt;/span&gt; = 3.0;

  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;the old way of doing things
&lt;/span&gt;  &lt;span style=&quot;color: #228b22;&quot;&gt;add_val&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;av&lt;/span&gt;(inc);
  &lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;::for_each(v.begin(),v.end(),av);

  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;using a boost.lambda function
&lt;/span&gt;  &lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;::for_each(v.begin(),v.end(), _1+=inc);

  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;using a c++0x lambda function
&lt;/span&gt;  &lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;::for_each(v.begin(),v.end(), [=](&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;d&lt;/span&gt;){d+=inc;});
}
&lt;/pre&gt;



&lt;p&gt;
The &quot;old way&quot; primarily has inconveniences for the programmer:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
We need to define a &lt;code&gt;struct&lt;/code&gt; outside the scope, even for tiny
functionality.
&lt;/li&gt;
&lt;li&gt;
We need to explicitly capture variables from the scope of the
surrounding code (here: &lt;code&gt;inc&lt;/code&gt;) in the &lt;code&gt;struct&lt;/code&gt;.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Boost.Lambda tried to resolve this problem, in an elegant way I
believe. However, there are still shortcomings of this approach:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
Variables have unintuitive names (&lt;code&gt;_1&lt;/code&gt;, &lt;code&gt;_2&lt;/code&gt;)
&lt;/li&gt;
&lt;li&gt;
The code in the lambda expression is not really a &lt;i&gt;block of   code&lt;/i&gt;. It is an expression, where parts may be evaluated
surprisingly.
&lt;/li&gt;
&lt;li&gt;
Also, since this is an expression, conditionals and loops must be
expressed in an awkward (that is, non-C++) way using &lt;code&gt;if_&lt;/code&gt;, &lt;code&gt;for_&lt;/code&gt;
and so on.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new C++0X lambda syntax gets rid of all these problems. While the
syntax looks a bit strange at first, it is much more readable than
boost.lambda constructs. The block of code is not out of scope, all
names of the surrounding code can be used as copy (&lt;code&gt;[=]&lt;/code&gt;) or
reference (&lt;code&gt;[&amp;amp;]&lt;/code&gt;).
&lt;/p&gt;
&lt;p&gt;
You might wonder, why we do not just use &lt;code&gt;boost.foreach&lt;/code&gt; and get away
with writing
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #7a378b;&quot;&gt;#include&lt;/span&gt; &lt;span style=&quot;color: #8b2252;&quot;&gt;&amp;lt;boost/foreach.hpp&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #7a378b;&quot;&gt;#define&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;foreach&lt;/span&gt; BOOST_FOREACH
&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;...as before...
&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;foreach&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;d&lt;/span&gt;, v){
  d+=inc;
}

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;in c++0x, not yet implemented, this will be
&lt;/span&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;d&lt;/span&gt; : v){
  d+=inc;
}
&lt;/pre&gt;



&lt;p&gt;
&amp;hellip; which is an idiom quite well-known in other languages.  The fun
part is, that we cannot change what &lt;code&gt;for&lt;/code&gt; does, but we can change the
implementation of &lt;code&gt;for_each&lt;/code&gt;. This is one of the things that the &lt;a href=&quot;http://threadingbuildingblocks.org/&quot;&gt;Intel (R) Threading Building Blocks&lt;/a&gt; library does. 
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-2&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-2&quot;&gt;Parallelizing your for-loop by changing two lines &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-2&quot;&gt;


&lt;p&gt;
A nice trick now is to change your (side-effect-free) for-loops like
this:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #7a378b;&quot;&gt;#include&lt;/span&gt; &lt;span style=&quot;color: #8b2252;&quot;&gt;&amp;lt;tbb/parallel_for_each.h&amp;gt;&lt;/span&gt;

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;before
&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;foreach&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;d&lt;/span&gt;, v){
  d+=inc;
}

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;after
&lt;/span&gt;&lt;span style=&quot;color: #008b8b;&quot;&gt;tbb&lt;/span&gt;::parallel_for_each(v.begin(),v.end(),[=](&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;d&lt;/span&gt;){
  d+=inc;
});
&lt;/pre&gt;



&lt;p&gt;
&amp;hellip; and everything in this loop is automatically run in parallel.
&lt;/p&gt;
&lt;p&gt;
Similar things can be done with &lt;a href=&quot;http://openmp.org/wp/&quot;&gt;OpenMP&lt;/a&gt;:
&lt;/p&gt;


&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #228b22;&quot;&gt;int&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;end&lt;/span&gt; = v.size();
&lt;span style=&quot;color: #7a378b;&quot;&gt;#pragma&lt;/span&gt; omp parallel &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;
&lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;int&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;i&lt;/span&gt;=0;i&amp;lt;end;i++){
   v[i]+=inc;
}
&lt;/pre&gt;



&lt;p&gt;
but this is already much more intrusive. Furthermore, the loop index
&lt;i&gt;must&lt;/i&gt; be an &lt;code&gt;int&lt;/code&gt; and the last index &lt;i&gt;must&lt;/i&gt; be known.  While
variables may be passed as copy (using &lt;code&gt;private (inc)&lt;/code&gt;), these
variables are private to the thread, not to the &quot;wrapped&quot; function. Of
course, OpenMP gives you much more fine-grained control over
parallelization as well.
&lt;/p&gt;&lt;/div&gt;
&lt;/div&gt;
</content>



 </entry>
 
 <entry>
   <title>Profiling Boost Multi-Array</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2011/02/09/profiling-boost-multi-array.html"/>
   <updated>2011-02-09T00:00:00+01:00</updated>
   <id>/2011/02/09/profiling-boost-multi-array</id>
   <content type="html">&lt;p&gt;
Worried by this &lt;a href=&quot;http://stackoverflow.com/questions/446866/boostmulti-array-performance-question/446880&quot;&gt;StackOverflow thread&lt;/a&gt;, I did my own experiments on
profiling the access speed in &lt;a href=&quot;http://www.boost.org/doc/libs/1_41_0/libs/multi_array/doc/index.html&quot;&gt;Boost MultiArrays&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
First of all, the thread is correct, &lt;code&gt;BOOST_MA&lt;/code&gt; needs about 3 times as
much time to iterate over the array as Native.
&lt;/p&gt;
&lt;p&gt;
The additional cases I checked are quite interesting, though:
&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;BOOST_IT&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;
only has a 30% overhead over Native. Note that you can
get around the obfuscating pointer types by using the &lt;code&gt;auto&lt;/code&gt; keyword
and &lt;code&gt;-std-gnu++0x&lt;/code&gt; as a compiler argument (works for gcc version &amp;gt;= 4.4).

&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;RAW_POINTER&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;
is actually slower than Native. This is a typical
case of &quot;do not try to be smarter than your compiler. Note that this
is the same algorithm like the one you get when you use &lt;code&gt;std::fill&lt;/code&gt;.

&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;unsigned int&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;
index types are significantly slower (about 2x) in
Native condition than &lt;code&gt;int&lt;/code&gt; index types. This is bad, because
intuitively, one would choose &lt;code&gt;unsigned int&lt;/code&gt; for array indexing.

&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;My code:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #7a378b;&quot;&gt;#include&lt;/span&gt; &lt;span style=&quot;color: #8b2252;&quot;&gt;&amp;lt;boost/date_time/posix_time/posix_time.hpp&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #7a378b;&quot;&gt;#define&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;_SCL_SECURE_NO_WARNINGS&lt;/span&gt;
&lt;span style=&quot;color: #7a378b;&quot;&gt;#define&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;BOOST_DISABLE_ASSERTS&lt;/span&gt; 
&lt;span style=&quot;color: #7a378b;&quot;&gt;#include&lt;/span&gt; &lt;span style=&quot;color: #8b2252;&quot;&gt;&amp;lt;boost/multi_array.hpp&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #a020f0;&quot;&gt;using&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;posix_time&lt;/span&gt;; 

&lt;span style=&quot;color: #228b22;&quot;&gt;int&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;main&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;int&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;argc&lt;/span&gt;, &lt;span style=&quot;color: #228b22;&quot;&gt;char&lt;/span&gt;* &lt;span style=&quot;color: #a0522d;&quot;&gt;argv&lt;/span&gt;[])
{
  &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;int&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt;;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;X_SIZE&lt;/span&gt; = 400;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;Y_SIZE&lt;/span&gt; = 400;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;ITERATIONS&lt;/span&gt; = 5000;

  &lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt; *&lt;span style=&quot;color: #a0522d;&quot;&gt;nativeMatrix&lt;/span&gt; = &lt;span style=&quot;color: #a020f0;&quot;&gt;new&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt; [X_SIZE * Y_SIZE];

  &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;multi_array_ref&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;, 2&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;ImageArrayType&lt;/span&gt;;
  &lt;span style=&quot;color: #228b22;&quot;&gt;ImageArrayType&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;boostMatrix&lt;/span&gt;(nativeMatrix, &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;extents&lt;/span&gt;[X_SIZE][Y_SIZE]);    

  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;Native condition
&lt;/span&gt;  &lt;span style=&quot;color: #228b22;&quot;&gt;ptime&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;startTime&lt;/span&gt; = &lt;span style=&quot;color: #008b8b;&quot;&gt;microsec_clock&lt;/span&gt;::universal_time();
  &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; (&lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;i&lt;/span&gt; = 0; i &amp;lt; ITERATIONS; ++i)
      &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; (&lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;y&lt;/span&gt; = 0; y &amp;lt; Y_SIZE; ++y)
          &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; (&lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;x&lt;/span&gt; = 0; x &amp;lt; X_SIZE; ++x)
              nativeMatrix[x + (y * X_SIZE)] = 2.345;
  &lt;span style=&quot;color: #228b22;&quot;&gt;ptime&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;endTime&lt;/span&gt; = &lt;span style=&quot;color: #008b8b;&quot;&gt;microsec_clock&lt;/span&gt;::universal_time();
  printf(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;[Native]Elapsed time: %6.3f seconds\n&quot;&lt;/span&gt;, time_period(startTime, endTime).length().total_milliseconds()/1000.f);

  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;other conditions   
&lt;/span&gt;  startTime = &lt;span style=&quot;color: #008b8b;&quot;&gt;microsec_clock&lt;/span&gt;::universal_time();
  &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; (&lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;i&lt;/span&gt; = 0; i &amp;lt; ITERATIONS; ++i)
    {
&lt;span style=&quot;color: #7a378b;&quot;&gt;#ifdef&lt;/span&gt; RAW_POINTER
      &lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;* &lt;span style=&quot;color: #a0522d;&quot;&gt;end&lt;/span&gt; = boostMatrix.data() + X_SIZE*Y_SIZE;
      &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;* &lt;span style=&quot;color: #a0522d;&quot;&gt;begin&lt;/span&gt;=boostMatrix.data(); begin!=end; ++begin)
        *begin = 2.345;
&lt;span style=&quot;color: #7a378b;&quot;&gt;#elif&lt;/span&gt; &lt;span style=&quot;color: #7a378b;&quot;&gt;defined&lt;/span&gt;(BOOST_IT)
      &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;auto&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;it&lt;/span&gt;=boostMatrix.begin(); it!= boostMatrix.end(); ++it)
          &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;auto&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;it2&lt;/span&gt;=(*it).begin(); it1!=(*it).end(); ++it2)
              *it2 = 2.345;
&lt;span style=&quot;color: #7a378b;&quot;&gt;#elif&lt;/span&gt; &lt;span style=&quot;color: #7a378b;&quot;&gt;defined&lt;/span&gt;(BOOST_MA)
      &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; (&lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;y&lt;/span&gt; = 0; y &amp;lt; Y_SIZE; ++y)
         &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; (&lt;span style=&quot;color: #228b22;&quot;&gt;idx&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;x&lt;/span&gt; = 0; x &amp;lt; X_SIZE; ++x)
             boostMatrix[y][x] = 2.345;
&lt;span style=&quot;color: #7a378b;&quot;&gt;#endif&lt;/span&gt;
    }
  endTime = &lt;span style=&quot;color: #008b8b;&quot;&gt;microsec_clock&lt;/span&gt;::universal_time();
  printf(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;[Boost] Elapsed time: %6.3f seconds\n&quot;&lt;/span&gt;, time_period(startTime,endTime).length().total_milliseconds()/1000.f );

  &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; 0;
}

&lt;/pre&gt;



&lt;p&gt;
All compiled and executed using
&lt;/p&gt;


&lt;pre class=&quot;src src-bash&quot;&gt;g++-4.4 -O3 -g0 -DNDEBUG -march=native -mtune=native --fast-math -std=gnu++0x % &amp;amp;&amp;amp; ./a.out
&lt;/pre&gt;


</content>



 </entry>
 
 <entry>
   <title>Parallelization in Python using Weave and OpenMP</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2011/02/08/parallelization-with-weave-and-openmp.html"/>
   <updated>2011-02-08T00:00:00+01:00</updated>
   <id>/2011/02/08/parallelization-with-weave-and-openmp</id>
   <content type="html">&lt;p&gt;
If you're using python a lot, you probably stumbled across &lt;a href=&quot;http://www.scipy.org/PerformancePython&quot;&gt;the performance python howto&lt;/a&gt; at some point. Since inner loops are so
expensive in python, it makes sense to move them to C++, and
PerformancePython shows how to do that. When standard libraries like
numpy and weave.blitz are not expressive enough, your best bet becomes
&lt;a href=&quot;http://www.scipy.org/Weave&quot;&gt;SciPy Weave&lt;/a&gt;. In weave you write your C++ code in a string and access
your numpy arrays via the blitz converters. The code is compiled when
it has changed and can make use of arbitrary libraries as we shall see.
&lt;/p&gt;

&lt;div id=&quot;outline-container-1&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-1&quot;&gt;Parallelizing Weave-Code: Per-Pixel eigenvalues &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-1&quot;&gt;


&lt;p&gt;
Lets suppose we have a 3D image $D$, with the second order derivatives
in $x$, $y$ and $z$ direction given by \(Dxx, Dxy, Dyy, \ldots\). We
want to calculate the eigenvalues at each pixel. For efficiency, we
use &lt;a href=&quot;http://www.netlib.org/lapack/&quot;&gt;LAPACK&lt;/a&gt; and for clarity, we use &lt;a href=&quot;http://www.boost.org/doc/libs/1_45_0/libs/numeric/ublas/doc/index.htm&quot;&gt;Boost uBLAS&lt;/a&gt;. Lets first write down
the C++ code:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;ublas&lt;/span&gt; = &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;numeric&lt;/span&gt;::ublas;
&lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;lapack&lt;/span&gt; = &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;numeric&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;bindings&lt;/span&gt;::lapack;
&lt;span style=&quot;color: #a020f0;&quot;&gt;using&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;ublas&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;bounded_matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;,3,3,&lt;span style=&quot;color: #008b8b;&quot;&gt;ublas&lt;/span&gt;::column_major&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;mat&lt;/span&gt;;
&lt;span style=&quot;color: #008b8b;&quot;&gt;ublas&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;bounded_vector&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;,3&amp;gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;lambda&lt;/span&gt;;
&lt;span style=&quot;color: #008b8b;&quot;&gt;ublas&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;bounded_vector&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;,34*3&amp;gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;work&lt;/span&gt;; &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;the 34*width is from syev.hpp
&lt;/span&gt;
&lt;span style=&quot;color: #228b22;&quot;&gt;mat&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;A&lt;/span&gt;;
&lt;span style=&quot;color: #228b22;&quot;&gt;int&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;i&lt;/span&gt;,&lt;span style=&quot;color: #a0522d;&quot;&gt;j&lt;/span&gt;,&lt;span style=&quot;color: #a0522d;&quot;&gt;k&lt;/span&gt;,&lt;span style=&quot;color: #a0522d;&quot;&gt;idx&lt;/span&gt;;
&lt;span style=&quot;color: #7a378b;&quot;&gt;#pragma&lt;/span&gt; omp parallel &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;private&lt;/span&gt;(i,j,k,A,idx,lambda,work)
&lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(i=0;i&amp;lt;nx;i++){
  &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(j=0;j&amp;lt;ny;j++){
    &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt;(k=0;k&amp;lt;nz;k++){
      A(0,0) = Dxx(i,j,k);  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;fill upper triangle of A
&lt;/span&gt;      A(1,1) = Dyy(i,j,k);
      A(2,2) = Dzz(i,j,k);
      A(0,1) = Dxy(i,j,k);
      A(0,2) = Dxz(i,j,k);
      A(1,2) = Dyz(i,j,k);

      &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;determine eigenvalues (N) of upper (U) triangle of A in lambda
&lt;/span&gt;      &lt;span style=&quot;color: #008b8b;&quot;&gt;lapack&lt;/span&gt;::syev(&lt;span style=&quot;color: #8b2252;&quot;&gt;'N'&lt;/span&gt;,&lt;span style=&quot;color: #8b2252;&quot;&gt;'U'&lt;/span&gt;,A,lambda,&lt;span style=&quot;color: #008b8b;&quot;&gt;lapack&lt;/span&gt;::workspace(work));
      lambda1(i,j,k) = lambda(0);  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;eigenvalues in ascending order
&lt;/span&gt;      lambda2(i,j,k) = lambda(1);
      lambda3(i,j,k) = lambda(2);
    }
  }
 }  
&lt;/pre&gt;



&lt;p&gt;
Note that we use &lt;a href=&quot;http://openmp.org/wp/&quot;&gt;OpenMP&lt;/a&gt; here to parallelize the outmost for-loop using
a &lt;code&gt;#pragma&lt;/code&gt; directive. I took me a while to figure out that one needs
to explicitly name all variables which must be private to a thread in
this for-loop, otherwise you will not notice any speedups and/or weird
program behavior. I suppose further restrictions are that the size of
the variables needs to be known at compile time (here I use
&lt;code&gt;bounded_vector&lt;/code&gt; for that purpose). Finally, the loop index must be of
type &lt;code&gt;int&lt;/code&gt; for OpenMP.
&lt;/p&gt;
&lt;p&gt;
That is only the body of a function. We save this in a python string
called &lt;code&gt;codestring&lt;/code&gt; and execute it as follows:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;from&lt;/span&gt; scipy.weave &lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; inline, converters
variables = &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;nx ny nz lambda1 lambda2 lambda3 Dxx Dyy Dzz Dxy Dxz Dyz&quot;&lt;/span&gt;.split()
inline(codestring,
       variables,
       extra_compile_args =[&lt;span style=&quot;color: #8b2252;&quot;&gt;'-O3 -fopenmp'&lt;/span&gt;],
       extra_link_args=[&lt;span style=&quot;color: #8b2252;&quot;&gt;'-lgomp'&lt;/span&gt;],
       headers=[
           &lt;span style=&quot;color: #8b2252;&quot;&gt;'&amp;lt;boost/numeric/bindings/lapack/syev.hpp&amp;gt;'&lt;/span&gt;,
           &lt;span style=&quot;color: #8b2252;&quot;&gt;'&amp;lt;boost/numeric/bindings/traits/ublas_matrix.hpp&amp;gt; '&lt;/span&gt;,
           &lt;span style=&quot;color: #8b2252;&quot;&gt;'&amp;lt;boost/numeric/bindings/traits/ublas_vector.hpp&amp;gt; '&lt;/span&gt;,
           &lt;span style=&quot;color: #8b2252;&quot;&gt;'&amp;lt;boost/numeric/ublas/matrix.hpp&amp;gt;'&lt;/span&gt;,
           &lt;span style=&quot;color: #8b2252;&quot;&gt;'&amp;lt;boost/numeric/ublas/banded.hpp&amp;gt;'&lt;/span&gt;,
           &lt;span style=&quot;color: #8b2252;&quot;&gt;'&amp;lt;boost/numeric/ublas/vector.hpp&amp;gt;'&lt;/span&gt;,
           &lt;span style=&quot;color: #8b2252;&quot;&gt;'&amp;lt;cmath&amp;gt;'&lt;/span&gt;],
       type_converters=converters.blitz,
       libraries=[&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;lapack&quot;&lt;/span&gt;])
&lt;/pre&gt;



&lt;p&gt;
A couple of things to point out here: since &lt;code&gt;codestring&lt;/code&gt; is only the
&lt;i&gt;body&lt;/i&gt; of a function, we need some other place to declare include
files and declarations. Include files are listed in &lt;code&gt;headers&lt;/code&gt;,
additional code with declarations can be supplied using
&lt;code&gt;support_code&lt;/code&gt;.  To use OpenMP, we need the compiler argument
&lt;code&gt;-fopenmp&lt;/code&gt;, and the linker needs to link the &lt;code&gt;gomp&lt;/code&gt; library. The
&lt;code&gt;type_converters&lt;/code&gt; argument tells Weave to copy and use conversion
routines into the final program which convert NumPy arrays in Blitz
arrays, which have nice accessor operators.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-2&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-2&quot;&gt;Second Order Parallelization &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-2&quot;&gt;


&lt;p&gt;
We can go even one step further and parallelize the resulting code
over multiple processors or even computers. In Python, a comparably
new and exciting way to achieve this is &lt;a href=&quot;http://ipython.scipy.org/doc/rel-0.9.1/html/parallel/index.html&quot;&gt;Parallel IPython&lt;/a&gt;. I'll leave
the details on this to a later posting, instead I'll point to a
problem I found while running Weave code in parallel on multiple machines.
&lt;/p&gt;
&lt;p&gt;
When Weave compiles your code, it writes the generated source code to
&lt;code&gt;~/.python26_compiled&lt;/code&gt;. The file name is created from a hash code over
the contained source code. This way, Weave knows when to recompile and
when to reuse an old binary. 
&lt;/p&gt;
&lt;p&gt;
Now to the downside: 
&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Multiple Cores&lt;/dt&gt;&lt;dd&gt;
You will get problems if you run multiple
instances at the same time: They try to write the same source code
file at the same time.
&lt;/dd&gt;
&lt;dt&gt;Multiple Computers&lt;/dt&gt;&lt;dd&gt;
You will get the same problem if your home
directory is on the same file system (NFS or the like)

&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;My solution to problem number one is to compile once with a single
thread, then reuse using all threads. Problem two can be solved by
symlinking &lt;code&gt;~/.python26_compiled&lt;/code&gt; to &lt;code&gt;/tmp&lt;/code&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content>



 </entry>
 
 <entry>
   <title>Dijkstra on boost grid graph</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2011/01/27/dijkstra-on-boost-grid-graph.html"/>
   <updated>2011-01-27T00:00:00+01:00</updated>
   <id>/2011/01/27/dijkstra-on-boost-grid-graph</id>
   <content type="html">&lt;p&gt;
There is a whole science of finding tubular structures in
3D-images. It started sometime in the late nineties. One of the most
cited papers&lt;sup&gt;&lt;a class=&quot;footref&quot; name=&quot;fnr.1&quot; href=&quot;#fn.1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; in this area shows how to use eigenvalues of the
local second order derivatives to identify tubular structures
heuristically. It also shows how to combine the results on multiple scales.
&lt;/p&gt;


&lt;table style=&quot;width:auto;&quot;&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://picasaweb.google.com/lh/photo/XK2FSphF3ijyxWE1YfWgerCSrj332j42BwVaSPfkN6k?feat=embedwebsite&quot;&gt;&lt;img src=&quot;http://lh6.ggpht.com/_hkhVmTxHDxo/TUQ8CIB9zpI/AAAAAAAACBg/uNIQinz8ntw/s144/raw-4.jpg&quot; height=&quot;126&quot; width=&quot;144&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;font-family:arial,sans-serif; font-size:11px; text-align:right&quot;&gt; &lt;a href=&quot;http://picasaweb.google.com/boiling.complex.9253826/NMRRootProcessing?authkey=Gv1sRgCMyb567amYDAzwE&amp;feat=embedwebsite&quot;&gt;Raw Data&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;table style=&quot;width:auto;&quot;&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://picasaweb.google.com/lh/photo/ypVatTBChSnGXAzCLOSuVrCSrj332j42BwVaSPfkN6k?feat=embedwebsite&quot;&gt;&lt;img src=&quot;http://lh3.ggpht.com/_hkhVmTxHDxo/TUQ8Dvhk_bI/AAAAAAAACBo/LZsn_5veUrc/s144/sato-4.jpg&quot; height=&quot;126&quot; width=&quot;144&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;font-family:arial,sans-serif; font-size:11px; text-align:right&quot;&gt; &lt;a href=&quot;http://picasaweb.google.com/boiling.complex.9253826/NMRRootProcessing?authkey=Gv1sRgCMyb567amYDAzwE&amp;feat=embedwebsite&quot;&gt;After Pre-processing&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;

&lt;p&gt;
That is quite neat, but the approach has a major problem: It does not
assume any global structure. Therefore, when the data does not have
evidence for tubular structures, you still do not see them, and
furthermore, you have to have a method that allows you to say &quot;points
x and y lie on the same tube&quot;. This, and determining properties of the
tubes, are the problems which most of the followupr research strives
to solve. 
&lt;/p&gt;
&lt;p&gt;
I wanted to have a quick-and-dirty approach to the problem. For a
given seed point, I would like to know the connectivity of the tubular
structure to this seed point. I pre-processed the raw image using the
method of Sato et al.&lt;sup&gt;&lt;a class=&quot;footref&quot; name=&quot;fnr.1.2&quot; href=&quot;#fn.1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, determined the seed point, the graph is
defined implicitly over the 3D-image using voxel-neighborhoods, so we
could in theory just run Dijkstra on it. I was lazy, however and did
not want to implement it all. 
&lt;/p&gt;

&lt;div id=&quot;outline-container-1&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-1&quot;&gt;BGL Grid Graph &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-1&quot;&gt;


&lt;p&gt;
A quick google search turned up &lt;code&gt;boost::grid_graph&lt;/code&gt;, a relatively new
member of the &lt;a href=&quot;http://www.boost.org/doc/libs/1_42_0/libs/graph/doc/table_of_contents.html&quot;&gt;boost graph library (BGL)&lt;/a&gt;. It defines a graph on a grid
without &lt;i&gt;representing&lt;/i&gt; everything. Neat. The documentation is still a
bit shortish and sadly, only a 6-neighborhood (in 3D) is supported,
but who'd expect a free lunch?
&lt;/p&gt;
&lt;p&gt;
Usability is (at first) easy as pie:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;grid_graph&lt;/span&gt;&amp;lt;3&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;graph_traits&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;Traits&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;Traits&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;edge_descriptor&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;edge_descriptor&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;Traits&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;vertex_descriptor&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;vertex_descriptor&lt;/span&gt;;

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;...
&lt;/span&gt;
&lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;array&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;vidx_t&lt;/span&gt;, 3&amp;gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;lengths&lt;/span&gt; = { { 256, 256, 256 } };
&lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;graph&lt;/span&gt;(lengths, &lt;span style=&quot;color: #008b8b;&quot;&gt;false&lt;/span&gt;); &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;do not wrap any dimensions
&lt;/span&gt;&lt;/pre&gt;




&lt;/div&gt;

&lt;div id=&quot;outline-container-1_1&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;sec-1_1&quot;&gt;Defining properties of edges in a grid graph &lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-1_1&quot;&gt;


&lt;p&gt;
Contrary to &lt;a href=&quot;http://www.cs.brown.edu/~jwicks/boost/libs/graph/doc/using_adjacency_list.html#sec:adjacency-list-properties&quot;&gt;graphs with internal properties&lt;/a&gt;, we have to define
&lt;i&gt;external&lt;/i&gt; properties for &lt;code&gt;graph_t&lt;/code&gt;. This is currently not documented
well, and it took me some time to figure out:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
In most BGL examples, nodes are indexed using an integer type. The
&lt;code&gt;grid_graph&lt;/code&gt;, however, uses n-tuples of type &lt;code&gt;boost::array&amp;lt;n&amp;gt;&lt;/code&gt;.
This means, most examples cannot by applied to &lt;code&gt;grid_graph&lt;/code&gt; in a
straight-forward way: You cannot index an array by a tuple without
doing some tricks first.

&lt;/li&gt;
&lt;li&gt;
We would like to have an arbitrary function of the nodes as the
weight.  Most BGL examples simply use an array for whichever
property they put in their graph.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first problem can be solved using shared array property maps
(which do not seem to be well-documented either):
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #228b22;&quot;&gt;shared_array_property_map&lt;/span&gt;&amp;lt;vertex_descriptor,
                          property_map&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;, &lt;span style=&quot;color: #228b22;&quot;&gt;vertex_index_t&lt;/span&gt;&amp;gt;::const_type&amp;gt;
                          &lt;span style=&quot;color: #0000ff;&quot;&gt;p_map&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;num_vertices&lt;/span&gt;(&lt;span style=&quot;color: #a0522d;&quot;&gt;graph&lt;/span&gt;), get(vertex_index, graph));
&lt;span style=&quot;color: #228b22;&quot;&gt;shared_array_property_map&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt;,
                          property_map&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;, &lt;span style=&quot;color: #228b22;&quot;&gt;vertex_index_t&lt;/span&gt;&amp;gt;::const_type&amp;gt;
                          &lt;span style=&quot;color: #0000ff;&quot;&gt;d_map&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;num_vertices&lt;/span&gt;(&lt;span style=&quot;color: #a0522d;&quot;&gt;graph&lt;/span&gt;), get(vertex_index, graph));
&lt;/pre&gt;



&lt;p&gt;
We will use &lt;code&gt;p_map&lt;/code&gt; as our predecessor map, which maps a vertex index (integer)
to a vertex descriptor (n-tuple).
&lt;/p&gt;
&lt;p&gt;
The second map &lt;code&gt;d_map&lt;/code&gt; is a mapping from vertex index to double, which
will be our distance map.
&lt;/p&gt;
&lt;p&gt;
Finally, we need to create a &lt;code&gt;edge_weight&lt;/code&gt; for &lt;code&gt;graph_t&lt;/code&gt;, and it
should be a &lt;i&gt;function&lt;/i&gt;, not an array. For this, I peaked W.P. McNeills
&lt;a href=&quot;https://github.com/wpm/Boost-Implicit-Graph-Example&quot;&gt;Boost-Implicit-Graph Example&lt;/a&gt; on &lt;a href=&quot;https://github.com/&quot;&gt;GitHub&lt;/a&gt;.
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;struct&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt; {
  &lt;span style=&quot;color: #a020f0;&quot;&gt;template&lt;/span&gt;&amp;lt;&amp;gt;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;struct&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;property_map&lt;/span&gt;&amp;lt; &lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;, &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_t&lt;/span&gt; &amp;gt; {
    &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;type&lt;/span&gt;;
    &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;const_type&lt;/span&gt;;
  };
}

&lt;span style=&quot;color: #b22222;&quot;&gt;/*&lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;
   Map from edges to weight values
*/&lt;/span&gt;
&lt;span style=&quot;color: #a020f0;&quot;&gt;struct&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map&lt;/span&gt; {
  &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;double&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;value_type&lt;/span&gt;;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;value_type&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;reference&lt;/span&gt;;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;edge_descriptor&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;key_type&lt;/span&gt;;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;readable_property_map_tag&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;category&lt;/span&gt;;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;m_graph&lt;/span&gt;;
  &lt;span style=&quot;color: #0000ff;&quot;&gt;edge_weight_map&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;g&lt;/span&gt;)
  :m_graph(g) { }
  &lt;span style=&quot;color: #228b22;&quot;&gt;reference&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;operator&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;[]&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;key_type&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;e&lt;/span&gt;) &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt;; &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;implemented below
&lt;/span&gt;};

&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;property_map&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;, &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_t&lt;/span&gt;&amp;gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;const_type&lt;/span&gt;
        &lt;span style=&quot;color: #228b22;&quot;&gt;const_edge_weight_map&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;property_traits&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;const_edge_weight_map&lt;/span&gt;&amp;gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;reference&lt;/span&gt;
        &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map_value_type&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;property_traits&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;const_edge_weight_map&lt;/span&gt;&amp;gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;key_type&lt;/span&gt;
        &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map_key&lt;/span&gt;;

&lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;{
    &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;PropertyMap valid expressions
&lt;/span&gt;    &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map_value_type&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;get&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;const_edge_weight_map&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;pmap&lt;/span&gt;, &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map_key&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;e&lt;/span&gt;) {
        &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; pmap[e]; }
    &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;ReadablePropertyGraph valid expressions
&lt;/span&gt;    &lt;span style=&quot;color: #228b22;&quot;&gt;const_edge_weight_map&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;get&lt;/span&gt;(&lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::edge_weight_t, &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;&amp;amp;&lt;span style=&quot;color: #a0522d;&quot;&gt;g&lt;/span&gt;) {
        &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; const_edge_weight_map(g); }
    &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map_value_type&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;get&lt;/span&gt;(&lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_t&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;tag&lt;/span&gt;, &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;graph_t&lt;/span&gt;&amp;amp; &lt;span style=&quot;color: #a0522d;&quot;&gt;g&lt;/span&gt;, &lt;span style=&quot;color: #228b22;&quot;&gt;edge_weight_map_key&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;e&lt;/span&gt;) {
        &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; get(tag, g)[e]; }
}

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;whoa, lots of typedefs, but now we can write the function that we
&lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;wanted to write: Map edges to weights!
&lt;/span&gt;&lt;span style=&quot;color: #008b8b;&quot;&gt;edge_weight_map&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;reference&lt;/span&gt;
&lt;span style=&quot;color: #008b8b;&quot;&gt;edge_weight_map&lt;/span&gt;::&lt;span style=&quot;color: #a020f0;&quot;&gt;operator&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;[]&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;key_type&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;e&lt;/span&gt;) &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; {
    &lt;span style=&quot;color: #228b22;&quot;&gt;vertex_descriptor&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;t&lt;/span&gt; = target(e,m_graph);
    &lt;span style=&quot;color: #228b22;&quot;&gt;vertex_descriptor&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;s&lt;/span&gt; = source(e,m_graph);
    &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;f = f(t,s)
&lt;/span&gt;    &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; f;
}
&lt;/pre&gt;



&lt;p&gt;
That was more nasty than I expected, but it is still not &lt;i&gt;that&lt;/i&gt; much
code to write.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-1_2&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;sec-1_2&quot;&gt;Representing the data in a Boost Multi-Array &lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-1_2&quot;&gt;


&lt;p&gt;
Finally, we need to represent 3D-data. I hate dealing with raw memory,
so I always wrap it in some convenient abstracting structure first
chance I get. 3D-arrays are definitely a case for &lt;a href=&quot;http://www.boost.org/doc/libs/1_45_0/libs/multi_array/doc/index.html&quot;&gt;Boost Multi-Array&lt;/a&gt;. 
Its not a big deal using it, but I want to advertise its simplicity here:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;multi_array_ref&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;unsigned&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;char&lt;/span&gt;, 3&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;array_type&lt;/span&gt;;

&lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;ifstream&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;dat&lt;/span&gt;(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;raw.dat&quot;&lt;/span&gt;, &lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;ios&lt;/span&gt;::in | &lt;span style=&quot;color: #008b8b;&quot;&gt;std&lt;/span&gt;::&lt;span style=&quot;color: #008b8b;&quot;&gt;ios&lt;/span&gt;::binary);
&lt;span style=&quot;color: #228b22;&quot;&gt;unsigned&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;char&lt;/span&gt;* &lt;span style=&quot;color: #a0522d;&quot;&gt;data&lt;/span&gt; = &lt;span style=&quot;color: #a020f0;&quot;&gt;new&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;unsigned&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;char&lt;/span&gt;[256*256*256];
dat.read((&lt;span style=&quot;color: #228b22;&quot;&gt;char&lt;/span&gt;*)data,256*256*256);
&lt;span style=&quot;color: #228b22;&quot;&gt;array_type&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;A&lt;/span&gt;(data,&lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;extents&lt;/span&gt;[256][256][256]);
dat.close();
&lt;/pre&gt;



&lt;p&gt;
By the way, you can also change the base of this array as well as the
memory order, so if you need to port Matlab (TM) code to a programming
language (no adjective here on purpose), you can't get much simpler.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-1_3&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;sec-1_3&quot;&gt;Finish line: Running Dijkstra &lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-1_3&quot;&gt;


&lt;p&gt;
This is trivial now and straight from the book:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #008b8b;&quot;&gt;boost&lt;/span&gt;::&lt;span style=&quot;color: #228b22;&quot;&gt;array&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;vidx_t&lt;/span&gt;, 3&amp;gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;start&lt;/span&gt;  = { { 109, 129,  24 } };
dijkstra_shortest_paths(graph, start
        ,predecessor_map(p_map)
        .distance_map(d_map)
        );
&lt;/pre&gt;



&lt;p&gt;
Note the &quot;.&quot; before &lt;code&gt;distance_map&lt;/code&gt;, this is a named argument called
&lt;code&gt;distance_map&lt;/code&gt;, which is set to &lt;code&gt;d_map&lt;/code&gt;. The same goes for &lt;code&gt;p_map&lt;/code&gt;.
&lt;/p&gt;


&lt;table style=&quot;width:auto;&quot;&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://picasaweb.google.com/lh/photo/CX1qPd1r7IP_RrAuinhVKbCSrj332j42BwVaSPfkN6k?feat=embedwebsite&quot;&gt;&lt;img src=&quot;http://lh4.ggpht.com/_hkhVmTxHDxo/TUQ8BNA3NbI/AAAAAAAACBY/a0RrGF_qbSo/s144/L2_22aug_Ansicht2_Dijkstra.jpg&quot; height=&quot;126&quot; width=&quot;144&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;font-family:arial,sans-serif; font-size:11px; text-align:right&quot;&gt; &lt;a href=&quot;http://picasaweb.google.com/boiling.complex.9253826/NMRRootProcessing?authkey=Gv1sRgCMyb567amYDAzwE&amp;feat=embedwebsite&quot;&gt;Thresholded Distances&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;

&lt;p&gt;
We can now trace paths starting at points which have a high value in
the raw image:
&lt;/p&gt;


&lt;table style=&quot;width:auto;&quot;&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://picasaweb.google.com/lh/photo/35H4UuBUDPzDCrkWSAcaSrCSrj332j42BwVaSPfkN6k?feat=embedwebsite&quot;&gt;&lt;img src=&quot;http://lh3.ggpht.com/_hkhVmTxHDxo/TUQ8BygjVLI/AAAAAAAACBc/s1rWGmNQ_7w/s144/L2_22aug_Ansicht2_tree.jpg&quot; height=&quot;126&quot; width=&quot;144&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;font-family:arial,sans-serif; font-size:11px; text-align:right&quot;&gt; &lt;a href=&quot;http://picasaweb.google.com/boiling.complex.9253826/NMRRootProcessing?authkey=Gv1sRgCMyb567amYDAzwE&amp;feat=embedwebsite&quot;&gt;Traced paths starting at thresholded raw data&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;




&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&quot;footnotes&quot;&gt;
&lt;h2 class=&quot;footnotes&quot;&gt;Footnotes: &lt;/h2&gt;
&lt;div id=&quot;text-footnotes&quot;&gt;
&lt;p class=&quot;footnote&quot;&gt;&lt;sup&gt;&lt;a class=&quot;footnum&quot; name=&quot;fn.1&quot; href=&quot;#fnr.1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; &quot;3D multi-scale line filter for segmentation and visualization
of curvilinear structures in medical images&quot; by Sato et al.,
CVRMed-MRCAS, 1997 (&lt;a href=&quot;http://www.image.med.osaka-u.ac.jp/member/yoshi/paper/linefilter.pdf&quot;&gt;PDF&lt;/a&gt;)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content>



 </entry>
 
 <entry>
   <title>Views on constant matrices in CUV</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/12/17/cuv_const_matrices.html"/>
   <updated>2010-12-17T00:00:00+01:00</updated>
   <id>/2010/12/17/cuv_const_matrices</id>
   <content type="html">&lt;p&gt;
Sometimes the strong typing of c++ is really annoying. For example,
when you have a class 
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;template&lt;/span&gt; &amp;lt;&lt;span style=&quot;color: #a020f0;&quot;&gt;class&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;T&lt;/span&gt;&amp;gt;
&lt;span style=&quot;color: #a020f0;&quot;&gt;struct&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;C&lt;/span&gt;{
   &lt;span style=&quot;color: #228b22;&quot;&gt;T&lt;/span&gt;* &lt;span style=&quot;color: #a0522d;&quot;&gt;p&lt;/span&gt;;
   &lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;T&lt;/span&gt;* &lt;span style=&quot;color: #0000ff;&quot;&gt;getp&lt;/span&gt;()&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt;{ &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; p;}
   &lt;span style=&quot;color: #0000ff;&quot;&gt;C&lt;/span&gt;(&lt;span style=&quot;color: #228b22;&quot;&gt;T&lt;/span&gt;* &lt;span style=&quot;color: #a0522d;&quot;&gt;_p&lt;/span&gt;):p(_p){}
};
&lt;/pre&gt;



&lt;p&gt;
and you want to create a const instance of class C, you would assume
that this is possible with a const pointer. However, C++ does not like
this, as the member variable is not declared const:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;C&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt;;

&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;f&lt;/span&gt;=0.f;
&lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt;       &lt;span style=&quot;color: #228b22;&quot;&gt;a&lt;/span&gt;(&amp;amp;&lt;span style=&quot;color: #a0522d;&quot;&gt;f&lt;/span&gt;);        &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;ok
&lt;/span&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;b&lt;/span&gt;(&amp;amp;&lt;span style=&quot;color: #a0522d;&quot;&gt;f&lt;/span&gt;);        &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;ok
&lt;/span&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;c&lt;/span&gt;(b.getp());  &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;compiler error!
&lt;/span&gt;&lt;/pre&gt;



&lt;p&gt;
This use case occured for us when we tried to create a column-major
view on a const row-major matrix and vice-versa. The data of both
matrices should be unchangeable and at the same position in memory. 
All conditions are ensured, but C++ complains nevertheless.
&lt;/p&gt;
&lt;p&gt;
The easiest way around this problem is this: We need to change the
value type of the matrix view we want to create:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;C&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;C&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;B&lt;/span&gt;;

&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;f&lt;/span&gt;=0.f;
&lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt;       &lt;span style=&quot;color: #228b22;&quot;&gt;a&lt;/span&gt;(&amp;amp;&lt;span style=&quot;color: #a0522d;&quot;&gt;f&lt;/span&gt;);
&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;b&lt;/span&gt;(&amp;amp;&lt;span style=&quot;color: #a0522d;&quot;&gt;f&lt;/span&gt;);

&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;B&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;c&lt;/span&gt;(b.getp());
&lt;/pre&gt;



&lt;p&gt;
Now the member pointer is of type &lt;code&gt;const float*&lt;/code&gt;, which can be
initialized using the &lt;code&gt;const float*&lt;/code&gt; we receive fron &lt;code&gt;b.getp()&lt;/code&gt;.
&lt;/p&gt;
&lt;p&gt;
In CUV this looks as follows:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;dense_matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;,column_major,dev_memory_space&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt;;
&lt;span style=&quot;color: #a020f0;&quot;&gt;typedef&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;dense_matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;,row_major,dev_memory_space&amp;gt; &lt;span style=&quot;color: #228b22;&quot;&gt;B&lt;/span&gt;;

&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;A&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;a&lt;/span&gt;(16,32);
&lt;span style=&quot;color: #a020f0;&quot;&gt;const&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;B&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;b&lt;/span&gt;(32,16,a.ptr(),&lt;span style=&quot;color: #008b8b;&quot;&gt;true&lt;/span&gt;);
&lt;/pre&gt;



&lt;p&gt;
With this, you can dispatch functions working on rows and columns on
column-major matrices so that they also work on columns and rows (!)
of row-major matrices.
&lt;/p&gt;</content>



 </entry>
 
 <entry>
   <title>Operators in CUV</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/12/15/cuv_operators.html"/>
   <updated>2010-12-15T00:00:00+01:00</updated>
   <id>/2010/12/15/cuv_operators</id>
   <content type="html">&lt;p&gt;
CUV now features operators for matrices in C++ and Python! 
&lt;/p&gt;
&lt;p&gt;
That means, now you can write
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;using&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;cuv&lt;/span&gt;;

&lt;span style=&quot;color: #228b22;&quot;&gt;dense_matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;, column_major&amp;gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;mat&lt;/span&gt; A(16,16);
&lt;span style=&quot;color: #228b22;&quot;&gt;dense_matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;, column_major&amp;gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;mat&lt;/span&gt; B(16,16);

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;...
&lt;/span&gt;
&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;this involves copying a temporary matrix
&lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;and could be avoided using a strategy like 
&lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;unalias of boost.ublas (TBD)
&lt;/span&gt;&lt;span style=&quot;color: #228b22;&quot;&gt;dense_matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;, column_major&amp;gt; &lt;span style=&quot;color: #a0522d;&quot;&gt;mat&lt;/span&gt; C = A-B;

C += B;   &lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;this is fast and equivalent to:
&lt;/span&gt;C.apply_binary_functor(C,B,BF_ADD);
&lt;/pre&gt;



&lt;p&gt;
In Python, it looks almost the same, and in fact, the operators are
&lt;a href=&quot;http://www.boost.org/doc/libs/1_45_0/libs/python/doc/tutorial/doc/html/python/exposing.html#python.class_operators_special_functions&quot;&gt;directly exported&lt;/a&gt; using Boost.Python!
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;A = dev_matrix_cmf(16,16)
B = dev_matrix_cmf(16,16)
C  = A-B
C += B     &lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;which again is equivalent to
&lt;/span&gt;apply_binary_functor(C,B,binary_functor.ADD)
&lt;/pre&gt;



&lt;p&gt;
In passing, we also fixed most of the weird const-cast related
problems. Now the only const casts left are the ones needed because
&lt;a href=&quot;http://code.google.com/p/thrust/&quot;&gt;thrust&lt;/a&gt; does not support const references (yet?).
&lt;/p&gt;</content>



 </entry>
 
 <entry>
   <title>Nearest Neighbor Classifier in CUV</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/12/15/cuv_1nn.html"/>
   <updated>2010-12-15T00:00:00+01:00</updated>
   <id>/2010/12/15/cuv_1nn</id>
   <content type="html">&lt;p&gt;
After reading this &lt;a href=&quot;http://blog.smola.org/post/969195661/in-praise-of-the-second-binomial-formula&quot;&gt;blog post by Alex Smola&lt;/a&gt; I implemented a
one-nearest-neighbor classifier in &lt;a href=&quot;https://github.com/deeplearningais/CUV&quot;&gt;CUV&lt;/a&gt;. Apart from a bug that popped
up, I was happy to see that no changes to the library where needed,
and the code is very short, so I'll share the implementation here.
&lt;/p&gt;
&lt;p&gt;
There is a small annoyance which I'll fix later:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
there is no &lt;code&gt;argmin_col&lt;/code&gt;, so we instead resort to multiplying by
&lt;code&gt;-1&lt;/code&gt; and then use &lt;code&gt;argmax_col&lt;/code&gt; for now. I'll clean this mess up
later, probably by adding a &lt;code&gt;argFunc_col&lt;/code&gt; functor.

&lt;/li&gt;
&lt;/ul&gt;




&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; cuv_python &lt;span style=&quot;color: #a020f0;&quot;&gt;as&lt;/span&gt; cp
&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; pyublas
&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; numpy &lt;span style=&quot;color: #a020f0;&quot;&gt;as&lt;/span&gt; np

&lt;span style=&quot;color: #a020f0;&quot;&gt;class&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;KNN&lt;/span&gt;:
    &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;&quot;&quot; calculates the labels of a test set by determining the closest
    instance in a training set and returning the corresponding label.&quot;&quot;&quot;&lt;/span&gt;
    &lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;__init__&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;, data, data_l):
        &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;&quot;&quot;
        data:   a (number_instances x dimensionality) numpy matrix
        data_l: a number_instances numpy vector containing the labels
        &quot;&quot;&quot;&lt;/span&gt;
        &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data   = cp.push(data)
        &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data_l = data_l
        &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.dsq    = cp.dev_matrix_cmf(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data.h,1)
        cp.reduce_to_col(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.dsq.vec,&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data,cp.reduce_functor.ADD_SQUARED)
    &lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;__get_distance_matrix&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;, test):
        &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;&quot;&quot;
        test: a (number_test_instances x dimensionality) numpy matrix
        returns: a (number_instances x number_test_instances) CUV distance matrix
        &quot;&quot;&quot;&lt;/span&gt;
        t   = cp.push(test)
        &lt;span style=&quot;color: #a020f0;&quot;&gt;assert&lt;/span&gt; t.w == &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data.w
        tsq = cp.dev_matrix_cmf(t.h, 1)
        cp.reduce_to_col(tsq.vec,t,cp.reduce_functor.ADD_SQUARED)
        p   = cp.dev_matrix_cmf(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data.h, t.h)
        cp.prod(p, &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data, t, &lt;span style=&quot;color: #8b2252;&quot;&gt;'n'&lt;/span&gt;,&lt;span style=&quot;color: #8b2252;&quot;&gt;'t'&lt;/span&gt;,-2, 0)
        cp.matrix_plus_col(p,&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.dsq.vec)
        cp.matrix_plus_row(p,tsq.vec)
        &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; p
    &lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;run&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;,test):
        &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;&quot;&quot;
        test:    a (number_test_instances x dimensionality) numpy matrix
        returns: the estimated labels of the test instances
        &quot;&quot;&quot;&lt;/span&gt;
        p = &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.__get_distance_matrix(test)
        p *= -1.                &lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;no argmin supported yet, sorry
&lt;/span&gt;        idx = cp.dev_matrix_cmi(test.shape[0],1)
        cp.argmax_to_row(idx.vec, p)
        hidx  = idx.np.reshape(idx.h)
        &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data_l.reshape(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data.h)[hidx]
&lt;/pre&gt;



&lt;p&gt;
As suggested in the original post, this is a very fast method to
calculate the nearest neighbor. Instead of looping over all possible
pairs $x\in X$ and $z\in Z$ calculating the distance
&lt;/p&gt;


\[D_{ij} = \|x_i-z_j\|^2,\]

&lt;p&gt;
we first precalculate $\|x_i\|^2$ and $\|z_i\|^2$ and then use a fast
matrix multiplication and two additions to determine the distance
above (derived by expanding the second binomial formula)
&lt;/p&gt;


\[D_{ij} = \|x_i\|^2  + \|z_i\|^2  - 2 XZ^T.\]

&lt;p&gt;
We can measure the speed of the implementation by running the above
program using GPU or CPU. The CPU implementation uses cBLAS, I'm using
a 3.2 GHz CPU and a GTX580 for comparison.
&lt;/p&gt;
&lt;table class=&quot;orgmode&quot; border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;hsides&quot;&gt;
&lt;caption&gt;&lt;/caption&gt;
&lt;colgroup&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;right&quot; /&gt;&lt;col align=&quot;right&quot; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt;Method&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;Time (s)&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;Relative speed&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;CUV CPU (cBLAS)&lt;/td&gt;&lt;td&gt;58.35&lt;/td&gt;&lt;td&gt;34&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;CUV GPU (cuBLAS)&lt;/td&gt;&lt;td&gt;1.71&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;
To test this on &lt;a href=&quot;http://yann.lecun.com/exdb/mnist/&quot;&gt;MNIST&lt;/a&gt;, save the above as &lt;code&gt;knn.py&lt;/code&gt; and run the following source:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; os
&lt;span style=&quot;color: #a020f0;&quot;&gt;from&lt;/span&gt; knn &lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; KNN
&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; numpy &lt;span style=&quot;color: #a020f0;&quot;&gt;as&lt;/span&gt; np

&lt;span style=&quot;color: #a020f0;&quot;&gt;class&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;MNIST&lt;/span&gt;:
  &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;&quot;&quot; This simply reads the MNIST files to main memory and converts them to float &quot;&quot;&quot;&lt;/span&gt;
  &lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;__init__&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;,dir):
      &lt;span style=&quot;color: #a020f0;&quot;&gt;from&lt;/span&gt; scipy.io.numpyio &lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; fread
      fd = &lt;span style=&quot;color: #a020f0;&quot;&gt;open&lt;/span&gt;(dir+&lt;span style=&quot;color: #8b2252;&quot;&gt;'/train-labels.idx1-ubyte'&lt;/span&gt;)
      fread(fd,8,&lt;span style=&quot;color: #8b2252;&quot;&gt;'c'&lt;/span&gt;)
      &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data_labels = np.fromfile(file=fd, dtype=np.uint8).reshape( 60000 )
      fd.close()

      fd = &lt;span style=&quot;color: #a020f0;&quot;&gt;open&lt;/span&gt;(dir+&lt;span style=&quot;color: #8b2252;&quot;&gt;'/train-images.idx3-ubyte'&lt;/span&gt;)
      fread(fd,16,&lt;span style=&quot;color: #8b2252;&quot;&gt;'c'&lt;/span&gt;)
      &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data = np.fromfile(file=fd, dtype=np.uint8).reshape( (60000,784) )
      fd.close()

      fd = &lt;span style=&quot;color: #a020f0;&quot;&gt;open&lt;/span&gt;(dir+&lt;span style=&quot;color: #8b2252;&quot;&gt;'/t10k-images.idx3-ubyte'&lt;/span&gt;)
      fread(fd,16,&lt;span style=&quot;color: #8b2252;&quot;&gt;'c'&lt;/span&gt;)
      &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.test = np.fromfile(file=fd, dtype=np.uint8).reshape( (10000,784) )
      fd.close()

      fd = &lt;span style=&quot;color: #a020f0;&quot;&gt;open&lt;/span&gt;(dir+&lt;span style=&quot;color: #8b2252;&quot;&gt;'/t10k-labels.idx1-ubyte'&lt;/span&gt;)
      fread(fd,8,&lt;span style=&quot;color: #8b2252;&quot;&gt;'c'&lt;/span&gt;)
      &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.test_labels = np.fromfile(file=fd, dtype=np.uint8).reshape( 10000 )
      fd.close()
  &lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;get_test&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;):
      v = &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.test.astype(&lt;span style=&quot;color: #8b2252;&quot;&gt;'float32'&lt;/span&gt;).T.copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;F&quot;&lt;/span&gt;)
      t = &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.test_labels
      &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; v,t
  &lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;get&lt;/span&gt;(&lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;):
      v = &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data.astype(&lt;span style=&quot;color: #8b2252;&quot;&gt;'float32'&lt;/span&gt;).T.copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;F&quot;&lt;/span&gt;)
      t = &lt;span style=&quot;color: #a020f0;&quot;&gt;self&lt;/span&gt;.data_labels
      &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; v,t

pg = MNIST(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;/home/local/datasets/MNIST&quot;&lt;/span&gt;);

data, data_l  = pg.get()
test, test_l = pg.get_test()

knn = KNN(data.T.copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;F&quot;&lt;/span&gt;),data_l)

off, err_cnt = 5000, 0
&lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; i &lt;span style=&quot;color: #a020f0;&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;xrange&lt;/span&gt;(0,10000,off):
    pred = knn.run(test[:,i:i+off].T.copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;F&quot;&lt;/span&gt;))
    err_cnt += (pred!=test_l[i:i+off]).sum()

&lt;span style=&quot;color: #a020f0;&quot;&gt;print&lt;/span&gt; &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;Errors: &quot;&lt;/span&gt;, err_cnt

&lt;/pre&gt;



</content>



 </entry>
 
 <entry>
   <title>Hurricane in the Server Room</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/11/26/hurricane-in-the-server-room.html"/>
   <updated>2010-11-26T00:00:00+01:00</updated>
   <id>/2010/11/26/hurricane-in-the-server-room</id>
   <content type="html">&lt;p&gt;
Our new toy arrived, with twelve cores, four GTX580 GPUs, four
terabyte hard disk and 24 Gigabytes RAM. The thing looks quite
impressive from the inside:
&lt;/p&gt;
&lt;p&gt;
&lt;img src=&quot;http://www.ais.uni-bonn.de/~schulz/images/bigcuda3-off.jpg&quot; width=&quot;100%&quot; alt=&quot;http://www.ais.uni-bonn.de/~schulz/images/bigcuda3-off.jpg&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
Note the four GPUs on the left hand side and the vertical stack of
fans in the center. Together with the CPU fans above the GPUs it is
impossible to sit next to this machine and think clearly. It blows
your head off, or at least it sounds as if it would. Even when almost
unused, this machine produced so much heat our server room could not
discharge the heat anymore. Amazing.
&lt;/p&gt;
&lt;p&gt;
&lt;img src=&quot;http://www.ais.uni-bonn.de/~schulz/images/bigcuda3-on.jpg&quot; width=&quot;100%&quot; alt=&quot;http://www.ais.uni-bonn.de/~schulz/images/bigcuda3-on.jpg&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
If you ever wondered: A current Ubuntu Server version seems to have no
problems with this specification.
&lt;/p&gt;</content>



 </entry>
 
 <entry>
   <title>Benchmarking GPUs using the CUV library</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/11/12/gpu_benchmark.html"/>
   <updated>2010-11-12T00:00:00+01:00</updated>
   <id>/2010/11/12/gpu_benchmark</id>
   <content type="html">&lt;p&gt;
Since I started programming GPUs, a few generations of these cards
have been released, most recently the &lt;a href=&quot;http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_580/&quot;&gt;GTX580&lt;/a&gt;. I always wondered how
good these actually are and how programs written for one GPU scale to
the next generation. 
&lt;/p&gt;
&lt;p&gt;
We have a test suite available here, which is at least of importance
to us: The &lt;a href=&quot;http://www.ais.uni-bonn.de/deep_learning/downloads.html&quot;&gt;CUV library&lt;/a&gt;. Apart from unit-tests checking correctness of
the implementation, the library also has a few &quot;tests&quot; which measure
execution speed on GPU and CPU for comparison.
&lt;/p&gt;
&lt;p&gt;
Instead of inventing new tests for the benchmark, we simply reuse the
speed tests which come with CUV, as they are probably relevant
use-cases that the programmers optimized for, anyway. A small perl
script that now resides in the &lt;code&gt;scripts/&lt;/code&gt; directory of CUV now
identifies speed tests by their name (&lt;code&gt;*_speed&lt;/code&gt;), runs them and parses
their output. We simply collect all values in a defined order and save
them to a file. A larger number of these files can then be analyzed
using a python script included in the same directory, which uses the
superb &lt;a href=&quot;http://matplotlib.sourceforge.net/&quot;&gt;matplotlib&lt;/a&gt; library to draw a bar chart. We use a reference GPU
to compare relative timings.
&lt;/p&gt;
&lt;p&gt;
To cut a long story short, here are the results comparing GTX285,
GTX295, GX2-9800, GTX480 and GTX580:
&lt;/p&gt;
&lt;p&gt;
&lt;img src=&quot;http://www.ais.uni-bonn.de/~schulz//images/speed-relative.png&quot; width=&quot;100%&quot; alt=&quot;http://www.ais.uni-bonn.de/~schulz//images/speed-relative.png&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&quot;http://www.ais.uni-bonn.de/~schulz//images/speed-relative.png&quot;&gt;Direkt link to image&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
As  expected, &lt;a href=&quot;http://www.nvidia.com/object/fermi_architecture.html&quot;&gt;Fermi&lt;/a&gt; generation cards perform a lot better than the
older generation, the &lt;a href=&quot;http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_580/&quot;&gt;GTX580&lt;/a&gt; also improves on GTX480. Some operations,
which are apparently not implemented well, perform worse with newer
generation cards. The real work horses of our library have definitely
improved a lot over generations, even though we did not spend time on
optimizing them for the later cards.
&lt;/p&gt;
&lt;p&gt;
We're not the first ones to compare these cards of course.  &lt;a href=&quot;http://www.legitreviews.com/news/9429/&quot;&gt;Legit Reviews&lt;/a&gt; compares the frame rate rendered by the GTX480 and GTX580,
finding only marginal improvements.  &lt;a href=&quot;http://www.brightsideofnews.com/news/2010/11/9/nvidias-gtx580-hits-the-streets-and-leaves-an-impact.aspx?pageid=2&quot;&gt;Brightsideofnews&lt;/a&gt; runs many
rendering related benchmarks as well (3DMark Vantage, Unigine Heaven,
Pripyat), with mixed results. However, we're more interested in
general purpose computing (&lt;a href=&quot;http://en.wikipedia.org/wiki/GPGPU&quot;&gt;GPGPU&lt;/a&gt;) and in writing algorithms for GPU
hardware which then scale with the new hardware, as it becomes
available.
&lt;/p&gt;</content>



 </entry>
 
 <entry>
   <title>Directory Color Embedding</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/11/08/directory-color-embedding.html"/>
   <updated>2010-11-08T00:00:00+01:00</updated>
   <id>/2010/11/08/directory-color-embedding</id>
   <content type="html">&lt;p&gt;
Do you also save all your experiment data in folder names which
represent the settings? I do. If you have &lt;i&gt;many&lt;/i&gt; experiments, however,
this strategy will defy comparison of the experiments, as it becomes
hard to put everything in one plot. The default settings in plotting
programs such as &lt;a href=&quot;http://www.gnuplot.info/&quot;&gt;gnuplot&lt;/a&gt; or &lt;a href=&quot;http://matplotlib.sourceforge.net/&quot;&gt;matplotlib&lt;/a&gt; do not have a large enough
range of colors, and if they do, it becomes hard to assign the colors
in a meaningful way for exploratory data analysis. The script here
might come to rescue. 
&lt;/p&gt;

&lt;div id=&quot;outline-container-1&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-1&quot;&gt;Reading pre-processing directory names &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-1&quot;&gt;


&lt;p&gt;
We first transform directory names such that &lt;code&gt;test-paramA_0.1&lt;/code&gt; becomes
&lt;code&gt;test-paramA_0000.1&lt;/code&gt;, allowing easier string comparison.
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; Levenshtein &lt;span style=&quot;color: #a020f0;&quot;&gt;as&lt;/span&gt; L
&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; numpy &lt;span style=&quot;color: #a020f0;&quot;&gt;as&lt;/span&gt; np
&lt;span style=&quot;color: #a020f0;&quot;&gt;from&lt;/span&gt; colormath.color_objects &lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; LabColor
&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; os, re

&lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;fillfunc&lt;/span&gt;(mo):
    &lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;&quot;&quot; returns the matched number, with padded zeros to the front &quot;&quot;&quot;&lt;/span&gt;
    s = mo.group(1)
    &lt;span style=&quot;color: #a020f0;&quot;&gt;while&lt;/span&gt;(len(s)&amp;lt;6): s = &lt;span style=&quot;color: #8b2252;&quot;&gt;'0'&lt;/span&gt;+s
    &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; s
&lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;get_unified_names&lt;/span&gt;(fns):
    n = len(fns)
    strs = [x &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; x &lt;span style=&quot;color: #a020f0;&quot;&gt;in&lt;/span&gt; fns]
    &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; i &lt;span style=&quot;color: #a020f0;&quot;&gt;in&lt;/span&gt; xrange(len(strs)):
        strs[i] = re.sub(r&lt;span style=&quot;color: #8b2252;&quot;&gt;'(?&amp;lt;!\d)([\d.]+)(?!\d)'&lt;/span&gt;,fillfunc,strs[i])
    &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; strs
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-2&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-2&quot;&gt;Creating the distance matrix based on Levenstein string distance &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-2&quot;&gt;


&lt;p&gt;
The &lt;a href=&quot;http://en.wikipedia.org/wiki/Levenshtein_distance&quot;&gt;Levenshtein string distance&lt;/a&gt; is a measure of how many
edits/insertions/deletions one needs to transform one string into the
other. With this we automatically derive a dissimilarity measure for
the unified directory strings:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;mat = np.zeros((n,n))
&lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; i &lt;span style=&quot;color: #a020f0;&quot;&gt;in&lt;/span&gt; xrange(n):
    &lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; j &lt;span style=&quot;color: #a020f0;&quot;&gt;in&lt;/span&gt; xrange(n):
        &lt;span style=&quot;color: #a020f0;&quot;&gt;if&lt;/span&gt; j&amp;lt;=i: &lt;span style=&quot;color: #a020f0;&quot;&gt;continue&lt;/span&gt;
        mat[i,j] = L.distance(strs[i],strs[j])
        mat[j,i] = mat[i,j]
&lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; np.exp(-mat)
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-3&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-3&quot;&gt;Embedding the dissimilarity matrix using Multidimensional Scaling (MDS) in R &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-3&quot;&gt;


&lt;p&gt;
The distance matrix does not necessarily directly map to coordinates
in 3D, so we need an embedding algorithm which deals with distance
matrices only.
&lt;/p&gt;
&lt;p&gt;
For simplicity, we call R from python with a saved dissimilarity matrix.
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;emb&lt;/span&gt;(mat):
    np.savetxt(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;file.dat&quot;&lt;/span&gt;, mat)
    os.system(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;./analyse/emb.R&quot;&lt;/span&gt;)
    res = np.loadtxt(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;mds.dat&quot;&lt;/span&gt;)
    &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; res
&lt;/pre&gt;



&lt;p&gt;
Now to the part written in R, which only does the embedding and saves
the result in a text file:
&lt;/p&gt;



&lt;pre class=&quot;src src-R&quot;&gt;#!/usr/bin/Rscript
library(MASS)
library(vegan)
tab &amp;lt;- read.table(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;file.dat&quot;&lt;/span&gt;, header = FALSE, sep=&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot; &quot;&lt;/span&gt;)
data.m    &amp;lt;- as.matrix(tab)
data.mds &amp;lt;- vegan::metaMDS(data.m, k=3, trymax=50)
write.table(data.mds$points, file=&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;mds.dat&quot;&lt;/span&gt;, quote=F, row.names=F, col.names=F)
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-4&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-4&quot;&gt;Transforming embedded coordinates into colors &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-4&quot;&gt;


&lt;p&gt;
The coordinates are in some &amp;ndash; it seems arbitrary &amp;ndash; range and
therefore need to be mapped to colors. We want similar colors to have
similar edit distance, we therefore map our coordinates directly to
&lt;a href=&quot;http://en.wikipedia.org/wiki/Lab_color_space&quot;&gt;Lab Color Space&lt;/a&gt; and then transform the Lab color coordinates to RGB:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;getcolor&lt;/span&gt;(embedded,i):
    &lt;span style=&quot;color: #b22222;&quot;&gt;#&lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;lab = LabColor(0.8,*embedded[i,:])
&lt;/span&gt;    lab = LabColor(*embedded[i,:])
    rgb = lab.convert_to(&lt;span style=&quot;color: #8b2252;&quot;&gt;'RGB'&lt;/span&gt;, debug=&lt;span style=&quot;color: #a020f0;&quot;&gt;False&lt;/span&gt;).get_rgb_hex()
    &lt;span style=&quot;color: #a020f0;&quot;&gt;return&lt;/span&gt; rgb
&lt;/pre&gt;



&lt;p&gt;
The resulting string can be used directly in the &lt;code&gt;color&lt;/code&gt; specification
of a matplotlib plot and is then best combined with &lt;a href=&quot;http://matplotlib.sourceforge.net/users/event_handling.html&quot;&gt;picking&lt;/a&gt; to find
out more about a particularly interesting plotline.
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-5&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-5&quot;&gt;Example Embedding of directory names &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-5&quot;&gt;



&lt;p&gt;
&lt;img src=&quot;http://www.ais.uni-bonn.de/~schulz/images/embedded_directories.png&quot; width=&quot;100%&quot; alt=&quot;http://www.ais.uni-bonn.de/~schulz/images/embedded_directories.png&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&quot;http://www.ais.uni-bonn.de/~schulz/images/embedded_directories.png&quot;&gt;Direkt link to image&lt;/a&gt;
&lt;/p&gt;&lt;/div&gt;
&lt;/div&gt;
</content>



 </entry>
 
 <entry>
   <title>Tutorial CUV and MLPs released</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/10/29/cuv_mlp_example.html"/>
   <updated>2010-10-29T00:00:00+02:00</updated>
   <id>/2010/10/29/cuv_mlp_example</id>
   <content type="html">&lt;p&gt;
&lt;a href=&quot;http://en.wikipedia.org/wiki/Multilayer_perceptron&quot;&gt;Multilayer Perceptrons&lt;/a&gt; (MLPs) have a long history and are still often
used in e.g. speech processing and image recognition. They are generic
function approximators and are well-understood.
&lt;/p&gt;
&lt;p&gt;
The main workhorse of the MLP is the matrix multiplication (a fact
which is not entirely obvious when looking at the typical MLP
introductions. Refer to &lt;a href=&quot;http://www.willamette.edu/~gorr/classes/cs449/backprop.html&quot;&gt;this site&lt;/a&gt; for a detailed explanation).  Matrix
multiplications can be efficiently computed on the GPU, with speedups
of about 20 times when compared to e.g. &lt;a href=&quot;http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Library.html&quot;&gt;CBLAS&lt;/a&gt;.  In this post, I show
how to use our &lt;a href=&quot;http://www.ais.uni-bonn.de/deep_learning/downloads.html&quot;&gt;CUV-Library&lt;/a&gt; to implement a MLP in &lt;a href=&quot;http://www.python.org/&quot;&gt;Python&lt;/a&gt;, which runs
entirely on the GPU. CUV simply wraps all the gory details of 
representing matrices, vectors and the &lt;a href=&quot;http://www.nvidia.com/object/cuda_home_new.html&quot;&gt;NVIDIA CUDA&lt;/a&gt; implementation of
their operations in a clean way and exports functionality to Python.
&lt;/p&gt;
&lt;p&gt;
In addition to the &lt;a href=&quot;http://www.ais.uni-bonn.de/~schulz/cuv_examples/cuv_mlp.tar.gz&quot;&gt;complete source code&lt;/a&gt;, you also need to download the
&lt;a href=&quot;http://yann.lecun.com/exdb/mnist/&quot;&gt;MNIST Database&lt;/a&gt; of handwritten digits. Simply drop all files in the same folder.
&lt;/p&gt;
&lt;p&gt;
Here, I will only highlight some main points of the implementation to
look out for.
&lt;/p&gt;

&lt;div id=&quot;outline-container-1&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-1&quot;&gt;Setting up CUV and creating an MLP &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-1&quot;&gt;


&lt;p&gt;
after &lt;a href=&quot;http://github.com/deeplearningais/CUV&quot;&gt;compiling CUV&lt;/a&gt;, you can start using it rightaway. We start by
initializing everything
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;essential! Otherwise numpy-matrices cannot be used with cp!
&lt;/span&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; pyublas            
&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; cuv_python &lt;span style=&quot;color: #a020f0;&quot;&gt;as&lt;/span&gt; cp

&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;initialize cuv to run on device 0
&lt;/span&gt;cp.initCUDA(0)

&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;initialize random number generator with seed 0
&lt;/span&gt;cp.initialize_mersenne_twister_seeds(0)

&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;YOUR CODE HERE
&lt;/span&gt;
cp.exitCUDA()
&lt;/pre&gt;



&lt;p&gt;
Now we load the MNIST database, which is done entirely in Python and &lt;a href=&quot;https://numpy.scipy.org&quot;&gt;Numpy&lt;/a&gt;.
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;sys.argv[1] should be the path to your downloaded MNIST files
&lt;/span&gt;mnist = MNIST_data(sys.argv[1]) 
train_data, train_labels = mnist.get_train_data()
test_data,  test_labels  = mnist.get_test_data()
&lt;/pre&gt;



&lt;p&gt;
Now we construct an MLP with a hidden layer of 128 neurons, a
batchsize of 96 and start training for 100 epochs:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;sizes = [train_data.shape[0], 128, train_labels.shape[0]]
mlp = MLP(sizes, 96)
mlp.train(train_data, train_labels, 100)
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-2&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-2&quot;&gt;Forward Pass &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-2&quot;&gt;


&lt;p&gt;
For training, we first select a minibatch and push it to the device:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;self.neuron_layer[0].activations = \
  cp.push(input_matrix[:,index_begin:index_end]. \
  astype(&lt;span style=&quot;color: #8b2252;&quot;&gt;'float32'&lt;/span&gt;).copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;'F'&lt;/span&gt;))
&lt;/pre&gt;



&lt;p&gt;
Note that the matrix is a four-byte float (&quot;float32&quot;) and column major
(&quot;F&quot;). CUV supports column-major and row-major matrices, you should
just know what exactly you want to use.
&lt;/p&gt;
&lt;p&gt;
We can now propagate the information ahead in the MLP:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;Forward-Pass
&lt;/span&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;for&lt;/span&gt; i &lt;span style=&quot;color: #a020f0;&quot;&gt;in&lt;/span&gt; xrange(self.number_of_layers):
    self.weight_layer[i].forward()
&lt;/pre&gt;



&lt;p&gt;
The forward pass is simply a matrix multiplication, adding the bias
and applying the non-linearity:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;cp.prod(self.target.activations, self.weight,self.source.activations)
cp.matrix_plus_col(self.target.activations, self.bias.vec)
cp.apply_scalar_functor(input_, cp.scalar_functor.TANH)
&lt;/pre&gt;



&lt;p&gt;
that's it for the forward pass! Now we need to determine the error at the output:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;deltas = teacher
&lt;/span&gt;cp.copy(self.neuron_layer[-1].deltas, teachbatch)
&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;deltas -= activations
&lt;/span&gt;cp.apply_binary_functor(self.neuron_layer[-1].deltas,
                        self.neuron_layer[-1].activations,
                        cp.binary_functor.SUBTRACT)
&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;squared_errors = deltas
&lt;/span&gt;cp.copy(squared_errors, self.neuron_layer[-1].deltas)
&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;mse = sum(squared_errors**2)
&lt;/span&gt;cp.apply_scalar_functor(squared_errors, cp.scalar_functor.SQUARE)
mse += cp.sum(squared_errors)
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-3&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-3&quot;&gt;&lt;a name=&quot;ID-d4be0caa-2746-407e-b89a-92c56f1bca1c&quot; id=&quot;ID-d4be0caa-2746-407e-b89a-92c56f1bca1c&quot;&gt;&lt;/a&gt;Backward Pass &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-3&quot;&gt;


&lt;p&gt;
What is left is the backward pass \[ \delta_l=W_{l+1}\delta_{l+1} \cdot f'(net_l) \]
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;cp.prod(self.source.deltas, self.weight,
        self.target.deltas, &lt;span style=&quot;color: #8b2252;&quot;&gt;'t'&lt;/span&gt;,  &lt;span style=&quot;color: #8b2252;&quot;&gt;'n'&lt;/span&gt;)
h = cp.dev_matrix_cmf(self.source.activations.h,
                      self.source.activations.w)
cp.copy(h,  self.source.activations)
self.source.d_nonlinearity(h)
cp.apply_binary_functor(self.source.deltas, h,
                        cp.binary_functor.MULT)
&lt;/pre&gt;



&lt;p&gt;
And finally, we need to adjust the weights according to the backpropagated error:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;batch_size = self.source.activations.w

&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;weights update
&lt;/span&gt;h          = cp.dev_matrix_cmf(self.weight.h, self.weight.w)
cp.prod(h, self.target.deltas, self.source.activations, &lt;span style=&quot;color: #8b2252;&quot;&gt;'n'&lt;/span&gt;, &lt;span style=&quot;color: #8b2252;&quot;&gt;'t'&lt;/span&gt;)
cp.learn_step_weight_decay(self.weight, h,
                           learnrate/batch_size, decay)

&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;bias update
&lt;/span&gt;h          = cp.get_filled_matrix(self.target.activations.h, 1, 0)
cp.reduce_to_col(h.vec, self.target.deltas)
cp.learn_step_weight_decay(self.bias, h,
                           learnrate/batch_size, decay)
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-4&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-4&quot;&gt;&lt;a name=&quot;ID-cb7971ee-5c19-4be0-9450-3ca2ca254c2e&quot; id=&quot;ID-cb7971ee-5c19-4be0-9450-3ca2ca254c2e&quot;&gt;&lt;/a&gt;Performance &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-4&quot;&gt;


&lt;p&gt;
We can now measure the performance of our implementation.  To run
everything on CPU, we do a few substitutions, such that all matrices
are host-matrices. All function calls will be to CPU functions
automatically.
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;switchtohost&lt;/span&gt;():
    cp.dev_matrix_cmf_orig = cp.dev_matrix_cmf
    cp.dev_matrix_cmf      = cp.host_matrix_cmf
    cp.push                = cp.push_host
    cp.pull                = cp.pull_host
&lt;/pre&gt;



&lt;p&gt;
We can simply use the standard python module &lt;a href=&quot;http://docs.python.org/library/profile.html&quot;&gt;cProfile&lt;/a&gt; to determine
speed and which functions cost most in terms of time:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;import&lt;/span&gt; os, pstats
cProfile.runctx(&lt;span style=&quot;color: #8b2252;&quot;&gt;'mlp.train(train_data, train_labels, 1)'&lt;/span&gt;,globals(),locals(), &lt;span style=&quot;color: #8b2252;&quot;&gt;'/tmp/%s_mlp_profile'&lt;/span&gt;%os.getenv(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;USER&quot;&lt;/span&gt;))
p = pstats.Stats(&lt;span style=&quot;color: #8b2252;&quot;&gt;'/tmp/%s_mlp_profile'&lt;/span&gt;%os.getenv(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;USER&quot;&lt;/span&gt;))
p.sort_stats(&lt;span style=&quot;color: #8b2252;&quot;&gt;'time'&lt;/span&gt;).print_stats(15)
&lt;/pre&gt;



&lt;p&gt;
My results for
Batch-Learning are as follows: 
&lt;/p&gt;
&lt;table class=&quot;orgmode&quot; border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;hsides&quot;&gt;
&lt;caption&gt;&lt;/caption&gt;
&lt;colgroup&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;right&quot; /&gt;&lt;col align=&quot;right&quot; /&gt;&lt;col align=&quot;right&quot; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt;Function&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;Host (sec)&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;Device (sec)&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;Speedup&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;weight_update&lt;/code&gt;&lt;/td&gt;&lt;td&gt;4.5&lt;/td&gt;&lt;td&gt;0.07&lt;/td&gt;&lt;td&gt;64.3&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;forward&lt;/code&gt;&lt;/td&gt;&lt;td&gt;1.8&lt;/td&gt;&lt;td&gt;0.04&lt;/td&gt;&lt;td&gt;45.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;backward&lt;/code&gt;&lt;/td&gt;&lt;td&gt;0.9&lt;/td&gt;&lt;td&gt;0.09&lt;/td&gt;&lt;td&gt;10.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;nonlinearity&lt;/code&gt;&lt;/td&gt;&lt;td&gt;0.3&lt;/td&gt;&lt;td&gt;0.001&lt;/td&gt;&lt;td&gt;300.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Total&lt;/td&gt;&lt;td&gt;7.5&lt;/td&gt;&lt;td&gt;0.201&lt;/td&gt;&lt;td&gt;37.3&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;
&lt;img src=&quot;http://www.ais.uni-bonn.de/~schulz//images/mlp-speedups.png&quot; width=&quot;100%&quot; alt=&quot;http://www.ais.uni-bonn.de/~schulz//images/mlp-speedups.png&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&quot;http://www.ais.uni-bonn.de/~schulz//images/mlp-speedups.png&quot;&gt;Direct link to image&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
The test was performed on a &lt;b&gt;NVIDIA GTX 295&lt;/b&gt;, the CPU was an otherwise
idle &lt;b&gt;Intel Core i7, with 3.2 GHz&lt;/b&gt;. The matrix-multiplication on CPU
uses the &lt;a href=&quot;http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Library.html&quot;&gt;GNU CBLAS&lt;/a&gt; library for maximum speed.
&lt;/p&gt;
&lt;p&gt;
That's it, I hope you find this tutorial helpful. If you have
problems, please send me an &lt;a href=&quot;http://www.ais.uni-bonn.de/~schulz/contact.html&quot;&gt;email&lt;/a&gt; or use the comments below.
&lt;/p&gt;
&lt;p&gt;
Hannes
&lt;/p&gt;&lt;/div&gt;
&lt;/div&gt;
</content>



 </entry>
 
 <entry>
   <title>CUV Data Types</title>
   <link href="http://www.ais.uni-bonn.de/~schulz/2010/10/29/cuv_data_types.html"/>
   <updated>2010-10-29T00:00:00+02:00</updated>
   <id>/2010/10/29/cuv_data_types</id>
   <content type="html">&lt;div id=&quot;outline-container-1&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-1&quot;&gt;Overview &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-1&quot;&gt;


&lt;p&gt;
CUV has few but versatile types. The basic structures are &lt;code&gt;vector&lt;/code&gt; and
&lt;code&gt;matrix&lt;/code&gt;. The &lt;code&gt;matrix&lt;/code&gt; is never used by itself, instead you typically
use derived classes such as &lt;code&gt;dense_matrix&lt;/code&gt; or &lt;code&gt;dia_matrix&lt;/code&gt;.
&lt;/p&gt;
&lt;p&gt;
A rough sketch on what we'll see in this tutorial is shown in the
following graph:
&lt;/p&gt;
&lt;p&gt;
&lt;img src=&quot;http://www.ais.uni-bonn.de/~schulz/images/blue.png&quot;  alt=&quot;http://www.ais.uni-bonn.de/~schulz/images/blue.png&quot; /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-2&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-2&quot;&gt;Vectors &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-2&quot;&gt;


&lt;p&gt;
Vectors can have a type and a space where they live, e.g.
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #a020f0;&quot;&gt;using&lt;/span&gt; &lt;span style=&quot;color: #a020f0;&quot;&gt;namespace&lt;/span&gt; &lt;span style=&quot;color: #008b8b;&quot;&gt;cuv&lt;/span&gt;;

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;a 256-float vector on the GPU
&lt;/span&gt;&lt;span style=&quot;color: #228b22;&quot;&gt;vector&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;, dev_memory_space&amp;gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;v&lt;/span&gt;(256);

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;an 1000-unsigned char vector on CPU
&lt;/span&gt;&lt;span style=&quot;color: #228b22;&quot;&gt;vector&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;, host_memory_space&amp;gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;w&lt;/span&gt;(1000);
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-3&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-3&quot;&gt;Matrices &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-3&quot;&gt;


&lt;p&gt;
With (dense) matrices you get the additional option of accessing them
in a column-major or in a row-major way:
&lt;/p&gt;



&lt;pre class=&quot;src src-c++&quot;&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;a 16x32 float matrix on GPU, column-major
&lt;/span&gt;&lt;span style=&quot;color: #228b22;&quot;&gt;matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;float&lt;/span&gt;,column_major,dev_memory_space&amp;gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;M&lt;/span&gt;(16,32);

&lt;span style=&quot;color: #b22222;&quot;&gt;// &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;a 16x32 unsigned char matrix on CPU, row-major
&lt;/span&gt;&lt;span style=&quot;color: #228b22;&quot;&gt;matrix&lt;/span&gt;&amp;lt;&lt;span style=&quot;color: #228b22;&quot;&gt;unsigned&lt;/span&gt; &lt;span style=&quot;color: #228b22;&quot;&gt;char&lt;/span&gt;,row_major,dev_memory_space&amp;gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;N&lt;/span&gt;(16,32);
&lt;/pre&gt;



&lt;/div&gt;

&lt;/div&gt;

&lt;div id=&quot;outline-container-4&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;sec-4&quot;&gt;Differences to Python Bindings &lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-4&quot;&gt;


&lt;p&gt;
As Python does not support templates, we have to resort to a naming
scheme. For vectors, the naming scheme is as follows:
&lt;/p&gt;
&lt;table class=&quot;orgmode&quot; border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;hsides&quot;&gt;
&lt;caption&gt;&lt;/caption&gt;
&lt;colgroup&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;left&quot; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt;&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;float&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;int&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;unsigned char&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;device&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_vector_f&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_vector_i&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_vector_uc&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;host&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_vector_f&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_vector_i&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_vector_hc&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;



&lt;p&gt;
For matrices, we need the column-major/row-major information
additionally, so we get:
&lt;/p&gt;
&lt;table class=&quot;orgmode&quot; border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;hsides&quot;&gt;
&lt;caption&gt;&lt;/caption&gt;
&lt;colgroup&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;left&quot; /&gt;&lt;col align=&quot;left&quot; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt;&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;float&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;int&lt;/th&gt;&lt;th scope=&quot;col&quot;&gt;unsigned char&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;device&lt;/td&gt;&lt;td&gt;column-major&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_matrix_cmf&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_matrix_cmi&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_matrix_cmuc&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;row-major&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_matrix_rmf&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_matrix_rmi&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;dev_matrix_rmuc&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;host&lt;/td&gt;&lt;td&gt;column-major&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_matrix_cmf&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_matrix_cmi&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_matrix_cmuc&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;row-major&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_matrix_rmf&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_matrix_rmi&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;host_matrix_rmuc&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;
Easy, isn't it?
&lt;/p&gt;
&lt;p&gt;
Of course we also have to specify how we deal with &lt;a href=&quot;http://numpy.scipy.org/&quot;&gt;numpy&lt;/a&gt; matrices. The
main work is here done by the wonderful &lt;a href=&quot;http://mathema.tician.de/software/pyublas&quot;&gt;PyUBLAS bindings&lt;/a&gt; by Andreas
Kloeckner.  Basically, whatever properties your Numpy matrix has, the
CUV matrix will match. For example:
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;push a row-major numpy matrix of type float32 to the device
&lt;/span&gt;cp.push(np.ones((16,32)).astype(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;float32&quot;&lt;/span&gt;))
&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;&amp;lt;cuv_python._cuv_python.dev_matrix_rmf object at 0x7f04cb260e50&amp;gt;
&lt;/span&gt;
&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;push a column-major numpy matrix of type uint8 to the device
&lt;/span&gt;cp.push(np.ones((16,32)).astype(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;uint8&quot;&lt;/span&gt;).copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;F&quot;&lt;/span&gt;))
&lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;&amp;lt;cuv_python._cuv_python.dev_matrix_cmuc object at 0x7f04cb260d70&amp;gt;
&lt;/span&gt;&lt;/pre&gt;



&lt;p&gt;
Finally a nice observation if you want to save some of the expensive
&lt;code&gt;copy(&quot;F&quot;)&lt;/code&gt; calls:
&lt;/p&gt;
&lt;p&gt;
Every column-major matrix can also be interpreted as a transposed
version of the same row-major matrix (think about it!):
&lt;/p&gt;



&lt;pre class=&quot;src src-python&quot;&gt;cp.push   (np.ones((16,32)).astype(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;uint8&quot;&lt;/span&gt;).copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;F&quot;&lt;/span&gt;)).shape &lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;(16, 32)
&lt;/span&gt;cp.push_rm(np.ones((16,32)).astype(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;uint8&quot;&lt;/span&gt;).copy(&lt;span style=&quot;color: #8b2252;&quot;&gt;&quot;F&quot;&lt;/span&gt;)).shape &lt;span style=&quot;color: #b22222;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #b22222;&quot;&gt;(32, 16)
&lt;/span&gt;&lt;/pre&gt;



&lt;p&gt;
Hope this helps, feel free to post some comments! 
&lt;/p&gt;
&lt;p&gt;
Hannes
&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
</content>



 </entry>
 
 
</feed>


