CUV  0.9.201304091348
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Groups Pages
CUV Documentation


CUV is a C++ template and Python library which makes it easy to use NVIDIA(tm) CUDA.


Supported Platforms:

Supported GPUs:


Python Integration

Implemented Functionality





For C++ libs, you will need:

For Python Integration, you additionally have to install

Optionally, install dependent libraries

Obtaining CUV

You should check out the git repository

$ git clone git://

Installation Procedure

Building CUV:

$ sudo apt-get install cmake cmake-curses-gui libblas-dev libboost-all-dev doxygen python-nose python-dev cimg-dev
$ # download and install pyublas if you want python-bindings
$ cd cuv-version-source
$ mkdir -p build/release
$ cd build/release
$ cmake -DCMAKE_BUILD_TYPE=Release ../../
$ ccmake . # adjust paths to your system (cuda, thrust, pyublas, ...)!
# turn on/off optional libraries (CImg, ...)
$ make -j
$ ctest # run tests to see if it went well
$ sudo make install
$ export PYTHONPATH=`pwd`/src # only if you want python bindings

On Debian/Ubuntu systems, you can skip the sudo make install step and instead do

$ cpack -G DEB
$ sudo dpkg -i cuv-VERSION.deb

Building the documentation

$ cd build/debug # change to the build directory
$ make doc

Sample Code

We show two brief examples. For further inspiration, please take a look at the test cases implemented in the src/tests directory.

Pushing and pulling of memory

C++ Code:

#include <cuv.hpp>
using namespace cuv;
int main(void){
tensor<float,host_memory_space> h(extents[8][5]); // reserves space in host memory
tensor<float,dev_memory_space> d(extents[8][5]); // reserves space in device memory
h = 0; // set all values to 0
d=h; // push to device
sequence(d); // fill device vector with a sequence
h=d; // pull to host
for(int i=0;i<h.size();i++) {
assert(d[i] == h[i]);
for(int i=0;i<h.shape(0);i++)
for(int j=0;j<h.shape(1);j++) {
assert(d(i,j) == h(i,j));

Python Code:

import cuv_python as cp
import numpy as np
h = np.zeros((1,256)) # create numpy matrix
d = cp.dev_tensor_float(h) # constructs by copying numpy_array
h2 = np.zeros((1,256)).copy("F") # create numpy matrix
d2 = cp.dev_tensor_float_cm(h2) # creates dev_tensor_float_cm (column-major float) object
cp.fill(d,1) # terse form
cp.apply_nullary_functor(d,cp.nullary_functor.FILL,1) # verbose form
h = # pull and convert to numpy
assert(np.sum(h) == 256)
d.dealloc() # explicitly deallocate memory (optional)
Simple Matrix operations


#include <cuv.hpp>
using namespace cuv;
int main(void){
tensor<float,dev_memory_space,column_major> C(2048,2048),A(2048,2048),B(2048,2048);
sequence(A); // fill A, B with consecutive integers
apply_binary_functor(A,B,BF_MULT); // elementwise multiplication
A *= B; // operators also work (elementwise)
prod(C,A,B, 'n','t'); // matrix multiplication

Python Code

import cuv_python as cp
import numpy as np
C = cp.dev_tensor_float_cm([2048,2048]) # column major tensor
A = cp.dev_tensor_float_cm([2048,2048])
B = cp.dev_tensor_float_cm([2048,2048])
cp.apply_binary_functor(B,A,cp.binary_functor.MULT) # elementwise multiplication
B *= A # operators also work (elementwise),A,B,'n','t') # matrix multiplication

The examples can be found in the "examples/" folder under "python" and "cpp"