StarPU Handbook
|
Some libraries need to be initialized once for each concurrent instance that may run on the machine. For instance, a C++ computation class which is not thread-safe by itself, but for which several instanciated objects of that class can be used concurrently. This can be used in StarPU by initializing one such object per worker. For instance, the libstarpufft example does the following to be able to use FFTW on CPUs.
Some global array stores the instanciated objects:
At initialisation time of libstarpu, the objects are initialized:
And in the codelet body, they are used:
This however is not sufficient for FFT on CUDA: initialization has to be done from the workers themselves. This can be done thanks to starpu_execute_on_each_worker(). For instance libstarpufft does the following.
TODO
Talk about STARPU_LIMIT_CUDA_devid_MEM, STARPU_LIMIT_CUDA_MEM, STARPU_LIMIT_OPENCL_devid_MEM, STARPU_LIMIT_OPENCL_MEM and STARPU_LIMIT_CPU_MEM
When using StarPU on a NetBSD machine, if the topology discovery library hwloc
is used, thread binding will fail. To prevent the problem, you should at least use the version 1.7 of hwloc
, and also issue the following call:
$ sysctl -w security.models.extensions.user_set_cpu_affinity=1
Or add the following line in the file /etc/sysctl.conf
security.models.extensions.user_set_cpu_affinity=1
Some users had issues with MKL 11 and StarPU (versions 1.1rc1 and 1.0.5) on Linux with MKL, using 1 thread for MKL and doing all the parallelism using StarPU (no multithreaded tasks), setting the environment variable MKL_NUM_THREADS to 1, and using the threaded MKL library, with iomp5.
Using this configuration, StarPU uses only 1 core, no matter the value of STARPU_NCPU. The problem is actually a thread pinning issue with MKL.
The solution is to set the environment variable KMP_AFFINITY to disabled
(http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm).
If your application only partially uses StarPU, and you do not want to call starpu_init() / starpu_shutdown() at the beginning/end of each section, StarPU workers will poll for work between the sections. To avoid this behavior, you can "pause" StarPU with the starpu_pause() function. This will prevent the StarPU workers from accepting new work (tasks that are already in progress will not be frozen), and stop them from polling for more work.
Note that this does not prevent you from submitting new tasks, but they won't execute until starpu_resume() is called. Also note that StarPU must not be paused when you call starpu_shutdown(), and that this function pair works in a push/pull manner, ie you need to match the number of calls to these functions to clear their effect.
One way to use these functions could be: