Parallelization and Performance Improvement of Dynare with Specialized Hardware
In addition to Ispra team FP7 grant project to parallelise Dynare Markov Chains in Matlab, there are many other options such as:
a) As suggested by Sebastian Villemot in his comments on the Ispra project, one option is to rewrite part of the Dynare code, namely Kalman Filter and Likelihood calculation in C/C++ and run multiple independent MC chains over the network using standard OS utilities,
b) Also, another would be to utilize the already implemented model block decomposition (by Ferhat Mihoubi) and solving the individual blocks may then be perallelised once rewritten in C/C++ (e.g. using k_order_pert) too,
c) Then, once rewritten in C/C++, we can also run the multiple block solvers and chains on special parallel processors: Graphics or other specialized hardware platforms. In addition, single thread calculations could be faster on such matrix computation specialized hardware.
• c.1.) NVIDIA provides free of charge parallelizing extensions SDK for C called CUDA that provides developers to parallelise and run lightweight threads on its proprietary multi-processor GPU platforms which range from add-on, home-user Geo-Force and professional Quadro graphics cards, to specially built computational platforms (Tesla). It comes with BLAS and FFT libraries already ported to its parallel hardware. It ssupports Linux 32/64-bit, Windows XP 32/64-bit, and Mac operating systems. See: http://www.nvidia.co.uk/object/cuda_what_is_uk.html
• c.2) RapidMind provides chargeable parallelization metaprogramming extensions to C/C++ that are platform portable, portable across several CPU/GPU platforms and allows parallel programs to be deployed on either of ATI/AMD or Nvidia GPUs, Intel/AMD, IBM Cell CPUs or on few other processors as well as a variety of operating systems, Linux and Windows including. See: http://www.rapidmind.net/product.php
Both parallelising solutions require changes to standard C/C++ programs.
Advantages of NVIDIA processors and its platform over many other proprietary parallel hardware platforms with proprietary SDKs are:
1) its processor would be generally faster in performing vector multiplication and single-thread recursive Kalman Filter estimation for the initial ML calculation than a general CPU such as Intel even whet it is in Multi-processor farm. 2) to start with its, add-on graphics cards are cheaper and widely available, 3) its parallelizing software extension is free of charge, and, 4) it comes with already ported parallel BLAS and FFT libraries.
The main advantage of RapidMind, though chargeable, would be that it is platform portable and this includes NVIDIA GPUs.