7878
Comment:
|
7599
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
The idea is to provide a general framework in Dynare for parallelizing tasks which require no inter-process communication. | The idea is to provide a general framework in Dynare for parallelizing tasks which require very little inter-process communication. |
Line 24: | Line 24: |
3. the Windows user on the master machine has to be user of any other slave machine in the cluster, and that user will be used for the remote computations | |
Line 30: | Line 29: |
3. the UNIX user on the master machine has to be user of any other slave machine in the cluster, and that user will be used for the remote computations 4. SSH keys must be installed so that the SSH connections from the slaves to the master can be done without passwords (see SshKeysHowto) |
4. SSH keys must be installed so that the SSH connections from the slaves to the master can be done without passwords, or using an SSH agent (see SshKeysHowto) |
This page documents the parallelization system developped by Marco Ratto for Dynare.
The idea is to provide a general framework in Dynare for parallelizing tasks which require very little inter-process communication.
The implementation is done by running several MATLAB or Octave processes, either on local or on remote machines. Communication between master and slave processes are done through SMB on Windows and SSH on UNIX. Input and output data, and also some short status messages, are exchanged through network filesystems.
Currently the system works only with homogenous grids: only Windows or only Unix machines.
Routines currently parallelized:
the Metropolis-Hastings algorithm (implemented in random_walk_metropolis_hastings.m)
the independent Metropolis-Hastings algorithm (implemented in independent_metropolis_hastings.m)
the Metropolis-Hastings diagnostics (implemented in McMCDiagnostics.m)
pm3.m (plotting routine)
Posterior_IRF.m
1. Requirements
1.1. For a Windows grid
- a standard Windows network (SMB) must be in place
PsTools must be installed in the path of the master Windows machine
1.2. For a UNIX grid
- MATLAB executable must be in the path of the slave machines
- SSH must be installed on the master and on the slave machines
SSH keys must be installed so that the SSH connections from the slaves to the master can be done without passwords, or using an SSH agent (see SshKeysHowto)
- SSHFS must be installed on the slave machines
2. Usage
The parallelization mechanism is triggered by the use of options_.parallel. By default, this option is equal to zero, no parallelization is used.
To trigger the parallelization, this option must be filled with a vector of structures. Each structure represents a slave machine (possibly using several CPU cores on the machine).
The fields are:
- Local: equal to 0 or 1. Use 0 if this slave is the local machine, 1 if it is a remote machine
PcName: for a remote slave, name of the machine. Use the NETBIOS name under Windows, or the DNS name under Unix
- NumCPU: a vector of integers representing the CPU cores to be used on that slave machine. The first core has number 0. So, on a quadcore, use [0:3] here to use the four cores
user: for a remote slave, username to be used. On Windows, the group needs also to be specified here, like DEPT\JohnSmith, i.e. user JohnSmith in windows group DEPT
- passwd: for a remote slave, password associated to the username
RemoteDrive: for a remote Windows slave, letter of the remote drive (C, D, ...) where the computations will take place
RemoteFolder: for a remote slave, path of the directory on the remote drive where the computations will take place
There is currently no interface in the preprocessor to construct this option structure vector; this has to be done by hand by the user in the MOD file.
2.1. Example syntax for win and unix, for local parallel runs (assuming quad-core)
All empty fields, except Local and NumCPU
options_.parallel = struct('Local', 1, 'PcName','', 'NumCPU', [1:3], 'user','','passwd','', 'RemoteDrive', '', 'RemoteFolder','');
2.2. Example Windows syntax for remote runs
- win passwd has to be typed explicitly!
RemoteDrive has to be typed explicitly!
for user, ALSO the group has to be specified, like DEPT\JohnSmith, i.e. user JohnSmith in windows group DEPT
PcName is the name of the computer in the windows network, i.e. the output of hostname, or the full IP adress
options_.parallel = struct('Local', 0, 'PcName','RemotePCName','NumCPU', [4:6], 'user', 'DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');
2.2.1. Example to use several remote PC's to build a grid
A vector of parallel structures has to be built:
options_.parallel = struct('Local', 0, 'PcName','RemotePCName1','NumCPU', [0:3], 'user', 'DEPT\JohnSmith', 'passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote'); options_.parallel(2) = struct('Local', 0, 'PcName','RemotePCName2','NumCPU', [0:3], 'user', 'DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'D', 'RemoteFolder','dynare_calcs\Remote'); options_.parallel(3) = struct('Local', 0, 'PcName','RemotePCName3','NumCPU', [0:1], 'user','DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote'); options_.parallel(4) = struct('Local', 0, 'PcName','RemotePCName4','NumCPU', [0:3], 'user','DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');
2.2.2. Example of combining local and remote runs
options_.parallel=struct('Local', 1, 'PcName','','NumCPU', [0:3], 'user','','passwd','','RemoteDrive', '', 'RemoteFolder',''); options_.parallel(2)=struct('Local', 0, 'PcName','RemotePCName','NumCPU', [0:1], 'user','DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');
2.3. Example Unix syntax for remote runs
no passwd and RemoteDrive needed!
PcName: full IP address or address
2.3.1. Example with only one remote slave
options_.parallel=struct('Local', 0, 'PcName','name.domain.org','NumCPU', [0:3], 'user','JohnSmith','passwd','', 'RemoteDrive', '', 'RemoteFolder','/home/rattoma/Remote');
2.3.2. Example of combining local and remote runs (on unix):
options_.parallel=struct('Local', 1, 'PcName','','NumCPU', [0:3], 'user','','passwd','','RemoteDrive', '', 'RemoteFolder',''); options_.parallel(2)=struct('Local', 0, 'PcName','name.domain.org','NumCPU', [0:3], 'user','JohnSmith','passwd','', 'RemoteDrive', '', 'RemoteFolder','/home/rattoma/Remote');
3. Informations for the Dynare developers
3.1. General architecture of the system
The generic parallelization system is organized around two routines: masterParallel and fParallel.
masterParallel is the entry point to the parallelization system. It is called from the master computer, at the point where the parallelization system should be activated. Its main arguments are the name of the function containing the task to be run on every slave computer, inputs to that function stored in two structures (one for local and the other for global variables), and the configuration of the cluster; this function exits when the task has finished on all computers of the cluster, and returns the output in a structure vector (one entry per slave)
fParallel is the top-level function to be run on every slave; its main arguments are the name of the function to be run (containing the computing task), and some information identifying the slave; the function will retrieve inputs on the filesystem, call the computing task, and transmit back the output to the master computer
3.2. Improvements to be made
Under MATLAB, add a new option to explictely set the number of threads on the slaves, and default to one thread if option not declared; note that multithreading only exist since MATLAB 7.4 (and not on Octave), and that the function maxNumCompThreads() should not be used since it will be deprecated in a future release of MATLAB (see MatlabVersionsCompatibility)
Design and implement a syntax in the preprocessor (probably using a dedicated config file and an option to the dynare command)