Differences between revisions 3 and 4
Revision 3 as of 2009-11-10 15:30:55
Size: 6059
Comment:
Revision 4 as of 2009-11-17 14:52:40
Size: 7012
Comment:
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
 * the independent Metropolis-Hastings algorithm (implemented in {{{independent_metropolis_hastings.m}}})
Line 114: Line 115:

= Informations for the Dynare developers =

== General architecture of the system ==

== Improvements to be made ==

 * Under UNIX, drop the use of SSH FS (apparently no longer needed)
 * Support MacOS (use {{{isunix() || ismac()}}} in OS tests, since MacOS is basically a Unix variant)
 * Under UNIX, don't use ifconfig to determine local and remote IPs, since this is not a portable technique; rather add a new option to explictly declare the local IP, and use the existing Pc``Name option to determine the remote IP
 * Under MATLAB, add a new option to explictely set maxNumCompThreads() on the slaves, and default to one thread if option not declared; need to determine in which version of MATLAB the function maxNumCompThreads() appeared: test it with matlab_ver_less_than()
 * Design and implement a syntax in the preprocessor

This page documents the parallelization system developped by Marco Ratto for Dynare.

The idea is to provide a general framework in Dynare for parallelizing tasks which require no inter-process communication.

The implementation is done by running several MATLAB or Octave processes, either on local or on remote machines. Communication between master and slave processes are done through SMB on Windows and SSH on UNIX. Input and output data, and also some short status messages, are exchanged through network filesystems.

Currently the system works only with homogenous grids: only Windows or only Unix machines.

Two routines are currently parallelized:

  • the Metropolis-Hastings algorithm (implemented in Random_walkMetropolis_hastings.m)

  • the independent Metropolis-Hastings algorithm (implemented in independent_metropolis_hastings.m)

  • the Metropolis-Hastings diagnostics (implemented in McMCDiagnostics.m)

1. Requirements

1.1. For a Windows grid

  1. a standard Windows network (SMB) must be in place
  2. PsTools must be installed in the path of the master Windows machine

  3. the Windows user on the master machine has to be user of any other slave machine in the cluster, and that user will be used for the remote computations

1.2. For a UNIX grid

  1. MATLAB executable must be in the path of the slave machines
  2. SSH must be installed on the master and on the slave machines
  3. the UNIX user on the master machine has to be user of any other slave machine in the cluster, and that user will be used for the remote computations
  4. SSH keys must be installed so that the SSH connections from the slaves to the master can be done without passwords (see SshKeysHowto)

  5. SSHFS must be installed on the slave machines

2. Usage

The parallelization mechanism is triggered by the use of options_.parallel. By default, this option is equal to zero, no parallelization is used.

To trigger the parallelization, this option must be filled with a vector of structures. Each structure represents a slave machine (possibly using several CPU cores on the machine).

The fields are:

  • Local: equal to 0 or 1. Use 0 if this slave is the local machine, 1 if it is a remote machine
  • PcName: for a remote slave, name of the machine. Use the NETBIOS name under Windows, or the DNS name under Unix

  • NumCPU: a vector of integers representing the CPU cores to be used on that slave machine. The first core has number 0. So, on a quadcore, use [0:3] here to use the four cores
  • user: for a remote slave, username to be used. On Windows, the group needs also to be specified here, like DEPT\JohnSmith, i.e. user JohnSmith in windows group DEPT

  • passwd: for a remote slave, password associated to the username
  • RemoteDrive: for a remote Windows slave, letter of the remote drive (C, D, ...) where the computations will take place

  • RemoteFolder: for a remote slave, path of the directory on the remote drive where the computations will take place

There is currently no interface in the preprocessor to construct this option structure vector; this has to be done by hand by the user in the MOD file.

2.1. Example syntax for win and unix, for local parallel runs (assuming quad-core)

All empty fields, except Local and NumCPU

options_.parallel = struct('Local', 1, 'PcName','', 'NumCPU', [1:3], 'user','','passwd','',
'RemoteDrive', '', 'RemoteFolder',''); 

2.2. Example Windows syntax for remote runs

  • win passwd has to be typed explicitly!
  • RemoteDrive has to be typed explicitly!

  • for user, ALSO the group has to be specified, like DEPT\JohnSmith, i.e. user JohnSmith in windows group DEPT

  • PcName is the name of the computer in the windows network, i.e. the output of hostname, or the full IP adress

options_.parallel = struct('Local', 0, 'PcName','RemotePCName','NumCPU', [4:6], 'user',
'DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');

2.2.1. Example to use several remote PC's to build a grid

A vector of parallel structures has to be built:

options_.parallel = struct('Local', 0, 'PcName','RemotePCName1','NumCPU', [0:3], 
'user', 'DEPT\JohnSmith', 'passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');

options_.parallel(2) = struct('Local', 0, 'PcName','RemotePCName2','NumCPU', [0:3], 
'user', 'DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'D', 'RemoteFolder','dynare_calcs\Remote');

options_.parallel(3) = struct('Local', 0, 'PcName','RemotePCName3','NumCPU', [0:1], 
'user','DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');

options_.parallel(4) = struct('Local', 0, 'PcName','RemotePCName4','NumCPU', [0:3], 
'user','DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');

2.2.2. Example of combining local and remote runs

options_.parallel=struct('Local', 1, 'PcName','','NumCPU', [0:3],
 'user','','passwd','','RemoteDrive', '', 'RemoteFolder','');

options_.parallel(2)=struct('Local', 0, 'PcName','RemotePCName','NumCPU', [0:1], 
'user','DEPT\JohnSmith','passwd','****', 'RemoteDrive', 'C', 'RemoteFolder','dynare_calcs\Remote');

2.3. Example Unix syntax for remote runs

  • no passwd and RemoteDrive needed!

  • PcName: full IP address or address

2.3.1. Example with only one remote slave

options_.parallel=struct('Local', 0, 'PcName','name.domain.org','NumCPU', [0:3], 
'user','JohnSmith','passwd','', 'RemoteDrive', '', 'RemoteFolder','/home/rattoma/Remote');

2.3.2. Example of combining local and remote runs (on unix):

options_.parallel=struct('Local', 1, 'PcName','','NumCPU', [0:3], 
'user','','passwd','','RemoteDrive', '', 'RemoteFolder','');

options_.parallel(2)=struct('Local', 0, 'PcName','name.domain.org','NumCPU', [0:3], 'user','JohnSmith','passwd','', 'RemoteDrive', '', 'RemoteFolder','/home/rattoma/Remote');

3. Informations for the Dynare developers

3.1. General architecture of the system

3.2. Improvements to be made

  • Under UNIX, drop the use of SSH FS (apparently no longer needed)
  • Support MacOS (use isunix() || ismac() in OS tests, since MacOS is basically a Unix variant)

  • Under UNIX, don't use ifconfig to determine local and remote IPs, since this is not a portable technique; rather add a new option to explictly declare the local IP, and use the existing PcName option to determine the remote IP

  • Under MATLAB, add a new option to explictely set maxNumCompThreads() on the slaves, and default to one thread if option not declared; need to determine in which version of MATLAB the function maxNumCompThreads() appeared: test it with matlab_ver_less_than()
  • Design and implement a syntax in the preprocessor

DynareWiki: ParallelDynare (last edited 2012-05-09 10:05:10 by HoutanBastani)