VLF Group compute cluster – Matlab

This information is out-of-date. This was written for the distributed computing server, which we decided not to purchase. If you have a need for it, contact nansen’s administrator.

Matlab job submission

This information is out-of-date. This was written for the distributed computing server, which we decided not to purchase. If you have a need for it, contact nansen’s administrator.

Matlab job submission

We are evaluating the distributed computing server, which allows certain MATLAB constructs to be parallelized across the cluster nodes. Even without the distributed computing server, it is of course possible to submit multiple MATLAB jobs and have them execute on the compute nodes, but they will not be able to communicate to each other unless you specifically write MPI code to do so.

With the distributed computing server, Matlab provides additional constructs to parallelize programs. The constructs and styles supported by Matlab are many — we discuss two of the most common below, the embarrassingly parallel style and the parfor style.

Embarrassingly parallel jobs

“Embarrassingly parallel” refers to jobs that have absolutely no dependence on each other and can run in any order. A common task might be to run a script for a variety of different parameters.

The first piece required by Matlab is a top-level script that configures the scheduler and submits a job. The entirety is included here, with comments:


  % Get scheduler object
  sched = findResource('scheduler', 'type', 'torque');
   
  % Define various characteristics of the cluster.
  % These values will be the same for every job.
  set(sched, 'ClusterSize', 32);
  set(sched, 'ClusterOsType', 'unix');
  set(sched, 'RcpCommand', 'scp');
  set(sched, 'RshCommand', 'ssh');
  
  % The MATLAB root directory.
  % Make sure this matches the version of MATLAB selected with the
  % module command.
  set(sched, 'ClusterMatlabRoot', '/usr/local/MATLAB/R2010b_DCS');

  % SET THIS TO YOUR OWN DIRECTORY, THE ONE CONTAINING THESE SCRIPTS.
  SCRIPTDIR = '/shared/users/username/scriptdir;
  set(sched, 'DataLocation', SCRIPTDIR);

The above commands configure the scheduler and are required for every parallel job. The next step is to create a job. In this case, we are using the createJob command, which creates a job object to run many instances of a script as separately scheduled jobs:

 

  % Create a simple job.  PathDependencies adds to the Matlab PATH so 
  % it can find the worker script 'test1worker'.
  pjob = createJob(sched, 'PathDependencies', {SCRIPTDIR});

The next step is to create a task. This configures the function to call. In this case, the function to run on all nodes is called test1worker. The third argument is the number of output arguments expected from this function. Be sure to configure this correctly, or your jobs will hang. The fourth argument specifies the input arguments for each instance of this script, as a cell array of cell arrays:


  % Define a task for this job.  Arguments for the workers are set as
  % a cell array of cell arrays.  This will start 4 workers, with input 
  % arguments 1, 2,3, and 4, respectively
  task = createTask(pjob, ...
                    @test1worker, ...   % function to call
                    1,  ...             % number of output arguments
                    {{1},{2},{3},{4}}); % Input arguments

Finally, submit the job and wait for completion:


  % Submit as a batch job.
  disp('Submitting job');
  submit(pjob);

  disp('Waiting for completion');
  waitForState(pjob);

The output from each worker will be returned as an entry in a cell array:


  out = getAllOutputArguments(pjob);
  
  disp('Printing results');
  celldisp(out)

For reference, here is a sample function:


  function x=test1worker( inputarg )
    x = rand(inputarg);

Distributed parfor

Another useful Matlab parallel construct is the parfor loop. This operates like a for loop but with some special restrictions (like no side effects) to allow the loop iterations to happen in any order.

Again, we set up a job submission script as above, configuring information about Matlab and the scheduler. But here, instead of using createJob, we use createMatlabPoolJob. Note that a matlab pool job has semantics very similar to that of the matlabpool command, but unlike matlabpool, it can span multiple compute nodes and is submitted through a job engine, that is, it is not interactive.


  set(sched, 'ResourceTemplate', '-l nodes=1:ppn=4');

  % Create matlabpool job and add directories to the MATLAB search path.
  pjob = createMatlabPoolJob(sched, 'PathDependencies', {SCRIPTDIR});
   
  % Specify the required number of workers.
  set(pjob, 'MinimumNumberOfWorkers', 4);
  set(pjob, 'MaximumNumberOfWorkers', 4);

Note that we have also specified how we want this job to be distributed. We are requesting 4 workers total, with 1 node (nodes=1) and 4 processes per node (ppn=4). Finally, we create the task and submit the job. Note that only 1 input argument is allowed with this parallel programming model. In this case, we are simply passing the value 20 to the script, and expecting back 2 output arguments.


  % Define a task for this job.
  % @test2worker is the function to run in parallel.
  % 2 is the number of output arguments.
  % 20 is the input argument (there is only 1).
  task = createTask(pjob, @test2worker, 2, {20}); 
   
  % Submit as a batch job.
  disp('Submitting job');
  submit(pjob);

Then we wait for completion and print out the output arguments. Notice that the output is still a cell array, one for each output argument:


  disp('Waiting for completion');
  waitForState(pjob);
  
  out = getAllOutputArguments(pjob);
  
  out1 = out{1};
  out2 = out{2};
  for ii=1:length(out{1})
    hostname = out2{ii};
    fprintf('host: %s, pid: %d\n', hostname(1:end-1), out1(ii));
  end

And last, here is the worker script actually run by the above example:

  function [pids,hostname]=test2worker( inputarg )
    parfor( ii=1:inputarg )
      % This will fill the output arrays of the process id and hostname of
      % the worker currently handling this part of the parfor.
      feature getpid;
      pids(ii) = ans;
      [~,tmp] = system('/bin/hostname');
      hostname{ii} = tmp;
    end;