Browsing all articles in Uncategorized
Apr
30

MATLAB: Easily perform large batch jobs on multiple machines (without MATLAB Distributed Computing Server)

Author Robert Coop    Category Uncategorized     Tags

The Problem: Too Many Test Runs, Too Little Time

I’ve often run into a problem where I need to test many different parameter settings in an experiment.  Typically, I need to test some sets of possible parameters, and I need to run many random repetitions for each parameter set in order to calculate statistical goodness.  (You are using statistical significance in all of your experiments, right?)

In my lab, we have access to a decent amount of computing resources.  We have 14 dedicated machines with 80 cores spread amongst them.  However we do not have a batch job dispatcher, nor do we have a copy of MATLAB’s Distributed Computing Server.  So, I have to SSH into each machine, start multiple copies of MATLAB, set up the parameters, run the test, and collate the results.

Until now…

Coordinating MATLAB Workers Across Machines

Here’s how to pull this off.  You’ll need a pool of machines with access to a common file system.

The MATLAB Batch Job

The MATLAB batch command allows one to queue a job for execution by a MATLAB parallel worker.  It is very useful for running copies of a script in parallel (locally), but it requires a bit of finagling in order to use this to coordinate workers between machines.  An important detail about the batch command is that it uses the local scheduler;  the local scheduler (by default) stores job information in the .matlab directory in your home directory.  If you have a home shared across machines, then this behavior is not what we want.  It will cause conflicts among

%d bloggers like this: