Cluster
The Bistromath cluster consists of 8 Apple Xserve G5s. Each has dual 2 or 2.3 GHz PowerPC processors and 4 GB of RAM. The machines are bistromath[0-7].csail.mit.edu (also nicknamed bm[0-7].csail.mit.edu for convenience). The machines have a status page which shows the load of the complete cluster and who is using what. The page is update every 5 minutes, so may not be perfectly accurate.
Accounts
The clusters have a separate account and home directory structure from the normal CSAIL accounts. The home directories are on an NFS share provided by TIG and are backed up. To get an account, talk to Nick Matsakis <matsakis·mit.edu> or Sarah Finney <sjf·csail.mit.edu>, who will give you a temporary password (we use the same usernames and ids as CSAIL).
Access to the machines is primarily via SSH. If you want to log in from a trusted machine without a password, there is a tip in the Q & A section below on how to do this. If you need graphical access, VNC is an option but there are some subtleties to getting it running. Talk to Nick Matsakis <matsakis·mit.edu> for details.
When you log in for the first time, please change your password using the 'passwd' program. If you do not want to use bash as your shell, use the 'chsh' program. For example, 'chsh /bin/tcsh' changes your shell to tcsh. You may also want to copy over your .bash_profile or .cshrc file from some other system.
Kerberos and AFS
The machines have AFS installed. For full access to your CSAIL AFS directory, you need to get CSAIL kerberos tickets and run aklog. We've installed a script called 'csail' which does both of these things and prints out the path to your csail home directory (you'll have to enter your password, of course). If you are going to use AFS frequently, you'll probably want a symlink. note: the longjob and longsession scripts are not supported on the new cluster configuration. Writing to your NFS home directory is the preferred way to store output of software runs.
If you need to run a job for more than eight hours, and that job needs to access files in your AFS home directory (or any other AFS files), you will need to learn more.
Bistromath load page
There is a web page that lists the CPU and memory usage for both the bistromath and sep machines here:
http://lis.csail.mit.edu/bistromath
There is a script that runs on each machine that is responsible for updating the appropriate information to the appropriate place so the web page can find it, every fives minutes. On the linux machines, this is done with a cron job . The cron job is defined in /etc/cron.d/loadlister_works If the cron job needs to be restarted, you can do that with sudo /etc/init.d/cron restart.
Perhaps Nick will write something useful about how this process gets started on the bistromath machines...
The scripts that are actually run by the cron jobs (or whatever it is on the bistromaths) are in /bistromath/software/scripts. They are Python scripts.
UNIX Programs
There are two primary sources of unix software installed on the machines: the built-in Mac OS X software and the Darwinports software packages. The built-in Mac software is of the standard unix variety, though it comes from a BSD heritage and so may differ subtly from the GNU versions typically found in linux distributions (for example, in supported command-line arguments).
The other source of software for the machines is Darwinports, which is a packaging system for Mac OS X. You will automatically get Darwinports's programs in your path when you log in. Chances are good that if you want open source software it is available through Darwinports. If some software you'd like to use is missing, check to see if a port is available and then ask a cluster administrator to install the package you need. If what you need isn't in Darwinports, we can probably still install it provided it runs on Mac OS X.
Many of the UNIX programs installed on these machines use X for their GUI, so it is possible to use their GUI over the network, just as you would on a Linux machine (Emacs is one example). The ssh server on each computer is set up to forward X connections. Java is a notable exception since Mac OS X Java uses the native graphics interface. See below for advice about using Java.
Java Programs
As mentioned above, Java GUIs cannot be run on the cluster because Java uses the native Mac OS X graphics technology and not X. If your code doesn't require a GUI, it should run fine. Note that some Java classes (such as BufferedImage, for example) try to initialize the GUI even if they don't actually use it. Normally, this will cause an exception to be thrown and your program to stop. You can prevent this behavior with the "-Djava.awt.headless=true" flag. That will enable you to do anything that doesn't require actually displaying something on the screen (and if you do try this, a HeadlessException is thrown).
The big secrets of Java on the cluster:
- Just start all of your programs with "-Djava.awt.headless=true" - this prevents the JVM from trying to access the Mac GUI (see above) and will let you run any program, so long as you don't try to execute GUI code (in which case an exception is thrown)
- You can use Java debuggers, because the Java debugging standard works over the network. To use JSwat, for example, start it on your local workstation and type "listen <port number>" at the prompt at the bottom of the GUI, then run your Java program on the cluster with options "-Xdebug -Xrunjdwp:transport=dt_socket,address=<your workstation:port number>". This causes the JVM to contact the debugger, and then you can debug your program as if it was running on your local machine.
- JProfiler also works remotely, similarly to the way a debugger works. You need to create a remote session in the JProfiler GUI on your workstation. Then run the JVM on the cluster with the following options (assume you've unpacked the JProfiler tar-ball in the directory $JPROFILER) "java -Xrunjprofiler -XX:-UseSharedSpaces -Xbootclasspath/a:$JPROFILER/bin/agent.jar <your other options and your startup class here>" In order for this to work, you'll need to set the DYLD_LIBRARY_PATH variable so it contains the path the JNI libraries JProfiler requires ($JPROFILER/bin/macos).
C/C++ programs
The C/C++ compiler on Mac OS X is gcc/g++. The default version is 4.0, version, 3.3 is available as well. C++ code is especially sensitive to compiler version. G++ 3 conforms much more closely to the ANSI/ISO C++ standard than version 2.95 did, and this means that some code that the 2.95 code considered valid won't compile in the new version.
Note that the PowerPC processors are used in big endian mode in Mac OS X, while Intel x86 processors are little endian. That means that the bytes in an integer are ordered from most significant to least significant, while Intel machines order them from least significant to most significant. This is typically not an important issue unless you are reading and writing binary data (in which case you can deal with it using the macros that translate to and from "network byte order" (which is big endian) or by making your data ASCII-formatted) or if you are doing low-level bit-fiddling.
Note that this is usually not a problem with high level languages such as Java (which uses big-endian representations no matter what platform you run on, by the way).
Matlab
The bistromath machines have version 2006b installed. Although the Matlab GUI uses Java (and therefore won't work over remote logins, see the notes on Java), Matlab on Mac OS X uses X windows for figures and output (which will work just fine over a remote login). Running 'matlab' at the command line executes a script which runs matlab with the '-nojvm' option to disable the GUI. To run matlab directly, you'll have to add <code>/bistromath/software/matlab/2006b/ppc/bin</code>
To run matlab on an sep machine, you must be sure that you have a ".software" file in your home directory that includes matlab.
Perl, Python, Tcl/Tk, ML, Lisp, etc.
These languages either are installed by default (Perl, Python) or can be easily added if people want them. Using them on Mac OS X shouldn't be significantly different from using them in Linux.
64 Bit
These machines have 64 bit processors, which means you can compile and run 64 bit clean code if you want. Take a look at Apple's Resources for more information. For clean your 64-bit C++ code use Viva64 migration tool.
Q & A
- Can I create big files/how does space allocation work?
It's not a problem to create big files, but please delete them if you're not using them anymore. Deleted files still stick around for about 1 more week in the snapshot backups, so if you create a set of huge files and wait to delete them, that may cause problems for other users wanting to save data up to 1 week later. Check out "df" for information on the current usage.
- Hey! Why is my X11 emacs crashing?
Chances are good you're logging into the cluster from a 10.4 Mac using 'ssh -X' for X11 forwarding. Due to changes in ssh in 10.4, this doesn't work so well with emacs. Instead, use 'ssh -Y' for trusted X11 forwarding.
- Hey, why can't I use my Kerberos password at login, automatically get tickets on login, etc.?
We have tried to do this many times, with varying degrees of success. It's really not worth it.
- Hey, do I have to keep typing my password?
Well, no. If you want, you can set up a private-public key pair to authenticate you to the machine. The idea is that the private key stays on a machine you trust (e.g. your laptop) and the public key sits in your bistromath home directory. If for some reason your private key is compromised (e.g. someone steals your laptop) then you need to log into the bistromath and remove your public key from the authorized keys list. You should type the following at your trusted machine to generate a key pair and install it on the bistromaths. Just accept the defaults (unless you already have set this up with other key pairs, in which case you're on your own).
% ssh-keygen -t rsa
% scp ~/.ssh/id_rsa.pub username@bm0.csail.mit.edu:~/.ssh/authorized_keys2
- Hey, what is Darwin?
When you login to the servers, you'll note that the message welcomes you to Darwin. Darwin is the open source, UNIX software underlying Mac OS X (essentially, the Mac OS X GUI and Apple GUI applications are proprietary, while everything else is open source). Darwin is basically a port of the Mach microkernel and FreeBSD 4.x/5.x to the PowerPC processor.
- Hey, why does [command x] work differently on Linux and on Mac OS X?
In many cases, some command-line tool will work a little differently, or have slightly different options on Mac OS X and Linux. Usually the reason is that Linux distributions tend to include the GNU versions of tools (such as ls, top, etc.) while Mac OS X usually uses the BSD version. Although they provide roughly the same functionality, these programs were usually developed independently and have small differences. For example, on Linux top displays a list of the running processes, sorted by CPU usage by default. On Mac OS X, top sorts the running process by process id number (larger number first). To get CPU-usage sorting, you should use the "-ocpu" option.
- Hey, how can the software be updated?
Talk to a cluster administrator to update the software.