Friday, May 18, 2007

Matlab Licenses renewed on Opportunity cluster

Just a quick FYI,

The Matlab licenses on the Opportunity cluster have been renewed by the Provost office!

I would like to thank everyone involved!


Leo

Monday, March 26, 2007

New Cluster 101 slides for the Opportunity cluster

Below is a link to some introductory training slides for the Opportunity cluster. If you are new to clusters... or a seasoned expert, you might find these slides beneficial.

http://opportunity.neu.edu/blog/cluster101-Opportunity.pdf


Leo

Tuesday, January 23, 2007

Xilinx's ISE ver. 9.1i Installed on all compute nodes

In the digital world, there are two types of electronic chips: memory and logic. Memory chips are used to store information. Logic chips are used to manipulate, or interface with, the information contained in memory.
Programmable Logic Devices (PLDs) are "off the shelf" logic chips that the customer, rather than the chip manufacturer, programs to perform a specific function. With the ability to program their own chips, customers realize two key benefits: product design flexibility and faster time to market. Given today's shorter product life cycles, both of these factors can be critical determinants of a product's ultimate success. Electronic equipment manufacturers rely upon PLDs to make fast design changes, accommodate uncertain production volumes, and accelerate the introduction of their products to the market place

To access Xilinx's ISE follow the following 4 steps:

1) Login into the cluster via an Xterminal of your choice
2) SSH into a compute node
3) Type: 'source /usr/local/Xilinx91i/settings.sh' into your shell
4) Type:'ise' to launch the Xilinx graphical interface.

If you have any questions about xilinx, please let me know.

Leo

Tuesday, January 02, 2007

Where users can store information/data outside of their home directory on the Opportunity Cluster.

There two unquoted file systems all Opportunity Users and the clustered compute nodes have access to: Scratch_local and Scratch_Global

*** Please note these areas are NOT backed up *****

The system has been configured to have both local and global scratch disk locations for the temporary storage of non-critical data. This global scratch space /scratch_global is an NSF mounted file share from the front end to the rest of the cluster. The local scratch space /scratch_local is unique disk space residing on the individual compute node (hence the term local)

Polices regarding the longevity of files on these file systems is 30 days.

If you have a need for temporary use of storage, please feel free to use either of these two spaces.

Leo

Wednesday, October 04, 2006

Gaussian 03 installed on Opportunity

The Gaussian 03 software package has been installed onto the Opportunity cluster and available for use. If you would like to use this software product, please contact l.hill@neu.edu for further infomation.

Approved Policy Exception Request

Opportunity Cluster
Policy Exception Request

Title/description of the research project:

My work is in the area of large search space enumeration. Typically, the search spaces we deal with do not fit in the distributed memory of a cluster and our algorithms take advantage of disk to both speed in the discovery process as well as make the discovery possible. We have a disk-based technique for exploring one of these search spaces (one with applications in the area of computational group theory) and are looking to run that computation on opportunity. The search space has 11 billion nodes, each of which requires 50 bytes of storage (compressed to 12 bytes), along with a small amount of additional storage for the path to reach that node.

Is this project in support of an externally funded grant or award: Yes

National Science Foundation grant number ACIR-0342555,
``Collaborative Research: Tuning Libraries to Effectively
Exploit the Memory Hierarchy'', co-principal investigator
(joint with David Kaeli (PI, Northeastern U.) and Misha Kilmer
(PI, Tufts U.)),
2004-2007,


Estimated system requirements:

# Nodes/CPU: 50 (only one of the dual processors)
Amount of Disk (local or global): 9GB per machine
Memory Usage: 100 Megabytes of RAM per node
Network : At least Gigabit Ethernet
Running time: 4 days
Start Date: ASAP (Thurs. Evening – Mon. Morning acceptable)

Expected Research Findings:

We aim to discover a property of the group we are working on (Brauer Tree structure). To do this, we will be running a separate computation over the results obtained here. This, however, can be done externally after the data is transferred from the cluster (approximately 500 gigabytes).

Monday, June 12, 2006

pMatlab is available on the Opportunity Cluster

pMatlab has been installed on the Opportunity Cluster . To use pMatlab, copy the
following startup.m file into the directory you will be launching your Matlab jobs from.

cp /scratch_global/pMatlab/startup.m ~/.


To validate you have access to the pMatlab tools, one can enter at the Matlab command
line "help pMatlab"

Leo

Thursday, June 08, 2006

The Provost office has renewed the Matlab license on the Opportunity Cluster

Please join me in extending a THANK YOU to the Provost Office
for sponsoring the renewal of the Matlab license on the Opportunity
Cluster. The new license has been installed and is valid through
June 2007.

Leo

Monday, May 08, 2006

How to Run NAMD on the Opportunity Cluster

NAMD is a parallel molecular dynamics application

  • Copy the files NAMD_2.6b1_Linux-i686.tar.gz (NAMD binary) and apoa1.tar.gz (sample NAMD simulation) from the /scratch_global/library/ folder and unzip/untar them in your home directory with the following commands:

    cd
    cp /scratch_global/library/apoa1.tar.gz .
    cp /scratch_global/library/NAMD_2.6b1_Linux-amd64-TCP.tar.gz .
    tar -xzf apoa1.tar.gz
    tar -xzf NAMD_2.6b1_Linux-amd64-TCP.tar.gz

  • Change into the NAMD directory

    cd NAMD_2.6b1_Linux-amd64-TCP

  • Use a text editor to create the file nodelist containing:

    group main
    host opportunity.neu.edu
    host compute-1-1
    host compute-1-2
    host compute-1-3

    The nodelist file tells NAMD what nodes to run on. When we run under the queueing system below we'll use a script to create this file. NAMD does all of it's I/O on the first node, so by including the master node in the calculation we can access fileservers or disks that are only available to the master.
  • Start NAMD on all four machines with:

./charmrun ++remote-shell ssh ++nodelist nodelist +p4 ./namd2 ~/apoa1/apoa1.namd

If you have problems, or want to see what's going in in the launch process, add ++verbose to the charmrun command line.

  • When NAMD reaches the line that says "TIMING 20 ..." kill it with Control-C and jot down the wallclock s/step number.

Below is an example of an LSF script of a NAMD job. To run it do the following:


* Place the contents below into a file "batchjob" in your NAMD directory
* The job is currently set up for 10 processors (note the 10's through the script)

################################################################
# hello_cpu10.lsf

# LSF demo NAMD job script.
#
# Use:
# bsub < batchjob
#
#
################################################################
# Define the working directory (RUNDIR), the program to run (PROG),
# the number of CPUs (NPROC and -n), and the output (-o) and error (-e) files:

RUNDIR=$HOME/NAMD_2.6b1_Linux-amd64-TCP
PROG=hello
NPROC=10
#BSUB -n 10
#BSUB -o hello_n10.out
#BSUB -e hello_n10.err

################################################################

# Nothing to edit below this line.

rm -f $RUNDIR/nodelist
echo 'group main' >> $RUNDIR/nodelist
for host in $LSB_HOSTS
do
echo host $host >> $RUNDIR/nodelist
done


./charmrun ++remote-shell ssh ++nodelist $RUNDIR/nodelist +p10 ./namd2 ~/apoa1/apoa1.namd


################################################################


Tuesday, April 11, 2006

MPI error when Semaphore table fills

What does p4_error: semget failed for setnum: 0 mean
p4_error: semget failed for setnum: 0


This means that the maximum number of allowed semaphores on the master node has been created, and the program you are trying to run cannot allocate a new semaphore for inter-process communication. This can happen when somebody has been testing software that does not exit properly, leaving semaphores and shared memory segments allocated.

If the leftover semaphores are owned by you, it can be fixed by running the following two commands:

/opt/mpich/gnu/sbin/cleanipcs
cluster-fork /opt/mpich/gnu/sbin/cleanipcs

(In this case, using the intel or gnu version doesn't matter. The scripts are identical.)
It is possible that other users may have filled up the semaphore table. In this case, either they or root will need to clean the tables.

To find out who else may be using semaphores, you can execute the commands

ipcs (on opportunity)
cluster-fork ipcs (on opportunity)

Lava Queue redesign



Working with Platform support we have now been able to officially disable batch jobs going to Compute-1-1 and Compute-1-2. These nodes have been slated for Interactive testing.

Friday, March 17, 2006

New Lava Job Queue Limits


It has been brought to my attention that frequently users will need to submit jobs that will consume more than 2 gig of memory ( 2gig is equal to total memory divided by the number of job slots/processors on the compute nodes).

When this occurs, hence, a jobs requiring 3.5 or more gig get released for running. This job and others can become staved of memory resources when two jobs are running on the same node.

To help relieve this issue we have set a couple of conditionals, before a new job can be released into a node’s free job slot. The new conditionals are as follows:

CPU Load (1 and 15 minute) <= .6 / .8
Memory available >= 2gig (Not including swap)


If you have any questions, please send you questions to l.hill@neu.edu

Thursday, February 16, 2006

New GNU MPI programming examples

Here is a link to some new infomation/examples regarding running GNU compiled MPI code on the Opportunity Cluster.

If you have any questions feel free to contact me.

http://opportunity.neu.edu/docs/mpi-gnu-opportunity.htm


Leo