Directions for using CUDA on the NMSU CS Bigdat cluster
written by Jonathan Cook, May 29, 2014, joncook@nmsu.edu

These are the basic, scripted directions. I am not an expert of nor even a normal user of CUDA, so I don't have any insight past the knowledge that this scripted example works. If you are going to use CUDA regularly, you will need to become an expert. I am sure that there are many many capabilities that will help you!

This example just steps through building and executing one of the sample programs that comes with the NVidia CUDA library. Also note that the NVidia user commands are installed in /usr/bin, so they are available (e.g., "nvidia-smi" will print stats of the local board, BUT the head node does not have one!, so these must be used in a "qsub -I" interactive session on one of the GPU nodes.

1. Make a directory for the example:

>  mkdir matrixMultiply
>  cd matrixMultiply/

2. Copy the NVidia example into your directory:

>  cp /usr/local/cuda-6.0/samples/0_Simple/matrixMul/* .

3. Change line 134 (approx) of Makefile from a relative include path to absolute:

>  vi Makefile 
< INCLUDES  := -I/usr/local/cuda-6.0/samples/common/inc
---
> INCLUDES  := -I../../common/inc

The Makefile near the bottom also includes actions to copy/install the binary into a relative path "../../bin" -- you can disable or change this if you desire.

4. Build the executable:

>  make

This will use "nvcc" to compile your program. I THINK it will pick up whatever "gcc" is in your path; note that the system "gcc" or "cc" is an older version, but you can do a module load gcc-4.7.2 for a newer version.

5. Create/grab a Torque submission script.

>  cp /nmsu/examples/matrixMultiply/sub-matmul.sh .

The line above just puts this in your source directory; I normally create a "run" directory and run my submissions from there. You can do what you want.

6. Run the app by submitting the script as a Torque job

>  qsub sub-matmul.sh 

7. Check to see if it is running (this example is fast, about 10 seconds, so it may be done by the time you do this. Do it fast!

>  qstat

8. Look at the output. You will have two files, named "sub-matmul.sh.o###" and "sub-matmul.sh.e###". The first is stdout and the second is stderr -- the second should be empty, and the first has the normal output for this sample app, which just prints some info about the run and if it passed. The "###" is the job number that Torque assigned to this job.