") provides advanced cyberinfrastructure for computational research and education at NDSU and beyond. CCAST develops, manages, and operates high-performance (HPC), cloud, and interactive computing resources, and educates researchers on proper and efficient use of the resources and on other topics of interest to the computational science and engineering community. We use UNIX/Linux primarily. The basic level of services is FREE of charge to
NDSU faculty, staff, and students as well as certain external collaborators (upon approval of CCAST's Executive Director). Additional services are available for a fee.
1.1 Acknowledging CCAST
Users are required to include the following statement (or a close variant) in all research outputs (papers,
presentations, theses, etc.) that have used CCAST resources: "This work used resources of the Center for
Computationally Assisted Science and Technology (CCAST) at North Dakota State University.
" The wording is subject to change; e.g., when we need to acknowledge specific funding sources that support certain CCAST resources. Please check the welcome message that appears when you log in to CCAST systems for the most accurate acknowledgement statement.
1.2 Reporting requirements
Users, usually through their Principal Investigators (PIs; i.e., sponsors of their CCAST accounts), are required to
report any research outputs and activities that have been enabled by the use of CCAST resources.
Reporting items often include publications, presentations, grant applications, patents, theses, etc.
1.3 CCAST usage policies
Users are required to carefully read and comply with CCAST Usage Policies
1.4 How can you get help?
Read this User Guide carefully and check the CCAST website
and related Knowledge Base articles before
contacting us. If you still cannot find
answers to your questions, send an e-mail to firstname.lastname@example.org
. In the e-mail, describe the issues,
state your questions, and provide a copy of the error messages and job submission script, the IDs
of your failed jobs, and any other info that may help debug the issues. Please do not directly contact
CCAST individual staff for technical support as this bypasses our tracking system to avoid dropped calls.
1.5 About this document
This document will be updated often since hardware specifications, system administration practice, and
usage policies, etc. are subject to changes.
2. Getting Started
2.1 Applying for an account
To be able to access to Thunder–an HPC cluster at CCAST–you need to have an active account with us.
Please apply for a CCAST account if you have not already done so. A link to the online application form is available on the CCAST website.
2.2 Connecting to the Thunder cluster
From a Windows computer: PuTTY
, a free SSH and telnet client, should be used. Download and install it, then double-click to open the application. In the "Host Name (or IP
address)" field, enter the hostname: thunder.ccast.ndsu.edu
. Select (or leave) 22 for "Port" and SSH for
"Connection type". Click "Open", you will be asked to enter your username and password.
Between a Mac/Linux computer and Thunder: To transfer files from Thunder to your computer:
scp [[username@hostname]:[source-file]] [[destination]]. Example: scp email@example.com:/gpfs1/home/username/myfile.txt /home/mycomputer/myfile.txt
2.4 Learning UNIX/Linux and HPC
Users are strongly recommended to attend the CCAST Advanced Research Computing Training Program,
offered every Fall and Spring semester, as well as possible user group meetings and other special training events.
Specialized training for individual researchers/research groups is also available. Contact CCAST for more information.
There are also lots of free training materials out there on the Internet. We recommend the following:
3. Research Computing Resources
has over 130 compute nodes (>4,000 CPU cores), each with 20, 40, 44, or 128 cores. There are several big-memory nodes and nodes with general-purpose graphics processing unit (GPU) cards.
To check which nodes are currently free or partially free on Thunder, execute the command freenodes.
The information will help you make the right choice when you request computing resources for your jobs.
There are many software
programs installed on Thunder. Most are available to all CCAST users; some,
e.g., ANSYS, Gaussian, VASP, etc., available only to those who have valid licenses and other authorized
users. Software are usually organized as modules; to check available modules, execute module avail.
You can also install software for yourself. Contact CCAST at firstname.lastname@example.org
if you need help.
3.3 Storage space
Once logged in, you are in your HOME directory (/gpfs1/home/username
). Data in HOME is backed up
periodically to tape, so it is a reliable storage area. Do not
use your HOME directory for data or job input/output. Running jobs out of HOME is prohibited
as it affects the interactive use and other important tasks.
Each research group usually has a PROJECTS directory; the full path is /gpfs1/projects/PI-username, where
PI-username is the username of the Principal Investigator (PI). This area has a larger storage space and is
backed up periodically to tape. All researchers working under the PI can store and share data in this space.
Backup practice: CCAST currently runs incremental backups of HOME and PROJECTS data on weekdays
(only new and changed files are backed up to tape) and full backups (everything) on every other weekend.
Each regular user has a SCRATCH directory (/gpfs1/scratch/username). It is designed as a place for working
directories for jobs. Please submit your jobs from this directory. Note that SCRATCH data is NOT backed up,
and the systems are currently set up to automatically DELETE files in SCRATCH that are 60 days old.
Contact CCAST if your research group really needs more storage space beyond the basic level.
3.4 Compute Condominium
Researchers can purchase condo nodes using equipment purchase funds from their grants or other available
funds. These PI-owned compute nodes are attached to CCAST’s Thunder cluster to take advantage of the
existing infrastructure. Contact CCAST if you have questions regarding the condominium model.
4. Running Jobs
Once you logged in to CCAST's Thunder, you are on one of its login nodes. Login nodes have limited
resources and are intended only for basic tasks such as transferring data, managing files, compiling
software, editing scripts, and checking on or managing jobs. DO NOT run your jobs on the login nodes!
Jobs must be submitted to a queue system, which is monitored by a job scheduler, using a job script. The
job scheduler currently used on the Thunder cluster is PBS Professional
(PBS Pro). The scheduler handles
job submission requests and assigns jobs to specific compute nodes available at the time.
To be able to run your jobs and run them efficiently, you need to have some basic knowledge of the
application you are using. This includes whether the application is serial (i.e., runs on only one CPU core) or
parallel (i.e., can run on multiple CPU cores). If it is parallel, what is the underlying parallel programming
model: shared-memory (e.g., using OpenMP, Pthreads, etc.), distributed-memory (e.g., using MPI), or
hybrid? You need such information to determine how you would like to request resources for your jobs.
4.1 Sample input files and job scripts
If you are new to running jobs on the Thunder cluster or if it has been a while since the last time you ran
an application, it is highly recommended that you first run some sample jobs we provide before running
your own jobs. On Thunder, users can copy sample input files and job scripts for various applications
from /gpfs1/projects/ccastest/training/examples. More job examples for more applications will be added
as they become available. Please check this directory frequently for the latest version of the job scripts.
A job submission script (also referred to as a "PBS job script" or "PBS script") to run a serial job is given below as an example:
#PBS -q default
#PBS -N test
#PBS -l select=1:mem=1gb:ncpus=1
#PBS -l walltime=08:00:00
#PBS -W group_list=x-ccast-prj-prjname
For any job script, you need to replace prjname with your project group name. If you do not know your
prjname, on Thunder, execute the command id or groups and look for the name x-ccast-prj-... Also, if
you are not sure how to select a value for mem, set it to the value of M*ncpus, where M = 1, 2, or 3gb.
Keep in mind that CCAST resources are shared among many users. Only request what you actually need.
A PBS job script is simply a text file in your working directory. The easiest way to create the file is to
copy an appropriate sample PBS job script from /gpfs1/projects/ccastest/training/examples
and then modify it as needed using some text editor such as nano
(for novice Linux users), emacs
(for more experienced users). See also the PBS Pro Cheat Sheet
4.2 Queue policies on Thunder
Different types of queues are given below. Users can also find info about the queues by executing qstat -q
|condo01, condo02, etc.
4.3 Launching and monitoring jobs
After preparing a suitable job script (with the filename job.pbs
, for instance), see Sec. 4.1, you can submit
the job by typing: qsub job.pbs
. This will assign your job to the queue. Depending on the available
resources, it may or may not start immediately. To check the status of your job(s), type qstat -u $USER
If you want to kill the job, use the command qdel
, where is the ID of the job you want
to kill. For more useful PBS Pro commands and options, see the PBS Pro Cheat Sheet
4.4 How to get your work done faster?
If you use software packages developed by others, be mindful of the parameters used in your input files.
A small tuning of the parameters can significantly improve computational efficiency. If you write and run
your own code, see if it can be optimized to make it run faster or parallelize it if it is not yet parallel.
When running parallel jobs, a question arises: How many cores/nodes should you request for the jobs?
Note: the requested resources in the sample PBS job scripts we provide are not optimized for your jobs!
Also note that, if you want to get your jobs done faster, simply adding a lot more cores/nodes is rarely the
answer! You should do some scaling tests to identify the optimal number of cores/nodes for your jobs.
When you have many similar parallel jobs, we recommend that you run a first few jobs with different
numbers of CPU cores. By looking the computing time needed to finish the jobs vs. the number of cores, you'll have a pretty good idea of how many cores you should choose for the remaining jobs.
Contact CCAST for help with improving your job efficiency and speeding up your research process.
5. Utilization Monitoring
We use XDMoD
for data collection and monitoring of HPC resource utilization. The tool allows CCAST
staff, PIs, and users to view data about their CCAST usage. It includes metrics like total CPU hours,
number of jobs submitted, average walltime per job, and much more. Information is updated daily for all
jobs completed at the time of update.