") provides advanced cyberinfrastructure for computational research and education at NDSU and beyond. CCAST develops, manages, brokers, and operates high-performance (HPC), cloud, and interactive computing resources, and educates researchers on proper and efficient use of the resources and on other topics of interest to the computational science and engineering community. We use UNIX/Linux primarily. The basic level of services is FREE of charge to
NDSU faculty, staff, and students as well as certain external collaborators (upon approval of CCAST's Executive Director). Additional services are available for a fee.
The wording is subject to change; e.g., when we need to acknowledge specific funding sources that support certain CCAST resources. Please check the welcome message that appears when you log in to CCAST systems for the most accurate acknowledgement statement.
1.2 Reporting requirements
Users, usually through their Principal Investigators (PIs; i.e., sponsors of their CCAST accounts), are required to
report any research outputs and activities that have been enabled by the use of CCAST resources.
Reporting items often include publications, presentations, grant applications, patents, theses, etc.
1.3 CCAST usage policies
Users are required to carefully read and comply with CCAST Usage Policies
1.4 How can you get help?
Read this User Guide carefully and check the CCAST website
and related Knowledge Base articles before
contacting us. If you still cannot find
answers to your questions, send an e-mail to email@example.com
. In the e-mail, describe the issues,
state your questions, and provide a copy of the error messages and job submission script, the IDs
of your failed jobs, and any other info that may help debug the issues. Please do not directly contact
CCAST individual staff for technical support as this bypasses our tracking system to avoid dropped calls.
1.5 About this document
This document will be updated often since hardware specifications, system administration practice, and
usage policies, etc. are subject to changes.
2. Getting Started
2.1 Applying for an account
To be able to access to "Thunder" and "Thunder Prime"–the two HPC clusters at CCAST–you need to have an active account with us.
Please apply for a CCAST account if you have not already done so. A link to the online application form is available on the CCAST website.
2.2 Connecting to CCAST's HPC clusters
From a Windows computer: PuTTY
, a free SSH and telnet client, should be used. Download and install it, then double-click to open the application. In the "Host Name (or IP
address)" field, enter the hostname: thunder.ccast.ndsu.edu
(for Thunder) or prime.ccast.ndsu.edu
(for Thunder Prime). Select (or leave) 22 for "Port" and SSH for
"Connection type". Click "Open", you will be asked to enter your username and password.
Between a Mac/Linux computer and Thunder: To transfer files from Thunder/Thunder Prime to your computer:
scp [[username@hostname]:[source-file]] [[destination]]. Example (for Thunder Prime): scp firstname.lastname@example.org:/mmfs1/home/username/myfile.txt /home/mycomputer/myfile.txt
2.4 Learning UNIX/Linux and HPC
Users are strongly recommended to attend the CCAST Advanced Research Computing Training Program,
offered every Fall and Spring semester, as well as possible user group meetings and other special training events.
Specialized training for individual researchers/research groups is also available. Contact CCAST for more information.
There are also lots of free training materials out there on the Internet. We recommend the following:
3. Research Computing Resources
has over 130 compute nodes ( with over 4,000 Intel CPU cores and 22 Nvidia GPU cards in total), each with 20, 40, 44, or 128 cores. Thunder Prime
currently has 39 computes nodes (with over 3,600 AMD EPYC 7662 CPU cores and 10 Nvidia A100 GPU cards in total), each with 64 or 128 cores. There are several big-memory nodes on each cluster. To check which nodes are currently free or partially free on Thunder or Thunder Prime, execute the command freenodes.
The information will help you make the right choice when you request computing resources for your jobs.
There are many software
programs installed on Thunder and Thunder Prime. Most are available to all CCAST users; some,
e.g., ANSYS, Gaussian, VASP, etc., available only to those who have valid licenses and other authorized
users. Software are usually organized as modules; to check available modules, execute module avail.
You can also install software for yourself. Contact CCAST at email@example.com
if you need help.
3.3 Storage space
Once logged in, you are in your HOME directory: /mmfs1/thunder/home/username
(Thunder) or /mmfs1/home/username
(Thunder Prime). Data in HOME is backed up
periodically to tape, so it is a reliable storage area. Do not
use your HOME directory for data or job input/output. Running jobs out of HOME is prohibited
as it affects the interactive use and other important tasks.
Each research group usually has a PROJECTS directory; the full path is /mmfs1/thunder/projects/PI-username (Thunder) or /mmfs1/projects/PI-username (Thunder Prime) where
PI-username is the username of the Principal Investigator (PI). This area has a larger storage space and is
backed up periodically to tape. All researchers working under the PI can store and share data in this space.
Backup practice: CCAST runs backups of HOME and PROJECTS data regularly. Contact CCAST for more details.
Each regular user has a SCRATCH directory: /mmfs1/thunder/scratch/username (Thunder) or /mmfs1/scratch/username (Thunder Prime). It is designed as a place for working
directories for jobs. Please submit your jobs from this directory. Note that SCRATCH data is NOT backed up,
and the systems are currently set up to automatically DELETE files in SCRATCH that are 60 days old.
Contact CCAST if your research group really needs more storage space beyond the basic level.
3.4 Compute Condominium
Researchers can purchase condo nodes using equipment purchase funds from their grants or other available
funds. These PI-owned compute nodes are attached to CCAST’s Thunder Prime cluster to take advantage of the
existing infrastructure. Contact CCAST if you have questions regarding the condominium model.
4. Running Jobs
Once you logged in to a CCAST HPC cluster, you are on one of its login nodes. Login nodes have limited
resources and are intended only for basic tasks such as transferring data, managing files, compiling
software, editing scripts, and checking on or managing jobs. DO NOT run your jobs on the login nodes!
Jobs must be submitted to a queue system, which is monitored by a job scheduler, using a job script. The
job scheduler currently used on the Thunder and Thunder Prime clusters is OpenPBS
. The scheduler handles
job submission requests and assigns jobs to specific compute nodes available at the time.
To be able to run your jobs and run them efficiently, you need to have some basic knowledge of the
application you are using. This includes whether the application is serial (i.e., runs on only one CPU core) or
parallel (i.e., can run on multiple CPU cores). If it is parallel, what is the underlying parallel programming
model: shared-memory (e.g., using OpenMP, Pthreads, etc.), distributed-memory (e.g., using MPI), or
hybrid? You need such information to determine how you would like to request resources for your jobs.
4.1 Sample input files and job scripts
If you are new to running jobs on Thunder and/or Thunder Prime or if it has been a while since the last time you ran
an application, it is highly recommended that you first run some sample jobs we provide before running
your own jobs. Users can copy sample input files and job scripts for various applications
from /mmfs1/thunder/projects/ccastest/examples (Thunder) or /mmfs1/projects/ccastest/examples (Thunder Prime). More job examples for more applications will be added
as they become available. Please check this directory frequently for the latest version of the job scripts.
A job submission script (also referred to as a "PBS job script" or "PBS script") to run a serial job is given below as an example:
#PBS -q default
#PBS -N test
#PBS -l select=1:mem=1gb:ncpus=1
#PBS -l walltime=08:00:00
#PBS -W group_list=x-ccast-prj-prjname
For any job script, you need to replace prjname with your project group name. If you do not know your
prjname, on Thunder or Thunder Prime, execute the command id or groups and look for the name x-ccast-prj-... Also, if
you are not sure how to select a value for mem, set it to the value of M*ncpus, where M = 1, 2, or 3gb.
Keep in mind that CCAST resources are shared among many users. Only request what you actually need.
A PBS job script is simply a text file in your working directory. The easiest way to create the file is to
copy an appropriate sample PBS job script from /mmfs1/thunder/projects/ccastest/examples
(Thunder) or /mmfs1/projects/ccastest/examples
(Thunder Prime) and then modify it as needed using some text editor such as nano
(for novice Linux users), emacs
(for more experienced users). See also the PBS Cheat Sheet
4.2 Queue policies on Thunder and Thunder Prime
Different types of queues on Thunder are given below. Users can also find info about the queues on Thunder or Thunder Prime by executing qstat -q
|condo01, condo02, etc.
4.3 Launching and monitoring jobs
After preparing a suitable job script (with the filename job.pbs
, for instance), see Sec. 4.1, you can submit
the job by typing: qsub job.pbs
. This will assign your job to the queue. Depending on the available
resources, it may or may not start immediately. To check the status of your job(s), type qstat -u $USER
If you want to kill the job, use the command qdel
, where is the ID of the job you want
to kill. For more useful PBS commands and options, see the PBS Cheat Sheet
4.4 How to get your work done faster?
If you use software packages developed by others, be mindful of the parameters used in your input files.
A small tuning of the parameters can significantly improve computational efficiency. If you write and run
your own code, see if it can be optimized to make it run faster or parallelize it if it is not yet parallel.
When running parallel jobs, a question arises: How many cores/nodes should you request for the jobs?
Note: the requested resources in the sample PBS job scripts we provide are not optimized for your jobs!
Also note that, if you want to get your jobs done faster, simply adding a lot more cores/nodes is rarely the
answer! You should do some scaling tests to identify the optimal number of cores/nodes for your jobs.
When you have many similar parallel jobs, we recommend that you run a first few jobs with different
numbers of CPU cores. By looking the computing time needed to finish the jobs vs. the number of cores, you'll have a pretty good idea of how many cores you should choose for the remaining jobs.
Contact CCAST for help with improving your job efficiency and speeding up your research process.
5. Utilization Monitoring
We use XDMoD
for data collection and monitoring of HPC resource utilization. The tool allows CCAST
staff, PIs, and users to view data about their CCAST usage. It includes metrics like total CPU hours,
number of jobs submitted, average walltime per job, and much more. Information is updated daily for all
jobs completed at the time of update.