Topics Map > Services > Research Computing and Support > CCAST

CCAST User Guide

This User Guide provides essential information about advanced research computing resources at CCAST/NDSU and how to use them. A must-read document for all CCAST users.

  1. Introduction, Context, and Qualifications
  2. Getting Started
  3. Research Computing Resources
  4. Running Jobs
  5. Utilization Monitoring

1. Introduction, Context, and Qualifications

The Center for Computationally Assisted Science and Technology (CCAST; pronounced "c-cast") provides advanced cyberinfrastructure for computational research and education at NDSU and beyond. CCAST develops, manages, and operates high-performance (HPC), cloud, and interactive computing resources, and educates researchers on proper and efficient use of the resources and on other topics of interest to the computational science and engineering community. We use UNIX/Linux primarily. The basic level of services is FREE of charge to NDSU faculty, staff, and students as well as certain external collaborators (upon approval of CCAST's Executive Director). Additional services are available for a fee. 

1.1 Acknowledging CCAST

Users are required to include the following statement (or a close variant) in all research outputs (papers, presentations, theses, etc.) that have used CCAST resources: "This work used resources of the Center for Computationally Assisted Science and Technology (CCAST) at North Dakota State University." The wording is subject to change; e.g., when we need to acknowledge specific funding sources that support certain CCAST resources. Please check the welcome message that appears when you log in to CCAST systems for the most accurate acknowledgement statement.   

1.2 Reporting requirements

Users, usually through their Principal Investigators (PIs; i.e., sponsors of their CCAST accounts), are required to report any research outputs and activities that have been enabled by the use of CCAST resources. Reporting items often include publications, presentations, grant applications, patents, theses, etc.

1.3 CCAST usage policies

Users are required to carefully read and comply with CCAST Usage Policies.

1.4 How can you get help?

Read this User Guide carefully and check the CCAST website and related Knowledge Base articles before contacting us. If you still cannot find answers to your questions, send an e-mail to ndsu.ccast.support@ndsu.edu. In the e-mail, describe the issues, clearly state your questions, and provide a copy of the error messages and job submission script, the IDs of your failed jobs, and any other info that may help debug the issues. Please do not directly contact CCAST individual staff for technical support as this bypasses our tracking system to avoid dropped calls.

1.5 About this document

This document will be updated often since hardware specifications, system administration practice, and usage policies, etc. are subject to changes.

2. Getting Started

2.1 Applying for an account

To be able to access to Thunder–an HPC cluster at CCAST–you need to have an active account with us. Please apply for a CCAST account if you have not already done so. A link to the online application form is available on the CCAST website.

2.2 Connecting to the Thunder cluster

From a Windows computer: PuTTY, a free SSH and telnet client, should be used. Download and install it, then double-click to open the application. In the "Host Name (or IP address)" field, enter the hostname: thunder.ccast.ndsu.edu. Select (or leave) 22 for "Port" and SSH for "Connection type". Click "Open", you will be asked to enter your username and password. 

From a Mac/Linux computer: Open a terminal and then execute the following line to access Thunder: ssh thunder.ccast.ndsu.edu -l username. You will be prompted to enter your username and password.

For more detailed instructions, see Logging into CCAST using SSH.

2.3 Transferring files

Between a Windows computer and Thunder: WinSCP client should be used. Download (for free) and install it, then open the application. In the "WinSCP Login" window, enter the hostname thunder.ccast.ndsu.edu as well as your username and password, then click on "Login". Once logged in, you will see a screen with two panels: the left shows files on your computer and the right shows your files on Thunder (usually your HOME directory, but you can double-click on the address bar and change the location). You can then easily drag and drop files between your computer and Thunder. 

Between a Mac/Linux computer and Thunder: To transfer files from Thunder to your computer: scp [[username@hostname]:[source-file]] [[destination]]. Example: scp username@thunder.ccast.ndsu.edu:/gpfs1/home/username/myfile.txt /home/mycomputer/myfile.txt 
To transfer files from your computer to Thunder: scp [[source-file]] [[username@hostname]:[destination]]. Example: scp myfile.txt username@thunder.ccast.ndsu.edu:/gpfs1/home/username.

2.4 Learning UNIX/Linux and HPC

Users are strongly recommended to attend the CCAST Advanced Research Computing Training Program, offered every Fall and Spring semester, as well as possible user group meetings and other special training events. Specialized training for individual researchers/research groups is also available. Contact CCAST for more information. 

There are also lots of free training materials out there on the Internet. We recommend the following: 

See also the CCAST Reference Card for a list of the most useful Linux commands and tricks. Tutorials for certain applications on Thunder can be found in our Knowledge Base articles

3. Research Computing Resources

3.1 Hardware

CCAST’s Thunder has over 130 compute nodes (>4,000 CPU cores), each with 20, 40, 44, or 128 cores. There are several big-memory nodes and nodes with general-purpose graphics processing unit (GPU) cards. To check which nodes are currently free or partially free on Thunder, execute the command freenodes. The information will help you make the right choice when you request computing resources for your jobs.

3.2 Software

There are many software programs installed on Thunder. Most are available to all CCAST users; some, e.g., ANSYS, Gaussian, VASP, etc., available only to those who have valid licenses and other authorized users. Software are usually organized as modules; to check available modules, execute module avail. You can also install software for yourself. Contact CCAST at ndsu.ccast.support@ndsu.edu if you need help.

3.3 Storage space

Once logged in, you are in your HOME directory (/gpfs1/home/username). Data in HOME is backed up periodically to tape, so it is a reliable storage area. Do not use your HOME directory for data or job input/output. Running jobs out of HOME is prohibited as it affects the interactive use and other important tasks.

Each research group usually has a PROJECTS directory; the full path is /gpfs1/projects/PI-username, where PI-username is the username of the Principal Investigator (PI). This area has a larger storage space and is backed up periodically to tape. All researchers working under the PI can store and share data in this space. 

Backup practice: CCAST currently runs incremental backups of HOME and PROJECTS data on weekdays (only new and changed files are backed up to tape) and full backups (everything) on every other weekend. 

Each regular user has a SCRATCH directory (/gpfs1/scratch/username). It is designed as a place for working directories for jobs. Please submit your jobs from this directory. Note that SCRATCH data is NOT backed up, and the systems are currently set up to automatically DELETE files in SCRATCH that are 60 days old

Contact CCAST if your research group really needs more storage space beyond the basic level.

3.4 Compute Condominium

Researchers can purchase condo nodes using equipment purchase funds from their grants or other available funds. These PI-owned compute nodes are attached to CCAST’s Thunder cluster to take advantage of the existing infrastructure. Contact CCAST if you have questions regarding the condominium model.

4. Running Jobs

Once you logged in to CCAST's Thunder, you are on one of its login nodes. Login nodes have limited resources and are intended only for basic tasks such as transferring data, managing files, compiling software, editing scripts, and checking on or managing jobs. DO NOT run your jobs on the login nodes! 

Jobs must be submitted to a queue system, which is monitored by a job scheduler, using a job script. The job scheduler currently used on the Thunder cluster is PBS Professional (PBS Pro). The scheduler handles job submission requests and assigns jobs to specific compute nodes available at the time. 

To be able to run your jobs and run them efficiently, you need to have some basic knowledge of the application you are using. This includes whether the application is serial (i.e., runs on only one CPU core) or parallel (i.e., can run on multiple CPU cores). If it is parallel, what is the underlying parallel programming model: shared-memory (e.g., using OpenMP, Pthreads, etc.), distributed-memory (e.g., using MPI), or hybrid? You need such information to determine how you would like to request resources for your jobs.

4.1 Sample input files and job scripts

If you are new to running jobs on the Thunder cluster or if it has been a while since the last time you ran an application, it is highly recommended that you first run some sample jobs we provide before running your own jobs. On Thunder, users can copy sample input files and job scripts for various applications from /gpfs1/projects/ccastest/training/examples. More job examples for more applications will be added as they become available. Please check this directory frequently for the latest version of the job scripts. 

A job submission script (also referred to as a "PBS job script" or "PBS script") to run a serial job is given below as an example:

#!/bin/bash
#PBS -q default
#PBS -N test
#PBS -l select=1:mem=1gb:ncpus=1
#PBS -l walltime=08:00:00
#PBS -W group_list=x-ccast-prj-prjname
cd $PBS_O_WORKDIR
./my-serial-program

For any job script, you need to replace prjname with your project group name. If you do not know your prjname, on Thunder, execute the command id or groups and look for the name x-ccast-prj-... Also, if you are not sure how to select a value for mem, set it to the value of M*ncpus, where M = 1, 2, or 3gb. Keep in mind that CCAST resources are shared among many users. Only request what you actually need.

A PBS job script is simply a text file in your working directory. The easiest way to create the file is to copy an appropriate sample PBS job script from /gpfs1/projects/ccastest/training/examples on Thunder and then modify it as needed using some text editor such as nano (for novice Linux users), emacs, or vi (for more experienced users). See also the PBS Pro Cheat Sheet.

4.2 Queue policies on Thunder

Different types of queues are given below. Users can also find info about the queues by executing qstat -q.

Route Queue Execution Queue Walltime (hours) Authorized Group
default def-short 24 all users
def-medium 72
def-long 168
def-devel 8
preemptible -
bigmem bm-short 24
bm-long 168
condo01, condo02, etc. - condo owners

4.3 Launching and monitoring jobs

After preparing a suitable job script (with the filename job.pbs, for instance), see Sec. 4.1, you can submit the job by typing: qsub job.pbs. This will assign your job to the queue. Depending on the available resources, it may or may not start immediately. To check the status of your job(s), type qstat -u $USER. If you want to kill the job, use the command qdel , where is the ID of the job you want to kill. For more useful PBS Pro commands and options, see the PBS Pro Cheat Sheet.

4.4 How to get your work done faster?

If you use software packages developed by others, be mindful of the parameters used in your input files. A small tuning of the parameters can significantly improve computational efficiency. If you write and run your own code, see if it can be optimized to make it run faster or parallelize it if it is not yet parallel.  

When running parallel jobs, a question arises: How many cores/nodes should you request for the jobs? Note: the requested resources in the sample PBS job scripts we provide are not optimized for your jobs! Also note that, if you want to get your jobs done faster, simply adding a lot more cores/nodes is rarely the answer! You should do some scaling tests to identify the optimal number of cores/nodes for your jobs. 

When you have many similar parallel jobs, we recommend that you run a first few jobs with different numbers of CPU cores. By looking the computing time needed to finish the jobs vs. the number of cores, you'll have a pretty good idea of how many cores you should choose for the remaining jobs. Contact CCAST for help with improving your job efficiency and speeding up your research process.

5. Utilization Monitoring

We use XDMoD for data collection and monitoring of HPC resource utilization. The tool allows CCAST staff, PIs, and users to view data about their CCAST usage. It includes metrics like total CPU hours, number of jobs submitted, average walltime per job, and much more. Information is updated daily for all jobs completed at the time of update.

See Also:




Keywords:ccast hpc computing "research computing" thunder "thunder prime" cloud "cloud computing" "interactive computing" user-guide "user guide" guide cyberinfrastructure computational   Doc ID:107680
Owner:Conner R.Group:IT Knowledge Base
Created:2020-12-08 13:03 CSTUpdated:2020-12-15 22:55 CST
Sites:IT Knowledge Base
CleanURL:https://kb.ndsu.edu/ccast-user-guide
Feedback:  0   0