Running LAMMPS on CCAST Clusters

Basic usage of LAMMPS, Large-scale Atomic/Molecular Massively Parallel Simulator, on CCAST Clusters

    Introduction

    This document describes basic usage of LAMMPS on CCAST clusters. Far more information is available on the project website including the LAMMPS manual. Additional help is available on the official LAMMPS forum.

    LAMMPS is the Large-scale Atomic/Molecular Massively Parallel Simulator. It is a classical molecular dynamics software focusing on materials modeling, including coarse-graining and reactive force field calculations, and is free and open source software.

    While this document describes basic usage of LAMMPS on CCAST clusters, it is not intended to be a comprehensive guide to LAMMPS, but rather a quick reference for the software on CCAST clusters. For more information, a number of resources are provided by the LAMMPS developers including:

    Creating Input Files

    For LAMMPS to run a simulation, it needs an input file. This file contains all the information about the simulation, including the type of simulation, the potential(s), the initial configuration, and the style of output. The input file is a text file, and can be created using any text editor. The LAMMPS manual has a section on how to create input files.

    Many people use a number of pre-processing tools for building initial input files. These include Packmol, VMD, and moltemplate for creating initial geometries. In addition, there are a number of tools for creating input files for specific types of simulations. An extensive list of these is available here.

    Simulations require parameters to be set. For atomistic systems, the parameters of the potential are set in the input file, and these potentials are derived either experimentally through techniques such as NMR, or through the optimization of a structure by quantum chemical methods. The Automated Topology Builder provides a way to create potential parameters for new molecular structures.

    Running LAMMPS on CCAST

    LAMMPS is available on both the Thunder and Prime clusters as modules, and can either be optimized for CPU or GPU execution. To load LAMMPS on Thunder, use the following commands in your PBS scripts:

    $ module load intel/2018.2.046
    $ module load lammps/20180222-gcc
    

    Installed packages for the Thunder version of LAMMPS are:

    ASPHERE BODY CLASS2 COLLOID DIPOLE GRANULAR KSPACE MANYBODY MC MISC MOLECULE
    PERI REPLICA RIGID SHOCK SNAP SRD OPT CORESHELL QEQ

    On Prime, both CPU and GPU versions are available. The CPU version is loaded with the following:

    $ module load lammps/02Aug2023-cpu
    # show the help message for the executable
    $ lmp -h
    

    whereas the GPU version, which requires a job in the gpus queue, is loaded with the following:

    module load lammps/02Aug2023-gpu
    # show the help message for the executable
    $ lmp -h

    Both the GPU and CPU versions of LAMMPS on Prime are compiled with the following packages:

    AMOEBA ASPHERE BOCS BODY BPM BROWNIAN CG-DNA CG-SPICA CLASS2 COLLOID COLVARS
    COMPRESS CORESHELL DIELECTRIC DIFFRACTION DIPOLE DPD-BASIC DPD-MESO DPD-REACT
    DPD-SMOOTH DRUDE EFF ELECTRODE EXTRA-COMPUTE EXTRA-DUMP EXTRA-FIX
    EXTRA-MOLECULE EXTRA-PAIR FEP GPU GRANULAR INTERLAYER KOKKOS KSPACE LEPTON
    MACHDYN MANYBODY MC MEAM MESONT MISC ML-IAP ML-POD ML-SNAP MOFFF MOLECULE
    OPENMP OPT ORIENT PERI PHONON PLUGIN POEMS QEQ REACTION REAXFF REPLICA RIGID
    SHOCK SPH SPIN SRD TALLY UEF YAFF

    Running LAMMPS on Prime

    CPU example

    Example simulation files are available in the /mmfs1/projects/ccastest/examples directory. These include a number of LAMMPS input examples.

    For example, to run Example_1 you would do the following:

    $ cp -R /mmfs1/projects/ccastest/examples/LAMMPS_example_1 .
    $ cd LAMMPS_example_1

    This will copy the example directory to your current working directory. The file job.pbs contains a PBS script which will need to be modified to run by removing the # proceeding the module load line:

    ## load LAMMPS version 02Aug2023 CPU version
    # module load lammps/02Aug2023-cpu

    You will also have to modify the job.pbs file to include your group name in the #PBS -W group_list= line.

    Once this is done, you can submit the job to the queue using the qsub command:

    $ qsub job.pbs

    GPU example

    For the GPU version, you will need to modify the job.pbs file, changing the queue to gpus:

    #PBS -q gpus

    In addition, you will need to modify the job.pbs file to include the number of GPUs you want to use in the #PBS -l gpus= line. For example, to use 1 GPU, you would change the line to:

    #PBS -l select=1:mem=5gb:ncpus=4:mpiprocs=4:ompthreads=1:ngpus=1

    Finally, you will need to modify the job.pbs file to uncomment the line loading the GPU version of LAMMPS:

    ## load LAMMPS version 02Aug2023 GPU version
    module load lammps/02Aug2023-gpu

    Once this is done, you can submit the job to the queue using the qsub command:

    $ qsub job.pbs

    Working with output files

    LAMMPS produces a number of output files. The most important of these is the log file, which contains information about the simulation, including the number of steps, the energy, and the temperature. The log file is specified in the input file using the log command, and can be analyzed with a number of free and open source tools, including VMD, which is available on CCAST OnDemand.

    Parallel Scaling Performance

    LAMMPS has a parallel codebase, and can be run on multiple cores and nodes. The performance of LAMMPS is dependent on the number of cores and nodes used, and the type of simulation. Here, the same workload, consisting of 19,652 simulated particles, was tested 10 times using different amounts and types of hardware on the Prime cluster to show the results with CPU or GPU parallel scaling.

    CPU Scaling

    The following tables shows the result of running the same simulation on CPU only:

    CPU scaling with number of cores
    Cores Timestep / s Speedup Efficiency
    1 36.77 1.00 1.00
    4 127.80 3.47 0.87
    6 166.22 4.52 0.75
    8 200.17 5.44 0.68
    16 319.20 8.68 0.54
    32 352.24 9.58 0.30

    GPU Scaling

    For comparison, the timesteps per second for GPU largely depended on the card used for calculation. The following runs used 4 CPU cores and 1 GPU card, with speedup over the CPU-only run:

    GPU scaling with card type
    GPU Timestep / s Speedup
    a10 1384.12 10.83
    a40 1672.25 13.08
    a100 3656.17 28.84


    Keywords:
    LAMMPS, Molecular Dynamics, MD, Simulation, How-to, Tutorial, CCAST, HPC, Physics, Chemistry 
    Doc ID:
    132064
    Owned by:
    Stephen S. in NDSU IT Knowledge Base
    Created:
    2023-10-12
    Updated:
    2024-06-05
    Sites:
    NDSU IT Knowledge Base