Build libraries with Spack

Here are some up-to-date instructions on installing a software stack for GEOS-Chem Classic or HEMCO with Spack.

Note

If you will be using GCHP, please see gchp.readthedocs.io for instructions on how to download required libraries with Spack.

Initial Spack setup

Install spack to your home directory

Spack can be installed with Git, as follows:

cd ~
$ git clone git@github.com:spack/spack.git

Initialize Spack

To initialize Spack type these commands:

$ export SPACK_ROOT=${HOME}/spack
$ source ${SPACK_ROOT}/spack/share/spack/setup-env.sh

Make sure the default compiler is in compilers.yaml

Tell Spack to search for compilers:

$ spack compiler find

You can confirm that the default compiler was found by inspecing compilers.yaml file with your favorite editor, e.g.:

$ emacs ~/.spack/linux/compilers.yaml

For example, the default compiler that was on my cloud instance was the GNU Compiler Collection 7.4.0. This collection contains C (gcc), C++ (:program`g++`), and Fortran (gfortran) compilers. These are specified in the compiler.yaml file as:

compilers:
- compiler:
    spec: gcc@7.4.0
    paths:
      cc: /usr/bin/gcc-7
      cxx: /usr/bin/g++-7
      f77: /usr/bin/gfortran-7
      fc: /usr/bin/gfortran-7
    flags: {}
    operating_system: ubuntu18.04
    target: x86_64
    modules: []
    environment: {}
    extra_rpaths: []

As you can see, the default compiler executables are located in the /usr/bin folder. This is where many of the system-supplied executable files are located.

Build the GCC 10.2.0 compilers

Let’s build a newer compiler verion with Spack. In this case we’ll build the GNU Compiler Collection 10.2.0 using the default compilers.

$ spack install gcc@10.2.0 target=x86_64 %gcc@7.4.0
$ spack load gcc%10.2.0

Update compilers.yaml

In order for Spack to use this new compiler to build other packages, the compilers.yaml file must be updated using these commands:

$ spack load gcc@10.2.0
$ spack compiler find

Install required libraries for GEOS-Chem

Now that we have installed a the GNU Compiler Collection 10.2.0, we can use it to build the required libraries for GEOS-Chem Classic and HEMCO.

HDF5

Now we can start installing libraries. First, let’s install HDF5, which is a dependency of netCDF.

$ spack install hdf5%gcc@10.2.0 target=x86_64 +cxx+fortran+hl+pic+shared+threadsafe
$ spack load hdf5%gcc@10.2.0

The +cxx+fortran+hl+pic+shared+threadsafe specifies necessary options for building HDF5.

netCDF-Fortran and netCDF-C

Now that we have installed :program:, we may proceed to installing netCDF-Fortran (which will install netCDF-C as a dependency).

$ spack install netcdf-fortran%gcc@10.2.0 target=x86_64 ^hdf5+cxx+fortran+hl+pic+shared+threadsafe
$ spack load netcdf-fortran%gcc@10.2.0
$ spack load netcdf-c%gcc@10.2.0

We tell Spack to use the same version of HDF5 that we just built by appending ^hdf5+cxx+fortran+hl+pic+shared+threadsafe to the spack install command. Otherwise, Spack will try to build a new version of HDF5 with default options (which is not what we want).

ncview

Ncview is a convenient viewer for browsing netCDF files. Install it with:

$ spack install ncview%gcc@10.2.0 target=x86_64 ^hdf5+cxx+fortran+hl+pic+shared+threadsafe
$ spack load ncview%gcc@10.2.0

nco (The netCDF Operators)

The netCDF operators (nco) are useful programs for manipulating netCDF files and attributes. Install (nco) with:

$ spack install nco%gcc@10.2.0 target=x86_64 ^hdf5+cxx+fortran+hl+pic+shared+threadsafe
$ spack load nco%gcc@10.2.0

cdo (The Climate Data Operators)

The Climate Data Operators (cdo) are utilities for processing data in netCDF files.

$ spack install cdo%gcc@10.2.0 target=x86_64 ^hdf5+cxx+fortran+hl+pic+shared+threadsafe
$ spack load cdo%gcc@10.2.0

flex

The flex library is a lexical parser. It is a dependency for The Kinetic PreProcessor (KPP).

$ spack install flex%gcc@10.2.0 target=x86_64
$ spack load flex%gcc10.2.0

gdb and cgdb

Gdb is the GNU Debugger. Cgdb is a visual, user-friendly interface for gdb.

$ spack install gdb@9.1%gcc@10.2.0 target=x86_64
$ spack load gdb%10.2.0

$ spack install cgdb%gcc@10.2.0 target=x86_64
$ spack load cgdb%gcc@10.2.0

cmake and gmake

Cmake and gmake are used to build source code into executables.

$ spack install cmake%gcc@10.2.0 target=x86_64
$ spack load cmake%gcc@10.2.0

$ spack install gmake%gcc@10.2.0 target=x86_64
$ spack load gmake%gcc@10.2.0

Installing optional packages

These packages are useful not strictly necessary for GEOS-Chem.

OpenJDK (Java)

Some programs might need the openjdk Java Runtime Environment:

$ spack install openjdk%gcc@10.2.0
$ spack load openjdk%gcc@10.2.0

TAU performance profiler

The Tuning and Analysis Utilities (;program:tau) lets you profile GEOS-Chem and HEMCO in order to locate computational bottlenecks:

$ spack install tau%gcc@10.2.0 +pthread+openmp~otf2
$ spack load tau%gcc@10.2.0

Loading Spack packages at startup

Creating an environment file for Spack

Once you have finished installing libraries with Spack, you can create an environment file to load the Spack libraries whenever you start a new Unix shell. Here is a sample environment file that can be used (or modified) to load the Spack libraries described above.

#==============================================================================
# %%%%% Clear existing environment variables %%%%%
#==============================================================================
unset CC
unset CXX
unset EMACS_HOME
unset FC
unset F77
unset F90
unset NETCDF_HOME
unset NETCDF_INCLUDE
unset NETCDF_LIB
unset NETCDF_FORTRAN_HOME
unset NETCDF_FORTRAN_INCLUDE
unset NETCDF_FORTRAN_LIB
unset OMP_NUM_THREADS
unset OMP_STACKSIZE
unset PERL_HOME

#==============================================================================
# %%%%% Load Spack packages %%%%%
#==============================================================================
echo "Loading gfortran 10.2.0 and related libraries ..."

# Initialize Spack
# In the examples above /path/to/spack was ${HOME}/spack
export SPACK_ROOT=/path/to/spack
source $SPACK_ROOT/share/spack/setup-env.sh

# List each Spack package that you want to load
# (add the backslash after each new package that you add)
pkgs=(                      \
  gcc@10.2.0                \
  cmake%gcc@10.2.0          \
  openmpi%gcc@10.2.0        \
  netcdf-fortran%gcc@10.2.0 \
  netcdf-c%gcc@10.2.0       \
  hdf5%gcc@10.2.0           \
  gdb%gcc@10.2.0            \
  flex%gcc@10.2.0           \
  openjdk%gcc@10.2.0        \
  cdo%gcc@10.2.0            \
  nco%gcc@10.2.0            \
  ncview%gcc@10.2.0         \
  perl@5.30.3%gcc@10.2.0    \
  tau%gcc@10.2.0            \
)

# Load each Spack package
for f in ${pkgs[@]}; do
    echo "Loading $f"
    spack load $f
done

#==============================================================================
# %%%%% Settings for OpenMP parallelization %%%%%
#==============================================================================

# Max out the stack memory for OpenMP
# Asking for a huge number will just give you the max availble
export OMP_STACKSIZE=500m

# By default, set the number of threads for OpenMP parallelization to 1
export OMP_NUM_THREADS=1

# Redefine number threads for OpenMP parallelization
# (a) If in a SLURM partition, set OMP_NUM_THREADS = SLURM_CPUS_PER_TASK
# (b) Or, set OMP_NUM_THREADS to the optional first argument that is passed
if [[ -n "${SLURM_CPUS_PER_TASK+1}" ]]; then
  export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
elif [[ "$#" -eq 1 ]]; then
  if [[ "x$1" != "xignoreeof" ]]; then
    export OMP_NUM_THREADS=${1}
  fi
fi
echo "Number of OpenMP threads: $OMP_NUM_THREADS"

#==============================================================================
# %%%%% Define relevant environment variables %%%%%
#==============================================================================

# Compiler environment variables
export FC=gfortran
export F90=gfortran
export F77=gfortran
export CC=gcc
export CXX=g++

# Machine architecture
export ARCH=`uname -s`

# netCDF paths
export NETCDF_HOME=`spack location -i netcdf-c%gcc@10.2.0`
export NETCDF_INCLUDE=${NETCDF_HOME}/include
export NETCDF_LIB=${NETCDF_HOME}/lib

# netCDF-Fortran paths
export NETCDF_FORTRAN_HOME=`spack location -i netcdf-fortran%gcc@10.2.0`
export NETCDF_FORTRAN_INCLUDE=${NETCDF_FORTRAN_HOME}/include
export NETCDF_FORTRAN_LIB=${NETCDF_FORTRAN_HOME}/lib

# Other important paths
export GCC_HOME=`spack location -i gcc@10.2.0`
export MPI_HOME=`spack location -i openmpi%gcc@10.2.0`
export TAU_HOME=`spack location -i tau%gcc@10.2.0`

#==============================================================================
# %%%%% Echo relevant environment variables %%%%%
#==============================================================================
echo
echo "Important environment variables:"
echo "CC  (C compiler)       : $CC"
echo "CXX (C++ compiler)     : $CXX"
echo "FC  (Fortran compiler) : $FC"
echo "NETCDF_HOME            : $NETCDF_HOME"
echo "NETCDF_INCLUDE         : $NETCDF_INCLUDE"
echo "NETCDF_LIB             : $NETCDF_LIB"
echo "NETCDF_FORTRAN_HOME    : $NETCDF_FORTRAN_HOME"
echo "NETCDF_FORTRAN_INCLUDE : $NETCDF_FORTRAN_INCLUDE"
echo "NETCDF_FORTRAN_LIB     : $NETCDF_FORTRAN_LIB"

Save this to your home folder with a name such as ~/.spack_env. The . in front of the name will make it a hidden file like your .bashrc or .bash_aliases.

Loading Spack-built libraries

Whenever you start a new Unix session (either by opening a terminal window or running a new job), your .bashrc and .bash_aliases files will be sourced, and the commands contained within them applied. You should then load the Spack modules by typing at the terminal prompt:

$ source ~/.spack.env

You can also add some code to your .bash_aliases so that this will be done automatically:

if [[ -f ~/.spack.env ]]; then
    source ~/.spack.env
fi

In either case, this will load the modules for you. You should see output similar to:

Loading gfortran 10.2.0 and related libraries ...
Loading gcc@10.2.0
Loading cmake%gcc@10.2.0
Loading openmpi%gcc@10.2.0
Loading netcdf-fortran%gcc@10.2.0
Loading netcdf-c%gcc@10.2.0
Loading hdf5%gcc@10.2.0
Loading gdb%gcc@10.2.0
Loading flex%gcc@10.2.0
Loading openjdk%gcc@10.2.0
Loading cdo%gcc@10.2.0
Loading nco%gcc@10.2.0
Loading ncview%gcc@10.2.0
Loading perl@5.30.3%gcc@10.2.0
Loading tau%gcc@10.2.0
Number of OpenMP threads: 1

Important environment variables:
CC  (C compiler)       : gcc
CXX (C++ compiler)     : g++
FC  (Fortran compiler) : gfortran
NETCDF_HOME            : /net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/spack/opt/spack/linux-centos7-x86_64/gcc-10.2.0/netcdf-c-4.7.4-22bkbtqledcaipqc2zrgun4qes7kkm5q
NETCDF_INCLUDE         : /net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/spack/opt/spack/linux-centos7-x86_64/gcc-10.2.0/netcdf-c-4.7.4-22bkbtqledcaipqc2zrgun4qes7kkm5q/include
NETCDF_LIB             : /net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/spack/opt/spack/linux-centos7-x86_64/gcc-10.2.0/netcdf-c-4.7.4-22bkbtqledcaipqc2zrgun4qes7kkm5q/lib
NETCDF_FORTRAN_HOME    : /net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/spack/opt/spack/linux-centos7-x86_64/gcc-10.2.0/netcdf-fortran-4.5.3-mtuoejjcl3ozbvd6prgqm44k5jre3hne
NETCDF_FORTRAN_INCLUDE : /net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/spack/opt/spack/linux-centos7-x86_64/gcc-10.2.0/netcdf-fortran-4.5.3-mtuoejjcl3ozbvd6prgqm44k5jre3hne/include
NETCDF_FORTRAN_LIB     : /net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/spack/opt/spack/linux-centos7-x86_64/gcc-10.2.0/netcdf-fortran-4.5.3-mtuoejjcl3ozbvd6prgqm44k5jre3hne/lib

Once you see this output, you can then start using programs that rely on these Spack-built libraries.

Setting the number of cores for OpenMP

If you type:

$ source ~/.spack.env

by itself, this will set the OMP_NUM_THREADS variable to 1. This variable sets the number of computational cores that OpenMP should use.

You can change this with, e.g.

source ~/.spack.env 6

which will set OMP_NUM_THREADS to 6. In this case, GEOS-Chem Classic (and other programs that use OpenMP parallelization) will parallelize with 6 cores.

If you are using the SLURM scheduler and are source .spack.env in your job script, then OMP_NUM_THREADS will be automatically set to SLURM_CPUS_PER_TASK, which is then number of cores requested. If you are not using SLURM then you should add e.g.

export OMP_NUM_THREADS=6

(or however many cores you have requested) in your SLURM job script.