GEOS-Chem Classic
This site provides instructions for GEOS-Chem Classic, the single-node mode of operation of the GEOS-Chem model. We provide instruction for downloading and compiling GEOS-Chem Classic, plus its required software libraries.
Note
If you would like to run GEOS-Chem on more than one node of a computing system, consider using GEOS-Chem High Performance (GCHP).
GEOS-Chem is a global 3-D model of atmospheric composition driven by assimilated meteorological observations from the Goddard Earth Observing System (GEOS) of the NASA Global Modeling and Assimilation Office. It is applied by research groups around the world to a wide range of atmospheric composition problems.
Cloning and building from source code ensures you will have direct access to the latest available versions of GEOS-Chem Classic, provides additional compile-time options, and allows you to make your own modifications to GEOS-Chem Classic source code.
Quickstart Guide
This quickstart guide is for quick reference on how to download, build, and run GEOS-Chem Classic, which is the single-node instance of GEOS-Chem.
Tip
Please also see our GCHP Quickstart Guide if you would like to run GEOS-Chem across using more than one computational node.
This guide assumes that your environment satisfies GEOS-Chem Classic hardware and software requirements. This means you should load a compute environment such that programs like cmake are available before continuing. If you do not have some of the required software dependencies, you can find instructions for installing external dependencies in our Spack instructions.
For simplicity we will also refer to GEOS-Chem Classic as simply GEOS-Chem on this page. More detailed instructions on downloading, compiling, and running GEOS-Chem can be found in the User Guide elsewhere on this site.
1. Clone GEOS-Chem Classic
Download the source code:
$ git clone --recurse-submodules https://github.com/geoschem/GCClassic.git GCClassic
$ cd GCClassic
Tip
If you wish, you may choose a different name for the source code folder, e.g.
$ git clone --recurse-submodules https://github.com/geoschem/GCClassic.git my_code_dir
$ cd my_code_dir
Upon download you will have the most recently released version. You can check what this is by printing the last commit in the git log and scanning the output for tag.
$ git log -n 1
Tip
To use an older GEOS-Chem Classic version (e.g. 14.0.0), follow these additional steps:
$ git checkout tags/14.0.0 # Points HEAD to the tag "14.0.0"
$ git branch version_14.0.0 # Creates a new branch at tag "14.0.0"
$ git checkout version_14.0.0 # Checks out the version_14.0.0 branch
$ git submodule update --init --recursive # Reverts submodules to the "14.0.0" tag
You can do this for any tag in the version history. For a list of all tags, type:
$ git tag
If you have any unsaved changes, make sure you commit those to a branch prior to updating versions.
2. Create a run directory
Navigate to the run/
subdirectory. To create a run
directory, run the script ./createRunDir.sh
:
$ cd run/
$ ./createRunDir.sh
Creating a run directory is interactive, meaning you will
be asked multiple questions to set up the simulation. For example,
running createRunDir.sh
will prompt questions about
configurable settings such as simulation type, grid resolution,
meteorology source, and number of vertical levels. It will also ask
you where you want to store your run directory and what you wish to
name it, including whether you want to use the default name,
e.g. gc_4x5_merra2_fullchem
. We recommend storing run
directories in a place that has a large storage capacity. It does
not need to be in the same location as your source code. When
creating a run directory you can quit and start from scratch at any
time.
For demonstration purposes, we will use a full chemistry simulation
run directory with the default name (gc_merra2_4x5_fullchem
).
The steps to setup and run other types of GEOS-Chem simulations follow
the same pattern as the examples shown below.
Attention
The first time you create a run directory, you will be asked to provide registration information. Please answer all of the questions, as it will help us to keep track of GEOS-Chem usage worldwide. We will also add your information to the GEOS-Chem People and Projects web page.
3. Load your Environment
Prior to building GEOS-Chem always make sure all libraries and environment variables are loaded. An easy way to do this is to write an environment file and load that file every time you work with GEOS-Chem. To make this extra easy you can create a symbolic link to your environment file within your run directory or for reference. For example, do the following in your new run directory to have a handy link to the environment you plan on using.
$ cd /path/to/gc_4x5_merra2_fullchem # Skip if you are already here
$ ln -s ~/envs/gcc.gfortran10.env gcc.env
Then every time you start up a session to work with GEOS-Chem in your run directory you can easily load your environment.
$ source gcc.env
4. Configure your build
You may build GEOS-Chem from within the run directory or from anywhere
else on your system. But we recommend that you always build GEOS-Chem
from within the run directory. This is convenient because it keeps
all build files in close proximity to where you will run the model.
For this purpose the GEOS-Chem run directory includes a build
directory called build/
.
First, navigate to the build/
folder of your run directory:
$ cd /path/to/gc_4x5_merra2_fullchem # Skip if you are already here
$ cd build
The next step is to configure your build. These
are persistent settings that are saved to your build directory. A
useful configuration option is -DRUNDIR
. This option lets you
specify one or more run directories that GEOS-Chem is “installed” to;
that is, where where the executable is copied, when you do
make install.
Configure your build so it installs GEOS-Chem to the run directory you
created in Step 2. The run directory is one directory level higher
than the build
directory. Also located one level higher than
the build directory is the CodeDir
symbolic link to the
top-level GEOS-Chem source code directory. Use the following command to
configure your build:
$ cmake ../CodeDir -DRUNDIR=..
GEOS-Chem has a number of additional configuration options you can add here. For example, to compile with RRTMG after running the above command:
Note
The .
in the cmake command above is
important. It tells CMake that your current working directory
(i.e., .
) is your build directory.
$ cmake . -DRRTMG=y
A useful configuration option is to build in debug mode. Doing this is a good idea if you encountered an error (such as a segmentation fault) in a previous run and need more information about where the error happened and why.
$ cmake . -DCMAKE_BUILD_TYPE=Debug
See the GEOS-Chem documentation for more information on configuration options.
5. Compile and install
Compiling GEOS-Chem Classic should take about a
minute, but it can vary depending on your system, your compiler, and
your configuration options. To maximize build speed you should compile
GEOS-Chem in parallel using as many cores as are available. Do this
with the -j
flag from the build/
directory:
# cd /path/to/gc_4x5_merra2_fullchem/build # Skip if you are already here
$ make -j
Upon successful compilation, install the compiled executable to your run directory:
$ make install
This copies executable build/bin/gcclassic
and supplemental
files to your run directory.
Note
You can update build settings at any time:
Navigate to your build directory.
Update your build settings with cmake (only if they differ since your last execution of cmake)
Recompile with make -j. Note that the build system automatically figures out what (if any) files need to be recompiled.
Install the rebuilt executable with make install.
If you do not install the executable to your run directory you can always get the executable from the directory build/bin.
6. Configure your run directory
Now, navigate to your run directory:
$ cd /path/to/gcc_4x5_merra2_fullchem
You should review these files before starting a simulation:
- geoschem_config.yml
Controls several frequently-updated simulation settings (e.g. start and end time, which operations to turn on/off, etc.)
- HISTORY.rc
Controls GEOS-Chem diagnostic settings.
- HEMCO_Diagn.rc
Controls emissions diagnostic settings via HEMCO.
- HEMCO_Config.rc
Controls which emissions inventories and other non-emissions data will be read from disk (via HEMCO).
Please see our Customize simulations with research options Supplemental Guide to learn how you can customize your simulation by activating alternate science options in your simulations.
Once you are satisfied that your simulation settings are correct, you may proceed to run GEOS-Chem.
7. Run GEOS-Chem Classic
If you used an environment file to load software libraries prior to building GEOS-Chem then you should load that file prior to running. To run GEOS-Chem Classic, type at the command line:
$ ./gcclassic
If you wish to send output to a log file, use:
$ ./gcclassic > GC.log 2>&1
We recommend running GEOS-Chem Classic as a batch job, although you can also do short runs interactively. Running GEOS-Chem as a batch job means that you write a script (usually bash) and then you submit that script to your local job scheduler (SLURM, LSF, etc.). If you write a batch script you can include sourcing your environment file within the script to ensure you always use the intended environment. Submitting GEOS-Chem as a batch job is slightly different depending on your scheduler. If you aren’t familiar with scheduling jobs on your system, ask your system administrator for guidance.
Those are the basics of using GEOS-Chem Classic! See this user guide, step-by-step guides, and reference pages for more detailed instructions.
Meet all hardware requirements
If you are a first-time GEOS-Chem Classic user, please take a moment to make sure that your computer system meets certain hardware and software requirements. These are described in the following chapters.
Computer system
You will need to have access to one (or both) of these types of computational resources in order to use GEOS-Chem Classic:
A Unix-like computer system
GEOS-Chem Classic can only be used on computers with operating systems that are Unix-like. This includes all flavors of Linux (e.g. Ubuntu, Fedora, Red-Hat, Rocky Linux, Alma Linux, etc) and BSD Unix (including MacOS X, which is a BSD derivative).
If your institution has computational resources (e.g. a shared computer cluster with many cores, sufficient disk storage and memory), then you can run GEOS-Chem Classic there. Contact your sysadmin or IT support staff for assistance.
An account on the Amazon Web Services cloud
If your institution lacks computational resources (or if you need additional computational resources), then you should consider signing up for access to the Amazon Web Services cloud. Using the cloud has the following advantages:
You can run GEOS-Chem without having to invest in local hardware and maintenance personnel.
You won’t have to download any meteorological fields or emissions data. All of the necessary data input for GEOS-Chem will be available on the cloud.
You can initialize your computational environment with all of the required software (e.g. compilers, libraries, utilities) that you need for GEOS-Chem.
Your GEOS-Chem runs will be 100% reproducible, because you will initialize your computational environment the same way every time.
You will avoid GEOS-Chem compilation errors due to library incompatibilities.
You will be charged for the computational time that you use, and if you download data off the cloud.
You can learn more about how to use GEOS-Chem on the cloud by visiting this tutorial (cloud.geos-chem.org).
Memory requirements
If you plan to run GEOS-Chem on a local computer system, please make sure that your system has sufficient memory to run your simulations.
Sufficient memory to run GEOS-Chem
For the \(4^{\circ}{\times}5^{\circ}\) “standard” simulation
8-15 GB RAM
For the \(2^{\circ}{\times} 2.5^{\circ}\) “standard” simulation:
30-40 GB RAM
20 GB memory (MaxRSS)
26 GB virtual memory (MaxVMSize)
Our standard GEOS-Chem Classic 1-month full-chemistry benchmark simulations use a little under 14 GB of system memory. This is mostly due to the fact that the benchmark simulations archive the “kitchen sink”—that is, most diagnostic outputs are requested so that the benchmark simulation can be properly evaluated. But a typical GEOS-Chem Classic production simulation would not require all of these diagnostic outputs, and thus would use much less memory than the benchmark simulations.
Extra memory for special simulations
You may want to consider at least 30 GB RAM if you plan on doing any of the following:
Running high-resolution (e.g. \(1^{\circ}{\times}1.25^{\circ}\) or higher resolution) global simulations
Running high-resolution (e.g. \(0.25^{\circ}{\times}0.3125^{\circ}\) or \(0.5^{\circ}{\times}0.625^{\circ}\)
Running \(2^{\circ}{\times}2.5^{\circ}\) and generating a lot of diagnostic output. The more diagnostics you turn on, the more memory GEOS-Chem Classic will require).
Disk space requirements
The following sections will help you assess how much disk space you will need on your server to store GEOS-Chem Classic input data and output data.
Space for GEOS-Chem Classic input data
The data format used by GEOS-Chem Classic is COARDS-compliant netCDF. This is a standard file format used for Earth Science applications. See our netCDF guide for more information.
Emissions input fields
Please see our Emissions input data section for more information.
Meteorology fields
The amount of disk space that you will need depends on two things:
Which type of met data you will use, and
How many years of met data you will download
Resolution |
Type |
Size GB/yr |
---|---|---|
\(1^{\circ}{\times}1.25^{\circ}\) |
Global |
~30 |
\(2^{\circ}{\times}2.5^{\circ}\) |
Global |
~110 |
\(0.5^{\circ}{\times}0.625^{\circ}\) |
Nested Asia (aka AS) |
~115 |
\(0.5^{\circ}{\times}0.625^{\circ}\) |
Nested Europe (aka EU) |
~58 |
\(0.5^{\circ}{\times}0.625^{\circ}\) |
Nested North America (aka NA) |
~110 |
Resolution |
Type |
Size GB/yr |
---|---|---|
\(1^{\circ}{\times}1.25^{\circ}\) |
Global |
~30 |
\(2^{\circ}{\times}2.5^{\circ}\) |
Global |
~120 |
\(0.25^{\circ}{\times}0.3125^{\circ}\) |
Nested Asia (aka AS) |
~175 |
\(0.25^{\circ}{\times}0.3125^{\circ}\) |
Nested Europe (aka EU) |
~175 |
\(0.25^{\circ}{\times}0.3125^{\circ}\) |
Nested North America (aka NA) |
~175 |
GCAP 2.0: to be added
Obtaining emissions data and met fields
There are several ways to obtain the input data required for GEOS-Chem classic. These are described in more detail in the following sections.
Perform a GEOS-Chem dry-run simulation;
Download and manage data with the bashdatacatalog tool;
Transfer data with Globus GEOS-Chem data (WashU) endpoint>.
Also see our Input data for GEOS-Chem Classic for more data download options.
Space for data generated by GEOS-Chem Classic
Monthly-mean output
We can look to the GEOS-Chem Classic full-chemistry benchmark simulations for a rough upper limit of how much disk space is needed for diagnostic output. The GEOS-Chem 13.0.0 vs. 12.9.0 1-month benchmark simulation generated approximately 837 MB/month of output. Of this amount, diagnostic output files accounted for ~646 MB and restart files accounted for ~191 MB.
We say that this is an upper limit, because benchmark simulations archive the “kitchen sink”–all species concentrations, various aerosol diagnostics, convective fluxes, dry dep fluxes and velocities, J-values, various chemical and meteorological quantities, transport fluxes, wet deposition diagnostics, and emissions diagnostics. Most GEOS-Chem users would probably not need to archive this much output.
GEOS-Chem Classic specialty simulations–simulations for species with first-order loss by prescribed oxidant fields (i.e. Hg, CH4, CO2, CO)–will produce much less output than the benchmark simulations. This is because these simulations typically only have a few species.
Reducing output file sizes
You may subset the horizontal and vertical size of the diagnostic output files in order to save space. For more information, please see our section on GEOS-Chem History diagnostics.
Furthermore, since GEOS-Chem 13.0.0, we have modified the diagnostic code so that diagnostic arrays are only dimensioned with enough elements necessary to save out the required output. For example, if you only wish to output the SpeciesConc_O3 diagnostic, GEOS-Chem will dimension the relevant array with (NX,NY,NZ,1) elements (1 because we are only archiving 1 species). This can drastically reduce the amount of memory that your simulation will require.
Timeseries output
Archiving hourly or daily timeseries output would require much more disk space than the monthly-mean output. The disk space actually used will depend on how many quantities are archived and what the archival frequency is.
Meet all software requirements
If you are a first-time GEOS-Chem Classic user, please take a moment to make sure that your computer system meets certain hardware and software requirements. These are described in the following chapters.
Supported compilers
GEOS-Chem is written in the Fortran programming language. However, you will also need C and C++ compilers to install certain libraries (like netCDF) on your system.
Intel
The Intel Compiler Suite is our recommended proprietary compiler suite.
Intel compilers produce well-optimized code that runs extremely efficiency on machines with Intel CPUs. Many universities and institutions will have an Intel site license that allows you to use these compilers.
The GCST has tested GEOS-Chem Classic with these versions (but others may work as well):
23.0.0
19.0.5.281
19.0.4
18.0.5
17.0.4
15.0.0
13.0.079
11.1.069
Best way to install: Direct from Intel (may require purchase of a site license or a student license)
Tip
Intel 2021 may be obtained for free, or installed with a package manager such as Spack.
GNU
The GNU Compiler Collection (or GCC for short) is our recommended open-source compiler suite.
Because the GNU Compiler Collection is free and open source, this is a good choice if your institution lacks an Intel site license, or if you are running GEOS-Chem on the Amazon EC2 cloud environment.
The GCST has tested GEOS-Chem Classic with these versions (but others may work as well):
12.2.0
11.2.0
11.1.0
10.2.0
9.3.0
9.2.0
8.2.0
7.4.0
7.3.0
7.1.0
6.2.0
Best way to install: With Spack.
Required software packages
Git
Git is the de-facto software industry standard package for source code management. A version of Git usually ships with most Linux OS builds.
The GEOS-Chem source code can be downloaded using the Git source code management system. GEOS-Chem software repositories are stored at the https://github.com/geoschem organization page.
Best way to install: git-scm.com/downloads. But first check if you have a version of Git pre-installed.
CMake
CMake is software that directs how the GEOS-Chem source code is compiled into an executable. You will need CMake version 3.13 or later to build GEOS-Chem Classic.
Best way to install: With Spack.
GNU Make
GNU Make is software that can build executables from Makefiles that are created by CMake.
While GNU Make is not required for GEOS-Chem 13.0.0 and later, some external libraries that you might need to build will require GNU Make. Therefore it is best to download GNU Make along with CMake.
Best way to install: With Spack.
netCDF
GEOS-Chem input and output data files use the netCDF file format (cf. netCDF). NetCDF is a self-describing file format that allows meadata (descriptive text) to be stored alongside data values.
Best way to install: With Spack.
Optional but recommended software packages
GCPy
GCPy is our recommended python companion software to GEOS-Chem.
While GCPy is not a general-purpose plotting package, it does contain many useful functions for creating zonal mean and horizontal plots from GEOS-Chem output. It also contains scripts to generate plots and tables from GEOS-Chem benchmark simulations.
Best way to install: With Mamba or Conda (see gcpy.readthedocs.io)
gdb and cgdb
The GNU debugger (gdb) and its graphical interface (cgdb) are very useful tools for tracking down the source of GEOS-Chem errors, such as segmentation faults, out-of-bounds errors, etc.
Best way to install: With Spack.
ncview
The ncview program is a netCDF file viewer. While it does not produce publication-quality output, ncview can let you easily examine the contents of a netCDF data file (such as those which are input and output by GEOS-Chem). Ncview is very useful for debugging and development.
nco
The netCDF operators (nco) are powerful command-line tools for editing and manipulating data in netCDF format.
Best way to install: With Spack.
cdo
The Climate Data Operators (cdo) are powerful command-line utilities for editing and manipulating data in netCDF format.
Best way to install: With Spack.
KPP
The Kinetic PreProcessor (KPP) translates a chemical mechanism specification from user-configurable input files to Fortran-90 source code. You will need to use KPP if you plan on updating any of the chemical mechanisms that ship with GEOS-Chem.
Best way to install: Clone from github.com/KineticPreProcessor/KPP and build the the KPP executable from source.
flex and bison
Flex is the Fast Lexical Analyzer, and bison is a general purpose parser-generator. KPP uses both flex and bison to parse chemical mechanism definition files. Depending on your setup, these packages might have already been installed for you.
Best way to install: With Spack.
Customize your login environment
Tip
You may skip ahead if you will be using GEOS-Chem Classic on an Amazon EC2 cloud instance. When you initialize the EC2 instance with one of the pre-configured Amazon Machine Images (AMIs) all of the required software libraries will be automatically loaded.
Each time you log in to your computer system, you’ll need to load the software libraries needed by GEOS-Chem into your environment. You can do this with a script known as an environment file, as described in the following chapters:
Environment files
An environment file is a script that:
Loads software libraries into your login environment. This is often done with a module manager such as lmod, spack, or environment-modules.
Stores settings for GEOS-Chem and its dependent libraries in shell variables called environment variables.
You will source the environment file each time you log in with a command such as:
$ . ~/my-environment-file # or whatever you name it
Tip
Keep a separate environment file for each combination of modules that you will use. Example environment files for GNU and Intel compilers and related software are provided in the following sections.
For general information about how libraries are loaded, see Load software into your environment.
Sample environment file for GNU 10.2.0 compilers
Below is a sample environment file (based on an enviroment file for the Harvard Cannon computer cluster). This file will load software libraries built with the GNU 10.2.0 compilers.
Save the code below (with any appropriate modifications for your own
computer system) to a file named ~/gcclassic.gnu10.env
.
#==============================================================================
# Load software packages (EDIT AS NEEDED)
#==============================================================================
# Unload all modules first
module purge
# Load modules
module load gcc/10.2.0-fasrc01 # gcc / g++ / gfortran
module load openmpi/4.1.0-fasrc01 # MPI
module load netcdf-c/4.8.0-fasrc01 # netcdf-c
module load netcdf-fortran/4.5.3-fasrc01 # netcdf-fortran
module load flex/2.6.4-fasrc01 # Flex lexer (needed for KPP)
module load cmake/3.25.2-fasrc01 # CMake (needed to compile)
#==============================================================================
# Environment variables and related settings
# (NOTE: Lmod will define <module>_HOME variables for each loaded module
#==============================================================================
# Make all files world-readable by default
umask 022
# Set number of threads for OpenMP. If running in a SLURM environment,
# use the number of requested cores. Otherwise use 8 cores for OpenMP.
if [[ "x${SLURM_CPUS_PER_TASK}" == "x" ]]; then
export OMP_NUM_THREADS=8
else
export OMP_NUM_THREADS="${SLURM_CPUS_PER_TASK}"
fi
# Max out the stacksize memory limit
export OMP_STACKSIZE="500m"
# Compilers
export CC="gcc"
export CXX="g++"
export FC="gfortran"
export F77="${FC}"
# netCDF
if [[ "x${NETCDF_HOME}" == "x" ]]; then
export NETCDF_HOME="${NETCDF_C_HOME}"
fi
export NETCDF_C_ROOT="${NETCDF_HOME}"
export NETCDF_FORTRAN_ROOT="${NETCDF_FORTRAN_HOME}"
# KPP 3.0.0+
export KPP_FLEX_LIB_DIR="${FLEX_HOME}/lib64"
#==============================================================================
# Set limits
#==============================================================================
ulimit -c unlimited # coredumpsize
ulimit -u 50000 # maxproc
ulimit -v unlimited # vmemoryuse
ulimit -s unlimited # stacksize
#==============================================================================
# Print information
#==============================================================================
module list
Tip
Ask your sysadmin how to load software libraries. If you are using your institution’s computer cluster, then chances are there will be a software module system installed, with commands similar to those listed above.
Then you can activate these seetings from the command line by typing:
$ . ~/gcclassic.gnu10.env
You may also place the above command within your GEOS-Chem run script, which will be discussed in a subsequent chapter.
Sample environment file for Intel 2023 compilers
Below is a sample environment file (based on an enviroment file for the Harvard Cannon computer cluster). This file will load software libraries built with the Intel 2023 compilers.
Add the code below (with the appropriate modifications for your
system) into a file named ~/gcclassic.intel23.env
.
#==============================================================================
# Load software packages (EDIT AS NEEDED)
#==============================================================================
# Unload all modules first
module purge
# Load modules
module load intel/23.0.0-fasrc01 # icc / i++ / gfortran
module load intelmpi/2021.8.0-fasrc01 # MPI
module load netcdf-fortran/4.6.0-fasrc03 # netCDF-Fortran
module load flex/2.6.4-fasrc01 # Flex lexer (needed for KPP)
module load cmake/3.25.2-fasrc01 # CMake (needed to compile)
#==============================================================================
# Environment variables and related settings
# (NOTE: Lmod will define <module>_HOME variables for each loaded module
#==============================================================================
# Make all files world-readable by default
umask 022
# Set number of threads for OpenMP. If running in a SLURM environment,
# use the number of requested cores. Otherwise use 8 cores for OpenMP.
if [[ "x${SLURM_CPUS_PER_TASK}" == "x" ]]; then
export OMP_NUM_THREADS=8
else
export OMP_NUM_THREADS="${SLURM_CPUS_PER_TASK}"
fi
# Max out the stacksize memory limit
export OMP_STACKSIZE="500m"
# Compilers
export CC="icx"
export CXX="icx"
export FC="ifort"
export F77="${FC}"
# netCDF
if [[ "x${NETCDF_HOME}" == "x" ]]; then
export NETCDF_HOME="${NETCDF_C_HOME}"
fi
export NETCDF_C_ROOT="${NETCDF_HOME}"
export NETCDF_FORTRAN_ROOT="${NETCDF_FORTRAN_HOME}"
# KPP 3.0.0+
export KPP_FLEX_LIB_DIR="${FLEX_HOME}/lib64"
#==============================================================================
# Set limits
#==============================================================================
ulimit -c unlimited # coredumpsize
ulimit -u 50000 # maxproc
ulimit -v unlimited # vmemoryuse
ulimit -s unlimited # stacksize
#==============================================================================
# Print information
#==============================================================================
module list
Tip
Ask your sysadmin how to load software libraries. If you are using your institution’s computer cluster, then chances are there will be a software module system installed, with commands similar to those listed above.
Then you can activate these settings from the command line by typing:
$ . ~/gcclassic.intel23.env
You may also place the above command within your GEOS-Chem run script, which will be discussed in a subsequent chapter.
Set environment variables for compilers
The sample GNU and Intel environment files set the environment variables listed below in order to select the desired C, C++, and Fortran compilers:
Variable |
Specifies the: |
GNU name |
Intel name |
---|---|---|---|
|
C compiler |
|
|
|
C++ compiler |
|
|
|
Fortran compiler |
|
|
Note
GEOS-Chem Classic only requires the Fortran compiler. But you will also need the C and C++ compilers if you plan to build other software packages (such as KPP) or install libraries manually.
Also, older Intel compiler versions used icc
as the name
for the C compiler and icpc
as the name of the C++ compiler.
These names have been deprecated in Intel 2023 and will be removed
from future Intel compiler releases.
The commands used to define CC
, CXX
, and
FC
are:
# for GNU
export CC=gcc
export CXX=g++
export FC=gfortran
or
# for Intel
export CC=icx
export CXX=icx
export FC=ifort
Set environment variables for parallelization
GEOS-Chem Classic uses OpenMP parallelization, which is an implementation of shared-memory (aka serial) parallelization.
Important
OpenMP-parallelized programs (such as GEOS-Chem Classic) cannot execute on more than 1 computational node. Most modern computational nodes typically contain between 16 and 64 cores. Therefore, GEOS-Chem Classic simulations will not be able to take advantage of more cores than these.
We recommend that you consider using GCHP for more computationally-intensive simulations.
In the the sample environment files for GNU and Intel, we define the following environment varaiables for OpenMP parallelization:
- OMP_NUM_THREADS
The
OMP_NUM_THREADS
environment variable sets the number of computational cores (aka threads) that you would like GEOS-Chem Classic to use.For example, the command below will tell GEOS-Chem Classic to use 8 cores within parallel sections of code:
$ export OMP_NUM_THREADS=8
We recommend that you define
OMP_NUM_THREADS
not only in your environment file, but also in your GEOS-Chem run script.
- OMP_STACKSIZE
In order to use GEOS-Chem Classic with OpenMP parallelization, you must request the maximum amount of stack memory in your Unix environment. (The stack memory is where local automatic variables and temporary
$OMP PRIVATE
variables will be created.)Add the following lines to your system startup file (e.g.
.bashrc
) and to your GEOS-Chem run scripts:ulimit -s unlimited export OMP_STACKSIZE=500m
The ulimit -s unlimited will tell the bash shell to use the maximum amount of stack memory that is available.
The environment variable
OMP_STACKSIZE
must also be set to a very large number. In this example, we are nominally requesting 500 MB of memory. But in practice, this will tell the GNU Fortran compiler to use the maximum amount of stack memory available on your system. The value 500m is a good round number that is larger than the amount of stack memory on most computer clusters, but you can increase this if you wish.
Errors caused by incorrect environment variable settings
Be on the lookout for these errors:
If
OMP_NUM_THREADS
is set to 1, then your simulation will execute using only one computational core. This will make your simulation take much longer than necessary.
If
OMP_STACKSIZE
environment variable is not included in your environment file (or if it is set to a very low value), you might encounter a segmentation fault error after the TPCORE transport module is initialized. In this case, GEOS-Chem Classic “thinks” that it does not have enough memory to perform the simulation, even though sufficient memory may be present.
Key references
Bey et al. [2001] is the first reference to GEOS-Chem that includes a detailed model description. It is suitable as an original reference for the model. It only describes a model for gas-phase tropospheric oxidant chemistry. Subsequent original references for major additional model features are:
Park et al. [2004] for aerosol chemistry;
Y.X. Wang et al. [2004] for the nested model;
Henze et al. [2007] for the model adjoint;
Selin et al. [2007] for the mercury simulation;
Trivitayanurak et al. [2008] for TOMAS aerosol microphysics;
Yu and Luo [2009] for APM aerosol microphysics;
Eastham et al. [2014] and for stratospheric chemistry;
Long et al. [2015] for the grid-independent GEOS-Chem;
Eastham et al. [2018] for the high-performance GEOS-Chem (GCHP);
Hu et al. [2018] for GEOS-Chem within the GEOS ESM (GEOS-GC);
Lin et al. [2020] for GEOS-Chem within WRF (WRF-GC);
Zhuang et al. [2019] and Zhuang et al. [2020] for implementations of GEOS-Chem Classic and GCHP on the cloud;
Bindle et al. [2021] for the stretched-grid capability in GCHP;
Murray et al. [2021] for GEOS-Chem driven by GISS GCM fields (GCAP 2.0);
Bukosa et al. [2023] for the carbon simulation;
Lin et al. [2023] for KPP 3.0.0 with adaptive auto-reduction solver.
References
Bey, I., Jacob, D. J., Yantosca, R. M., Logan, J. A., Field, B. D., Fiore, A. M., Li, Q., Liu, H. Y., Mickley, L. J., and Schultz, M. G. Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation. J. Geophys. Res., 106(D19):23073–23095, Oct 2001. doi:10.1029/2001JD000807.
Bindle, L., Martin, R. V., Cooper, M. J., Lundgren, E. W., Eastham, S. D., Auer, B. M., Clune, T. L., Weng, H., Lin, J., Murray, L. T., Meng, J., Keller, C. A., Putman, W. M., Pawson, S., and Jacob, D. J. Grid-stretching capability for the GEOS-Chem 13.0.0 atmospheric chemistry model. Geosci. Model Dev., 14(10):5977–5997, 2021. doi:10.5194/gmd-14-5977-2021.
Bukosa, B., Fisher, J., Deutscher, N., and Jones, D. A Coupled CH4, CO and CO2 Simulation for Improved Chemical Source Modelling. Atmosphere, 14:764, 2023. doi:10.3390/atmos14050764.
Eastham, S.D., Weisenstein, D.K., and Barrett, S.R.H. Development and evaluation of the unified tropospheric-stratospheric chemistry extension (UCX) for the global chemistry-transport model GEOS-Chem. Atmos. Env., 89:52–63, 2014. doi:10.1016/j.atmosenv.2014.02.001.
Eastham, S. D., Long, M. S., Keller, C. A., Lundgren, E., Yantosca, R. M., Zhuang, J., Li, C., Lee, C. J., Yannetti, M., Auer, B. M., Clune, T. L., Kouatchou, J., Putman, W. M., Thompson, M. A., Trayanov, A. L., Molod, A. M., Martin, R. V., and Jacob, D. J. GEOS-Chem High Performance (GCHP v11-02c): a next-generation implementation of the GEOS-Chem chemical transport model for massively parallel applications. Geoscientific Model Development, 11(7):2941–2953, July 2018. doi:10.5194/gmd-11-2941-2018.
Henze, D.K., Hakami, A., and Seinfeld, J.H. Development of the adjoint of GEOS-Chem. Atmos. Chem. Phys., 7:2413–2433, 2007. doi:10.5194/acp-7-2413-2007.
Hu, L., C.A. Keller and, M.S. L., Sherwen, T., Auer, B., Silva, A. D., Nielsen, J.E., Pawson, S., Thompson, M.A., Trayanov, A.L., Travis, K.R., Grange, S.K., Evans, M.J., and Jacob, D.J. Global simulation of tropospheric chemistry at 12.5km resolution: performance and evaluation of the GEOS-Chem chemical module (v10-1) within the NASA GEOS Earth System Model (GEOS-5 ESM). Geosci. Model Dev., 11:4603–4620, 2018. doi:10.5194/gmd-11-4603-2018.
Keller, C. A., M.S. Long, Yantosca, R.M., Silva, A.M. D., Pawson, S., and Jacob, D.J. HEMCO v1.0: a versatile, ESMF-compliant component for calculating emissions in atmospheric models. Geosci. Model Dev., 7(4):1409–1417, July 2014. doi:10.5194/gmd-7-1409-2014.
Lin, H., Feng, X., Fu, T.-M., Tian, H., Ma, Y., Zhang, L., Jacob, D. J., Yantosca, R. M., Sulprizio, M. P., Lundgren, E. W., Zhuang, J., Zhang, Q., Lu, X., Zhang, L., Shen, L., Guo, J., Eastham, S. D., and Keller, C. A. WRF-GC (v1.0): online coupling of WRF (v3.9.1.1) and GEOS-Chem (v12.2.1) for regional atmospheric chemistry modeling – Part 1: Description of the one-way model. Geosci. Model. Dev., 13:3241–3265, 2020. doi:10.5194/gmd-13-3241-2020.
Lin, H., Long, M. S., Sander, R., Sandu, A., Yantosca, R. M., Estrada, L. A., Shen, L., and Jacob, D. J. An adaptive auto-reduction solver for speeding up integration of chemical kinetics in atmospheric chemistry models: implementation and evaluation within the Kinetic Pre-Processor (KPP) version 3.0.0. J. Adv. Model. Earth Syst., pages 2022MS003293, 2023. doi:10.1029/2022MS003293.
Lin, H., Jacob, D. J., Lundgren, E. W., Sulprizio, M. P., Keller, C. A., Fritz, T. M., Eastham4, S. D., Emmons, L. K., Campbell, P. C., Baker, B., Saylor, R. D., and Montuoro, R. Harmonized Emissions Component (HEMCO) 3.0 as a versatile emissions component for atmospheric models: application in the GEOS-Chem, NASA GEOS, WRF-GC, CESM2, NOAA GEFS-Aerosol, and NOAA UFS models. Geosci. Model. Dev., 14:5487–5506, 2021. doi:0.5194/gmd-14-5487-2021.
Long, M.S., and. J.E. Nielsen, R. Y., Keller, C.A., da Silva, A., Sulprizio, M.P., Pawson, S., and Jacob, D.J. Development of a grid-independent GEOS-Chem chemical transport model (v9-02) as an atmospheric chemistry module for Earth system models. Geosci. Model Dev., 8(3):595–602, March 2015. doi:10.5194/gmd-8-595-2015.
Luo, G., Yu, F., and Moch, J. Further improvement of wet process treatments in GEOS-Chem v12.6.0: impact on global distributions of aerosols and aerosol precursors. Geosci. Model. Dev., 13:2879–2903, 2020. doi:10.5194/gmd-13-2879-2020.
Murray, L.T., Leibensperger, E.M., Orbe, C., Mickley, L.J., and Sulprizio, M. GCAP 2.0: a global 3-D chemical-transport model framework for past, present, and future climate scenarios. Geosci. Model Dev., 14:5789–5823, 2021. doi:10.5194/gmd-14-5789-2021.
Park, R.J., Jacob, D.J., Field, B.D., R.M. Yantosca, and Chin, M. Natural and transboundary pollution influences on sulfate-nitrate-ammonium aerosols in the United States: implications for policy. J. Geophys. Res., 109(D15):204ff, 2004. doi:10.1029/2003JD004473.
Philip, S., Martin, R. V., and Keller, C. A. Sensitivity of chemistry-transport model simulations to the duration of chemical and transport operators: a case study with GEOS-Chem v10-01. Geosci. Model Dev., 9:1683–1695, 2016. doi:10.5194/gmd-9-1683-2016.
Selin, N.E., D.J. Jacob, Park, R.J., Yantosca, R.M., Strode, S., L. Jaeglé, and Jaffe, D. Chemical cycling and deposition of atmospheric mercury: Global constraints from observations. J. Geophys. Res., 112(D02308):, 2007. doi:10.1029/2006JD007450.
Trivitayanurak, W., Adams, P., Spracklen, D., and Carslaw, K. Tropospheric aerosol microphysics simulation with assimilated meteorology: model description and intermodel comparison. Atmos. Chem. Phys., 8:3149–3168, 2008.
Y.X. Wang, McElroy, M.B., Jacob, D.J., and Yantosca, R.M. A Nested Grid Formulation for Chemical Transport over Asia: Applications to CO. J. Geophys. Res., 109(D22):307ff, 2004. doi:10.1029/2004jd005237.
Yu, F. and Luo, G. Simulation of particle size distribution with a global aerosol model: Contribution of nucleation to aerosol and CCN number concentrations. Atmos. Chem. Phys., 9(7):7691–7710, 2009.
Zhuang, J., D.J. Jacob, J. Flo-Gaya, Yantosca, R.M., Lundgren, E.W., Sulprizio, M.P., and Eastham, S.D. Enabling immediate access to Earth science models through cloud computing: application to the GEOS-Chem model. Bull. Amer. Met. Soc., pages 1943–1960, October 2019. doi:10.1175/BAMS-D-18-0243.1.
Zhuang, J., Jacob, D. J., Lin, H., Lundgren, E. W., Yantosca, R. M., Gaya, J. F., Sulprizio, M. P., and Eastham, S. D. Enabling High-Performance Cloud Computing for Earth Science Modeling on Over a Thousand Cores: Application to the GEOS-Chem Atmospheric Chemistry Model. Journal of Advances in Modeling Earth Systems, May 2020. doi:10.1029/2020MS002064.
Download source code
In the following chapters, you will learn how to download the GEOS-Chem source code from Github.
Source code repositories
The GEOS-Chem Classic source code is distributed into 3 Github repositories, as described below. This setup allows the GEOS-Chem core science routines to be easily integrated into several modeling contexts, such as:
GEOS-Chem Classic
GCHP
GEOS-Chem within the NASA/GEOS ESM
GEOS-Chem within CESM
GEOS-Chem withn WRF (aka WRF-GC)
This repository setup also aligns with our GEOS-Chem Vision and Mission statements.
GEOS-Chem Science Codebase
The GEOS-Chem “Science” Codebase repository (https://github.com/geoschem/geos-chem) contains the GEOS-Chem science routines, plus:
Scripts to create GEOS-Chem run directories, plus template configuration files for all simulations;
Scripts to create GEOS-Chem integration tests;
Interfaces (i.e. the driver programs) for GEOS-Chem Classic, GCHP, etc.
HEMCO
The HEMCO repository (https://github.com/geoschem/HEMCO) contains the source code for the Harmonized Emissions Component, which is used to read and regrid emissions, met fields, and other inputs to GEOS-Chem.
GCClassic
The GCClassic repository (https://github.com/geoschem/GCClassic) is a lightweight wrapper that encompasses GEOS-Chem and HEMCO. We say that GCClassic is the superproject (i.e. top-level source code folder), and that GEOS-Chem (science codebase) and HEMCO are submodules.
Download instructions
Follow these directions to download the GEOS-Chem Classic source code.
Clone GCClassic and fetch submodules
To download the latest stable GEOS-Chem Classic version, type:
$ git clone --recurse-submodules https://github.com/geoschem/GCClassic.git
This command does the following:
Clones the GCClassic repo from GitHub to a local folder named
GCClassic
;Clones the GEOS-Chem Science Codebase repo from GitHub to
GCClassic/src/GEOS-Chem
; andClones the HEMCO repo from GitHub to
GCClassic/src/HEMCO
.
Tip
To download GEOS-Chem Classic source code into a folder named
something other than GCClassic
, supply the name of the
folder at the end of the git clone command. For example:
git clone --recurse-submodules https://github.com/geoschem/GCClassic.git my-code-dir
will download the GEOS-Chem Classic source code into
my-code-dir
instead of GCClassic
.
Once the git clone process starts, you should see output similar to this:
Cloning into 'GCClassic'...
remote: Enumerating objects: 2680, done.
remote: Counting objects: 100% (1146/1146), done.
remote: Compressing objects: 100% (312/312), done.
remote: Total 2680 (delta 858), reused 1099 (delta 825), pack-reused 1534
Receiving objects: 100% (2680/2680), 1.74 MiB | 13.16 MiB/s, done.
Resolving deltas: 100% (1411/1411), done.
Submodule 'docs/source/geos-chem-shared-docs' (https://github.com/geoschem/geos-chem-shared-docs.git) registered for path 'docs/source/geos-chem-shared-docs'
Submodule 'src/GEOS-Chem' (https://github.com/geoschem/geos-chem.git) registered for path 'src/GEOS-Chem'
Submodule 'src/HEMCO' (https://github.com/geoschem/hemco.git) registered for path 'src/HEMCO'
Cloning into '/local/ryantosca/GC/rundirs/epa-kpp/tmp/GCClassic/docs/source/geos-chem-shared-docs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 148 (delta 77), reused 116 (delta 45), pack-reused 0
Receiving objects: 100% (148/148), 162.29 KiB | 2.90 MiB/s, done.
Resolving deltas: 100% (77/77), done.
Cloning into '/local/ryantosca/GC/rundirs/epa-kpp/tmp/GCClassic/src/GEOS-Chem'...
remote: Enumerating objects: 75574, done.
remote: Counting objects: 100% (410/410), done.
remote: Compressing objects: 100% (187/187), done.
remote: Total 75574 (delta 238), reused 364 (delta 216), pack-reused 75164
Receiving objects: 100% (75574/75574), 85.23 MiB | 30.59 MiB/s, done.
Resolving deltas: 100% (62327/62327), done.
Cloning into '/local/ryantosca/GC/rundirs/epa-kpp/tmp/GCClassic/src/HEMCO'...
remote: Enumerating objects: 3178, done.
remote: Counting objects: 100% (638/638), done.
remote: Compressing objects: 100% (195/195), done.
remote: Total 3178 (delta 476), reused 585 (delta 438), pack-reused 2540
Receiving objects: 100% (3178/3178), 2.24 MiB | 11.87 MiB/s, done.
Resolving deltas: 100% (2270/2270), done.
Submodule path 'docs/source/geos-chem-shared-docs': checked out '228507857eb53740dacf4055ce9268aa8ccf520d'
Submodule path 'src/GEOS-Chem': checked out '7e51a0674aba638c8322fef493ac9251095e8cf4'
Submodule path 'src/HEMCO': checked out '4a66bae48f33e6dc22cda5ec9d4633192dee2f73'
Submodule 'docs/source/geos-chem-shared-docs' (https://github.com/geoschem/geos-chem-shared-docs.git) registered for path 'src/HEMCO/docs/source/geos-chem-shared-docs'
Cloning into '/local/ryantosca/GC/rundirs/epa-kpp/tmp/GCClassic/src/HEMCO/docs/source/geos-chem-shared-docs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 148 (delta 77), reused 116 (delta 45), pack-reused 0
Receiving objects: 100% (148/148), 162.29 KiB | 3.00 MiB/s, done.
Resolving deltas: 100% (77/77), done.
Submodule path 'src/HEMCO/docs/source/geos-chem-shared-docs': checked out '645401baa35b6a6838b9bedede309a01a311517f'
When the git clone process has finished, navigate into the
GCClassic
folder and get a directory listing:
$ cd GCClassic
$ ls -CF src/*
and you will see output similar to this:
src/CMakeLists.txt src/gc_classic_version.H@ src/main.F90@
src/GEOS-Chem:
APM/ CMakeScripts/ GeosUtil/ History/ lib/ ObsPack/ run/
AUTHORS.txt doc/ GTMM/ Interfaces/ LICENSE.txt PKUCPL/
bin/ GeosCore/ Headers/ ISORROPIA/ mod/ README.md
CMakeLists.txt GeosRad/ help/ KPP/ NcdfUtil/ REVISIONS
src/HEMCO:
AUTHORS.txt CMakeLists.txt CMakeScripts/ LICENSE.txt README.md run/ src/
This confirms that the GCClassic/src/GEOS-Chem
and
GCClassic/src/HEMCO
folders have been populated with source
code from the GEOS-Chem Science Codebase
and HEMCO GitHub repositories.
Tip
To use an older GEOS-Chem Classic version (e.g. 14.0.0), follow these additional steps:
$ git checkout tags/14.0.0 # Points HEAD to the tag "14.0.0"
$ git branch version_14.0.0 # Creates a new branch at tag "14.0.0"
$ git checkout version_14.0.0 # Checks out the version_14.0.0 branch
$ git submodule update --init --recursive # Reverts submodules to the "14.0.0" tag
You can do this for any tag in the version history. For a list of all tags, type:
$ git tag
If you have any unsaved changes, make sure you commit those to a branch prior to updating versions.
Create a branch in src/GEOS-Chem for your work
Whter the git clone command described above finishes, the GEOS-Chem
Science Codebase submodule code (in folder
GCClassic/src/GEOS-Chem
) and the HEMCO submodule code (in folder
GCClassic/src/HEMCO
) will be in detached HEAD state. In
other words, the code is checked out but a branch is not
created. Adding new code to a detached HEAD state is very dangerous
and should be avoided. You should instead make a branch it the same
point as the detached HEAD, and then add your own modifications into
that branch.
Navigate from GCClassic
to GCClassic/src/GEOS-Chem
:
$ cd src/GEOS-Chem
and then type:
$ git branch
You will see output similar to this:
*(HEAD detached at xxxxxxxx)
main
where xxxxxxxx
denotes the hash of the commit at which the code
has been checked out.
At ths point, you may now create a branch in which to store your own modifications to the GEOS-Chem science codebase. Type:
$ git branch feature/my-git-updates
$ git checkout feature/my-git-updates
Note
This naming convention adheres to the
Github Flow
conventions (i.e. new feature branches start with
feature/
, bug fix branches start with bugfix/
, etc.
Instead of feature/my-git-updates
, you may choose a name that reflects
the nature of your updates (e.g. feature/new_reactions
, etc.) If
you now type:
$ git branch
You will see that we are checked out onto the branch that you just created and are no longer in detached HEAD state.
* feature/my-git-updates
main
At this point, you may proceed to add your modifications into the GEOS-Chem Science Codebase.
Note
If you need to also modify HEMCO
source code, repeat the process above to create your own working
branch in GCClassic/src/HEMCO
.
See additional resources
For more information about downloading the GEOS-Chem source code, please see the following Youtube video tutorials:
Getting started with GEOS-Chem 13 (by Melissa Sulprizio)
Managing branches between superproject and submodules (by Bob Yantosca)
Create a run directory
We have greatly simplified run directory creation in GEOS-Chem Classic 13.0.0 and later versions. You no longer need to download the separate GEOS-Chem Unit Tester repository, but can create run directories from a script in the GEOS-Chem source code itself.
Please see the following sections for more information on how to create run directories for GEOS-Chem Classic simulations:
First-time user registration
We have introduced a online user registration system starting with GEOS-Chem Classic 14.0.0. The first time that you create a run directory, you will be prompted to provide contact information and a summary about how you plan to use GEOS-Chem. This information will be kept at a secure cloud-based server.
Even if you are a long-time GEOS-Chem user, we ask that you answer all of the questions. Your responses will help us to keep an accurate count of GEOS-Chem users and to keep the list of GEOS-Chem users current.
The user registration dialog (and where you will type in your repsonses) is shown below.
===========================================================
GEOS-CHEM RUN DIRECTORY CREATION
===========================================================
Initiating User Registration:
You will only need to fill this information out once.
Please respond to all questions.
-----------------------------------------------------------
What is your name?
-----------------------------------------------------------
>>> type your response and hit ENTER
-----------------------------------------------------------
What is your email address?
-----------------------------------------------------------
>>> type your response and hit ENTER
-----------------------------------------------------------
What is the name of your research institution?
-----------------------------------------------------------
>>> type your response and hit ENTER
-----------------------------------------------------------
What is the name of your principal invesigator?
(Enter 'self' if you are the principal investigator.)
-----------------------------------------------------------
>>> type your response and hit ENTER
-----------------------------------------------------------
Please provide the web site for your institution
(group website, company website, etc.)?
-----------------------------------------------------------
>>> type your response and hit ENTER
-----------------------------------------------------------
Please provide your github username (if any) so that we
can recognize you in submitted issues and pull requests.
-----------------------------------------------------------
>>> type your response and hit ENTER
-----------------------------------------------------------
Where do you plan to run GEOS-Chem?
(e.g. local compute cluster, AWS, other supercomputer)?
-----------------------------------------------------------
>>> type your response and hit ENTER
-----------------------------------------------------------
Please briefly describe how you plan on using GEOS-Chem
so that we can add you to 'GEOS-Chem People and Projects'
(https://geoschem.github.io/geos-chem-people-projects-map/)
-----------------------------------------------------------
>>> type your response and hit ENTER
Successful Registration
If you do not see the Successful Registraton
message, check
your internet connection and try again. If the problem persists,
open a new Github issue.
Example: Create a full-chemistry simulation run directory
Let us walk through the process of creating a run directory for a global GEOS-Chem full-chemistry simulation.
Navigate to the GCClassic superproject folder and get a directory listing:
$ cd /path/to/your/GCClassic $ ls -CF
You should see this output:
AUTHORS.txt CMakeScripts/ LICENSE.txt SUPPORT.md run@ test@ CMakeLists.txt CONTRIBUTING.md README.md docs/ src/
As mentioned previously,
run@
is a symbolic link. It actually points to the to thesrc/GEOS-Chem/run/GCClassic
folder. This folder contains several scripts and template files for run directory creation.
Navigate to the run folder and get a directory listing:
$ cd run $ ls -CF
and you should see this output:
HEMCO_Config.rc.templates/ geoschem_config.yml.templates/ HEMCO_Diagn.rc.templates/ getRunInfo* HISTORY.rc.templates/ gitignore README init_rd.sh* archiveRun.sh* runScriptSamples/ createRunDir.sh*
You can see several folders (highlighted in the directory display with
/
) and a few executable scripts (highlighted with*
). The script we are interested in iscreateRunDir.sh
.
Run the
createRunDir.sh
script. Type:$ ./createRunDir.sh
You will then be prompted to supply information about the run directory that you wish to create:
=========================================================== GEOS-CHEM RUN DIRECTORY CREATION =========================================================== ----------------------------------------------------------- Choose simulation type: ----------------------------------------------------------- 1. Full chemistry 2. Aerosols only 3. CH4 4. CO2 5. Hg 6. POPs 7. Tagged CH4 8. Tagged CO 9. Tagged O3 10. TransportTracers 11. Trace metals 12. Carbon >>>
To create a run directory for the full-chemistry simulation, type 1 followed by the ENTER key.
Tip
To exit, the run directory creation process, type
Ctrl-C
at any prompt.You will then be asked to specify any additional options for the full-chemistry simulation (such as adding the RRTMG radiative transfer model, APM or TOMAS microphysics, etc.)
----------------------------------------------------------- Choose additional simulation option: ----------------------------------------------------------- 1. Standard 2. Benchmark 3. Complex SOA 4. Marine POA 5. Acid uptake on dust 6. TOMAS 7. APM 8. RRTMG >>>
For the standard full-chemistry simulation, type 1 followed by ENTER.
To add an option to the full-chemistry simulation, type a number between 2 and 8 and press ENTER.
You will then be asked to specify the meteorology type for the simulation (GEOS-FP, MERRA-2), or GCAP 2.0):
----------------------------------------------------------- Choose meteorology source: ----------------------------------------------------------- 1. MERRA-2 (Recommended) 2. GEOS-FP 3. GISS ModelE2.1 (GCAP 2.0) >>>
You should use the recommended option (MERRA-2) if possible. Type 1 followed by ENTER.
The next menu will prompt you for the horizontal resolution that you wish to use:
----------------------------------------------------------- Choose horizontal resolution: ----------------------------------------------------------- 1. 4.0 x 5.0 2. 2.0 x 2.5 3. 0.5 x 0.625 >>>
If you wish to set up a global simulation, type either 1 or 2 followed by ENTER.
If you wish to set up a nested-grid simulation, type 3 and hit ENTER. Then you will be followed by a nested-grid menu:
----------------------------------------------------------- Choose horizontal grid domain: ----------------------------------------------------------- 1. Global 2. Asia 3. Europe 4. North America 5. Custom >>>
Select your preferred horizontal domain, followed by ENTER.
You will then be prompted for the vertical dimension of the grid.
----------------------------------------------------------- Choose number of levels: ----------------------------------------------------------- 1. 72 (native) 2. 47 (reduced) >>>
For most simulations, you will want to use 72 levels. Type 1 followed by ENTER.
For some memory-intensive simulations (such as nested-grid simulations), you can use 47 levels. Type 2 followed by ENTER.
You will then be prompted for the folder in which you wish to create the run directory.
----------------------------------------------------------- Enter path where the run directory will be created: ----------------------------------------------------------- >>>
You may enter an absolute path (such as
$HOME/myusername/
followed by ENTER).You may also enter a relative path (such as
~/rundirs
followed by ENTER). In this case you will see that the./createRunDir.sh
script will expand the path to:Expanding to: /n/home09/myusername/rundirs |br| |br|
The next menu will prompt you for the run directory name.
----------------------------------------------------------- Enter run directory name, or press return to use default: NOTE: This will be a subfolder of the path you entered above. ----------------------------------------------------------- >>>
You should use the default run directory name whenever possible. Type ENTER to select the default.
The script will display the following output:
-- Using default directory name gc_4x5_merra2_fullchem
or if you are creating a nested grid simulation:
-- Using default directory name gc_05x0625_merra2_fullchem
and then:
-- This run directory has been set up for 20190701 - 20190801. You may modify these settings in geoschem_config.yml. -- The default frequency and duration of diagnostics is set to monthly. You may modify these settings in HISTORY.rc and HEMCO_Config.rc.
The last menu will prompt you with:
----------------------------------------------------------- Do you want to track run directory changes with git? (y/n) -----------------------------------------------------------
Type y and then ENTER. Then you will be able to track changes that you make to GEOS-Chem configuration files with Git. This can be a lifesaver when debugging – you can revert to an earlier state and then start fresh.
The script will display the full path to the run directory. You can navigate there and then start editing the GEOS-Chem configuration files.
Example: Create a CH4 simulation run directory
The process of creating run directories for the GEOS-Chem specialty simulations is similar to that as listed in Example 1 above. However, the number of menus that you need to select from will likely be fewer than for the full-chemistry simulation. We’ll use the methane simulation as an example.
Navigate to the
GCClassic
superproject folder and get a directory listing:$ cd /path/to/your/GCClassic $ ls -CF
You should see this output:
AUTHORS.txt CMakeScripts/ LICENSE.txt SUPPORT.md run@ test@ CMakeLists.txt CONTRIBUTING.md README.md docs/ src/
As mentioned previously,
run@
is a symbolic link. It actually points to the to thesrc/GEOS-Chem/run/GCClassic
folder. This folder contains several scripts and template files for run directory creation.
Navigate to the run folder and get a directory listing:
$ cd run $ ls -CF
and you should see this output:
HEMCO_Config.rc.templates/ geoschem_config.yml.templates/ HEMCO_Diagn.rc.templates/ getRunInfo* HISTORY.rc.templates/ gitignore README init_rd.sh* archiveRun.sh* runScriptSamples/ createRunDir.sh*
You can see several folders (highlighted in the directory display with
/
) and a few executable scripts (highlighted with*
). The script we are interested in iscreateRunDir.sh
.
Run the createRunDir.sh script.. Type:
$ ./createRunDir.sh
You will then be prompted to supply information about the run directory that you wish to create:
=========================================================== GEOS-CHEM RUN DIRECTORY CREATION =========================================================== ----------------------------------------------------------- Choose simulation type: ----------------------------------------------------------- 1. Full chemistry 2. Aerosols only 3. CH4 4. CO2 5. Hg 6. POPs 7. Tagged CH4 8. Tagged CO 9. Tagged O3 10. TransportTracers 11. Trace metals 12. Carbon >>>
To select the GEOS-Chem methane specialty simulation, type 3 followed by ENTER.
Tip
To exit, the run directory creation process, type
Ctrl-C
at any prompt.You will then be asked to specify the meteorology type for the simulation (GEOS-FP, MERRA-2), or GCAP 2.0):
----------------------------------------------------------- Choose meteorology source: ----------------------------------------------------------- 1. MERRA-2 (Recommended) 2. GEOS-FP 3. GISS ModelE2.1 (GCAP 2.0) >>>
To accept the recommended meteorology (MERRA-2), type 1 followed by ENTER.
The next menu will prompt you for the horizontal resolution that you wish to use:
----------------------------------------------------------- Choose horizontal resolution: ----------------------------------------------------------- 1. 4.0 x 5.0 2. 2.0 x 2.5 3. 0.5 x 0.625 >>>
If you wish to set up a global simulation, type either 1 or 2 followed by ENTER.
If you wish to set up a nested-grid simulation, type 3 and hit ENTER. Then you will be followed by a nested-grid menu:
----------------------------------------------------------- Choose horizontal grid domain: ----------------------------------------------------------- 1. Global 2. Asia 3. Europe 4. North America 5. Custom >>>
Type the number of your preferred option and then hit ENTER.
You will then be prompted for the vertical dimension of the grid.
----------------------------------------------------------- Choose number of levels: ----------------------------------------------------------- 1. 72 (native) 2. 47 (reduced) >>>
For most simulations, you will want to use 72 levels. Type 1 followed by ENTER.
For some memory-intensive simulations (such as nested-grid simulations), you can use 47 levels. Type 2 followed by ENTER.
You will then be prompted for the folder in which you wish to create the run directory.
----------------------------------------------------------- Enter path where the run directory will be created: ----------------------------------------------------------- >>>
You may enter an absolute path (such as
$HOME/myusername/
followed by ENTER).You may also enter a relative path (such as
~/rundirs
followed by ENTER). In this case you will see that the./createRunDir.sh
script will expand the path to:Expanding to: /n/home09/myusername/rundirs
The next menu will prompt you for the run directory name.
----------------------------------------------------------- Enter run directory name, or press return to use default: NOTE: This will be a subfolder of the path you entered above. ----------------------------------------------------------- >>>
You should use the default run directory name whenever possible. Type ENTER. The script will display the following output:
-- Using default directory name gc_4x5_merra2_CH4
or if you are creating a nested grid simulation:
-- Using default directory name gc_05x0625_merra2_CH4
and then
-- This run directory has been set up for 20190701 - 20190801. You may modify these settings in geoschem_config.yml. -- The default frequency and duration of diagnostics is set to monthly. You may modify these settings in HISTORY.rc and HEMCO_Config.rc.
The last menu will prompt you with:
----------------------------------------------------------- Do you want to track run directory changes with git? (y/n) ----------------------------------------------------------- >>>
Type y and then ENTER. Then you will be able to track changes that you make to GEOS-Chem configuration files with Git. This can be a lifesaver when debugging – you can revert to an earlier state and then start fresh.
The script will display the full path to the run directory. You can navigate there and then start editing the GEOS-Chem configuration files.
The procedure to set up run directories for other GEOS-Chem Classic simulations is similar to that shown above.
Run directory files and folders
Each GEOS-Chem Classic run directory that you create will contain the files and folders listed below. The GEOS-Chem and HEMCO configuration files in the run directory will be appropriate to the type of simulation that you have selected.
- archiveRun.sh
This script can be used to create an archive of the run directory. Run this script with:
$ ./archiveRun.sh directory-name
Where
directory-name
is the name of the archive folder. This can be either a relative path or an absolute path.
- build/
This is a blank directory where you can direct CMake to configure and build the GEOS-Chem source code.
- build_info/
This folder is created when you compile GEOS-Chem. It contains information about the options that were passed to CMake during the configuration and build process.
- cleanRunDir.sh
Typing
$ ./cleanRunDir.sh
will remove log files and diagnostic output files left over from a previous GEOS-Chem simulation.
- CodeDir
Symbolic link to the top-level source code folder (i.e. the
GCClassic
superproject folder).
- CreateRunDirLogs/rundir_vars.txt
Log file containing environment variable settings used in run directory creation. Running the
init_rd.sh
script on this file will create a duplicate run directory.
- download_data.py
Use this Python script to download data from one of the GEOS-Chem data portals to your disk space. See our Download data with a dry-run simulation chapter for more information.
- download_data.yml
Configuration file for
download_data.py
.
- geoschem_config.yml
The main GEOS-Chem configuration file (see Configure your simulation).
- getRunInfo
This file is now deprecated and will be removed in a future version.
- HEMCO_Config.rc
The main HEMCO configuration file (see Configure your simulation).
- HEMCO_Config.rc.gmao_metfields
HEMCO configuration file snippet containing entries for reading the GMAO meteorological fields. This file will only be present if you are using GEOS-FP or MERRA-2 meteorology to drive your GEOS-Chem simulation.
- HEMCO_Config.rc.gcap2_metfields
HEMCO configuration file snippet containing entries for reading the GCAP2 meteorological fields. This file will only be present if you are using GCAP2 meteorology to drive your GEOS-Chem simulation.
- HEMCO_Diagn.rc
Configuration file for HEMCO diagnostics (see Configure your simulation).
- HISTORY.rc
Configuration file for GEOS-Chem History diagnostics (see Configure your simulation).
- metrics.py
This Python script can be used to print the OH metrics for a full-chemistry simulation. Typing:
$ ./metrics.py
will generate output such as:
============================================================================== GEOS-Chem FULL-CHEMISTRY SIMULATION METRICS Simulation start : 2019-07-01 00:00:00z Simulation end : 2019-07-01 01:00:00z ============================================================================== Mass-weighted mean OH concentration = 10.04682154969 x 10^5 molec cm-3 CH3CCl3 lifetime w/r/t tropospheric OH = 6.3189 years CH4 lifetime w/r/t tropospheric OH = 10.6590 years
- OutputDir/
Blank directory where GEOS-Chem diagnostic output files will be created.
- README.md
README file (in Markdown format) with containing links to information about GEOS-Chem.
- Restarts/
Directory where GEOS-Chem restart files will be created.
- Restarts/GEOSChem.Restart.YYYYMMDD_hhmmzz.nc4
Restart file containing initial conditions for the GEOS-Chem simulation.
Attention
The restart file that is created when you generate a run directory should not be used to start a production simulation. We recommend that you “spin up” your simulation for at least 6 months to a year in order to remove the signature of the initial conditions.
- runScriptSamples
Symbolic link to the folder in the GEOS-Chem “Science Codebase”” repository that contains sample scripts for running GEOS-Chem.
- species_database.yml
YAML file containing metadata (e.g. molecular weight, Henry’s law constants, wetdep and drydep parameters, etc.) for each species used in the various GEOS-Chem simulations. You should not have to edit this file unless you are adding new species to your GEOS-Chem simulation. The species_database.yml file will be discussed in more detail in a following section.
Compile the source code
In this chapter, we will describe how you can compile GEOS-Chem Classic. Compiling creates an executable file that you can run on your computer system.
The compilation process involves the following steps:
Configure with CMake
You should think of CMake as an interactive tool for configuring GEOS-Chem Classic’s build. For example, compile-time options like disabling multithreading and turning on components (e.g. APM, RRTMG) are all configured with CMake commands.
Besides configuring GEOS-Chem’s build, CMake also performs checks on your build environment to detect problems that would cause the build to fail. If it identifies a problem, like a missing dependency or mismatched run directory and source code version numbers, CMake will print an error message that describes the problem.
If you are new to CMake and would like a rundown of how to use the cmake command, check out Liam Bindle’s Cmake Tutorial. This tutorial is not necessary, but it will make you more familiar with using CMake and help you better understand what is going on.
Below are the steps for building GEOS-Chem with CMake.
Initialize the build directory
Next, we need to initialize the build directory. Type:
$ cmake ../CodeDir -DRUNDIR=..
where ../CodeDir
is the symbolic link from our run directory
to the GEOS-Chem source code directory. CMake will
generate output similar to this:
-- The Fortran compiler identification is GNU 11.2.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /n/home09/ryantosca/spack/var/spack/environments/gc-classic/.spack-env/view/bin/gfortran - skipped
-- Checking whether /n/home09/ryantosca/spack/var/spack/environments/gc-classic/.spack-env/view/bin/gfortran supports Fortran 90
-- Checking whether /n/home09/ryantosca/spack/var/spack/environments/gc-classic/.spack-env/view/bin/gfortran supports Fortran 90 - yes
=================================================================
GCClassic X.Y.Z (superproject wrapper)
Current status: X.Y.Z
=================================================================
-- Found NetCDF: /n/home09/ryantosca/spack/opt/spack/linux-centos7-x86_64/gcc-8.3.0/netcdf-fortran-4.5.3-tb3oqspkitgcbkcyp623tdq2al6gxmom/lib/libnetcdff.so
-- Useful CMake variables:
+ CMAKE_PREFIX_PATH: /path/to/netcdf-c /path/to/netcdf-fortran
+ CMAKE_BUILD_TYPE: Release
-- Run directory setup:
+ RUNDIR: /n/holyscratch01/jacob_lab/ryantosca/tests/test/test_cc
-- Threading:
* OMP: **ON** OFF
-- Found OpenMP_Fortran: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- General settings:
* MECH: **fullchem** carbon Hg custom
* BPCH_DIAG: **ON** OFF
* USE_REAL8: **ON** OFF
* SANITIZE: ON **OFF**
-- Components:
* TOMAS: ON **OFF**
* TOMAS_BINS: **NA** 15 40
* APM: ON **OFF**
* RRTMG: ON **OFF**
* GTMM: ON **OFF**
* HCOSA: ON **OFF**
* LUO_WETDEP: ON **OFF**
* FASTJX: ON **OFF**
=================================================================
HEMCO A.B.C
Current status: A.B.C
=================================================================
=================================================================
GEOS-Chem T.U.V (science codebase)
Current status: T.U.V
=================================================================
Creating /n/holyscratch01/jacob_lab/ryantosca/tests/test/test_cc/CodeDir/src/GEOS-Chem/Interfaces/GCClassic/gc_classic_version.H
-- Configuring done
-- Generating done
-- Build files have been written to: /n/holyscratch01/jacob_lab/ryantosca/tests/test/test_cc/build
Your CMake command’s output contains important information about your build’s configuration.
Note
The text X.Y.Z
, A.B.C
, and T.U.V
refer to the version numbers (in semantic versioning style) of the GCClassic, HEMCO, and
GEOS-Chem “science codebase” repositories.
Configure your build with extra options
Your build directory is now configured to compile GEOS-Chem using all default options. If you do not wish to change anything further, you may skip ahead to the next section.
However, if you wish to modify your build’s configuration, simply invoke CMake once more with optional parameters. Use this format:
$ cmake . -DOPTION=value
Note that the .
argument is necessary. It tells CMake that your
current working directory (i.e. .
) is your build directory. The output
of cmake tells you about your build’s configuration. Options are
prefixed by a +
or \*
in the output, and their values are
displayed or highlighted.
Tip
If you are colorblind or if you are using a terminal that does not support colors, refer to the CMake FAQ for instructions on disabling colorized output. For a detailed explanation of CMake output, see the next section.
The table below contains the list of GEOS-Chem build options that you can pass to CMake. GEOS-Chem will be compiled with the default build options, unless you explicitly specify otherwise.
- RUNDIR
Defines the path to the run directory.
In this example, our build directory is a subfolder of the run directory, so we can use
-DRUNDIR=..
. If your build directory is somewhere else, then specify the path to the run directory as an absolute path.
- CMAKE_BUILD_TYPE
Specifies the type of build. Accepted values are:
- Release
Tells CMake to configure GEOS-Chem in Release mode. This means that all optimizations will be applied and all debugging options will be disabled. (Default option).
- Debug
Turns on several runtime error checks. This will make it easier to find errors but will adversely impact performance. Only use this option if you are actively debugging.
- MECH
Specifies the chemical mechanism that you wish to use:
- fullchem
Activates the fullchem mechanism. The source code files that define this mechanism are stored in
KPP/fullchem
. (Default option)
- Hg
Activates the Hg mechanism. The source code files that define this mechanism are stored in
KPP/Hg
.
- carbon
Activates the carbon mechanism (CH4-CO-CO2-OCS). The source code files that define this mechanism are stored in
KPP/carbon
.
- custom
Activates a custom mechanism defined by the user. The source code files that define this mechanism are stored in
KPP/custom
.
- OMP
Determines if GEOS-Chem Classic will activate OpenMP parallelization. Accepted values are:
- y
Activates OpenMP parallelization. (Default option)
GEOS-Chem Classic will execute on as many computational cores as is specified with
OMP_NUM_THREADS
.
- n
Deactivates OpenMP parallelization. GEOS-Chem Classic will execute on a single computational core. Useful for debugging.
- TOMAS
Configure GEOS-Chem with the TOMAS aerosol microphysics package. Accepted values are:
- y
Activate TOMAS microphysics.
- n
Deactivate TOMAS microphysics (Default option)
- TOMAS_BINS
Specifies the number of size-resolved bins for TOMAS. Accepted values are:
- 15
Use 15 size-resolved bins with TOMAS simulations.
- 40
Use 40 size-resolved bins with TOMAS simulations.
- BPCH_DIAG
Toggles the legacy binary punch diagnostics on.
Attention
This option is deprecated and will be removed soon. Most binary-punch format diagnostics have been replaced by netCDF-based History diagnostics.
Accepted values are:
- y
Activate legacy binary-punch diagnostics.
- n
Deactivate legacy binary-punch diagnostics. (Default option)
- APM
Configures GEOS-Chem to use the APM microphysics package. Accepted values are:
- y
Activate APM microphysics.
- n
Deactivate APM microphysics. (Default option)
- RRTMG
Configures GEOS-Chem to use the RRTMG radiative transfer model. Accepted values are:
- y
Activates the RRTMG radiative transfer model.
- n
Deactivates the RRTMG radiative transfer model. (Default option)
- LUO_WETDEP
Configures GEOS-Chem to use the Luo et al., 2020 wet deposition scheme.
Note
The Luo et al 2020 wet deposition scheme will eventually become the default wet deposition schem in GEOS-Chem. We have made it an option for the time being while further evaluation is being done.
Accepted values are:
- y
Activates the Luo et al., 2020 wet deposition scheme.
- n
Deactivates the Luo et al., 2020 wet deposition scheme. (Default option)
- FASTJX
Configures GEOS-Chem to use the legacy FAST-JX v7.0 photolysis mechanism instead of its successor Cloud-J.
Note
We recommend using FAST-JX for the mercury simulation instead of Cloud-J. Further work is needed to make the mercury simulation compatible with Cloud-J. Once that work is completed the legacy FAST-JX option will be deleted from the model.
Accepted values are:
- y
Uses the legacy FAST-JX v7.0 photolysis scheme rather than Cloud-J.
- n
Uses the Cloud-J photolyis scheme rather than legacy FAST-JX. (Default option)
- SANITIZE
Activates the AddressSanitizer/LeakSanitizer functionality in GNU Fortran to identify memory leaks. Accepted values are:
- y
Activates AddressSanitizer/LeakSanitizer
- n
Deactivates AddressSanitizer/LeakSanitizer (Default option).
If you plan to use the make -j install option (recommended) to copy your executable to your run directory, you must reconfigure CMake with the RUNDIR=/path/to/run/dir option. Multiple run directories can be specified by a semicolon separated list. A warning is issues if one of these directories does not look like a run directory. These paths can be relative paths or absolute paths. Relative paths are interpreted as relative to your build directory. For example:
$ cmake . -DRUNDIR=/path/to/run/dir
For example if you wanted to build GEOS-Chem with all debugging flags on, you would type:
$ cmake . -DCMAKE_BUILD_TYPE=Debug
or if you wanted to turn off OpenMP parallelization (so that GEOS-Chem executes only on one computational core), you would type:
$ cmake . -DOMP=n
etc.
Understand CMake output
As you can see from the example CMake output listed above, GEOS-Chem Classic contains code from 3 independent repositories:
=================================================================
GCClassic X.Y.Z (superproject wrapper)
Current status: X.Y.Z
=================================================================
where X.Y.Z
specifies the GEOS-Chem Classic “major”,
“minor”, and “patch” version numbers.
Note
If you are cloning GEOS-Chem Classic between official releases, you
may the see Current status
reported like this:
X.Y.Z-alpha.n-C-gabcd1234.dirty or
X.Y.Z.rc.n-C.gabcd1234.dirty
We will explain these formats below.
=================================================================
HEMCO A.B.C
Current status: A.B.C
=================================================================
where A.B.C
specifies the HEMCO “major”, “minor”, and
“patch” version numbers. The HEMCO version number differs from
GEOS-Chem because it is kept in a separate repository, and is
considered a separate package.
=================================================================
GEOS-Chem X.Y.Z (science codebase)
Current status: X.Y.Z
=================================================================
The GEOS-Chem science codebase and GEOS-Chem Classic wrapper will always share the same version number.
During the build configuration stage, CMake will display the version
number (e.g. X.Y.Z
) as well as the current status of the Git
repository (e.g. TAG-C-gabcd1234.dirty
) for GCClassic,
GEOS-Chem, and HEMCO.
Let’s take the Git repository status of GCClassic as our example. The status string uses the same format as the git describe --tags command, namely:
TAG-C-gabcd1234.dirty
where
- TAG
Indicates the most recent tag in the GCClassic superproject repository.
Tags may use the following notations:
X.Y.Z
: Denotes an official releaseX.Y.Z-rc.n
: Denotes a release candidateX.Y.Z-alpha.n
: Denotes an internal “alpha” benchmark
where
n
is the number of the release candidate or alpha benchmark (starting from 0).
- g
Indicates that the version control system is Git.
- abcd1234
Indicates the Git commit hash. This is an alphanumeric string that denotes the commit at the
HEAD
of the GCClassic repository.
- .dirty
If present, indicates that there are uncommitted updates atop the
abcd1234
commit in the GCClassic repository.
Under each header are printed the various options that have been selected.
Compile with Make
Now that CMake has created the Makefiles that are needed to compile GEOS-Chem, you may proceed as follows:
Build the GEOS-Chem Classic executable
Use the make command to build the GEOS-Chem executable. Type:
$ make -j
You will see output similar to this:
Scanning dependencies of target HeadersHco
Scanning dependencies of target Isorropia
Scanning dependencies of target KPP_FirstPass
[ 1%] Building Fortran object src/HEMCO/src/Shared/Headers/CMakeFiles/HeadersHco.dir/hco_inquireMod.F90.o
[ 1%] Building Fortran object src/HEMCO/src/Shared/Headers/CMakeFiles/HeadersHco.dir/hco_precision_mod.F90.o
[ 1%] Building Fortran object src/HEMCO/src/Shared/Headers/CMakeFiles/HeadersHco.dir/hco_charpak_mod.F90.o
[ 3%] Building Fortran object src/GEOS-Chem/KPP/fullchem/CMakeFiles/KPP_FirstPass.dir/gckpp_Monitor.F90.o
[ 3%] Building Fortran object src/GEOS-Chem/KPP/fullchem/CMakeFiles/KPP_FirstPass.dir/gckpp_Precision.F90.o
[ 3%] Building Fortran object src/GEOS-Chem/KPP/fullchem/CMakeFiles/KPP_FirstPass.dir/gckpp_Parameters.F90.o
[ 3%] Linking Fortran static library libKPP_FirstPass.a
[ 3%] Built target KPP_FirstPass
Scanning dependencies of target Headers
[ 3%] Building Fortran object src/GEOS-Chem/ISORROPIA/CMakeFiles/Isorropia.dir/isorropiaII_main_mod.F.o
[ 3%] Building Fortran object src/GEOS-Chem/Headers/CMakeFiles/Headers.dir/charpak_mod.F90.o
[ 3%] Building Fortran object src/GEOS-Chem/Headers/CMakeFiles/Headers.dir/dictionary_m.F90.o
[ 3%] Building Fortran object src/GEOS-Chem/Headers/CMakeFiles/Headers.dir/CMN_SIZE_mod.F90.o
[ 3%] Building Fortran object src/GEOS-Chem/Headers/CMakeFiles/Headers.dir/qfyaml_mod.F90.o
[ 4%] Building Fortran object src/GEOS-Chem/Headers/CMakeFiles/Headers.dir/CMN_O3_mod.F90.o
[ 6%] Building Fortran object src/GEOS-Chem/Headers/CMakeFiles/Headers.dir/inquireMod.F90.o
... etc ...
[ 93%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/sulfate_mod.F90.o
[ 93%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/fullchem_mod.F90.o
[ 93%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/mixing_mod.F90.o
[ 93%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/carbon_mod.F90.o
[ 95%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/chemistry_mod.F90.o
[ 95%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/gc_environment_mod.F90.o
[ 96%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/emissions_mod.F90.o
[ 96%] Building Fortran object src/GEOS-Chem/GeosCore/CMakeFiles/GeosCore.dir/cleanup.F90.o
[ 98%] Linking Fortran static library libGeosCore.a
[ 98%] Built target GeosCore
Scanning dependencies of target gcclassic
[ 98%] Building Fortran object src/CMakeFiles/gcclassic.dir/GEOS-Chem/Interfaces/GCClassic/main.F90.o
[100%] Linking Fortran executable ../bin/gcclassic
[100%] Built target gcclassic
Tip
The -j argument tells make that it can execute as many jobs as it wants simultaneously. For example, if you have 8 cores, then the build process may attempt to compile 8 files at a time.
If you want to restrict the number of simultaneous jobs (e.g. you are compiling on a machine with limited memory), you can can use e.g. make -j4, which should only try to compile 4 files at a time.
Install the executable in your run directory
Now that the gcclassic
executable is built, install it to your
run directory with make install. For this to work properly,
you must tell CMake where to find your run directory by configuring
CMake with -DRUNDIR=/path/to/run/directory
as
described above. Type:
$ make install
and you will see output similar to this:
[ 1%] Built target HeadersHco
[ 3%] Built target KPP_FirstPass
[ 3%] Built target Isorropia
[ 4%] Built target JulDayHco
[ 13%] Built target Headers
[ 18%] Built target NcdfUtilHco
[ 19%] Built target JulDay
[ 19%] Built target GeosUtilHco
[ 25%] Built target NcdfUtil
[ 40%] Built target HCO
[ 46%] Built target GeosUtil
[ 56%] Built target HCOX
[ 59%] Built target Transport
[ 62%] Built target History
[ 63%] Built target ObsPack
[ 71%] Built target KPP
[ 71%] Built target HCOI_Shared
[ 98%] Built target GeosCore
[100%] Built target gcclassic
Install the project...
-- Install configuration: "Release"
-- Up-to-date: /home/ubuntu/gc_merra2_fullchem/build_info/CMakeCache.txt
-- Up-to-date: /home/ubuntu/gc_merra2_fullchem/build_info/summarize_build
-- Up-to-date: /home/ubuntu/gc_merra2_fullchem/gcclassic
Let’s now navigate back to the run directory and get a directory listing:
$ cd ..
$ ls
CodeDir@ cleanRunDir.sh*
GEOSChem.Restart.20190701_0000z.nc4 download_data.py*
HEMCO_Config.rc download_data.yml
HEMCO_Config.rc.gmao_metfields gcclassic*
HEMCO_Diagn.rc geoschem_config.yml
HISTORY.rc getRunInfo*
OutputDir/ metrics.py*
README runScriptSamples@
archiveRun.sh* rundirConfig/
build/ species_database.yml
build_info/
You should now see the gcclassic executable and a build_info
directory there. GEOS-Chem has now been configured, compiled, and
installed in your run directory.
Please see the Run directory files and folders section for more information about the contents of the run directory.
You are now ready to run a GEOS-Chem simulation!
Remove compiler-generated files when no longer needed
In older versions of GEOS-Chem, you could use a GNU Make command such
as make clean or make realclean to remove all
object (.o
), library (.a
), module (.mod
)
files, as well as the previously-built executable file from the
GEOS-Chem source code folder.
All of the files created by Cmake during the configuration and
compilation stages are placed in the build/
folder in your run
directory (or in the location that you have specified with the
-DRUNDIR=/path/to/run/dir
option.). Therefore, if you
wish to build the GEOS-Chem Classic executable from
scratch, all you have to do is to remove all of the files from the
build folder. It’s as simple as that!
You can also create a new build folder with this command:
$ mv build was.build
$ mkdir build
and then later on, you can remove the old build folder:
$ rm -rf was.build
This avoids the temptation to use rm -rf *, which can potentially wipe out all of your files if used incorrectly.
Get a summary of compilation options
The compilation process will create a folder in your run directory
named build_info
. Navigate into this folder and get a
directory listing:
$ cd build_info
$ ls -CF
CMakeCache.txt summarize_build*
CMakeCache.txt
contains the CMake cache, which is a complete
listing of all compilation settings. summarize_build
is a
script that will print the most important of these CMake cache
settings.
If you run summarize_build
:
$ ./summarize_build
You will get output similar to this:
## Compiler Info
# Family: GNU
# Version: 11.2.0
# Which: /path/to/gfortran
## Compiler Options (global)
-DCMAKE_Fortran_FLAGS=""
-DCMAKE_Fortran_FLAGS_DEBUG="-g"
## Compiler Options (GEOS-Chem)
-DGEOSChem_Fortran_FLAGS_GNU="-g;-cpp;-w;-std=legacy;-fautomatic;-fno-align-commons;-fconvert=big-endian;-fno-range-check;-mcmodel=medium;-fbacktrace;-g;-DLINUX_GFORTRAN;-ffree-line-length-none"
-DGEOSChem_Fortran_FLAGS_DEBUG_GNU="-O0;-Wall;-Wextra;-Wconversion;-Warray-temporaries;-fcheck=array-temps;-ffpe-trap=invalid,zero,overflow;-finit-real=snan;-fcheck=bounds;-fcheck=pointer"
## Compiler Options (HEMCO)
-DHEMCO_Fortran_FLAGS_GNU="-cpp;-w;-std=legacy;-fautomatic;-fno-align-commons;-fconvert=big-endian;-fno-range-check;-mcmodel=medium;-fbacktrace;-g;-DLINUX_GFORTRAN;-ffree-line-length-none"
-DHEMCO_Fortran_FLAGS_DEBUG_GNU="-g;-gdwarf-2;-gstrict-dwarf;-O0;-Wall;-Wextra;-Wconversion;-Warray-temporaries;-fcheck=array-temps;-ffpe-trap=invalid,zero,overflow;-finit-real=snan;-fcheck=bounds;-fcheck=pointer;-fcheck=no-recursion"
## GEOS-Chem Components Settings
-DTOMAS="OFF"
-DTOMAS_BINS="NA"
-DAPM="OFF"
-DRRTMG="OFF"
-DGTMM="OFF"
-DHCOSA="OFF"
-DLUO_WETDEP="OFF"
-DFASTJX="OFF"
Here you can see the compiler flags that were used as well as the options that were selected.
Configure your simulation
Note
We recommend that you configure your simulation before downloading data files. You can use the configuration settings with a dry-run simulation to download only the data that you will need.
You will need to edit various configuration files in order to specify options for your GEOS-Chem Classic simulation. These are described below.
Commonly-updated configuration files
When starting a new GEOS-Chem Classic simulation, you will usually edit most (if not all) of these configuration files:
geoschem_config.yml
Starting with GEOS-Chem 14.0.0, the input.geos
configuration
file (plain text) has been replaced with by the
geoschem_config.yml
file. This file is in YAML format, which is a text-based markup syntax used
for representing dictionary-like data structures.
Note
The geoschem_config.yml
file contains several sections. Only
the sections relevant to a given type of simulation are present.
For example, fullchem
simulation options (such as aerosol
settings and photolysis settings) are omitted from the
geoschem_config.yml
file for the CH4
simulation.
Simulation settings
#============================================================================
# Simulation settings
#============================================================================
simulation:
name: fullchem
start_date: [20190701, 000000]
end_date: [20190801, 000000]
root_data_dir: /path/to/ExtData
met_field: MERRA2
species_database_file: ./species_database.yml
species_metadata_output_file: OutputDir/geoschem_species_metadata.yml
verbose:
activate: false
on_cores: root # Allowed values: root all
use_gcclassic_timers: false
The simulation
section contains general simulation options:
- name
Specifies the type of GEOS-Chem simulation. Accepted values are
- fullchem
Full-chemistry simulation.
- aerosol
- carbon
Coupled carbon gases simulation (CH4-CO-CO2-OCS), implemented as a KPP mechanism (cf Bukosa et al. [2023]).
You must configure your build with
-DMECH=carbon
in order to use this simulation.
- Hg
-
You must configure your build with
-DMECH=Hg
in order to use this simulation.
- POPs
Persistent organic pollutants (aka POPs) simulation.
Attention
The POPs simulation is currently stale. We look to members of the GEOS-Chem user community take the lead on updating this simulation.
- tagCH4
Methane simulation with species tagged by geographic region or other criteria.
This simulation will eventually be superseded by the
carbon
simulation.
- tagCO
Carbon dioxide simulation, with species tagged by geographic region and other criteria.
This simulation will eventually be superseded by the
carbon
simulation.
- tagO3
Ozone simulation (using specified production and loss rates), with species tagged by geographical region.
- TransportTracers
Transport Tracers simulation, with both radionuclide and passive_species. Useful for evaluating model transport.
- metals
Trace metals simulation
- start_date
Specifies the starting date and time of the simulation in list notation
[YYYYMMDD, hhmmss]
.
- end_date
Specifies the ending date and time of the simulation in list notation
[YYYYMMDD, hhmmss]
.
- root_data_dir
Path to the root data directory. All of the data that GEOS-Chem Classic reads must be located in subfolders of this directory.
- met_field
Name of the meteorology product that will be used to drive GEOS-Chem Classic. Accepted values are:
- MERRA2
The MERRA-2 meteorology product from NASA/GMAO. MERRA-2 is a stable reanalysis product, and extends from approximately 1980 to present. (Recommended option)
- GEOS-FP
The GEOS-FP meteorology product from NASA/GMAO. GEOS-FP is an operational data product and, unlike MERRA-2, periodically receives science updates.
- GCAP2
The GCAP-2 meteorology product, archived from the GISS-2 GCM. GCAP-2 has hundreds of years of data available, making it useful for simulations of historical climate.
- species_database_file
Path to the GEOS-Chem Species Database file. This is stored in the run directory file
./species_database.yml
. You should not have to edit this setting.
- species_metadata_output_file
Path to the
geoschem-species-metadata.yml
file. This file contains echoback of information from species_database.yml, but only for species that are defined in this simulation (instead of all possible species). This facilitates interfacing GEOS-Chem with external models such as CESM.
- verbose:
Menu controlling verbose printout. Starting with GEOS-Chem 14.2.0 and HEMCO 3.7.0, most informational printouts are now deactivated by default. You may choose to activate them (e.g. for debugging and/or testing) with the options below:
- activate
Activates (
true
) or deactivates (false
) printing extra informational printout to the screen and/or log file.
- on_cores:
Specify on which computational cores informational printout should be done.
- root
Print extra informational output only on the root core. Use this setting for GEOS-Chem Classic.
- all
Print extra informational output on all cores. Consider using this when using GEOS-Chem as GCHP, or in MPI-based external models (NASA GEOS, CESM, etc.).
- use_gcclassic_timers
Activates (
true
) or deactivates (false
) the GEOS-Chem Classic timers. If activated, information about how long each component of GEOS-Chem took to execute will be printed to the screen and/or GEOS-Chem log file. The same information will also be written in JSON format to a file named gcclassic_timers.json.You can set this option to
false
unless you are running benchmark or timing simulations.
Grid settings
#============================================================================
# Grid settings
#============================================================================
grid:
resolution: 4.0x5.0
number_of_levels: 72
longitude:
range: [-180.0, 180.0]
center_at_180: true
latitude:
range: [-90.0, 90.0]
half_size_polar_boxes: true
nested_grid_simulation:
activate: true
buffer_zone_NSEW: [0, 0, 0, 0]
The grid
section contains settings that define the grid used
by GEOS-Chem Classic:
- resolution
Specifies the horizontal resolution of the grid. Accepted values are:
- 4.0x5.0
The global \(4^{\circ}{\times}5^{\circ}\) GEOS-Chem Classic grid.
- 2.0x2.5
The global \(2.0{\circ}{\times}2.5^{\circ}\) GEOS-Chem Classic grid.
- number_of_levels
Number of vertical levels to use in the simulation. Accepted values are:
- longitude
Settings that define the longitude dimension of the grid. There are two sub-options:
- range
The minimum and maximum longitude values (grid box centers), specified in list format.
- center_at_180
If
true
, then westernmost grid boxes are centered at \(-180^{\circ}\) longitude (the International Date Line). This is true for bothMERRA2
andGEOS-FP
.If
false
, then the westernmost grid boxes have their westernmost edges at \(-180^{\circ}\) longitude. This is true for theGCAP2
grid.
- latitude
Settings to define the latitude dimension of the grid. There are two sub-options:
- range
The minimum and maximum latitude values (grid box centers), specified in list format.
- nested_grid_simulation
Settings for nested-grid simulations. There are two sub-options:
- activate
If
true
, this indicates that the simulation will use a sub-window of the horizontal grid.If
false
, this indicates that the simulation will use the entire global grid extent.
- buffer_zone_NSEW
Specifies the nested grid latitude offsets (# of grid boxes) in list format
[N-offset, S-offset, E-offset, W-offset]
. These offsets are used to define an inner window region in which transport is actually done (aka the “transport window”). This “transport window” is always smaller than the actual size of the nested grid region in order to properly account for the boundary conditions.
For global simulations, use:
[0, 0, 0, 0]
.For nested-grid simulations, we recommend using:
[3, 3, 3, 3]
.
Timesteps settings
#============================================================================
# Timesteps settings
#============================================================================
timesteps:
transport_timestep_in_s: 600
chemistry_timestep_in_s: 1200
radiation_timestep_in_s: 10800
The timesteps
section specifies the frequency at which
various GEOS-Chem operations occur:
- transport_timestep_in_s
Specifies the “heartbeat” timestep of GEOS-Chem.. This is the frequency at which transport, cloud convection, PBL mixing, and wet deposition will be done.
Recommended value for global simulations:
600
Recommended value for nested simluations:
300
or smaller
- chemistry_timestep_in_s
Specifies the frequency at which chemistry and emissions will be done.
Recommended value for global simulations
1200
Recommended value for nested simulations
600
or smaller
Operations settings
This section of geoschem_config.yml
is included for all
simulations. However, some of the options listed below will be omitted for
simulations that do not require them.
There are several sub-sections under operations
:
Chemistry
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
chemistry:
activate: true
linear_chemistry_aloft:
activate: true
use_linoz_for_O3: true
active_strat_H2O:
activate: true
use_static_bnd_cond: true
gamma_HO2: 0.2
autoreduce_solver:
activate: false
use_target_threshold:
activate: true
oh_tuning_factor: 0.00005
no2_tuning_factor: 0.0001
use_absolute_threshold:
scale_by_pressure: true
absolute_threshold: 100.0
keep_halogens_active: false
append_in_internal_timestep: false
# ... following sub-sections omitted ...
The operations:chemistry
section contains settings for chemistry:
- activate
Activates (
true
) or deactivates (false
) chemistry in GEOS-Chem.
- linear_chemistry_aloft
Determines how linearized chemistry will be applied in the stratosphere and/or mesosphere. (Only valid for
fullchem
simulations).There are two sub-options:
- activate
Activates (
true
) or deactivates (false
) linearized stratospheric chemistry in the stratosphere and/or mesosphere.
- use_linoz_for_O3
If
true
, Linoz stratospheric ozone chemistry will be used.If
false
, Synoz (i.e. a synthetic flux of ozone across the tropopause) will be used instead of Linoz.
- active_strat_H2O
Determines if water vapor as modeled by GEOS-Chem will be allowed to influence humidity fields. (Only valid for
fullchem
simulations)There are two sub-options:
- activate
Allows (
true
) or disallows (false
the H2O species in GEOS-Chem to influence specific humidity and relative humidity.
- use_static_bnd_cond
Allows (
true
) or diasallows (false
) a static boundary condition.TODO Clarify this
- gamma_HO2
Specifies \(\gamma\), the uptake coefficient for \(HO_2\) heterogeneous chemistry.
Recommended value:
0.2
.
- autoreduce_solver
Menu for controlling the adaptive mechanism auto-reduction feature, which is available in KPP 3.0.0. and later versions. See Lin et al. [2023] for details.
- activate
If
true
, the mechanism will be integrated using the Rosenbrock method with the adaptive auto-reduction feature.If
false
, the mechanism will be integrated using the traditional Rosenbrock method.Default value:
false
.
- use_target_threshold
Contains options for defining \(\partial\) (the partitioning threshold between “fast” and “slow” species”) by considering the production and loss of key species (OH for daytime, NO2 for nighttime).
- activate
Activates (
true
) or deactivates (false
) using OH and NO2 to determine \(\partial\).Default value:
true
.
- oh_tuning_factor
Specifies \({\alpha}_{OH}\), which is used to compute \(\partial\).
- no2 tuning factor
Specifies \({\alpha}_{NO2}\), which is used to compute \(\partial\).
- use_pressure_threshold
Contains options for setting an absolute threshold \(\partial\) that may be weighted by pressure.
- scale_by_pressure
Activates (
true
) or deactivates (false
) using a pressure-dependent method to determine \(\partial\).
- absolute_threshold
The absolute partitioning threshold \(\partial\).
If
scale_by_pressure
istrue,
anduse_target_threshold:activate
isfalse
, the value for \(\partial\) specified here will be scaled by the ratio \(P / P_{sfc}\). where \(P\) is the grid box pressure and \(P_{sfc}\) is the surface pressure for the column.
- keep_halogens_active
If
true
, then all halogen species will be considered “fast”. This may be necessary in order to obtain realistic results for ozone and other important species.If
false
, then halogen species will be determined as “slow” or “fast” depending on the partitioning threshold \(\partial\).Default value:
true
- append_in_internal_timestep
If
true
, any “slow” species that later become “fast” will be appended to the list of “fast” species.If
false
, any “slow” species that later become “fast” will NOT be appended to the list of “fast” species.Default value:
false
Convection
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
# .. preceding sub-sections omitted ...
convection:
activate: true
# ... following sub-sections omitted ...
The operations:convection section contains settings for cloud convection:
- activate
Activates (
true
) or deactivates (false
) cloud convection in GEOS-Chem.
Dry deposition
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
# .. preceding sub-sections omitted ...
dry_deposition:
activate: true
CO2_effect:
activate: false
CO2_level: 600.0
reference_CO2_level: 380.0
diag_alt_above_sfc_in_m: 10
# ... following sub-sections omitted ...
The operations:dry_deposition
section contains settings that
for dry deposition:
- activate
Activates (
true
) or deactivates (false
) dry deposition.
- CO2_effect
This sub-section contains options for applying the simple parameterization for the CO2 effect on stomatal resistance.
- activate
Activates (
true
) or deactivates (false
) the CO2 effect on stomatal resistance in dry deposition.Default value:
false
.
- CO2_level
Specifies the CO2 level (in ppb).
- reference_CO2_level
Specifies the reference CO2 level (in ppb).
- diag_alt_above_sfc_in_m:
Specifies the altitude above the surface (in m) to used with the ConcAboveSfc diagnostic collection.
PBL mixing
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
# .. preceding sub-sections omitted ...
pbl_mixing:
activate: true
use_non_local_pbl: true
# ... following sub-sections omitted ...
The operations:pbl_mixing
section contains settings that
for planetary boundary layer (PBL) mixing:
- activate
Activates (
true
) or deactivates (false
) planetary boundary layer mixing in GEOS-Chem Classic.
- use_non_local_pbl
If
true
, then the non-local PBL mixing scheme (VDIFF) will be used. (Default option)If
false
, then the full PBL mixing scheme (TURBDAY) will be used.
Photolysis
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
# .. preceding sub-sections omitted ...
photolysis:
activate: true
input_directories:
fastjx_input_dir: /path/to/ExtData/CHEM_INPUTS/FAST_JX/v2021-10/
cloudj_input_dir: /path/to/ExtData/CHEM_INPUTS/CLOUD_J/v2023-05/
overhead_O3:
use_online_O3_from_model: true
use_column_O3_from_met: true
use_TOMS_SBUV_O3: false
photolyze_nitrate_aerosol:
activate: true
NITs_Jscale_JHNO3: 0.0
NIT_Jscale_JHNO2: 0.0
percent_channel_A_HONO: 66.667
percent_channel_B_NO2: 33.333
# ... following sub-sections omitted ...
The operation:photolysis
section contains settings for photolysis.
This section only applies to fullchem, Hg, and aerosol-only simulations.
- activate
Activates (
true
) or deactivates (false
) photolysis.Attention
You should always keep photolysis turned on in your simulations. Disabling photolysis should only be done when debugging.
- input_directories
Specifies the location of directories containing photolysis configuration files.
- fastjx_input_dir
Specifies the path to the legacy FAST_JX configuration files containing information about species cross sections and quantum yields. Note that FAST-JX is off by default and Cloud-J is used instead. You can use legacy FAST-JX instead of Cloud-J by configuring with -DFASTJX=y during build.
- cloudj_input_dir
Specifies the path to the Cloud-J configuration files containing information about species cross sections and quantum yields.
- overhead_O3
This section contains settings that control which overhead ozone sources are used for photolysis
- use_online_O3_from_model
Activates (
true
) or deactivates (false
) using online O3 from GEOS-Chem in the extinction calculations for photolysis.Recommended value:
true
- use_column_O3_from_met
Activates (
true
) or deactivates (false
) using ozone columns (e.g. TO3) from the meteorology fields.Recommended value:
true
.
- use_TOMS_SBUV_O3
Activates (
true
) or deactivates (false
) using ozone columns from the TOMS-SBUV archive will be used.Recommended value:
false
.
- photolyze_nitrate_aerosol:
This section contains settings that control options for nitrate aerosol photolysis.
- activate
Activates (
true
) or deactivates (false
) nitrate aerosol photolysis.Recommended value:
true
.
- NITs_Jscale_JHNO3
Scale factor (percent) for JNO3 that photolyzes NITs aerosol.
- NIT_Jscale_JHNO2
Scale factor (percent) for JHNO2 that photolyzes NIT aerosol.
- percent_channel_A_HONO
Fraction of JNITs/JNIT in channel A (HNO2) for NITs photolysis.
- percent_channel_B_NO2
Fraction of JNITs/JNIT in channel B (NO2) for NITs photolysis.
RRTMG radiative transfer model
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
# .. preceding sub-sections omitted ...
rrtmg_rad_transfer_model:
activate: false
aod_wavelengths_in_nm:
- 550
longwave_fluxes: false
shortwave_fluxes: false
clear_sky_flux: false
all_sky_flux: false
# .. following sub-sections omitted ...
The operations:rrtmg_rad_transfer_model
section contains
settings for the RRTMG radiative transfer model:
This section only applies to fullchem
simultions.
- activate
Activates (
true
) or deactivates (false
) the RRTMG radiative transfer model.Default value:
false
.
- aod_wavelengths_in_nm
Specify wavelength(s) for the aerosol optical properties in nm (in YAML sequence format) Up to three wavelengths can be selected. The specified wavelengths are used for the photolysis mechanism (either legacy FAST-JX or Cloud-J) regardless of whether the RRTMG radiative transfer model is used.
- longwave_fluxes
Activates (
true
) or deactivates (false
) RRTMG longwave flux calculations.Default value:
false
.
- shortwave_fluxes
Activates (
true
) or deactivates (false
) RRTMG shortwave calculations.Default value:
false
.
- clear_sky_flux
Activates (
true
) or deactivates (false
) RRTMG clear-sky flux calculations.Default value:
false
.
- all_sky_flux
Activates (
true
) or deactivates (false
) RRTMG all-sky flux calculations.Default value:
false
.
Transport
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
# .. preceding sub-sections omitted ...
transport:
gcclassic_tpcore: # GEOS-Chem Classic only
activate: true # GEOS-Chem Classic only
fill_negative_values: true # GEOS-Chem Classic only
iord_jord_kord: [3, 3, 7] # GEOS-Chem Classic only
transported_species:
- ACET
- ACTA
- AERI
# ... etc more transported species ...
# .. following sub-sections omitted ...
The operations:transport
section contains
settings for species transport:
- gcclassic_tpcore
Contains options that control species transport in GEOS-Chem Classic with the TPCORE advection scheme:
- activate
Activates (
true
) or deactivates (false
) species transport in GEOS-Chem Classic.Default value:
true
.
- fill_negative_values
If
true
, negative species concentrations will be replaced with zeros.If
false
, no change will be made to species concentrations.Default value:
true
.
- iord_jord_kord
Specifies advection options (in list format) for TPCORE in the longitude, latitude, and vertical dimensions. The options are listed below:
1st order upstream scheme (use for debugging only)
2nd order van Leer (full monotonicity constraint)
Monotonic PPM
Semi-monotonic PPM (same as 3, but overshoots are allowed)
Positive-definite PPM
Un-constrained PPM (use when fields & winds are very smooth) this option only when the fields and winds are very smooth.
Huynh/Van Leer/Lin full monotonicity constraint (KORD only)
Default (and recommended) value:
[3, 3, 7]
- transported_species
A list of species names (in YAML sequence format) that will be transported by the TPCORE advection scheme.
Wet deposition
#============================================================================
# Settings for GEOS-Chem operations
#============================================================================
operations:
# .. preceding sub-sections omitted ...
wet_deposition:
activate: true
The operations:wet_deposition
section contains settings
for wet deposition.
- activate
Activates (
true
) or deactivates (false
) wet deposition in GEOS-Chem Classic.
Aerosols settings
This section of geoschem_config.yml
is included for
fullchem
and aerosol
simulations.
There are several sub-sections under aerosols
:
Carbon aerosols
#============================================================================
# Settings for GEOS-Chem aerosols
#============================================================================
aerosols:
carbon:
activate: true
brown_carbon: false
enhance_black_carbon_absorption:
activate: true
hydrophilic: 1.5
hydrophobic: 1.0
# .. following sub-sections omitted ...
The aerosols:carbon
section contains settings for
carbon aerosols:
- activate
Activates (
true
) or deactivates (false
) carbon aerosols in GEOS-Chem.Default value:
true
.
- brown_carbon
Activates (
true
) or deactivates (false
) brown carbon aerosols in GEOS-Chem.Default value:
false
.
- enhance_black_carbon_absorption
Options for enhancing the absorption of black carbon aerosols due to external coating.
- activate
Activates (
true
) or deactivates (false
) black carbon absorption enhancement.Default value:
true
.
- hydrophilic
Absorption enhancement factor for hydrophilic black carbon aerosol (species name BCPI).
Default value:
1.5
- hydrophobic
Absorption enhancement factor for hydrophilic black carbon aerosol (species name BCPO).
Default value:
1.0
Complex SOA
The aerosols:complex_SOA
section contains settings for
the complex SOA scheme used in GEOS-Chem.
#============================================================================
# Settings for GEOS-Chem aerosols
#============================================================================
aerosols:
# ... preceding sub-sections omitted ...
complex_SOA:
activate: true
semivolatile_POA: false
# ... following sub-sections omitted ...
- activate
Activates (
true
) or deactivates (false
) the complex SOA scheme.Default value:
- semivolatile_POA
Activates (
true
) or deactivates (false
) the semi-volatile primary organic aerosol (POA) option.Default value:
false
Mineral dust aerosols
The aerosols:dust
section contains settings for
mineral dust aerosols.
#============================================================================
# Settings for GEOS-Chem aerosols
#============================================================================
aerosols:
# ... preceding sub-sections omitted ...
dust:
activate: true
acid_uptake_on_dust: false
# ... following sub-sections omitted ...
- activate
Activates (
true
) or deactivates (false
) mineral dust aerosols in GEOS-Chem.Default value:
true
- acid_uptake_on_dust
Activates (
true
) or deactivates (false
) the acid uptake on dust option, which includes 12 additional species.Default value:
false
Sea salt aerosols
The aerosols:sea_salt
section contains settings for sea salt
aerosols:
#============================================================================
# Settings for GEOS-Chem aerosols
#============================================================================
aerosols:
# ... preceding sub-sections omitted ...
sea_salt:
activate: true
SALA_radius_bin_in_um: [0.01, 0.5]
SALC_radius_bin_in_um: [0.5, 8.0]
marine_organic_aerosols: false
# ... following sub-sections omitted ...
- activate
Activates (
true
) or deactivates (false
) sea salt aerosols.Default value:
true
- SALA_radius_bin_in_um
Specifies the upper and lower boundaries (in nm) for accumulation-mode sea salt aerosol (aka SALA).
Default value:
0.01 nm - 0.5 nm
- SALC_radius_bin_in_um
Specifies the upper and lower boundaries (in nm) for coarse-mode sea salt aerosol (aka SALC).
Default value:
0.5 nm - 8.0 nm
- marine_organic_aerosols
Activates (
true
) or deactivates (false
) emission of marine primary organic aerosols. This option includes two extra species (MOPO and MOPI).Default value:
false
Stratospheric aerosols
The aerosols:sulfate
section contains settings for
stratopsheric aerosols.
#============================================================================
# Settings for GEOS-Chem aerosols
#============================================================================
aerosols:
# ... preceding sub-sections omitted ...
stratosphere:
settle_strat_aerosol: true
polar_strat_clouds:
activate: true
het_chem: true
allow_homogeneous_NAT: false
NAT_supercooling_req_in_K: 3.0
supersat_factor_req_for_ice_nucl: 1.2
calc_strat_aod: true
# ... following sub-sections omitted ...
- settle_strat_aerosol
Activates (
true
) or deactivates (false
) gravitational settling of stratospheric solid particulate aerosols (SPA, trapezoidal scheme) and stratospheric liquid aerosols (SLA, corrected Stokes’ Law).Default value:
true
- polar_strat_clouds
Contains settings for how aerosols are handled in polar stratospheric clouds (PSC):
- activate
Activates (
true
) or deactivates (false
) formation of polar stratospheric clouds.Default value:
true
- het_chem
Activates (
true
) or deactivates (false
) heterogeneous chemistry within polar stratospheric clouds.Default value:
true
- allow_homogeneous_NAT
Activates (
true
) or deactivates (false
) heterogeneous formation of NAT from freezing of HNO3.Default value:
false
- NAT_supercooling_req_in_K
Specifies the cooling (in K) required for homogeneous NAT nucleation.
Default value:
3.0
- supersat_factor_req_for_ice_nucl
Specifies the supersaturation factor required for ice nucleation.
Recommended values:
1.2
for coarse grids;1.5
for fine grids.
- calc_strat_aod
Includes (
true
) or excludes (false
) online stratospheric aerosols in extinction calculations for photolysis.Default value:
true
Sulfate aerosols
The aerosols:sulfate
section contains settings for sulfate
aerosols:
#============================================================================
# Settings for GEOS-Chem aerosols
#============================================================================
aerosols:
# ... preceding sub-sections omitted ...
sulfate:
activate: true
metal_cat_SO2_oxidation: true
- activate
Activates (
true
) or deactivates (false
) sulfate aerosols.Default value:
true
- metal_cat_SO2_oxidation
Activates (
true
) or deactivates (false
) the metal catalyzed oxidation of SO2.Default value:
true
Extra diagnostics
The extra_diagnostics
section contains settings for GEOS-Chem Classic
diagnostics that are not archived by History or HEMCO:
Obspack diagnostic
The extra_diagnostics:obspack
section contains settings for
the Obspack diagnostic:
#============================================================================
# Settings for diagnostics (other than HISTORY and HEMCO)
#============================================================================
extra_diagnostics:
obspack:
activate: false
quiet_logfile_output: false
input_file: ./obspack_co2_1_OCO2MIP_2018-11-28.YYYYMMDD.nc
output_file: ./OutputDir/GEOSChem.ObsPack.YYYYMMDD_hhmmz.nc4
output_species:
- CO
- 'NO'
- O3
# ... following sub-sections omitted ...
- activate
Activates (
true
) or deactivates (false
) ObsPack diagnostic output.Default value:
true
- quiet_logfile_output
Deactivates (
true
) or activates (false
) printing informational output tostdout
(i.e. the screen or log file).Default value:
false
- input_file
Specifies the path to an ObsPack data file (in netCDF format).
- output_file
Specifies the path to the ObsPack diagnostic output file. This will be a file that contains data at the same locations as specified in
input_file
.
- output_species
A list of GEOS-Chem species (as a YAML sequence) to archive to the output file.
Planeflight diagnostic
The extra_diagnostics:planeflight
section contains settings for
the GEOS-Chem planeflight diagnostic:
#============================================================================
# Settings for diagnostics (other than HISTORY and HEMCO)
#============================================================================
extra_diagnostics:
# ... preceding sub-sections omitted ...
planeflight:
activate: false
flight_track_file: Planeflight.dat.YYYYMMDD
output_file: plane.log.YYYYMMDD
# ... following sub-sections omitted ...
- activate
Activates (
true
) or deactivates (false
) the Planeflight diagnostic output.Default value:
false
- flight_track_file
Specifies the path to a flight track file. This file contains the coordinates of the plane as a function of time, as well as the requested quantities to archive.
- output_file
Specifies the path to the Planeflight output file. Requested quantities will be archived from GEOS-Chem along the flight track specified in
flight_track_file
.
Hg simulation options
This section of geoschem_config.yml
is included for
the mercury (Hg) simulation:
Hg sources
The Hg_simulation_options:sources
section contains settings
for various mercury sources.
#============================================================================
# Settings specific to the Hg simulation
#============================================================================
Hg_simulation_options:
sources:
use_dynamic_ocean_Hg: false
use_preindustrial_Hg: false
use_arctic_river_Hg: true
# ... following sub-sections omitted ...
- use_dynamic_ocean_Hg
Activates (
true
) or deactivates (false
) the online slab ocean mercury model.Default value:
false
- use_preindustrial_Hg
Activates (
true
) or deactivates (false
) the preindustrial mercury simulation. This will turn off all anthropogenic emissions.Default value:
false
- use_arctic_river_Hg
Activates (
true
) or deactivates (false
) the source of mercury from arctic rivers.Default value:
true
Hg chemistry
The Hg_simulation_options:chemistry
section contains settings
for mercury chemistry:
#============================================================================
# Settings specific to the Hg simulation
#============================================================================
Hg_simulation_options:
# ... preceding sub-sections omitted ...
chemistry:
tie_HgIIaq_reduction_to_UVB: true
# ... following sub-sections omitted ...
- tie_HgIIaq_reduction_to_UVB
Activates (
true
) or deactivates (false
) linking the reduction of aqueous oxidized mercury to UVB radiation. A lifetime of -1 seconds indicates the species has an infinite lifetime.Default value:
true
Options for simulations with carbon gases
These sections of geoschem_config.yml
are included for
simulations with carbon gases (carbon
, CH4
,
CO2
, tagCO
, tagCH4
).
CH4 observational operators
The CH4_simulation_options:use_observational_operators
section
contains options for using satellite observational operators for CH4:
#============================================================================
# Settings specific to the CH4 simulation / Integrated Methane Inversion
#============================================================================
CH4_simulation_options:
use_observational_operators:
AIRS: false
GOSAT: false
TCCON: false
# ... following sub-sections omitted ...
- AIRS
Activates (
true
) or deactivates (false
) the AIRS observational operator.Default value:
false
- GOSAT
Activates (
true
) or deactivates (false
) the GOSAT observational operator.Default value:
false
- TCCON
Activates (
true
) or deactivates (false
) the GOSAT observational operator.Default value:
false
CH4 analytical inversion options
The ch4_simulation_options:analytical_inversion
section
contains options for analytical inversions with the Integrated
Methane Inversion workflow (aka IMI).
The IMI will automatically modify several of these options based on
the inversion parameters that you specify.
#============================================================================
# Settings specific to the CH4 simulation / Integrated Methane Inversion
#============================================================================
CH4_simulation_options:
# ... preceding sub-sections omitted ...
analytical_inversion:
activate: false
emission_perturbation: 1.0
state_vector_element_number: 0
use_emission_scale_factor: false
use_OH_scale_factors: false
perturb_OH_boundary_conditions: false
CH4_boundary_condition_ppb_increase_NSEW: [0.0, 0.0, 0.0, 0.0]
- activate
Activates (
true
) or deactivates (false
) the analytical inversion.Default value:
false
- emission perturbation
Specifies a unitless factor by which emissions for this state vector element will be perturbed.
Default value:
1.0
(no perturbation)
- state_vector_element_number
Specifies the element of the state vector corresponding to this simulation.
Default value:
0
- use_emission_scale_factor
Activates (
true
) or deactivates (false
) scaling methane emissions by a fixed factor. This scale factor is specified in the HEMCO_Config.rc file.Default value:
false
- use_oh_scale_factors
Activates (
true
) or deactivates (false
) perturbation of OH in analytical inversions of methane. The OH scale factors are specified in theOH_SF
entry of HEMCO_Config.rc file.Default value:
false
- perturb_CH4_boundary_conditions
Activates (
true
) or deactivatees (false
) perturbation of CH4 nested-grid boundary conditions in analytical inversions.Default value:
false
- CH4_boundary_condition_ppb_increase_NSEW
Specifies the perturbation amount (in ppbv) to apply to the north, south, east and west CH4 nested-grid boundary conditions. Used in conjunction with the
perturb_CH4_boundary_conditions
option.Default value:
[0.0, 0.0, 0.0, 0.0]
(no perturbation)
CO2 Sources
The CO2_simulation_options:sources
section contains toggles
for activating sources of \(CO_2\):
#============================================================================
# Settings specific to the CO2 simulation
#============================================================================
CO2_simulation_options:
sources:
fossil_fuel_emissions: true
ocean_exchange: true
balanced_biosphere_exchange: true
net_terrestrial_exchange: true
ship_emissions: true
aviation_emissions: true
3D_chemical_oxidation_source: true
# ... following sub-sections omitted ...
- fossil_fuel_emissions
Activates (
true
) or deactivates (false
) using \(CO_2\) fossil fuel emissions as computed by HEMCO.Default value:
true
- ocean_exchange
Activates (
true
) or deactivates (false
) \(CO_2\) ocean-air exchange.Default value:
true
- balanced_biosphere_exchange
Activates (
true
) or deactivates (false
) \(CO_2\) balanced-biosphere exchange.Default value:
true
- net_terrestrial_exchange
Activates (
true
) or deactivates (false
) \(CO_2\) net terrestrial exchange.Default value:
true
- ship_emissions
Activates (
true
) or deactivates (false
) \(CO_2\) ship emissions as computed by HEMCO.Default value:
true
- aviation_emissions
Activates (
true
) or deactivates (false
) \(CO_2\) aviation emissions as computed by HEMCO.Default value:
true
- 3D_chemical_oxidation_source
Activates (
true
) or deactivates (false
) \(CO_2\) production by archived chemical oxidation, as read by HEMCO.Default value:
true
CO2 tagged species
The CO2_simulation_options:tagged_species
section contains toggles
for activating tagged \(CO_2\) species:
Attention
Tagged \(CO_2\) tracers should be customized by each user and the present configuration will not work for resolutions other than \(2.0^{\circ} {\times} 2.5^{\circ}\).
#============================================================================
# Settings specific to the CO2 simulation
#============================================================================
CO2_simulation_options:
# ... preceding sub-sections omitted ...
tagged_species:
save_fossil_fuel_in_background: false
tag_bio_and_ocean_CO2: false
tag_land_fossil_fuel_CO2:
tag_global_ship_CO2: false
tag_global_aircraft_CO2: false
- save_fossil_fuel_in_background
Activates (
true
) or deactivates (false
) saving the \(CO_2\) background.Default value:
false
- tag_bio_and_ocean_CO2
Activates (
true
) or deactivates (false
) tagging of biosphere regions (28), ocean regions (11), and the rest of the world (ROW) as specified inRegions_land.dat
andRegions_ocean.dat
files.# .. following sub-sections omitted …
CO chemical sources
The tagged_CO_simulation_options
section contains settings
for the carbon
simulation and tagged CO simulation.
#============================================================================
# Settings specific to the tagged CO simulation
#============================================================================
tagged_CO_simulation_options:
use_fullchem_PCO_from_CH4: true
use_fullchem_PCO_from_NMVOC: true
HEMCO_Config.rc
GEOS-Chem Classic relies on the Harmonized Emissions Component (aka
HEMCO) for file I/O, regridding, and
computing emissions fluxes. Settings for HEMCO can be updated in the
HEMCO configuration file,
which is named HEMCO_Config.rc
.
The HEMCO online manual at hemco.readthedocs.io contains detailed instructions about
the structure and contents of HEMCO_Config.rc
, so we will not
replicate that content in this Guide. Instead, we will provide a
short summary with links to the relevant documentation.
General HEMCO settings
Define general simulation parameters in the Settings section
of HEMCO_Config.rc
. This includes data paths, global
diagnostic options, and verbose output options.
###############################################################################
### BEGIN SECTION SETTINGS
###############################################################################
ROOT: /path/to/hemco/data/dir
METDIR: /path/to/hemco/met/dir
GCAPSCENARIO: not_used
GCAPVERTRES: 47
Logfile: *
DiagnFile: HEMCO_Diagn.rc
DiagnPrefix: ./OutputDir/HEMCO_diagnostics
DiagnFreq: 00000000 010000
Wildcard: *
Separator: /
Unit tolerance: 1
Negative values: 0
Only unitless scale factors: false
Verbose: false
VerboseOnCores: root # Accepted values: root all
### END SECTION SETTINGS ###
Extension switches
Turn individual emissions inventories on/off in the Extension
Switches section
of HEMCO_Config.rc
. Emission inventories are
specified as either Base Emissions
(i.e. read from files on disk) or Extensions
(i.e. computed using meteorological inputs).
###############################################################################
### BEGIN SECTION EXTENSION SWITCHES
###############################################################################
# ExtNr ExtName on/off Species Years avail.
0 Base : on *
# ----- MAIN SWITCHES ---------------------------------------------------------
--> EMISSIONS : true
--> METEOROLOGY : true
--> CHEMISTRY_INPUT : true
# ----- RESTART FIELDS --------------------------------------------------------
--> GC_RESTART : true
--> HEMCO_RESTART : true
# ----- NESTED GRID FIELDS ----------------------------------------------------
--> GC_BCs : false
# ----- REGIONAL INVENTORIES --------------------------------------------------
--> APEI : false # 1989-2014
--> NEI2016_MONMEAN : false # 2002-2020
--> DICE_Africa : false # 2013
# ----- GLOBAL INVENTORIES ----------------------------------------------------
--> CEDSv2 : true # 1750-2019
--> CEDS_GBDMAPS : false # 1970-2017
--> CEDS_GBDMAPS_byFuelType: false # 1970-2017
... etc ...
# -----------------------------------------------------------------------------
100 Custom : off -
101 SeaFlux : on DMS/ACET/ALD2/MENO3/ETNO3/MOH
102 ParaNOx : on NO/NO2/O3/HNO3
--> LUT data format : nc
--> LUT source dir : $ROOT/PARANOX/v2015-02
103 LightNOx : on NO
--> CDF table : $ROOT/LIGHTNOX/v2014-07/light_dist.ott2010.dat
104 SoilNOx : on NO
--> Use fertilizer NOx : true
... etc ...
### END SECTION EXTENSION SWITCHES ###
Base emissions
Note
You do not have to edit this section if you just wish to run GEOS-Chem Classic with its default emissions configuration.
Specify how emissions and other data sets will be read from disk in
the Base Emissions section
of HEMCO_Config.rc
.
###############################################################################
### BEGIN SECTION BASE EMISSIONS
###############################################################################
# ExtNrName sourceFile sourceVar sourceTime C/R/E SrcDim SrcUnit Species ScalIDs Cat Hier
(((EMISSIONS
#==============================================================================
# --- APEI (Canada) ---
#==============================================================================
(((APEI
0 APEI_NO $ROOT/APEI/v2016-11/APEI.0.1x0.1.nc NOx 1989-2014/1/1/0 RF xy kg/m2/s NO 25/1002/115 1 30
0 APEI_CO $ROOT/APEI/v2016-11/APEI.0.1x0.1.nc CO 1989-2014/1/1/0 RF xy kg/m2/s CO 26/52/1002 1 30
0 APEI_SOAP - - - - - - SOAP 26/52/1002/280 1 30
0 APEI_SO2 $ROOT/APEI/v2016-11/APEI.0.1x0.1.nc SOx 1989-2014/1/1/0 RF xy kg/m2/s SO2 60/1002 1 30
0 APEI_SO4 - - - - - - SO4 60/65/1002 1 30
0 APEI_pFe -
... etc ...
### END SECTION BASE EMISSIONS ###
Scale factors
Define scale factors for emissions inventories and other data sets in
the Scale Factors section
of HEMCO_Config.rc
.
#==============================================================================
# --- Scale factors used for species conversions ---
#==============================================================================
# Units carbon to species conversions
# Factor = # carbon atoms * MW carbon) / MW species
40 CtoACET MATH:58.09/(3.0*12.0) - - - xy unitless 1
41 CtoALD2 MATH:44.06/(2.0*12.0) - - - xy unitless 1
42 CtoALK4 MATH:58.12/(4.3*12.0) - - - xy unitless 1
... etc ...
# VOC speciations
(((RCP_3PD.or.RCP_45.or.RCP_60.or.RCP_85
50 KET2MEK 0.25 - - - xy unitless 1
51 KET2ACET 0.75 - - - xy unitless 1
)))RCP_3PD.or.RCP_45.or.RCP_60.or.RCP_85
... etc ...
### END SECTION SCALE FACTORS ###
Masks
Define masks for emissions and other data sets in the Masks section
of HEMCO_Config.rc
###############################################################################
### BEGIN SECTION MASKS
###############################################################################
# ScalID Name sourceFile sourceVar sourceTime C/R/E SrcDim SrcUnit Oper Lon1/Lat1/Lon2/Lat2
(((EMISSIONS
#==============================================================================
# Country/region masks
#==============================================================================
(((APEI
1002 CANADA_MASK $ROOT/MASKS/v2018-09/Canada_mask.geos.1x1.nc MASK 2000/1/1/0 C xy 1 1 -141/40/-52/85
)))APEI
(((NEI2016_MONMEAN
1007 CONUS_MASK $ROOT/MASKS/v2018-09/CONUS_Mask.01x01.nc MASK 2000/1/1/0 C xy 1 1 -140/20/-50/60
)))NEI2016_MONMEAN
... etc ...
)))EMISSIONS
### END SECTION MASKS ###
### END OF HEMCO INPUT FILE ###
HEMCO_Diagn.rc
In your run directory, you will find a copy of the HEMCO diagnostic configuration file
(named HEMCO_Diagn.rc
) corresponding to the
HEMCO_Config.rc file. You will only need to edit
this file if you wish to change the default diagnostic output configuration.
A snippet of the HEMCO_Diagn.rc
for the fullchem
simulation is shown below:
###############################################################################
##### ALD2 emissions #####
###############################################################################
EmisALD2_Total ALD2 -1 -1 -1 3 kg/m2/s ALD2_emission_flux_from_all_sectors
EmisALD2_Anthro ALD2 0 1 -1 3 kg/m2/s ALD2_emission_flux_from_anthropogenic
EmisALD2_BioBurn ALD2 111 -1 -1 2 kg/m2/s ALD2_emission_flux_from_biomass_burning
EmisALD2_Biogenic ALD2 0 4 -1 2 kg/m2/s ALD2_emission_flux_from_biogenic_sources
EmisALD2_Ocean ALD2 101 -1 -1 2 kg/m2/s ALD2_emission_flux_from_ocean
EmisALD2_PlantDecay ALD2 0 3 -1 2 kg/m2/s ALD2_emission_flux_from_decaying_plants
EmisALD2_Ship ALD2 0 10 -1 2 kg/m2/s ALD2_emission_flux_from_ships
Columns:
netCDF variable name for the requested diagnostic quantity
Species name
Extension number (
-1
means sum over all extensions)Category (
-1
means sum over all categories)Hierarchy (
-1
means sum over all hierarchies)Dimension of data (
1
: scalar,2
: lon-lat,3
: lon-lat-lev)Units
Value for the
long_name
netCDF variable attribute
The prefix (e.g. OutputDir/HEMCO_diagnostics
) for HEMCO diagnostics
output files are specified in the
Settings section of the HEMCO_Config.rc file.
HISTORY.rc
You can specify which GEOS-Chem Classic diagnostic outputs you would
like to archive with the HISTORY.rc
configuration file.
Sample HISTORY.rc diagnostic input file
A simplified HISTORY.rc
file is shown below.
#============================================================================
# EXPID allows you to specify the beginning of the file path corresponding
# to each diagnostic collection. For example:
#
# EXPID: ./GEOSChem
# Will create netCDF files whose names begin "GEOSChem",
# in this run directory.
#
# EXPID: ./OutputDir/GEOSChem
# Will create netCDF files whose names begin with "GEOSChem"
# in the OutputDir sub-folder of this run directory.
#
#============================================================================
EXPID: ./OutputDir/GEOSChem
#==============================================================================
# %%%%% COLLECTION NAME DECLARATIONS %%%%%
#
# To disable a collection, place a "#" character in front of its name
#
# NOTE: These are the "default" collections for GEOS-Chem.
# But you can create your own custom diagnostic collections as well.
#==============================================================================
COLLECTIONS: 'SpeciesConc' ,
'SpeciesConcSubset' ,
'ConcAfterChem' ,
::
#==============================================================================
# %%%%% THE SpeciesConc COLLECTION %%%%%
#
# GEOS-Chem species concentrations (default = advected species)
#
# Available for all simulations
#==============================================================================
SpeciesConc.template: '%y4%m2%d2_%h2%n2z.nc4' ,
SpeciesConc.frequency: 00000000 060000 ,
SpeciesConc.duration: 00000001 000000 ,
SpeciesConc mode: 'instantaneous' ,
SpeciesConc.fields: 'SpeciesConcVV_?ADV?'
'SpeciesConcMND_?ADV?'
::
#==============================================================================
# %%%%% THE SpeciesConcSubset COLLECTION %%%%%
#
# Same as the SpeciesConc collection, but will subset data in the horizontal
# and vertical dimensions so that the netCDF diagnostic files will cover
# a smaller region of the globe. This can save disk space and memory.
#
# NOTE: This capability will be available in GEOS-Chem "Classic" 12.5.0
# and later versions.
#
# Available for all simulations
#==============================================================================
SpeciesConcSubset.template: '%y4%m2%d2_%h2%n2z.nc4',
SpeciesConcSubset.frequency: 060000,
SpeciesConcSubset.duration: 00000001 000000,
SpeciesConcSubset.mode: 'instantaneous',
SpeciesConcSubset.LON_RANGE: -40.0 60.0,
SpeciesConcSubset.LAT_RANGE: -10.0 50.0,
SpeciesConcSubset.levels: 1 2 3 4 5,
SpeciesConcSubset.fields: 'SpeciesConcVV_?ADV?',
'SpeciesConcMND_?ADV?',
::
#==============================================================================
# %%%%% THE ConcAfterChem COLLECTION %%%%%
#
# Concentrations of OH, HO2, O1D, O3P immediately after exiting the KPP solver
# or OH after the CH4 specialty-simulation chemistry routine.
#
# OH: Available for all full-chemistry simulations and CH4 specialty sim
# HO2: Available for all full-chemistry simulations
# O1D, O3P: Availalbe for full-chemistry simulations using UCX mechanism
#==============================================================================
ConcAfterChem.template: '%y4%m2%d2_%h2%n2z.nc4',
ConcAfterChem.frequency: 00000100 000000,
ConcAfterChem.duration: 00000100 000000,
ConcAfterChem.mode: 'time-averaged',
ConcAfterChem.fields: 'OHconcAfterChem',
'HO2concAfterChem',
'O1DconcAfterChem',
'O3PconcAfterChem',
::
In this HISTORY.rc
file, we are requesting three collections
(SpeciesConc
, SpeciesConcSubset
, and
ConcAfterChem
). Each collection represents a set of netCDF
files that will contain the same diagnostic fields.
Legend
- COLLECTIONS:
List of diagnostic collections in the
HISTORY.rc
file.To turn off a collection, place a comment character (
#
) before its name. For example:COLLECTIONS: #'SpeciesConc', 'SpeciesConcSubset', 'ConcAfterChem',
turns off the
SpeciesConc
collection.
- <collection-name>.template
Determines the date and time format of the netCDF file names that will be created for diagnostic collection
<collection-name>
.The string
%y4%m2%d2_%h2%n2z.nc4
will printYYYYMMDD_hhmmz.nc4
to the end of each netCDF filename, where:YYYYMMDD
is the date in year/month/day formathhmm
is the time in hour:minutes format.z
denotes “Zulu”, which is an abbreviation for UTC time..nc4
denotes that the data file is in the netCDF-4 format.
- <collection-name>.frequency
Determines how often the diagnostic fields belonging to collection
<collection-name>
collection will be written to a netCDF file. For example:010000
schedules diagnostic archival each hour.00000100 000000
schedules diagnostic output each month.00000001 000000
(or240000
) schedules diagnostic output each day.… etc. …
- <collection-name>.duration
Determines how often a new netCDF file will be created for collection
<collection-name>
. For example:010000
creates a new netCDF each hour.00000100 000000
creates a new netCDF file each month. month.00000001 000000
(or240000
) creates a new netCDF file each day.
- <collection-name>.mode
Determines the averaging method for collection
<collection-name>
. Accepted values are:- instantaneous
Data will be archived as instantaneous “snapshots” at the frequency specified in
<collection-name>.frequency
.
- time-averaged
Data will be time-averaged with the frequency specified in
<collection-name>.frequency
.
- <collection-name>.fields
A list of the diagnostic fields that will be included in collection
<collection-name>
.A single underscore
_
denotes a species-based diagnostic field. To request output for a single species, list the species name immediately after the underscore, such as:SpeciesConc.fields: 'SpeciesConcVV_NO' 'SpeciesConcVV_O3' 'SpeciesConcVV_CO' ... etc ... ::
You may also use a wildcard such as
?ADV?
, which requests all advected species:SpeciesConc.fields: 'SpeciesConcVV_?ADV?' ... etc ... ::
The complete wildcard list is shown below. Wildcards are case-insensitive.
?ADV?
: Only the advected species?AER?
: Only the aerosol species?ALL?
: All GEOS-Chem species?DRYALT?
: Only the dry-deposited species whose concentrations we wish to archive at a given altitude above the surface. (In practice these are only O3 and HNO3.)?DRY?
: Only the dry-deposited species?FIX?
: Only the inactive (aka “fixed”) species in the KPP chemical mechanism?GAS?
: Only the gas-phase species?HYG?
: Only aerosols that undergo hygroscopic growth (sulfate, BC, OC, SALA, SALC)?LOS?
: Only chemical loss species or families?KPP?
: Only the KPP species?PHO?
: Only the photolyzed species?VAR?
: Only the active (aka “variable”) species in the KPP chemical mechanism?WET?
: Only the wet-deposited species?PRD?
: Only chemical production species or families?DUSTBIN?
: Only the dust bin number?PHOTOBIN?
Number of a given wavelength bin for photolysis
To include fields from the
State_Chm
object in collection<collection-name>
, precede the field name withChem_
:'Chem_phCloud', ... etc ...
To include fields from the
State_Met
object in collection<collection-name>
, precede the field name withMet_
:'Met_T'. 'Met_PS', 'Met_SPHU', ... etc ...
Both
Chem_
andMet_
specifiers are case-insensitive.
- <collection-name>.LON_RANGE
Optional. Restrict data fields of collection
<collection-name>
to the rangemin_lon, max_lon
.
- <collection-name>.LAT_RANGE
Optional. Restrict data fields of collection
<collection-name>
to the rangemin_lat, max_lat
.
- <collection-name>.levels
Optional. Restrict data fields of collection
<collection-name>
to the specified levels (e.g.,1,2,3,4,5
or1-5
).
- ::
Signifies the end of the definition section for collection
<collection-name>
.
All of the above-mentioned files are included in your GEOS-Chem Classic run directory.
Please see our Customize simulations with research options Supplemental Guide to learn how you can customize your simulation by activating alternate science options in your simulations.
Less-commonly updated configuration files
If you need to add or delete species, or to change the default photolysis and/or chemistry mechanism settings in your simulation, you’ll need to edit these configuration files:
species_database.yml
Note
You will only need to edit species_database.yml
if you are
adding new species to a GEOS-Chem simulation.
The GEOS-Chem Species Database is a YAML file that contains a listing of metadata for each
species used by GEOS-Chem. The Species Database is included in your
run directory as file species_database.yml
, a snippet of which
is shown below.
# GEOS-Chem Species Database
# Core species only (neglecting microphysics)
# NOTE: Anchors must be defined before any variables that reference them.
A3O2:
Formula: CH3CH2CH2OO
FullName: Primary peroxy radical from C3H8
Is_Gas: true
MW_g: 75.10
ACET:
DD_F0: 1.0
DD_Hstar: 1.0e+5
Formula: CH3C(O)CH3
FullName: Acetone
Henry_CR: 5500.0
Henry_K0: 2.74e+1
Is_Advected: true
Is_DryDep: true
Is_Gas: true
Is_Photolysis: true
MW_g: 58.09
... etc ...
AERI:
DD_DvzAerSnow: 0.03
DD_DvzMinVal: [0.01, 0.01]
DD_F0: 0.0
DD_Hstar: 0.0
Formula: I
FullName: Iodine on aerosol
Is_Advected: true
Is_Aerosol: true
Is_DryDep: true
Is_WetDep: true
MW_g: 126.90
WD_AerScavEff: 1.0
WD_KcScaleFac: [1.0, 0.5, 1.0]
WD_RainoutEff: [1.0, 0.0, 1.0]
WD_RainoutEff_Luo: [0.4, 0.0, 1.0]
... etc ...
Important
Species NO (nitrogen oxide) must be listed in
species_database.yml
as 'NO':
. This will avoid
YAML readers mis-intepreting this as no
(meaning
false
).
Each species name begins in the first column of the file, followed by
a :
. Underneath
the species name follows an indented block of species properties in Property: Value
format.
Some properties listed above are only applicable to gas-phase species, and others to aerosol species. But at the very least, each species should have the following properties defined:
Formula
FullName
MW_g
Either
Is_Gas
orIs_Aerosol
For more information about species properties, please see View GEOS-Chem species properties in the Supplemental Guides section.
Photolysis and chemistry configuration files
Note
You won’t need to edit these configuration files unless you are changing the photolysis and/or chemistry mechanisms used in your GEOS-Chem Classic simulation.
Photolysis configuration files
These are found in the ExtData/CHEM_INPUTS/CLOUD-J/
directory
structure. Cloud-J is used by default in GEOS-Chem compute photolysis
rates. If using legacy FAST-JX instead then configuration files are located
in the ExtData/CHEM_INPUTS/FAST-JX/
directory. See the configuration
file geoschem_config.yml
for which subdirectory within these folders
you are configured to use. See the README in the data directory for information
about these files.
Chemical mechanism configuration files
GEOS-Chem Classic simulations use source code generated by The Kinetic PreProcessor. If you need to update the default chemistry mechanism, you will need to do the following steps:
Modify the relevant KPP configuration files (described below);
Run KPP to generate updated source code for GEOS-Chem Classic;
Compile GEOS-Chem Classic to create a new executable;
Start your GEOS-Chem Classic simulation.
Chemical mechanism configuration files are located in these folders:
- KPP/fullchem
Contains configuration files for the default “full-chemistry” mechanism (NOx + Ox + aerosols + Br + Cl + I).
fullchem.kpp
: Main configuration file for the fullchem mechanism.fullchem.eqn
: List of species and reactions for the fullchem mechanism.
- KPP/carbon
Contains configuration files for the carbon gases mechanism (CH4-CO2-CO-OCS):
carbon.kpp
: Main configuration file for the carbon mechanism.carbon.eqn
: List of species and reactions for the carbon mechanism.
- KPP/custom
Contains configuration files that you can edit if you need to create a custom mechanism. We recommend that you create the custom in this folder and leave
KPP/fullchem
andKPP/Hg
untouched.custom.kpp
: Copy offullchem.kpp
custom.eqn
: Copy offullchem.eqn
.
- KPP/Hg
Contains configuration files for the mercury chemistry mechanism:
Hg.kpp
: Main configuration file for the Hg mechanism.Hg.eqn
: List of species and reactions for the Hg mechanism.
Please see the following references for more information about KPP:
The KPP user manual (kpp.readthedocs.io)
Supplemental Guide: Update chemical mechanisms with KPP
Download input data
In the following chapters, you will learn how to download input data for your GEOS-Chem simulation:
Note
If you are located at an institution with several GEOS-Chem users, the input data for GEOS-Chem Classic may have already been downloaded to a shared data directory on your system. If this is the case for you, feel fee to skip ahead to the Run your simulation chapter.
Input data for GEOS-Chem Classic
GEOS-Chem Classic reads (via HEMCO) several data files from disk during a simulation. These can be grouped into the following categories:
Data portals
Input data files for GEOS-Chem can be downloaded from one of the following portals:
- WashU
The primary data portal for GEOS-Chem, geoschemdata.wustl.edu
Note
The WashU data portal may be unavailable at times due to regularly-scheduled maintenance periods. Please check the WashU IT status page for more information.
- Amazon
GEOS-Chem data on the Amazon cloud, s3://gcgrid
- Rochester
Portal for the GCAP 2.0 meteorological data files, atmos.earth.rochester.edu
Initial conditions input data
Initial conditions include:
Initial species concentrations (aka Restart files) used to start a GEOS-Chem simulation.
Download method |
From portals |
---|---|
Run bashdatacatalog
on the |
|
Direct data download (FTP or wget) |
|
Globus, use endpoint GEOS-Chem data (WashU) |
\(^1\) We provide InitialConditions.csv
files (for each
GEOS-Chem version since 13.0.0) at our input-data-catalogs Github repository.
Chemistry input data
Chemistry input data includes:
Quantum yields and cross sections for photolysis using either
Cloud-J
or legacyFAST-JX
Climatology data for Linoz
Boundary conditions for UCX stratospheric chemistry routines
Download method |
From portals |
---|---|
Run bashdatacatalog
on the
|
|
Direct data download (FTP or wget) |
|
Globus, use endpoint GEOS-Chem data (WashU) |
\(^2\) We provide ChemistryInputs.csv
files (for each
GEOS-Chem version since 13.0.0) at our input-data-catalogs Github repository.
Emissions input data
Emissions input data includes the following data:
Emissions inventories
Input data for HEMCO Extensions
Input data for GEOS-Chem specialty simulations
Scale factors
Mask definitions
Surface boundary conditions
Leaf area indices
Land cover map
Download method |
From portals |
---|---|
Run bashdatacatalog
on the
|
|
Direct data download (FTP or wget) |
|
Globus, use endpoint GEOS-Chem data (WashU) |
\(^3\) We provide EmissionsInputs.csv
files (for each
GEOS-Chem version since 13.0.0) at our input-data-catalogs Github repository.
Meteorology input data
As described previously, GEOS-Chem Classic be driven by the following meteorology products:
Download method |
From portals |
---|---|
Run bashdatacatalog
on the
|
|
Direct data download (FTP or wget) |
|
Globus, use endpoint GEOS-Chem data (WashU) |
\(^4\) We provide a MeteorologyInputs.csv
file at our
input-data-catalogs Github repository.
Restart files
In the following chapers, you will learn about restart files and how they are used.
What is a restart file?
Restart files contain the initial conditions (cf. Initial conditions input data) for a GEOS-Chem simulation. GEOS-Chem simulations need two restart files.
- GEOSChem.Restart.YYYYMMDD_hhmmz.nc4
Format: netCDF
Description: The GEOS-Chem Classic restart file. Contains species concentrations that are read at simulation startup.
GEOS-Chem writes a restart file at the end of each simulation. This allows a long simulation to be split into several individal run stages. For example, the restart file that was created at 00:00 UTC on August 1, 2019 is named:
GEOSChem.Restart.20190801_0000z.nc4
. Thez
character indicates “Zulu” time (aka UTC).GEOS-Chem restart files are created in the
Restarts/
folder of your GEOS-Chem run directory directory.
- HEMCO_restart.YYYYMMDDhhmm.nc
Format: netCDF
Description: The HEMCO restart file. HEMCO archives certain quantities (mostly pertaining to soil NOx and biogenic emissions) in order to facilitate long GEOS-Chem simulations with several run stages.
HEMCO restart files are created in the
Restarts/
folder of your GEOS-Chem run directory directory.
When you run a GEOS-Chem simulation, it will write new GEOS-Chem restart files at the intervals you specify in HISTORY.rc. New HEMCO restart files are written with frequency configured in HEMCO_Config.rc.
Viewing and manipulating restart files
Please see the following sections of our Supplemental Guide: Work with netCDF files for more information on how you can view and manipulate data in restart files:
GEOS-Chem restart files
How are restart files read into GEOS-Chem?
GEOS-Chem restart files are read via HEMCO. The entries listed below have been
added to HEMCO_Config.rc
(and may vary slightly for different
simulation types). These fields are obtained from HEMCO and copied to
the appropriate State_Chm
and State_Met
fields in
routine Get_GC_Restart
(located in
GeosCore/hcoi_gc_main_mod.F90
).
#==============================================================================
# --- GEOS-Chem restart file ---
#==============================================================================
(((GC_RESTART
* SPC_ ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 SpeciesRst_?ALL? $YYYY/$MM/$DD/$HH EFYO xyz 1 * - 1 1
* DELPDRY ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Met_DELPDRY $YYYY/$MM/$DD/$HH EY xyz 1 * - 1 1
* KPP_HVALUE ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_KPPHvalue $YYYY/$MM/$DD/$HH EY xyz 1 * - 1 1
* WETDEP_N ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_WetDepNitrogen $YYYY/$MM/$DD/$HH EY xy 1 * - 1 1
* DRYDEP_N ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_DryDepNitrogen $YYYY/$MM/$DD/$HH EY xy 1 * - 1 1
* SO2_AFTERCHEM ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_SO2AfterChem $YYYY/$MM/$DD/$HH EY xyz 1 * - 1 1
* H2O2_AFTERCHEM ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_H2O2AfterChem $YYYY/$MM/$DD/$HH EY xyz 1 * - 1 1
* AEROH2O_SNA ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_AeroH2OSNA $YYYY/$MM/$DD/$HH EY xyz 1 * - 1 1
* ORVCSESQ ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_ORVCSESQ $YYYY/$MM/$DD/$HH EY xyz 1 * - 1 1
* JOH ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_JOH $YYYY/$MM/$DD/$HH EY xy 1 * - 1 1
* JNO2 ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_JNO2 $YYYY/$MM/$DD/$HH EY xy 1 * - 1 1
* STATE_PSC ./Restarts/GEOSChem.Restart.$YYYY$MM$DD_$HH$MNz.nc4 Chem_StatePSC $YYYY/$MM/$DD/$HH EY xyz count * - 1 1
)))GC_RESTART
GEOS-Chem species (the SPC_
entry) use HEMCO time cycle flag
EFYO
by default. Other restart file fields use the time cycle flag
EY
. These are explained below.
- E
Exact: Stops with an error if the date of the simulation is different than the file timestamp.
- F
Forced: Stops with an error if the file isn’t found.
- Y
Simulation Year: Only reads the data for the simulation year but not for other years.
- O
Once: Does not keep cycling in time but only reads the file once.
When reading the species concentrations (time cycle flag:
EFYO
) from the restart file, HEMCO will cause your simulation
to stop with an error if:
The restart file is missing, or;
Any species is not found in the restart file, or;
The date in the restart file (which is usually 20190101 or 20190701, depending on your simulation) differs from the start date listed in geoschem_config.yml.
When reading other restart file fields (time cycle flag:
EY
). HEMCO will
The restart file is missing, or
The date in the restart file (which is usually 20190101 or 20190701, depending on your simulation) differs from the start date listed in geoschem_config.yml.
Attention
If you wish to spin up a GEOS-Chem simulation with a restart file that has (1) missing species or (2) a timestamp that does not match the start date in geoschem_config.yml, simply change the time cycle flag from
* SPC_ ... $YYYY/$MM/$DD/$HH EFYO xyz 1 * - 1 1
to
* SPC_ ... $YYYY/$MM/$DD/$HH CYS xyz 1 * - 1 1
This will direct HEMCO to read the closest date
available (C
), to use the simulation year
(Y
), and to skip any species (S
) not found
in the restart file.
Skipped species will be assigned the initial concentration
(units: \(mol\ mol^{-1}\) w/r/t dry air) specified by its
BackgroundVV
entry in species_database.yml. If the species does not have a
BackgroundVV
value specified, then its initial
concentration will be set to \(1.0{\times}10^{-20}\)
instead.
How can I determine the date of a restart file?
To determine the date of a netCDF restart file, you may use ncdump. For example:
ncdump -v time -t GEOSChem.Restart.YYYYMMDD_hhmmz.nc4
The -t option will return the time value in human-readable
date-time strings rather than numerical values in unit such as "hours
since 1985-1-1 00:00:0.0.
Where can I get a restart file for my simulation?
GEOS-Chem Classic run directories are configured to use sample GEOS-Chem restart files in netCDF format. These files are available for download at: http://geoschemdata.wustl.edu/ExtData/GEOSCHEM_RESTARTS/.
Tip
We recommend that you download restart files to your disk space with either a dry-run simulation or with the bashdatacatalog. This will ensure that the proper files will be downloaded.
If you have the ExtData/GEOSCHEM_RESTARTS
folder in your
GEOS-Chem data paths, then a sample restart file will be copied to
your run directory when you generate a new GEOS-Chem classic run
directory.
Monthly GEOS-Chem restart files from the GEOS-Chem 14.0.0 10-year benchmark may be found at http://ftp.as.harvard.edu/gcgrid/geos-chem/10yr_benchmarks/14.0.0/GCClassic/restarts.
Attention
The sample restart files do not reflect the actual atmospheric state and should only be used to “spin up” the model. In other words, they should be used as initial values in an initialization simulation to generate more accurate initial conditions for your production runs.
For how long should I spin up before starting a production simulation?
Doing a 6-month year spin up is usually sufficient for full-chemistry simulations. We recommend ten years for ozone, carbon dioxide, and methane simulations, and four years for radon-lead-beryllium simulations. If you are in doubt about how long your spin up should be for your simulation, we recommend contacting the GEOS-Chem Working Group that specializes in your area of research.
You may spin up the model starting at any year for which there is met data, but you should always start your simulations at the month and day corresponding to the restart file to more accurately capture seasonal variation. If you want to start your production run at a specific date, we recommend doing a spin up for the appropriate number of years plus the number of days needed to reach your ultimate start date. For example, if you want to do a production simulation starting on 2019/12/01, you could spin up the model for one year using the initial GEOS-FP restart file dated 2019/07/01 and then use the new restart file to spin up the model for five additional months, from 2019/07/01 to 2019/12/01.
See also this discussion on our Github page for further guidance: https://github.com/geoschem/geos-chem/discussions/911.
How do I check my initial conditions?
To ensure you are using the expected initial conditions for your simulation, please check the GEOS-Chem log file. You should see something like:
HEMCO: Opening ./Restarts/GEOSChem.Restart.20190701_0000z.nc4
- Found all CN met fields for 2011/01/01 00:00
- Found all A1 met fields for 2019/07/01 00:30
- Found all A3cld met fields for 2019/07/01 01:30
- Found all A3dyn met fields for 2019/07/01 01:30
- Found all A3mstC met fields for 2019/07/01 01:30
- Found all A3mstE met fields for 2019/07/01 01:30
- Found all I3 met fields for 2019/07/01 00:00
Initialize DELP_DRY from restart file
- Found all I3 met fields for 2019/07/01 03:00
===============================================================================
R E S T A R T F I L E I N P U T
Min and Max of each species in restart file [mol/mol]:
Species 1, ACET: Min = 1.000458833E-22 Max = 6.680149323E-09
Species 2, ACTA: Min = 6.574137699E-23 Max = 6.108235029E-10
Species 3, AERI: Min = 4.122849756E-16 Max = 1.213838925E-11
Species 4, ALD2: Min = 4.186668786E-23 Max = 4.571487633E-09
...
If a species is not found in the restart file, you may see something like:
Species 178, pFe: Use background = 9.999999683E-21
How are GEOS-Chem restart files written?
GEOS-Chem restart files are now saved via the History component. A Restart collection has been defined in HISTORY.rc and fields saved out to the restart file can be modified in that file.
For more information, please see our documentation about the Restart collection in GEOS-Chem History diagnostics. This documentation is currently on the GEOS-Chem wiki, but will be ported to ReadTheDocs in the near future.
HEMCO restart files
In this chapter, you will learn more about HEMCO restart files.
Do I need a HEMCO restart file for my initial spin-up run?
Using a HEMCO restart file for your initial spin up run is
optional. The HEMCO restart file contains fields for initializing
variables required for Soil NOx emissions, MEGAN biogenic emissions,
and the UCX chemistry quantities. The HEMCO restart file that comes
with a run directory may only be used for the date and time indicated
in the filename. HEMCO will automatically recognize when a restart
file is not available for the date and time required, and in that case
HEMCO will use default values to initialize those fields. You can also
force HEMCO to use the default initialization values by setting
HEMCO_RESTART
to false in HEMCO_Config.rc
.
For more information
Please see the HEMCO diagnostics (at hemco.readthedocs.io) for more information about restart files and other diagnostic outpus from HEMCO.
Download data with a dry-run simulation
Tip
If you are located at an institution with many other GEOS-Chem users, then the necessary input data might have already been downloaded and stored in a commmon directory on your system. Ask your sysadmin or IT support staff.
Another way to download and manage GEOS-Chem input data is with the bashdatacatalog tool.
A “dry-run” is a is a GEOS-Chem Classic simulation that steps through time, but does not perform computations or read data files from disk. Instead, a dry-run simulation prints a list of all data files that a regular GEOS-Chem simulation would have read. The dry-run output also denotes whether each data file was found on disk, or if it is missing. This output can be fed to a script which will download the missing data files to your computer system.
You may generate dry-run output for any of the GEOS-Chem Classic
simulation types (fullchem
, CH4
,
TransportTracers
, etc.)
In the following chapters, you will learn how to you can download data using the output from a dry-run simulation:
Execute a dry-run simulation
Follow the steps below to perform a GEOS-Chem Classic dry-run simulation:
Tip
Also be sure to watch our video tutorial Using the updated dry-run capability in GEOS-Chem 13.2.1 and later versions at our GEOS-Chem Youtube Channel, which will guide you through these steps.
Complete preliminary setup
Make sure that you have done the following steps;
Then doublecheck these settings in the following configuration files:
- geoschem_config.yml
start_date
: Set the start date and time for your simulation.end_date
: Setthe end date and time for your simulation.met_field
: Check if the meteorology setting (GEOS-FP
,MERRA2
,GCAP2
) is correct for your simulation.root_data_dir
: Make sure that the path toExtData
is correct.
- HISTORY.rc
Set the frequency and duration for the HISTORY diagnostic collections to be consistent with the settings in
geoschem_config.yml
.
- HEMCO_Config.rc
Check the Settings section to make sure that diagnostic frequency
DiagnFreq
: is set to the interval that you wish (e.g.Monthly
,Daily
,YYYYMMDD hhmmss
, etc).Check the Extension Settings section, to make sure all of the required emissions inventories and data sets for your simulation have been switched on.
Tip
You can reduce the amount of data that needs to be downloaded for your simulation by turning off inventories that you don’t need.
Run the executable with the --dryrun
flag
Run the GEOS-Chem Classic executable file at the command line with the --dryrun command-line argument as shown below:
$ ./gcclassic --dryrun | tee log.dryrun
The tee command will send the output of the dryrun to the
screen as well as to a file named log.dryrun
.
The log.dryrun
file will look somewhat like a regular
GEOS-Chem log file but will also contain a list of data files and
whether each file was found on disk or not. This information will be
used by the download_data.py
script in the next step.
You may use whatever name you like for the dry-run output
log file (but we prefer log.dryrun
). You will need this file
to download data (see the next chapter).
Download data from dry-run output
Once you have successfully executed a GEOS-Chem dry-run, you
can use the output from the dry-run (contained in the log.dryrun
file)
to download the data files that GEOS-Chem will need to perform the
corresponding “production” simulation. You may download from one of
several locations, which are described in the following sections.
Important
Before you use the download_data.py
script, make sure to
initialize a Mamba or Conda environment with the relevant command
shown below:
$ mamba activate ENV-NAME # If using Mamba
$ conda activate ENV-NAME # If using Conda
Here ENV-NAME
is the name of your environment.
Also make sure that you have installed the PyYAML module to your
conda environment. PyYAML will allow the download_data.py
script to read certain configurable settings from a YAML file in
your run directory.
The Python environment for GCPy has all of the proper packages that you need to download data from a dry-run simulation. For more information, please see gcpy.readthedocs.io.
Choose a data portal
You can download input data data from one of the following locations:
The geoschemdata.wustl.edu site (aka WashU)
If you are using GEOS-Chem on your institutional computer cluster, we recommend that you download data from the WashU (Washington University in St. Louis) site (http://geoschemdata.wustl.edu). This site, which is maintained by Randall Martin’s group at WashU, is the main data site for GEOS-Chem.
Tip
We have also set up a Globus endpoint named GEOS-Chem data (WashU) on the WashU site. If you need to download many years of data, it may be faster to use Globus (particularly if your home institution supports it).
The s3://gcgrid bucket (aka Amazon)
If you are running GEOS-Chem Classic on the Amazon Web Services cloud, you can quickly download the necessary data for your GEOS-Chem simulation from the :file:`s3://gcgrid` bucket to the Elastic Block Storage (EBS) volume attached to your cloud instance.
Navigate to your GEOS-Chem Classic run directory and type:
$ ./download data.py log.dryrun amazon
This will start the data download process using the aws s3 cp
commands, which should execute much more quickly than if you were to
download the data from another location. It will also produce a
log of unique data files.
Important
Copying data from s3://gcgrid
to the EBS volume of an
Amazon EC2 cloud instance is always free. But if you download data
from s3://gcgrid
to your own computer system, you will
incur an egress fee. PROCEED WITH CAUTION!
The atmos.earth.rochester.edu site (aka Rochester)
The U. Rochester site (which is maintained by Lee Murray’s research there) contains the GCAP 2.0 met field data. This met field data is useful if you wish to perform simulations stretching back into the preindustrial period, or running into the future.
To download data from the Rochester site, type:
$ ./download data.py log.dryrun rochester
Run the download_data.py script on the dryrun log file
Navigate to your GEOS-Chem run directory where you executed the dry-run and type:
$ ./download_data.py log.dryrun washu
The download_data.py
Python program is included in the
GEOS-Chem run directory that you created. This Python
program creates and executes a temporary bash script containing the
appropriate wget
commands to download the data files. (We have
found that this is the fastest method.)
The download_data.py
program will also generate a log of
unique data files (i.e. with all duplicate listings removed), which
looks similar to this:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! LIST OF (UNIQUE) FILES REQUIRED FOR THE SIMULATION
!!! Start Date : 20160701 000000
!!! End Date : 20160701 010000
!!! Simulation : standard
!!! Meteorology : GEOSFP
!!! Grid Resolution : 4.0x5.0
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
./GEOSChem.Restart.20160701_0000z.nc4 --> /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/GEOSCHEM_RESTARTS/v2018-11/initial_GEOSChem_rst.4x5_standard.nc
./HEMCO_Config.rc
./HEMCO_Diagn.rc
./HEMCO_restart.201607010000.nc
./HISTORY.rc
./input.geos
/n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/FJX_j2j.dat
/n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/FJX_spec.dat
/n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/dust.dat
/n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/h2so4.dat
/n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/jv_spec_mie.dat
... etc ...
This name of this “unique” log file will be the same as the log file
with dryrun ouptut, with .unique
appended. In our above
example, we passed log.dryrun
to download_data.py
, so
the “unique” log file will be named log.dryrun.unique
. This
“unique” log file can be very useful for documentation purposes.
Skip download, but create log of unique files
If you wish to only produce the *log of unique data files without downloading any data, then type the following command from within your GEOS-Chem run directory:
$ ./download_data.py log.dryrun --skip-download
or for short:
$ ./download_data.py log.dryrun --skip
This can be useful if you already have the necessary data downloaded to your system but wish to create the log of unique files for documentation purposes (such as for benchmark simulations, etc.)
Run your simulation
In the following chapters, you will learn how to run your GEOS-Chem Classic simulation on your computer system.
Create a run script
We recommend that you create a run script for your GEOS-Chem simulation. This is a bash script containing the commands to run GEOS-Chem.
A sample GEOS-Chem run script is provided for you in the GEOS-Chem Classic run directory. You can edit this script as necessary for your own computational system.
Navigate to your run directory. Then copy the
runScriptSamples/geoschem.run
sample run script into the run directory:
cp ./runScriptSamples/geoschem.run .
The geoschem.run
script looks like this:
#!/bin/bash
#SBATCH -c 8
#SBATCH -N 1
#SBATCH -t 0-12:00
#SBATCH -p MYQUEUE
#SBATCH --mem=15000
#SBATCH --mail-type=END
###############################################################################
### Sample GEOS-Chem run script for SLURM
### You can increase the number of cores with -c and memory with --mem,
### particularly if you are running at very fine resolution (e.g. nested-grid)
###############################################################################
# Set the proper # of threads for OpenMP
# SLURM_CPUS_PER_TASK ensures this matches the number you set with -c above
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Set the stacksize memory to the highest possible limit
ulimit -s unlimited
export OMP_STACKSIZE=500m
# Run GEOS-Chem. The "time" command will return CPU and wall times.
# Stdout and stderr will be directed to the "GC.log" log file
# (you can change the log file name below if you wish)
srun -c $OMP_NUM_THREADS time -p ./gcclassic > GC.log 2>&1
# Exit normally
exit 0
The sample run script contains commands for the SLURM scheduler, which is used on many HPC sytems.
Note
If your computer system uses a different scheduler (such as LSF or PBS), then you can replace the SLURM-specific commands with commands for your scheduler. Ask your IT staff for more information.
Important commands in the run script are listed below:
- #SBATCH -c 8
Tells SLURM to request 8 computational cores.
- #SBATCH -N 1
Tells SLURM to request 1 computational node.
Important
GEOS-Chem Classic uses OpenMP, which is a shared-memory parallelization model. Using OpenMP limits GEOS-Chem Classic to one computational node.
- #SBATCH -t 0-12:00
Tells SLURM to request 12 hours of computational time. The format is
D-hh:mm
or (days-hours:minutes
).
- #SBATCH -p MYQUEUE
Tells SLURM to run GEOS-Chem Classic in the computational partition named
MYQUEUE
. Ask your IT staff for a list of the available partitions on your system.
- #SBATCH --mem=15000
Tells SLURM to reserve 15000 MB (15 GB) of memory for the simulation.
- #SBATCH --mail-type=END
Tells SLURM to send an email upon completion (successful or unsuccesful) of the simulation.
- export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Specifies how many computational cores that GEOS-Chem Classic should use. The environment variable
SLURM_CPUS_PER_TASK
will fill in the number of cores requested (in this example, we used#SBATCH -c 8
, which requests 8 cores).
- ulimit -s unlimited
Tells the bash shell to remove any restrictions on stack memory. This is the place in GEOS-Chem’s memory where temporary variables (including PRIVATE variables for OpenMP parallel loops) get created.
- export OMP_STACKSIZE=500m
Tells the GEOS_Chem executable to use as much memory as it needs for allocating PRIVATE variables in OpenMP parallel loops.
- srun -c $OMP_NUM_THREADS
Tells SLURM to run the GEOS-Chem Classic executable using the number of cores specified in
OMP_NUM_THREADS
.
- time -p ./gcclassic > GC.log 2>&1
Executes the GEOS-Chem Classic executable and pipes the output (both stdout and stderr streams) to a file named
GC.log
.The
time -p
command will print the amount of time (both CPU time and wall time) that the simulation took to complete to the end ofGC.log
.
Complete this pre-run checklist
Now that you have created a run script for GEOS-Chem Classic, take a moment to make sure that you have completed all required setup steps before running your simulation.
First-time setup
Each-time setup
Make sure that you have properly configured your login environment (i.e. load necessary software modules after login, etc.)
Create a GEOS-Chem Classic run directory, and make sure that it is correct for the simulation you wish to perform.
Attention
The initial restart file that is included with your run directory does not reflect the actual atmospheric state and should only be used to “spin-up” the model. We recommend a spin-up period of 6 months to 1 year (depending on the type of simulation you are using).
Edit configuration files to specify the runtime behavior of GEOS-Chem Classic..
Attention
Prior to running with
GEOS-FP
met fields, please be aware of several caveats regarding that data stream. (cf. The GEOS-FP wiki page).Configure and build the source code into an executable file.
Copy a sample GEOS-Chem Classic run script to your run directory and edit it for the particulars of your simulation and computer system.
Make sure that your run script contains the proper settings for OpenMP parallelization.
Be aware of ways in which you can speed up your GEOS-Chem Classic simulations.
Submit your run script to a scheduler
Many shared computer systems use a scheduler to determine in which order submitted jobs will run.
If your computer system uses the SLURM scheduler, then you can use the following command to submit your GEOS-Chem Classic run script to a computational queue:
sbatch geoschem.run
The SLURM scheduler will then decide when your job starts based on paramaters such as current load on the system and past cluster usage (sometimes known as fairshare). If there is high demand on the cluster, your job may remain in pending state for a few hours (or sometimes days!) before it starts.
If your computer system uses a different scheduler (e.g. LSF, PBS, etc.) then ask your sysadmin or IT staff about the commands that are needed to submit jobs.
Or run the script from the command line
If your computer system does not use a scheduler, or if you are logged into an Amazon Web Services (AWS) cloud instance, then you can run GEOS-Chem Classic directly from the terminal command line.
Here is a sample run script for interactive use
(geoschem-int.sh
). It is similar to the
run script shown previously, with a few edits:
#!/bin/bash
###############################################################################
### Sample GEOS-Chem run script for interactive use
###############################################################################
# Set the proper # of threads for OpenMP
export OMP_NUM_THREADS=8
# Max out stack memory available to GEOS-Chem
ulimit -s unlimited
export OMP_STACKSIZE=500m
# Run GEOS-Chem. The "time" command will return CPU and wall times.
# Stdout and stderr will be directed to the "GC.log" log file
# (you can change the log file name below if you wish)
time -p ./gcclassic > GC.log 2>&1
# Exit normally
exit 0
The modifications entail:
Removing the SLURM-specific commands (i.e.
#SBATCH
,$SLURM_CPUS__PER_TASK
, andsrun
).
Manually specifying the number of cores that you wish GEOS-Chem to use (
export $OMP_NUM_THREADS=8
).Note
If you are logged into an AWS cloud instance, you can add
export OMP_NUM_THREADS=`ncpus`
to the run script. This will automatically set
OMP_NUM_THREADS
to the available number of cores.
To run GEOS-Chem interactively, type:
$ ./geoschem.run > GC.log 2>&1 &
This will run the job in the background. To monitor the progress of the job you can type:
$ tail -f GC.log
which will show the contents of the log file as they are being written.
Another way to view output from GEOS-Chem in real time is to use the tee command . This will print output to the screen and also send the same output to a log file. Type:
$ ./geoschem.run | tee GC.log
Verify a successful simulation
There are several ways to verify that your GEOS-Chem Classic run was successful:
The following output can be found at the end of the log file:
************** E N D O F G E O S -- C H E M **************
NetCDF files (e.g.
OutputDir/GEOSChem*.nc4
andOutputDir/HEMCO*.nc
) are present.
Restart files (e.g.
GEOSChem.Restart.YYYYMMDD_hhmmz.nc4
andHEMCO_restart.YYYYMMDDhh.nc
) for ending dateYYYYMMDD hhmm
are present.
Your scheduler log file (e.g.
slurm-xxxxx.out
, wherexxxxx
is the job id) is free of errors.
If your run stopped with an error, please see the following resources:
Minimize differences in multi-stage runs
If you need to split up a very long simulation (e.g. 1 model year or more) into multiple stages, keep these guidelines in mind:
Make sure
GC_RESTART
andHEMCO_RESTART
options are set totrue:
in HEMCO_Config.rc.
To ensure your restart_files are read and species concentrations are properly initialized, check your GEOS-Chem log file for the following output:
=============================================================================== R E S T A R T F I L E I N P U T Min and Max of each species in restart file [mol/mol]:`` Species 1, NO: Min = 1.000000003E-30 Max = 1.560991691E-08 Species 2, O3: Min = 3.135925075E-09 Max = 9.816152669E-06 Species 3, PAN: Min = 3.435056848E-25 Max = 1.222619450E-09 ...
Actual values may differ. If you see
Use background = ...
for most or all species, that suggests your restart file was not found. To avoid using the wrong restart file make sure to use time cycle flagEY
in HEMCO_Config.rc (cf. How are restart files read into GEOS-Chem?).
Speed up a slow simulation
GEOS-Chem Classic performance is continuously monitored by the GEOS-Chem Support Team by means of benchmark simulations and ad-hoc timing tests. It has been shown that running GEOS-Chem with recommended timesteps from Philip et al. [2016]. can increase run times by approximately a factor of 2. To speed up GEOS-Chem Classic simulations, users may choose to use any of the following options.
Use coarser timesteps
As discussed previously, the default timesteps for GEOS-Chem Classic are 600 seconds for dynamics, and 1200 seconds for chemistry and emissions. You can experiment with using coarser timesteps (such as 1800 seconds for dynamics and 3600 seconds for emissions & chemistry).
Attention
For nested-grid simulations, you might not be able to use coarser timesteps, or else the Courant limit in transport will be violated.
Turn off unwanted diagnostics
Several diagnostics are turned on by default in the HISTORY.rc configuration file. The more diagnostics that are turned on, the more I/O operations need to be done, resulting in longer simulation execution times. Disabling diagnostics that you do not wish to archive can result in a faster simulation.
Disable debugging options
If you previously configured GEOS-Chem with the :
CMAKE_BUILD_TYPE
option set to Debug
, then several
run-time debugging checks will be activated. These include:
Checking for array-out-of-bounds errors
Checking for floating-point math exceptions (e.g. div-by-zero)
Disabling compiler optimizations
These options can be useful in detecting errors in your GEOS-Chem
Classic simulation, but result in a much slower simulation. If you
plan on running a long Classic simulation, make sure that
you configure and build GEOS-Chem Classic
so that CMAKE_BUILD_TYPE
is set to Release
.
View output files
In this chapter, you will learn more about the output files that are generated by GEOS-Chem Classic simulations.
Log files
Log files redirect the output of Fortran PRINT*
or
WRITE
statements to a file. You can check the log files for an
“echo-back” of simulation options, as well as error messages.
GEOS-Chem and HEMCO log file
File name: GC.log
(or similar)
Contains an “echo-back” of input options that were specified in geoschem_config.yml and HISTORY.rc, as well as information about what is happening at each GEOS-Chem timestep. If your GEOS-Chem Classic simulation dies with an error, a detailed error message will be printed in this log file.
In GEOS-Chem 14.1.0 and later versions, information about emissions,
met fields, and other relevant data that are read from disk and
processed by HEMCO is now sent to
this log file (instead of to HEMCO.log
).
GEOS-Chem log file with dry-run output
File name: log.dryrun
(or similar)
Contains the full path names of all input files (configuration files, meteorology files, emissions files) that are read by GEOS-Chem. This will allow users to download only those files that their GEOS-Chem simulation requires, thus speeding up the data downloading process.
For more information, please see the dry run chapter.
GEOS-Chem species metadata log
File name: OutputDir/geoschem_species_metadata.yml
Contains metadata (taken from the GEOS-Chem species database) in YAML format for only those species that are used in the simulation. This facilitates coupling GEOS-Chem to other Earth System Models.
Timers log file
File name: gcclassic_timers.json
(in JSON format).
The timers log file is created when you set use_gcclassic_timers:
true
in the Simulation Settings section of geoschem_config.yml. It contains “wall-clock” times that measure how
long each component of GEOS-Chem took to execute. This information is
used by the GEOS-Chem benchmarking scripts that execute on the
Amazon cloud computing platform.
Scheduler log file
File name: Specific to each scheduler.
If you used a batch scheduler such as SLURM, PBS, LSF, etc. to submit your GEOS-Chem Classic simulation, then output from the Unix stdout and/or stderr streams may be printed to this file. This file may contain important error messages.
History diagnostics output
GEOS-Chem History diagnostics are comprised of several diagnostic collections. Each diagnostic collection contains a series of diagnostic fields that may be archived from a GEOS-Chem Classic simulation.
In the HISTORY.rc configuration file (which is located in your GEOS-Chem Classic run directory, you will find a list of default diagnostic collections. These are collections that have been predefined for you. You may edit the HISTORY.rc configuration file to select which diagnostic collections you wish to archive from your GEOS-Chem Classic simulation. You may also define your own custom diagnostic collectinons.
The filenames listed below correspond to the default diagnostic collections in the HISTORY.rc file.
History output file |
Diagnostic collection |
Used in simulations |
---|---|---|
|
||
|
||
|
||
|
Nested-grid simulations |
|
|
||
|
All simulations |
|
|
||
|
||
|
All simulations with dry- depositing species |
|
|
||
|
||
|
All simulations |
|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
All simulations |
|
|
||
|
All simulations |
|
|
All simulations |
|
|
All simulations |
|
|
All simulations |
|
|
All simulations with wet-depositing species |
|
|
All simulations with wet-depositing species |
Other diagnostic output files
HEMCO diagnostic output
HEMCO diagnostics
generate netCDF-format files in the OutputDir/
subdirectory of your GEOS-Chem run directory. You may
change this filepath by editing the HEMCO_Config.rc configuration
file.
HEMCO diagnostic files use the naming convention
HEMCO_diagnostics.YYYYMMDDhhmm.nc
, where YYYYMMDD
and hhmm
refer to the model date and time at which each
file was created.
For more information, please see our HEMCO user manual at hemco.readthedocs.io.
Planeflight diagnostic output
The GEOS-Chem plane-following diagnostic generates text files in the top-level of your GEOS-Chem Classic run directory. You may change this filepath by editing the planeflight section of geoschem_config.yml.
Planeflight diagnostic files use the naming convention
plane.log.YYYYMMDDhhmm
, where YYYYMMDD
refers to
the model date at which each diagnostic file is created.
ObsPack diagnostic output
The GEOS-Chem ObsPack diagnostic generates netCDF-format files in the top-level of your GEOS-Chem Classic run directory. You may change this filepath by editing the ObsPack section of geoschem_config.yml.
ObsPack diagnostic files use the naming convention
GEOS-Chem.ObsPack.YYYYMMDD_hhmmz.nc4
, where
YYYYMMDD
and hhmm
refers to the model date
a which each diagnostic file is created, and z
refers to
UTC (aka Zulu time).
GEOS-Chem Classic and HEMCO also create restart files, which have been discussed previously.
Diagnostics reference
In the following chapters, you will learn about the types of diagnostic outputs you can generate with GEOS-Chem Classic.
GEOS-Chem History diagnostics
At present, our Guide to GEOS-Chem History diagnostics is still located on the GEOS-Chem wiki. We will be migrating this information over to ReadTheDocs very soon.
HEMCO diagnostics
Information about diagnostic output from HEMCO (the Harmonized Emissions Component) may be found on our HEMCO ReadTheDocs site.
Planeflight diagnostic
On this page we provide information about the GEOS-Chem planeflight diagnostic, which allows you to save certain diagnostic quantities along flight tracks or at the position of ground observations. This can be more efficient in terms of storage than saving out 3-D data files via the GEOS-Chem History diagnostics.
Attention
Several diagnostic quantities were disabled when the SMVGEAR chemistry solver was replaced with the FlexChem implementation of KPP (cf: Update chemical mechanisms with KPP). Therefore, you may find that functionality is not currently working. We look to GEOS-Chem community members to help us maintain the planeflight diagnostic.
The Planeflight.dat.YYYYMMDD configuration file
The Planeflight.dat.YYYYMMDD
files allow you to specify the
diagnostic quantities (species, reaction rates, met fields) that you
want to print out for a specific longitude, latitude, altitude, and
time. A sample Planeflight.dat.YYYYMMDD
file is given below. Of
course if you have lots of flight track data points, your file will be
much longer.
If the plane flight following diagnostic is switched on, then it will
look for a new Planeflight.dat.YYYYMMDD
file each day. If a
Planeflight.dat.YYYYMMDD
:` file is found for a given day, then
GEOS-Chem will save out diagnostic quantities along the flight
track(s) to the plane.log.YYYYMMDD
file.
Format
Planeflight.dat -- Input file for planeflight diagnostic
GCST
July 2018
-----------------------------------------------------------
9 <-- # of variables to be output (listed below)
-----------------------------------------------------------
TRA_001
TRA_002
TRA_003
GMAO_TEMP
GMAO_ABSH
GMAO_RELH
GMAO_IIEV
GMAO_JJEV
GMAO_LLEV
-----------------------------------------------------------
Now give the times and locations of the flight
-----------------------------------------------------------
Point Type DD-MM-YYYY HH:MM LAT LON ALT/PRE OBS
1 Scrz 30-06-2012 13:53 -46.43 51.85 202.00 1765.030
2 Scrz 30-06-2012 13:53 -46.43 51.85 202.00 1765.060
3 Sush 30-06-2012 16:25 -54.85 -68.31 32.00 1764.750
4 Sush 30-06-2012 16:25 -54.85 -68.31 32.00 1765.610
5 Sllb 30-06-2012 17:13 54.95 -112.45 588.00 1891.200
6 Sllb 30-06-2012 17:13 54.95 -112.45 588.00 1891.310
99999 END 00-00-0000 00:00 0.00 0.00 0.00 0.000
The data in this text file can be read and plotted using GAMAP routines CTM_READPLANEFLIGHT and PLANE_PLOT.
Requesting diagnostic quantities
The first part of the Planeflight.dat.YYYYMMDD
file allows you
to request several diagnostic quantities that you would like to be
archived along the plane’s flight track. These are listed in the
table below.
You must make sure that you have specified the number of requested quantities properly, or you will get an input error.
Important
Several planeflight diagnostic quantities had to be disabled when the SMVGEAR chemical solver was replaced by the FlexChem implementation of the KPP chemical solver. Therefore, you may find that not all of the planeflight diagnostic quantities listed below are still functional. Please report any issues to the GEOS-Chem Support Team by opening a new Github issue.
Quantity |
Description |
Units |
---|---|---|
|
Species concentration (nnn = species index) |
v/v dry |
|
Species concentration |
molec/cm3 |
|
Concentration of RO2 family |
v/v dry |
|
Concentration of AN family |
v/v dry |
|
Concentration of NOy family |
v/v dry |
|
Temperature |
K |
|
Absolute humidity |
unitless |
|
Aerosol surface area |
cm2/cm3 |
|
Surface pressure |
hPa |
|
Zonal winds |
m/s |
|
Meridional winds |
m/s |
|
Longitude index |
unitless |
|
Latitude index |
unitless |
|
Level index |
unitless |
|
Relative humidity |
% |
|
Sea level pressure |
hPa |
|
Water vapor mixing ratio |
v/v |
|
Potential temperature |
K |
|
Pressure at center of grid box |
hPa |
|
SEAICEnn fields |
unitless |
|
Column AOD, sulfate |
unitless |
|
Column AOD, black carbon |
unitless |
|
Column AOD, organic carbon |
unitless |
|
Column AOD, fine sea salt |
unitless |
|
Column AOD, coarse sea salt |
unitless |
|
Column AOD, dust |
unitless |
|
Column AOD, sulfate (below aircraft) |
unitless |
|
Column AOD, black carbon (below aircraft) |
unitless |
|
Column AOD, organic carbon (below aircraft) |
unitless |
|
Column AOD, fine sea salt (below aircraft) |
unitless |
|
Column AOD, coarse sea salt (below aircraft) |
unitless |
|
Column AOD, dust (below the aircraft) |
unitless |
|
Nucleation rates (TOMAS) |
|
|
Frac of Hg(II) in gas phase |
unitless |
|
Frac Hg(II) in particle phase |
unitless |
|
ISORROPIA H+ |
M |
|
ISORROPIA pH |
unitless |
|
ISORROPIA aerosol water |
ug/m3 air |
|
ISORROPIA bifulfate |
M |
|
Local time |
hours |
|
Aqueous aerosol radius |
cm |
|
Aqueous aerosol surface area |
cm2/cm3 |
|
Production rates (needs updating) |
molec/cm3/s |
|
Reaction rates (Needs updating) |
molec/cm3/s |
Specifying the flight track
The next section of the Planeflight.dat.YYYYMMDD
file is where
you will specify the points that make up the flight track.
Quantity |
Description |
---|---|
|
A sequential index of flight track points. |
|
Identifier for the plane (or station) |
|
Day of the observation |
|
Month of the observation |
|
Year of the observation |
|
Hour of the observation (UTC) |
|
Minute of the observation (UTC) |
|
Latitude (deg), range -90 to +90 |
|
Longitude (deg), range -180 to +180 |
|
Altitude [m] or Pressure [hPa] of the observation |
|
Value of the observation (if known), used to compare to model output |
Important
The TYPE
column can be used to specify the aircraft type
and flight number to distinguish between multiple plane flight tracks.
The planeflight diagnostic will automatically set L=1
if
it does not recognize TYPE
. When using a new flight track,
make sure to add your TYPE
to this IF statement
if you do not wish to use L=1 for that type value.
The plane.log.YYYYMMDD output file
The plane.log.YYYYMMDD
file contains output from the
planeflight diagnostic.
Format
POINT TYPE YYYYMMDD HHMM LAT LON PRESS OBS T-IND P-I I-IND J-IND TRA_001 GMAO_TEMP ...
1 Scrz 20120630 1353 -46.43 51.85 981.74 1765.030 000061277 002 00047 00012 1.785E-006 2.780E+002 ...
2 Scrz 20120630 1353 -46.43 51.85 981.74 1765.060 000061277 002 00047 00012 1.785E-006 2.780E+002 ...
3 Sush 20120630 1625 -54.85 -68.31 949.77 1764.750 000061281 002 00023 00010 1.784E-006 2.746E+002 ...
4 Sush 20120630 1625 -54.85 -68.31 949.77 1765.610 000061281 002 00023 00010 1.784E-006 2.746E+002 ...
5 Sllb 20120630 1713 54.95 -112.45 876.13 1891.200 000061283 005 00015 00037 1.906E-006 2.942E+002 ...
6 Sllb 20120630 1713 54.95 -112.45 876.13 1891.310 000061283 005 00015 00037 1.906E-006 2.942E+002 ...
Columns
Column |
Description |
---|---|
|
Flight track data point number (for reference) |
|
Aircraft/flight number ID or ground station ID |
|
Year, month, and day (UTC or each flight track point |
|
Hour and minute (UTC) for each flight track point |
|
Latitude (-90 to 90 degrees) for each flight track point |
|
Longitude (-180 to 180 degrees) for each flight track point |
|
Pressure in hPa for each flight track point |
|
Observation value from the flight campaign |
|
Time index |
|
GEOS-Chem level index |
|
GEOS-Chem longitude index |
|
GEOS-Chem latitude index |
|
Diagnostic quantities
requested in
|
ObsPack diagnostic
On this page we provide information about the ObsPack diagnostic in GEOS-Chem, which is intended to sample GEOS-Chem data at specified coordinates and times (e.g. corresponding to surface or flight track measurements). This feature was written by Andy Jacobson of NOAA and Andrew Schuh of Colorado State University and implemented in GEOS-Chem 12.2.0.
Specifying ObsPack diagnostic options
The ObsPack Menu section of input.geos is where you define the following settings:
Turning ObsPack or off
Specifying which GEOS-Chem species you would like to archive.
At present, you can archive individual species, or all advected species.
Specifying the names of input files
These are the files from which coordinate data will be read)
Specifying the names of output files
These are the files that will contain GEOS-Chem data sampled by the ObsPack diagnostic.
Input file format
The ObsPack diagnostic reads input information (such as coordinates, sampling method, and observation ID’s) from netCDF files having the format shown below. You will need to prepare an input file for each YYYY/MM/DD date on which you would like to obtain ObsPack diagnostic output from GEOS-Chem. (The ObsPack diagnostic will skip over days on which it cannot find an input file.)
ObsPack input files can be downloaded from NOAA (see https://www.esrl.noaa.gov/gmd/ccgg/obspack/).
Attention
Starting in ObsPack v6, time_components
indicates the
start-time of the sampling interval, not the center time. For the
center time, we need to read the time
variable. The
time
variable represents the center of the averaging window
in all ObsPack data versions.
Obspack file metadata
Here is the metadata from an ObsPack data file. We have only displayed the variables that the ObsPack module needs to read.
netcdf obspack_co2_1_GLOBALVIEWplus_v6.0_2020-09-11.20190408 {
dimensions:
obs = UNLIMITED ; // (3111 currently)
calendar_components = 6 ;
string_of_200chars = 200 ;
string_of_500chars = 500 ;
variables:
int obs(obs) ;
obs:long_name = "obs" ;
obs:_Storage = "chunked" ;
obs:_ChunkSizes = 1024 ;
obs:_Endianness = "little" ;
int time(obs) ;
time:units = "Seconds since 1970-01-01 00:00:00 UTC" ;
time:_FillValue = -999999999 ;
time:long_name = "Seconds since 1970-01-01 00:00:00 UTC" ;
time:_Storage = "chunked" ;
time:_ChunkSizes = 778 ;
time:_DeflateLevel = 5 ;
time:_Endianness = "little" ;
...
float latitude(obs) ;
latitude:units = "degrees_north" ;
latitude:_FillValue = -1.e+34f ;
latitude:long_name = "Sample latitude" ;
latitude:_Storage = "chunked" ;
latitude:_ChunkSizes = 778 ;
latitude:_DeflateLevel = 5 ;
latitude:_Endianness = "little" ;
float longitude(obs) ;
longitude:units = "degrees_east" ;
longitude:_FillValue = -1.e+34f ;
longitude:long_name = "Sample longitude" ;
longitude:_Storage = "chunked" ;
longitude:_ChunkSizes = 778 ;
longitude:_DeflateLevel = 5 ;
longitude:_Endianness = "little" ;
float altitude(obs) ;
altitude:units = "meters" ;
altitude:_FillValue = -1.e+34f ;
altitude:long_name = "sample altitude in meters above sea level" ;
altitude:comment = "Altitude is surface elevation plus sample intake height in meters above sea level." ;
altitude:_Storage = "chunked" ;
altitude:_ChunkSizes = 778 ;
altitude:_DeflateLevel = 5 ;
...
char obspack_id(obs, string_of_200chars) ;
obspack_id:long_name = "Unique ObsPack observation id" ;
obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
obspack_id:_Storage = "chunked" ;
obspack_id:_ChunkSizes = 1, 200 ;
obspack_id:_DeflateLevel = 5 ;
...
int CT_sampling_strategy(obs) ;
CT_sampling_strategy:_FillValue = -9 ;
CT_sampling_strategy:long_name = "model sampling strategy" ;
CT_sampling_strategy:values = "How to sample model. 1=4-hour avg; 2=1-hour avg; 3=90-min avg; 4=instantaneous" ;
CT_sampling_strategy:_Storage = "chunked" ;
CT_sampling_strategy:_ChunkSizes = 778 ;
CT_sampling_strategy:_DeflateLevel = 5 ;
CT_sampling_strategy:_Endianness = "little"
... omitting global attributes etc. ...
Notes
The ObsPack ID string should be 200 characters long.
If you have coordinate data in another format (e.g. a text-based Planeflight.dat file) then you’ll need to create a netCDF file using the format shown above, or else ObsPack will not be able to read it.
Output file format
The ObsPack diagnostic will produce a file called
GEOSChem.ObsPack.YYYYMMDD_hhmmz.nc4
for each day where an
input file has been specified. (You can change
the output file name in the ObsPack Menu in input.geos
.
Below is shown an ObsPack output file for the GEOS-Chem methane simulation. If you are using the ObsPack diagnostic with other GEOS-Chem simulations, your output files will look similar to this, except for the species names.
netcdf GEOSChem.ObsPack.20180926_0000z.nc4 {
dimensions:
obs = UNLIMITED ; // (662 currently)
species = 1 ;
char_len_obs = 200 ;
variables:
char obspack_id(obs, char_len_obs) ;
obspack_id:long_name = "obspack_id" ;
obspack_id:units = "1" ;
int nsamples(obs) ;
nsamples:long_name = "no. of model samples" ;
nsamples:units = "1" ;
nsamples:comment = "Number of discrete model samples in average" ;
int averaging_interval(obs) ;
averaging_interval:long_name = "Amount of model time over which this observation is averaged" ;
averaging_interval:units = "seconds" ;
int averaging_interval_start(obs) ;
averaging_interval_start:long_name = "Start of averaging interval" ;
averaging_interval_start:units = "seconds since 1970-01-01 00:00:00 UTC" ;
averaging_interval_start:calendar = "standard" ;
int averaging_interval_end(obs) ;
averaging_interval_end:long_name = "End of averaging interval" ;
averaging_interval_end:units = "seconds since 1970-01-01 00:00:00 UTC" ;
averaging_interval_end:calendar = "standard" ;
float lon(obs) ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
float lat(obs) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
float u(obs) ;
u:long_name = "Zonal component of wind" ;
u:units = "m s^-1" ;
float v(obs) ;
v:long_name = "Meridional component of wind" ;
v:units = "m s^-1" ;
float blh(obs) ;
blh:long_name = "Boundary layer height" ;
blh:units = "m" ;
float q(obs) ;
q:long_name = "mass_fraction_of_water_inair" ;
q:units = "kg water (kg air)^-1" ;
float pressure(obs) ;
pressure:long_name = "pressure" ;
pressure:units = "Pa" ;
float temperature(obs) ;
temperature:long_name = "temperature" ;
temperature:units = "K" ;
float CH4(obs) ;
CH4:long_name = "Methane" ;
CH4:units = "mol mol-1" ;
CH4:_FillValue = -1.e+34f ;
// global attributes:
:history = "GEOS-Chem simulation at 2019/01/11 14:54" ;
:conventions = "CF-1.4" ;
:references = "www.geos-chem.org; wiki.geos-chem.org" ;
:model_start_date = "2018/09/26 00:00:00 UTC" ;
:model_end_date = "2018/09/27 00:00:00 UTC" ;
}
You can several different types of netCDF-reading software to read and plot data from Obspack diagnostic output files. We recommend using either Python scripts or Jupyter notebooks.
Known issues
Unit conversions are currently done for all species
In routine ObsPack_Sample
(located in module
ObsPack/obspack_mod.F90
), the following algorithm is used:
! Ensure that units of species are "v/v dry", which is dry=
! air mole fraction. Capture the InUnit value, this is=
! what the units are prior to this call. After we sample=
! the species, we'll call this again requesting that the=
! species are converted back to the InUnit values.=
... THEN DO THE DATA SAMPLING ...............................................
... i.e. determine which GEOS-Chem grid boxes to include in the averaging ...
! Return State_Chm%SPECIES to whatever units they had
! coming into this routine
call Convert_Spc_Units( am_I_root, Input_Opt, State_Met, &
The routine Convert_Spc_Units
performs unit conversions for
all of the species in the State_Chm%Species
array, regardless
of whether they are being archived with ObsPack or not. This can lead
to a bottleneck in performance, as ObsPack_Sample
is called on
every GEOS-Chem “heartbeat” timestep.
What would be more efficient would be to do the unit conversion only for hose species that are being archived by ObsPack. A typical full-chemistry simulation includes about 200 species. But if we are only using ObsPack to archive 10 of these species, GEOS-Chem would execute much faster if we were doing unit conversions for only the 10 archived species instead of all 200 species.
This issue is currently unresolved.
GEOS-Chem Classic folder tree
The tables below list the folders in which various components of GEOS-Chem and HEMCO reside.
GEOS-Chem folder tree
- GCClassic/src/GEOS-Chem
Root folder for the GEOS-Chem “science codebase”.
- GCClassic/src/GEOS-Chem/GTMM
Contains the GTMM (Global Terrestrial Mercury Model) source code. (NOTE: This option has fallen into disuse.)
- GCClassic/src/GEOS-Chem/GeosCore
Contains most GEOS-Chem modules & routines
- GCClassic/src/GEOS-Chem/GeosRad
Contains the RRTMG radiative transfer model source code.
- GCClassic/src/GEOS-Chem/GeosUtil
Contains GEOS-Chem utility modules & routines (for error handling, string handling, etc.)
- GCClassic/src/GEOS-Chem/Headers
Contains modules for with derived-type definitions for state objects, fixed parameter settings, etc.
- GCClassic/src/GEOS-Chem/History
Contains modules & routines for archiving GEOS-Chem diagnostics to netCDF-format output.
- GCClassic/src/GEOS-Chem/ISORROPIA
Contains the ISORROPIA II source code, which is used for aerosol thermodynamical equilibrium computations.
- GCClassic/src/GEOS-Chem/KPP
Main folder for chemical mechanisms built with KPP-for-GEOS-Chem.
- GCClassic/src/GEOS-Chem/NcdfUtil
Contains modules & routines for netCDF file I/O.
- GCClassic/src/GEOS-Chem/ObsPack
Contains modules & routines for generating GEOS-Chem diagnostic output at the same locations of NOAA ObsPack observational stations.
- GCClassic/src/GEOS-Chem/PKUCPL
Contains the coupler code for the PKU 2-way nesting algorithm. (This option has fallen into disuse.)
- `GEOS-Chem/Interfaces/GCClassic
Contains the GCClassic driver program (main.F90).
HEMCO folder tree
- GCClassic/src/HEMCO/src/Core
Contains modules for reading, storing, and updating data.
- GCClassic/src/HEMCO/src/Extensions
Contains modules for calculating emissions that depend on meterological variables or parameterizations.
- GCClassic/src/HEMCO/src/Interfaces
Contains modules and routines for linking HEMCO to GEOS-Chem Classic and other external models.
Contains various modules with utility routines (such as for netCDF I/O, regridding, string handling, etc.)
Sample GEOS-Chem run scripts
Here are some sample run scripts that you can adapt for your own purposes.
For clusters using the Slurm scheduler
Here is a sample GEOS-Chem run script for computational clusters that use the SLURM scheduler to control jobs:
Run script for Slurm
Save this code to a file named geoschem.run.slurm
:
#!/bin/bash
#SBATCH -c 24
#SBATCH -N 1
#SBATCH -t 0-12:00
#SBATCH -p my-queue_name
#SBATCH --mem=30000
#SBATCH --mail-type=END
###############################################################################
### Sample GEOS-Chem run script for SLURM
### You can increase the number of cores with -c and memory with --mem,
### particularly if you are running at very fine resolution (e.g. nested-grid)
###############################################################################
# Load your bash-shell customizations
source ~/.bashrc
# Load software modules
# (this example is for GNU 10.2.0 compilers)
source ~/gcclassic.gnu10.env
# Set the proper # of threads for OpenMP
# SLURM_CPUS_PER_TASK ensures this matches the number you set with -c above
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# NOTE: If the environment file does not max out the available
# stack memory for GEOS-Chem, you may uncomment these lines here:
#ulimit -s unlimited
#export OMP_STACKSIZE=500m
# Run GEOS_Chem. The "time" command will return CPU and wall times.
# Stdout and stderr will be directed to the "GC.log" log file
# (you can change the log file name below if you wish)
srun -c $OMP_NUM_THREADS time -p ./gcclassic >> GC.log
Important
If you forget to define OMP_NUM_THREADS
in
your run script, then GEOS-Chem Classic will
execute using one core. This can cause your
simulations to take much longer than is necessary!
Then make geoschem.run.slurm
executable:
$ chmod 755 geoschem.run.slurm
For more information about how Slurm is set up on your particular cluster, ask your sysadmin.
Submitting jobs with Slurm
To schedule a GEOS-Chem Classic job with Slurm, use this command:
$ sbatch geoschem.run.slurm
For Amazon Web Services EC2 cloud instances
When you log into an Amazon Web Services EC2 instance, you will receive an entire node with as many vCPUs as you have requested. A vCPU is equivalent to the number of computational cores. Most cloud instances have twice as many vCPUs as physical CPUs (i.e. each CPU chip has 2 cores).
Tip
To find out how many vCPUs are available in your instance, you can use then nproc command.
Run script for Amazon EC2
Save the code below to a file named geoschem.run.aws
:
#!/bin/bash
###############################################################################
### Sample GEOS-Chem run script for Amazon Web Services EC2 instances
###############################################################################
# Load your bash-shell customizations
source ~/.bashrc
### NOTE: We do not have to load an environment file
### because all libraries are contained in the Amazon
### Machine Image (AMI) used to initialize the instance.
# In an AWS cloud instance, you own the entire node, so there is no need
# to use a scheduler like SLURM. You can just use the `nproc` command
# to specify the number of cores that GEOS-Chem should use.
export OMP_NUM_THREADS=$(nproc)
# NOTE: If your `/.bashrc file does not max out the available
# stack memory for GEOS-Chem, you may uncomment these lines here:
#ulimit -s unlimited
#export OMP_STACKSIZE=500m
# Run GEOS_Chem. The "time" command will return CPU and wall times.
# Stdout and stderr will be directed to the "GC.log" log file
# (you can change the log file name below if you wish)
time -p ./gcclassic >> GC.log 2>&1
And then make the geoschem.run.aws
file executable:
$ chmod 755 geoschem.run.aws
Running jobs on AWS
When you are on an AWS EC2 instance, you own the entire node, so it is not necessary to use a scheduler. You can run your GEOS-Chem job in with this command:
$ ./geoschem.run.aws &
This will run your job in the background and send all output
(i.e. program output and error output) to log
.
Load software into your environment
This supplemental guide describes the how to load the required software dependencies for GEOS-Chem and HEMCO into your computational environment.
On the Amazon Web Services Cloud
All of the required software dependencies for GEOS-Chem and HEMCO will be included in the Amazon Machine Image (AMI) that you use to initialize your Amazon Elastic Cloud Compute (EC2) instance. For more information, please see our our GEOS-Chem cloud computing tutorial.
Build required software with Spack
This page has instructions for building dependencies for GEOS-Chem Classic, GCHP, and HEMCO These are the software libraries that are needed to compile and execute these programs.
Before proceeding, please also check if the dependencies for GEOS-Chem, GCHP, and HEMCO are already found on your computational cluster or cloud environment. If this is the case, you may use the pre-installed versions of these software libraries and won’t have to install your own versions.
For more information about software dependencies, see:
Introduction
In the sections below, we will show you how to build a single software environment containing all software dependencies for GEOS-Chem Classic, GCHP, and HEMCO. This will be especially of use for those users working on a computational cluster where these dependencies have not yet been installed.
We will be using the Spack package manager to download and build all required software dependencies for GEOS-Chem Classic, GCHP and HEMCO.
Note
Spack is not the only way to build the dependencies. It is possible to download and compile the source code for each library manually. Spack automates this process, thus it is the recommended method.
You will be using this workflow:
Install Spack and do first-time setup
Decide where you want to install Spack (aka the Spack root directory). A few details you should consider are:
The Spack root directory will be ~5-10 GB. Keep in mind that some computational clusters restrict the size of your home directory (aka
${HOME}
) to a few GB).
This Spack root directory cannot be moved. Instead, you will have to reinstall Spack to a different directory location (and rebuild all software packages).
The Spack root directory should be placed in a shared drive if several users need to access it.
Once you have chosen an location for the Spack root directory, you may continue with the Spack download and setup process.
Important
Execute all commands in this tutorial from the same directory. This is typically one directory level higher than the Spack root directory.
For example, if you install Spack as a subdirectory of
${HOME}
, then you will issue all commands from
${HOME}
.
Use the commands listed below to install Spack and perform first-time
setup. You can copy-paste these commands, but lookout for lines
marked with a # (modifiable) ...
comment as they might
require modification.
$ cd ${HOME} # (modifiable) cd to the install location you chose
$ git clone -c feature.manyFiles=true https://github.com/spack/spack.git # download Spack
$ source spack/share/spack/setup-env.sh # Load Spack
$ spack external find # Tell Spack to look for existing software
$ spack compiler find # Tell Spack to look for existing complilers
Note
If you should encounter this error:
$ spack external find
==> Error: 'name'
then Spack could not find any external software on your system.
Spack searches for executables that are located within your search
path (i.e. the list of directories contained in your $PATH
environment variable), but not within software modules. Because of
this, you might have to load a software package into your
environment before Spack can detect it. Ask your
sysadmin or IT staff for more information about your system’s
specific setup.
After the first-time setup has been completed, an environment variable
named SPACK_ROOT
, will be created in your Unix/Linux
environment. This contains to the absolute path of the Spack root
directory. Use this command to view the value of SPACK_ROOT
:
$ echo ${SPACK_ROOT}
/path/to/home/spack # Path to Spack root, assumes installation to a subdir of ${HOME}
Clone a copy of GCClassic, GCHP, or HEMCO
The GCClassic, GCHP , and HEMCO repositories each contain a
spack/
subdirectory with customized Spack configuration files
modules.yaml
and packages.yaml
. We have updated these
YAML files with the proper settings in order to ensure a smooth
software build process with Spack.
First, define the model
, scope_dir
, and
scope_args
environment variables as shown below.
$ model=GCClassic # Use this if you will be working with GEOS-Chem Classic
$ model=GCHP # Use this if you will be working with GCHP
$ model=HEMCO # Use this if you will be working with HEMCO standalone
$ scope_dir="${model}/spack" # Folder where customized YAML files are stored
$ scope_args="-C ${scope_dir}" # Tell spack to for custom YAML files in scope_dir
You will use these environment variables in the steps below.
When you have completed this step, download the source code for your preferred model (e.g. GEOS-Chem Classic, GCHP, or HEMCO standalone):
$ git clone --recurse-submodules https://github.com/geoschem/${model}.git
Install the recommended compiler
Next, install the recommended compiler, gcc (aka the GNU
Compiler Collection). Use the scope_args
environment
variable that you defined in the previous step.
$ spack ${scope_args} install gcc # Install GNU Compiler Collection
Note
Requested version numbers for software packages (including the
compiler) are listed in the ${scope_dir}/packages.yaml
file. We have selected software package versions that have been
proven to work together. You should not have to change any of
the settings in ${scope_dir}/packages.yaml
.
As of this writing, the default compiler is gcc 10.2.0 (includes C, C++, and Fortran compilers). We will upgrade to newer compiler and software package versions as necessary.
The compiler installation should take several minutes (or longer if you have a slow internet connection).
Register the compiler with Spack after it has been installed. This will allow Spack to use this compiler to build other software packages. Use this command:
$ spack compiler add $(spack location -i gcc) # Register GNU Compiler Collection
You will then see output similar to this:
==> Added 1 new compiler to /path/to/home/.spack/linux/compilers.yaml
gcc@X.Y.Z
==> Compilers are defined in the following files:
/path/to/home/.spack/linux/compilers.yaml
where
/path/to/home
indicates the absolute path of your home directory (aka${HOME}
)X.Y.Z
indicates the version of the GCC compiler that you just built with Spack.
Tip
Use this command to view the list of compilers that have been registered with Spack:
$ spack compiler list
Use this command to view the installation location for a Spackguide-built software package:
$ spack location -i <package-name>
Build GEOS-Chem dependencies and useful tools
Once the compiiler has been built and registered, you may proceed to building the software dependencies for GEOS-Chem Classic, GCHP, and HEMCO.
The Spack installation commands that you will use take the form:
$ spack ${scope_args} install <package-name>%gcc^openmpi
where
${scope_args}
is the environment variable that you defined above;
<package-name>
is a placeholder for the name of the software package that you wish to install;
%gcc
tells Spack that it should use the GNU Compiler Collection version that you just built;
^openmpi
tells Spack to use OpenMPI when building software packages. You may omit this setting for packages that do not require it.
Spack will download and build <package-name>
plus all of
its dependencies that have not already been installed.
Note
Use this command to find out what other packages will be built
along with <package-name>
:
$ spack spec <package-name>
This step is not required, but may be useful for informational purposes.
Use the following commands to build dependencies for GEOS-Chem Classic, GCHP, and HEMCO, as well as some useful tools for working with GEOS-Chem data:
Build the esmf (Earth System Model Framework), hdf5, netcdf-c, netcdf-fortran, and openmpi packages:
$ spack ${scope_args} install esmf%gcc^openmpi
The above command will build all of the above-mentioned packages in a single step.
Note
GEOS-Chem Classic does not require esmf. However, we recommend that you build ESMF anyway so that it will already be installed in case you decide to use GCHP in the future.
Build the cdo (Climate Data Operators) and nco (netCDF operators) packages. These are command-line tools for editing and manipulating data contained in netCDF files.
$ spack ${scope_args} install cdo%gcc^openmpi $ spack ${scope_args} install nco%gcc^openmpi
Build the ncview package, which is a quick-and-dirty netCDF file viewer.
$ spack ${scope_args} install ncview%gcc^openmpi
Build the flex (Fast Lexical Analyzer) package. This is a dependency of the Kinetic PreProcessor (KPP), with which you can update GEOS-Chem chemical mechanisms.
$ spack ${scope_args} install flex%gcc
Note
The flex package does not use OpenMPI. Therefore, we can omit
^openmpi
from the above command.
At any time, you may see a list of installed packages by using this command:
$ spack find
Add spack load
commands to your environment file
We recommend “sourcing” the load_script that you created in the previous section from within an environment file. This is a file that not only loads the required modules but also defines settings that you need to run GEOS-Chem Classic, GCHP, or the HEMCO standalone.
Please see the following links for sample environment files.
Copy and paste the code below into a file named ${model}.env
(using
the ${model}
environment variable that you defined
above). Then replace any existing module load
commands with the following code:
#=========================================================================
# Load Spackguide-built modules
#=========================================================================
# Setup Spack if it hasn't already been done
# ${SPACK_ROOT} will be blank if the setup-env.sh script hasn't been called.
# (modifiable) Replace "/path/to/spack" with the path to your Spack root directory
if [[ "x${SPACK_ROOT}" == "x" ]]; fi
source /path/to/spack/source/spack/setup-env.sh
fi
# Load esmf, hdf5, netcdf-c, netcdf-fortran, openmpi
spack load esmf%gcc^openmpi
# Load netCDF packages (cdo, nco, ncview)
spack load cdo%gcc^openmpi
spack load nco%gcc^openmpi
spack load ncview
# Load flex
spack load flex
#=========================================================================
# Set environment variables for compilers
#=========================================================================
export CC=gcc
export CXX=g++
export FC=gfortran
export F77=gfortran
#=========================================================================
# Set environment variables for Spack-built modules
#=========================================================================
# openmpi (needed for GCHP)
export MPI_ROOT=$(spack-location -i openmpi%gcc)
# esmf (needed for GCHP)
export ESMF_DIR=$(spack location -i esmf%gcc^openmpi)
export ESMF_LIB=${ESMF_DIR}/lib
export ESMF_COMPILER=gfortran
export ESMF_COMM=openmpi
export ESMF_INSTALL_PREFIX=${ESMF_DIR}/INSTALL_gfortran10_openmpi4
# netcdf-c
export NETCDF_HOME=$(spack location -i netcdf-c%gcc^openmpi)
export NETCDF_LIB=$NETCDF_HOME/lib
# netcdf-fortran
export NETCDF_FORTRAN_HOME=$(spack location -i netcdf-fortran%gcc^openmpi)
export NETCDF_FORTRAN_LIB=$NETCDF_FORTRAN_HOME/lib
# flex
export FLEX_HOME=$(spack location -i flex%gcc^openmpi)
export FLEX_LIB=$NETCDF_FORTRAN_HOME/lib
export KPP_FLEX_LIB_DIR=${FLEX_LIB} # OPTIONAL: Needed for KPP
To apply these settings into your login environment, type
source ${model}.env # One of GCClassic.env, GCHP.env, HEMCO.env
To test if the modules have been loaded properly, type:
$ nf-config --help # netcdf-fortran configuration utility
If you see a screen similar to this, you know that the modules have been installed properly.
Usage: nf-config [OPTION]
Available values for OPTION include:
--help display this help message and exit
--all display all options
--cc C compiler
--fc Fortran compiler
--cflags pre-processor and compiler flags
--fflags flags needed to compile a Fortran program
--has-dap whether OPeNDAP is enabled in this build
--has-nc2 whether NetCDF-2 API is enabled
--has-nc4 whether NetCDF-4/HDF-5 is enabled in this build
--has-f90 whether Fortran 90 API is enabled in this build
--has-f03 whether Fortran 2003 API is enabled in this build
--flibs libraries needed to link a Fortran program
--prefix Install prefix
--includedir Include directory
--version Library version
Clean up
At this point, you can remove the ${model}
directory as it is
not needed. (Unless you would like to keep it to build the executable
for your research with GEOS-Chem Classic, GCHP, or HEMCO.)
The spack
directory needs to remain. As mentioned above, this directory cannot be moved.
You can clean up any Spack temporary build stage information with:
$ spack clean -m
==> Removing cached information on repositories
That’s it!
Customize simulations with research options
Most of the time you will want to use the “out-of-the-box” settings in your GEOS-Chem simulations, as these are the recommended settings that have been evaluated with benchmark simulations. But depending on your research needs, you may wish to use alternate simulation options. In this Guide we will show you how you can select these research options by editing the various GEOS-Chem and HEMCO configuration files.
Aerosols
Aerosol microphysics
GEOS-Chem incorporates two different aerosol microphysics schemes: APM (Yu and Luo [2009]) and TOMAS (Trivitayanurak et al. [2008]) as compile-time options for the full-chemistry simulation. Both APM and TOMAS are deactivated by default due to the extra computational overhead that these microphysics schemes require.
Follow the steps below to activate either APM or TOMAS microphysics in your full-chemistry simulation.
APM
Create a run directory for the Full Chemistry simulation with APM as the extra simulation option.
Navigate to the
build
folder within the run directory.Then type the following:
$ cmake .. -DAPM=y $ make -j $ make install
TOMAS
Create a run directory for the Full Chemistry simulation with TOMAS as the extra simulation option.
Navigate to the
build
folder within the run directory.Then type the following:
$ cmake .. -DTOMAS=y -DTOMAS_BINS=15 -DBPCH_DIAG=y $ make -j $ make install
This will create a GEOS-Chem executable for the TOMAS15 (15 size bins)
simulation. To generate an executable for the TOMAS40 (40 size-bins)
simulation, replace -DTOMAS_BINS=15
with
-DTOMAS_BINS=40
in the cmake
step above.
Chemistry
Adaptive Rosenbrock solver with mechanism auto-reduction
In Lin et al. [2023], the authors introduce an adaptive
Rosenbrock solver with on-the-fly mechanism reduction
in The Kinetic PreProcessor (KPP)
version 3.0.0 and later. While this adaptive solver is available for all
GEOS-Chem simulations that use the fullchem
simulation, it
is disabled by default.
To activate the adaptive Rosenbrock solver with mechanism
auto-reduction, edit the line of geoschem_config.yml
indicated
below:
chemistry:
activate: true
# ... Previous sub-sections omitted
autoreduce_solver:
activate: false # <== true activates the adaptive Rosenbrock solver
use_target_threshold:
activate: true
oh_tuning_factor: 0.00005
no2_tuning_factor: 0.0001
use_absolute_threshold:
scale_by_pressure: true
absolute_threshold: 100.0
keep_halogens_active: false
append_in_internal_timestep: false
Please see the Lin et al. [2023] reference for a detailed explanation of the other adaptive Rosenbrock solver options.
Alternate chemistry mechanisms
GEOS-Chem is compiled “out-of-the-box” with KPP-generated solver code
for the fullchem
mechanism. But you must manually specify
the mechanism name at configuration time for the following instances:
Carbon mechanism
Follow these steps to build an executable with the carbon
mechanism:
Create a run directory for the Carbon simulation
Navigate to the
build
folder within the run directory.Then type the following:
$ cmake .. -DMECH=carbon $ make -j $ make install
Custom full-chemistry mechanism
We recommend that you use the custom
mechanism instead of
directly modifying the fullchem
mechanism. The
custom
mechanism is a copy of fullchem
, but the
KPP solver code will be generated in the KPP/custom
folder instead of in KPP/fullchem
. This lets you keep the
fullchem
folder untouched.
Follow these steps:
Create a run directory for the full-chemistry simulation (whichever configuration you need).
Navigate to the
build
folder within the run directory.Then type the following:
$ cmake .. -DMECH=custom $ make -j $ make install
Hg mechanism
Follow these steps to build an executable with the Hg
(mercury)
mechanism:
Create a run directory for the Hg simulation.
Navigate to the
build
folder within the run directory.Then type the following:
$ cmake .. -DMECH=Hg $ make -j $ make install
HO2 heterogeneous chemistry reaction probability
You may update the value of \(\gamma_{HO2}\) (reaction probability for
uptake of HO2 in heterogeneous chemistry) used in your simulations.
Edit the line of geoschem_config.yml
indicated below:
chemistry:
activate: true
# ... Preceding sections omitted ...
gamma_HO2: 0.2 # <=== add new value here
TransportTracers
In GEOS-Chem 14.2.0 and later versions, species belonging to the
TransportTracers simulation (radionuclides and passive species) now
have their properties defined in the species_database.yml
file. For example:
CH3I:
Background_VV: 1.0e-20
Formula: CH3I
FullName: Methyl iodide
Henry_CR: 3.6e+3
Henry_K0: 0.20265
Is_Advected: true
Is_Gas: true
Is_Photolysis: true
Is_Tracer: true
Snk_Horiz: all
Snk_Mode: efolding
Snk_Period: 5
Snk_Vert: all
Src_Add: true
Src_Mode: HEMCO
MW_g: 141.94
where:
Is_Tracer: true
indicates a TransportTracer speciesSnk_*
define species sink propertiesSrc_*
define species source propertiesUnits
: specifies the default units for species (added mainly for age of air species at this time which are indays
)
For TransportTracers species that have a source term in HEMCO, there
will be corresponding entries in HEMCO_Config.rc
:
--> OCEAN_CH3I : true
# ... etc ...
#==============================================================================
# CH3I emitted over the oceans at rate of 1 molec/cm2/s
#==============================================================================
(((OCEAN_CH3I
0 SRC_2D_CH3I 1.0 - - - xy molec/cm2/s CH3I 1000 1 1
)))OCEAN_CH3I
Sources and sinks for TransportTracers are now applied in the new source
code module GeosCore/tracer_mod.F90
.
Note
Sources and sinks for radionuclide species (Rn, Pb, Be isotopes)
are currently not applied in GeosCore/tracer_mod.F90
(but
may be in the future). Emissions for radionuclide species are
computed by the HEMCO GC-Rn-Pb-Be
extension and
chemistry is done in GeosCore/RnPbBe_mod.F90
.
TransportTracer properties for radionuclide species have been
added to species_database.yml
but are currently commented
out.
Diagnostics
GEOS-Chem and HEMCO diagnostics
Please see our Diagnostics reference chapter for an overview of how to archive diagnostics from GEOS-Chem and HEMCO.
RRTMG radiative transfer diagnostics
You can use the RRTMG radiative transfer model to archive radiative
forcing fluxes to the GeosRad
History diagnostic
collection. RRTMG is implemented as a compile-time option due to the
extra computational overhead that it incurs.
To activate RRTMG, follow these steps:
Create a run directory for the Full Chemistry simulation, with extra option RRTMG.
Navigate to the
build
folder within the run directory.Then type the following:
$ cmake .. -DRRTMG=y $ make -j $ make install
Then also make sure to request the radiative forcing flux diagnostics
that you wish to archive in the HISTORY.rc
file.
Emissions
Offline vs. online emissions
Emission inventories sometimes include dynamic source types and nonlinear scale factors that have functional dependencies on local environmental variables such as wind speed or temperature, which are best calculated online during execution of the model. HEMCO includes a suite of additional modules (aka HEMCO extensions) that perform online emissions ccalculations for a variety of sources.
Some types of emissions are highly sensitive to meteorological variables such as wind speed and temperature. Because the meteorological inputs are regridded from their native resolution to the GEOS-Chem or HEMCO simulation grid, emissions computed with fine-resolution meteorology can significantly differ from emissions computed with coarse-resolution meteorology. This can make it difficult to compare the output of GEOS-Chem and HEMCO simulations that use different horizontal resolutions.
In order to provide more consistency in the computed emissions, we now make available for download offline emissions. These offline emissions are pre-computed with HEMCO standalone simulations using meteorological inputs at native horizontal resolutions possible. When these emissions are regridded within GEOS-Chem and HEMCO, the total mass emitted will be conserved regardless of the horizontal resolution of the simulation grid.
You should use offline emissions:
For all GCHP simulations
For full-chemistry simulations (except benchmark)
You should use online emissions:
For benchmark simulations
If you wish to assess the impact of changing/updating the meteorlogical inputs on emissions.
You may toggle offline emissions on (true
) or off
(false
) in this section of HEMCO_Config.rc
:
# ----- OFFLINE EMISSIONS -----------------------------------------------------
# To use online emissions instead set the offline emissions to 'false' and the
# corresponding HEMCO extension to 'on':
# OFFLINE_DUST - DustDead or DustGinoux
# OFFLINE_BIOGENICVOC - MEGAN
# OFFLINE_SEASALT - SeaSalt
# OFFLINE_SOILNOX - SoilNOx
#
# NOTE: When switching between offline and online emissions, make sure to also
# update ExtNr and Cat in HEMCO_Diagn.rc to properly save out emissions for
# any affected species.
#------------------------------------------------------------------------------
--> OFFLINE_DUST : true # 1980-2019
--> OFFLINE_BIOGENICVOC : true # 1980-2020
--> OFFLINE_SEASALT : true # 1980-2019
--> CalcBrSeasalt : true
--> OFFLINE_SOILNOX : true # 1980-2020
As stated in the comments, if you switch between offline and online emissions, you will need to activate the corresponding HEMCO extension:
Offline base emission |
Extension # |
Corresponding HEMCO extension |
Extension # |
---|---|---|---|
OFFLINE_DUST |
0 |
DustDead |
105 |
OFFLINE_BIOGENICVOC |
0 |
MEGAN |
108 |
OFFLINE_SEASALT |
0 |
SeaSalt |
107 |
OFFLINE_SOILNOX |
0 |
SoilNOx |
104 |
Example: Disabling offline dust emissions
Change the
OFFLINE_DUST
setting fromtrue
tofalse
inHEMCO_Config.rc
:--> OFFLINE_DUST : false # 1980-2019
Change the
DustDead
extension setting fromoff
toon
inHEMCO_Config.rc
:105 DustDead : on DST1/DST2/DST3/DST4
Change the extension number for all dust emission diagnostics from
0
(the extension number for base emissions) to105
(the extension number forDustDead
) inHEMCO_Diagn.rc
.############################################################################### ##### Dust emissions ##### ############################################################################### EmisDST1_Total DST1 -1 -1 -1 2 kg/m2/s DST1_emission_flux_from_all_sectors EmisDST1_Anthro DST1 105 1 -1 2 kg/m2/s DST1_emission_flux_from_anthropogenic EmisDST1_Natural DST1 105 3 -1 2 kg/m2/s DST1_emission_flux_from_natural_sources EmisDST2_Natural DST2 105 3 -1 2 kg/m2/s DST2_emission_flux_from_natural_sources EmisDST3_Natural DST3 105 3 -1 2 kg/m2/s DST3_emission_flux_from_natural_sources EmisDST4_Natural DST4 105 3 -1 2 kg/m2/s DST4_emission_flux_from_natural_sources
To enable online emissions again, do the inverse of the steps listed above.
Sea salt debromination
In Zhu et al. [2018], the authors present a mechanistic description of sea salt aerosol debromination. This option was originally enabled by in GEOS-Chem 13.4.0, but was then changed to be an option (disabled by default) due to the impact it had on ozone concentrations.
Further chemistry updates to GEOS-Chem have allowed us to re-activate
sea-salt debromination as the default option in GEOS-Chem 14.2.0 and
later versions. If you wish to disable sea salt debromination in your
simulations, edit the line in HEMCO_Config.rc
indicated below.
107 SeaSalt : on SALA/SALC/SALACL/SALCCL/SALAAL/SALCAL/BrSALA/BrSALC/MOPO/MOPI
# ... Preceding options omitted ...
--> Model sea salt Br- : true # <== false deactivates sea salt debromination
--> Br- mass ratio : 2.11e-3
Photolysis
Particulate nitrate photolysis
A study by Shah et al. [2023] showed that particulate nitrate photolysis increases GEOS-Chem modeled ozone concentrations by up to 5 ppbv in the free troposphere in northern extratropical regions. This helps to correct a low bias with respect to observations.
Particulate nitrate photolysis is turned on by default in GEOS-Chem
14.2.0 and later versions. You may disable this option by editing
the line in geoschem_config.yml
indicated below:
photolysis:
activate: true
# .. preceding sub-sections omitted ...
photolyze_nitrate_aerosol:
activate: true # <=== false deactivates nitrate photolysis
NITs_Jscale_JHNO3: 100.0
NIT_Jscale_JHNO2: 100.0
percent_channel_A_HONO: 66.667
percent_channel_B_NO2: 33.333
You can also edit the other nitrate photolysis parameters by changing the appropriate lines above. See the Shah et al [2023] reference for more information.
Wet deposition
Luo et al 2020 wetdep parameterization
In Luo et al. [2020], the authors introduced an updated wet deposition parameterization, which is now incorporated into GEOS-Chem as a compile-time option. Follow these steps to activate the Luo et al 2020 wetdep scheme in your GEOS-Chem simulations.
Create a run directory for the type of simulation that you wish to use.
CAVEAT: Make sure your simulation uses at least one species that can be wet-scavenged.
Navigate to the
build
folder within the run directory.Then type the following:
$ cmake .. -DLUO_WETDEP=y $ make -j $ make install
Understand what error messages mean
In this Guide we provide information about the different types of errors that your GEOS-Chem simulation might encounter.
Important
Know the difference between warnings and errors.
Warnings are non-fatal informational messages. Usually you do not have to take any action when encountering a warning. Nevertheless, you should always try to investigate why the warning was generated in the first place.
Errors are fatal and will halt GEOS-Chem compilation or execution. Looking at the error message will give you some clues as to why the error occurred.
We strongly encourage that you try to debug the issue using the info both in this Guide and in our Debug GEOS-Chem and HEMCO errors Guide. Please see our Support Guidelines for more information.
Where does error output get printed?
GEOS-Chem Classic, GCHP, and HEMCO, like all Linux-based programs, send output to two streams: stdout and stderr.
Most output will go to the stdout stream, which takes I/O from the
Fortran WRITE
and PRINT
commands. If you run
e.g. GEOS-Chem Classic by just typing the executable name at the Unix
prompt:
$ ./gcclassic
then the stdout stream will be printed to the terminal window. You can also redirect the stdout stream to a log file with the redirect command:
$ ./gcclassic > GC.log 2>&1
The 2>&1 tells the bash script to append the stderr stream
(noted by 2
) to the stdout stream (noted by 1
).
This will make sure that any error output also shows up in the log file.
You can also use the Linux tee command, which will send output both to a log file as well as to the terminal window:
$ ./gcclassic | tee GC.log 2>&1
Note
Please note the following:
We have combined HEMCO and GEOS-Chem informational printouts as of GEOS-Chem 14.2.0 and HEMCO 3.7.0. In previous versions, HEMCO informational printouts would have been sent to a separate
HEMCO.log
file.
We have disabled most GEOS-Chem and HEMCO informational printouts by default, starting in GEOS-Chem 14.2.0 and HEMCO 3.7.0. These printouts may be restored (e.g. for debugging) by enabling verbose output in both
geoschem_config.yml
andHEMCO_Config.rc
.
GCHP sends output to several log files as well as to the stdout and stderr streams. Please see gchp.readthedocs.io for more information.
Compile-time errors
In this section we discuss some compilation warnings that you may encounter when building GEOS-Chem.
Cannot open include file netcdf.inc
error #5102: Cannot open include file 'netcdf.inc'
Problem: The netcdf-fortran library cannot be found.
Solution: Make sure that all software dependencies have been installed and loaded into your Linux environment.
KPP error: Cannot find -lfl
/usr/bin/ld: cannot find -lfl
error: ld returned exit 1 status
Problem:: The Kinetic PreProcessor (KPP) cannot find the flex library, which is one of its dependencies.
Solution: Make sure that all software dependencies have been installed and loaded into your Linux environment.
GNU Fortran internal compiler error
f951: internal compiler error: in ___ at ___
Problem: Compilation halted due to a compiler issue. These types of errors can indicate:
An undiagnosed bug in the compiler itself.
The inability of the compiler to parse source code adhering to the most recent Fortran language standard.
Solution: Try switching to a newer compiler:
For GCHP: Use GNU Compiler Collection 9.3 and later.
For GEOS-Chem Classic and HEMCO: Use GNU Compiler Collection 7.0 and later
Run-time errors
Floating invalid or floating-point exception error
forrtl: error (65): floating invalid # Error message from Intel Fortran Compiler
Floating point exception (core dumped) # Error message from GNU Fortran compiler
Problem: An illegal floating-point math operation has occurred. This error can be generated if one of the following conditions has been encountered:
Division by zero
Underflow or overflow
Square root of a negative number
Logarithm of a negative number
Negative or Positive Infinity
Undefined value(s) used in an equation
Solution: Re-configure GEOS-Chem (or the HEMCO standalone) with
the -DCMAKE_RELEASE_TYPE=Debug
Cmake option. This will build
in additional error checking that should alert you to where the error
is occurring. Once you find the location of the error, you can take
the appropriate steps, such as making sure that the denominator of an
expression never goes to zero, etc.
Forced exit from Rosenbrock
Forced exit from Rosenbrock due to the following error:
--> Step size too small: T + 10*H = T or H < Roundoff
T= 3044.21151383269 and H= 1.281206877135470E-012
### INTEGRATE RETURNED ERROR AT: 40 68 1
Forced exit from Rosenbrock due to the following error:
--> Step size too small: T + 10*H = T or H < Roundoff
T= 3044.21151383269 and H= 1.281206877135470E-012
### INTEGRATE FAILED TWICE ###
###############################################################################
### KPP DEBUG OUTPUT
### Species concentrations at problem box 40 68 1
###############################################################################
... printout of species concentrations ...
###############################################################################
### KPP DEBUG OUTPUT
### Species concentrations at problem box 40 68 1
###############################################################################
... printout of reaction rates ...
Problem: The KPP Rosenbrock integrator could not converge to a solution at a particular grid box. This can happen when:
The absolute (
ATOL
) and/or relative (RTOL
) error tolerances need to be refined.A particular species has numerically underflowed or overflowed.
A division by zero occurred in the reaction rate computations.
A species has been set to a very low value in another operation (e.g. wet scavenging), thus causing the non-convergence.
The initial conditions of the simulation may be non-physical.
A data file (meteorology or emissions) may be corrupted.
If the non-convergence only happens once, then GEOS-Chem will revert
to prior concentrations and reset the saved KPP internal timestep
(Hnew
) to zero before calling the Rosenbrock integrator again.
In many instances, this is sufficient for the chemistry to converge to
a soluiton.
In the case that the Rosenbrock integrator fails to converge to a solution twice in a row, all of the concentrations and reaction rates at the grid box will be printed to stdout and the simulation will terminate.
Solution: Look at the error printout. You will likely notice species concentrations or reaction rates that are extremely high or low compared to the others. This will give you a clue as to where in GEOS-Chem the error may have occurred.
Try performing some short test simulations, turning each operation
(e.g. transport, PBL mixing, convection, etc). off one at a time.
This should isolate the location of the error. Make sure to turn on
verbose output in both geoschem_config.yml
and
HEMCO_Config.rc
; this will send additional printout to the
stdout stream. The clue to finding the error
may become obvious by looking at this output.
Check your restart file to make sure that the initial concentrations make sense. For certain simulations, using initial conditions from a simulation that has been sufficiently spun-up makes a difference.
Use a netCDF file viewer like ncview to open the meteorology files on the day that the error occurred. If a file does not open properly, it is probably corrupted. If you suspect that the file may have been corrupted during download, then download the file again from its original source. If this still does not fix the error, then the file may have been corrupted at its source. Please open a new Github issue to alert the GEOS-Chem Support Team.
More about KPP error tolerances
The error tolerances are set in the following locations:
fullchem mechanism: In routine
Do_FlexChem
(located in inGeosCore/fullchem_mod.F90
).Hg mechanism: In routine
ChemMercury
(located inGeosCore/mercury_mod.F90
).
For example, in the fullchem mechanism, ATOL
and RTOL
are
defined as:
!%%%%% CONVERGENCE CRITERIA %%%%%
! Absolute tolerance
ATOL = 1e-2_dp
! Relative tolerance
! Changed to 0.5e-3 to avoid integrate errors by halogen chemistry
! -- Becky Alexander & Bob Yantosca (24 Jan 2023)
RTOL = 0.5e-3_dp
Convergence errors can occur because the system arrives to a state too far from the truth to be able to converge. By tightening (i.e. decreasing) the tolerances, you ensure that the system stays closer to the truth at every time step. Then, the problematic time steps will start the chemistry with a system closer to the true state, enabling the chemistry to converge.
CAVEAT: If the first time step of chemistry cannot converge,
tightening the tolerances wouldn’t work but loosening the tolerance
would. So you might have to experiment a little bit in order to find
the proper settings for ATOL
and RTOL
for your
specific mechanism.
HEMCO Error: Cannot find field
HEMCO Error: Cannot find field ___. Please check the name in the config file.
Problem: A GEOS-Chem Classic or HEMCO standalone simulation halts because HEMCO cannot find a certain input field.
Solution: Most of the time, this error indicates that a species is
missing from the GEOS-Chem restart file.
By default, the GEOS-Chem restart file (entry SPC_
in
HEMCO_Config.rc) uses time cycle flag EFYO
. This
setting tells HEMCO to halt if a species does not have an initial
condition field contained in the GEOS-Chem restart file. Changing this
time cycle flag to CYS
will allow the simulation to
proceed. In this case, species will be given a default background
initial concentration, and the simulation will be allowed to proceed.
HEMCO Error: Cannot find file for current simulation time
HEMCO ERROR: Cannot find file for current simulation time:
./Restarts/GEOSChem.Restart.17120701_0000z.nc4 - Cannot get field SPC_NO.
Please check file name and time (incl. time range flag) in the config. file
Problem: HEMCO tried to read data from a file but could not find the time
slice requested in HEMCO_Config.rc
.
Solution: Make sure that the file is at the path specified in
HEMCO_Config.rc
. HEMCO will try to look back in time starting
with the current year and going all the way back to the year 1712
or 1713. So if you see 1712 or 1713 in the error message, that is a
tip-off that the file is missing.
HEMCO Run Error
===============================================================================
GEOS-CHEM ERROR: HCO_RUN
HEMCO ERROR: Please check the HEMCO log file for error messages!
STOP at HCOI_GC_RUN (hcoi_gc_main_mod.F90)
===============================================================================
Problem: A GEOS-Chem simulation stopped in the HCOI_GC_RUN
routine with an error message similar to that shown above.
Solution: Look at the output that was written to the
stdout and stderr streams. Error messages
containing HCO
originate in HEMCO.
HEMCO time stamps may be wrong
HEMCO WARNING: ncdf reference year is prior to 1901 - time stamps may be wrong!
--> LOCATION: GET_TIMEIDX (hco_read_std_mod.F90)
Problem: HEMCO reads the files but gives zero emissions and shows the error listed above.
Solution: Do the following:
Reset the reference datetime in the netCDF file so that it is after 1901.
Make sure that the
time:calendar
string is eitherstandard
orgregorian
. GEOS-Chem Classic, GCHP, and HEMCO can only read data placed on calendars with leap years.
GCST member Lizzie Lundgren writes:
This HEMCO error occurs if the reference time for the netCDF file time dimension is prior to 1901. If you do ncdump –c filename you will be able to see the metadata for the time dimension as well as the time variable values. The time units should include the reference date.
You can get around this issue by changing the reference time within the file. You can do this with cdo (Climate Data Operators) using the setreftime command.
Here is a bash script example by GCST member Melissa Sulprizio that updates the calendar and reference time for all files ending in
*.nc
within a directory. This script was made for a user who ran into this issue. into the same issue. In that case the first file was for Jan 1, 1950, so that was made the new reference time. I would recommend doing the same for your dataset so that the first time variable value would be0
. This script also compresses the file which we recommend doing.#!/bin/bash for file in *nc; do echo "Processing $file" # Make sure te calendar is "standard" and not e.g. 360 days cdo setcalendar,standard $file tmp.nc mv tmp.nc $file # Set file reference time to 1950-01-01 at 0z cdo setreftime,1950-01-01,0 $file tmp.nc mv tmp.nc $file # Compress the file nccopy -d1 -c "time/1" $file tmp.nc mv tmp.nc $file doneAfter you update the file you can then again do ncdump –c filename to check the time dimension. For the case above it looks like this after processing.
double time(time) ; time:standard_name = "time" ; time:long_name = "time" ; time:bounds = "time_bnds" ; time:units = "days since 1950-01-01 00:00:00" ; time:calendar = "standard" ; . . . time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365, 396, 424, 455, 485, 516, 546, 577, 608, 638, 669, 699, 730, 761, 790, 821, 851,`` 882, 912, 943, 974, 1004, 1035, 1065, 1096, 1127, 1155, 1186, 1216, 1247 . . .
Negative tracer found in WETDEP
WETDEP: ERROR at 40 67 1 for species 2 in area WASHOUT: at surface
LS : T
PDOWN : 0.0000000000000000
QQ : 0.0000000000000000
ALPHA : 0.0000000000000000
ALPHA2 : 0.0000000000000000
RAINFRAC : 0.0000000000000000
WASHFRAC : 0.0000000000000000
MASS_WASH : 0.0000000000000000
MASS_NOWASH : 0.0000000000000000
WETLOSS : NaN
GAINED : 0.0000000000000000
LOST : 0.0000000000000000
DSpc(NW,:) : NaN 6.0358243778561746E-013 6.5871997362336500E-013 7.2710915872550685E-013 8.0185772698102585E-013 8.7883682997147595E-013 9.6396466805517407E-013 1.0574719517340253E-012 1.1617302070198606E-012 1.2976219851862141E-012 1.4347568254382824E-012 1.5772212240871896E-012 1.7071657565802178E-012 1.8443377617027378E-012 1.9982208320328261E-012 2.1567932874822908E-012 2.2591568422224307E-012 2.2208301198704935E-012 1.8475974519883714E-012 1.7716069173018996E-013 1.7714395985520433E-013 1.7633649101242403E-013 1.6668529114369137E-013 1.3548045738669223E-013 5.1061710020314286E-014 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000
Spc(I,J,:N) : NaN 3.5108056785061143E-009 3.8363969256742307E-009 3.6615166033026556E-009 3.6780394914242783E-009 4.1462343168230006E-009 4.7319942271993657E-009 5.1961472823088513E-009 5.4030830279477525E-009 5.5736845790195336E-009 5.7139596145766606E-009 5.8629212873139874E-009 7.9742789235773213E-009 1.0334311421916619E-008 1.0816150360971255E-008 1.1168715310744298E-008 1.1534959217017146E-008 1.1809950282570185E-008 1.7969626885629474E-008 1.7430760762446019E-008 1.7477810715818748E-008 1.7967321756900857E-008 1.8683742574601477E-008 1.9309929368816065E-008 2.0262386892450682E-008 2.0489969814921647E-008 1.9961590106306151E-008 2.2859284477873924E-008 1.3161046290246557E-008 6.5857053651000387E-009 2.7535806161296159E-009 1.2708780077337107E-009 3.6557775667039418E-010 6.1984105316417057E-011 2.6665694620973736E-011 8.7599157145440813E-012 4.8009375158768866E-012 1.0086435318729046E-012 1.3493529625353547E-013 1.6403790023674963E-014 2.7417226109948757E-015 4.2031825835582592E-014 2.3778709382809943E-013 8.3223532851684382E-013 4.5695049346098890E-012 6.9911523125704209E-012 2.5076669266356582E-012
===============================================================================
===============================================================================
GEOS-Chem ERROR: Error encountered in wet deposition!
-> at SAFETY (in module GeosCore/wetscav_mod.F90)
===============================================================================
===============================================================================
GEOS-Chem ERROR: Error encountered in "Safety"!
-> at Do_Washout_at_Sfc (in module GeosCore/wetscav_mod.F90)
===============================================================================
===============================================================================
GEOS-Chem ERROR:
-> at WetDep (in module GeosCore/wetscav_mod.F90)
===============================================================================
===============================================================================
GEOS-Chem ERROR: Error encountered in "Wetdep"!
-> at Do_WetDep (in module GeosCore/wetscav_mod.F90)
===============================================================================
===============================================================================
GEOS-CHEM ERROR: Error encountered in "Do_WetDep"!
STOP at -> at GEOS-Chem (in GeosCore/main.F90)
===============================================================================
- CLEANUP: deallocating arrays now...
Problem: A GEOS-Chem simulation has encountered either negative
or NaN
(not-a-number) concentrations in the wet deposition
module. This can indicate the following:
The wet deposition routines have removed too much soluble species from within a grid box.
Another operation (e.g. transport, convection, etc.) has removed too much soluble species from within a grid box.
A corrupted or incorrect meteorological input has caused too much rainout or washout to occur within a grid box (which leads to conditions 1 and/or 2 above).
An array-out-of-bounds error has corrupted a variable that is used in wet depoosition.
For nested-grid simulations, the transport timestep may be too large, thus resulting in grid boxes with zero or negative concentrations.
Solution: Re-configure GEOS-Chem and/or HEMCO with the
-DCMAKE_RELEASE_TYPE=Debug
CMake option. This adds in
additional error checks that may help you find where the error
occurs.
Also try adding some PRINT*
statements before and after the
call to DO_WETDEP
to check the concentrations entering and
leaving the wetdep module. That might give you an idea of where the
concetnrations are going negative.
Permission denied error
geoschem.run: Permission denied
Problem: The script geoschem.run
is not executable.
Solution: Change the permission of the script with:
$ chmod 755 geoschem.run
Excessive fall velocity error
GEOS-CHEM ERROR: Excessive fall velocity?
STOP at CALC_FALLVEL, UCX_mod
Problem: The fall velocity (in stratopsheric chemistry routine
Calc_FallVel
in module GeosCore/ucx_mod.F90
) exceeds
10 m/s. This error will most often occur in GEOS-Chem Classic
nested-grid simulations.
Solution: Reduce the default timestep settings in
geoschem_config.yml
. You may need to use 300 seconds
(transport) and 600 seconds (chemistry) or even smaller depending on
the horizontal resolution of your simulation.
File I/O errors
List-directed I/O syntax error
# Error message from GNU Fortran
At line NNNN of file filename.F90
Fortran runtime error: Bad real number|integer number|character in item X of list input
# Error message from Intel Fortran
forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
Problem: This error indicates that the wrong type of data was read from a text file. This can happen when:
Numeric input is expected but character input was read from disk (or vice-versa);
A READ statement in your code has been omitted or deleted.
Solution: Check configuration files (geoschem_config.yml
,
HEMCO_Config.rc
, HEMCO_Diagn.rc
, etc.) for syntax
errors and omissions that could be causing this error.
Nf_Def_Var: can not define variable
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Nf_Def_var: can not define variable: ____
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Code stopped from DO_ERR_OUT (in module NcdfUtil/m_do_err_out.F90)
This is an error that was encountered in one of the netCDF I/O modules,
which indicates an error in writing to or reading from a netCDF file!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Problem: GEOS-Chem or HEMCO could not write a variable to a netCDF file. This error may be caused by:
The netCDF file is write-protected and cannot be overwritten.
The path to the netCDF file is incorrect (e.g. directory does not exist).
The netCDF file already contains a variable with the same name.
Solution: Try the following:
If GEOS-Chem or HEMCO will be overwriting any existing netCDF files (which can often happen during testing & development), make sure that the file and containing directory are not write-protected.
Make sure that the path where you intend to write the netCDF file exists.
Check your
HISTORY.rc
andHEMCO_Diagn.rc
diagnostic configuration files to make sure that you are not writing more than one diagnostic variable with the same name.
NetCDF: HDF Error
NetCDF: HDF error
Problem: The netCDF library routines in GEOS-Chem or HEMCO cannot read a netCDF file. The error is occurring in the HDF5 library (upon which netCDF depends). This may indicate a corrupted or incomplete netCDF file.
Solution: Try re-downloading the file from the WashU data portal. Downloading a fresh copy of the file is often sufficient to fix this type of issue. If the error persists, please open a new GitHub issue to alert the GEOS-Chem Support team, as the corruption may have occured at the original source of te data.
Segmentation faults and similar errors
SIGSEGV, segmentation fault occurred
Problem: GEOS-Chem or HEMCO tried to access an invalid memory location.
Solution: See the sections below for ways to debug segmentation fault errors.
Array-out-of-bounds error
Subscript #N of the array THISARRAY has value X which is less than the lower bound of Y
or
Subscript #N of the array THISARRAY has value A which is greater than the upper bound of B
Problem: An array index variable refers to an element that lies outside of the array boundaries.
Solution: Reconfigure GEOS-Chem with the following options:
$ cd /path/to/build # Your GEOS-Chem or HEMCO build directory
$ cmake . -DCMAKE_BUILD_TYPE=Debug
This will enable several debugging options, including checking for array operations indices that going out of bounds. You wil get an error message similar to those shown above.
Use the grep command to search for all instances of the
array (in this example, THISARRAY
) in each source code folder:
grep -i THISARRAY *.F90 # -i means ignore uppercase/lowercase distinction
This should let you quickly locate the issue. Depending on the compiler that is used, you might also get a routine name and line number from the error output.
Segmentation fault encountered after TPCORE initialization
NASA-GSFC Tracer Transport Module successfully initialized
Problem: A GEOS-Chem simulation dies right after you see this text.
Note
Starting in GEOS-Chem Classic 14.1.0, the text above will only be
printed if you have activated verbose output in the
geoschem_config.yml
configuration file.
Solution: Increase the amount of stack memory available to GEOS-Chem and HEMCO. Please follow this link for detailed instructions.
Invalid memory access
severe (174): SIGSEGV, segmentation fault occurred
This message indicates that the program attempted an invalid memory reference.
Check the program for possible errors.
Problem: GEOS-Chem or HEMCO code tried to read data from an invalid memory location. This can happen when data is being read from a file into an array, but the array is too small to hold all the data.
Solution: Use a debugger (like gdb) to try to diagnose the situation. Also try increasing the dimensions of the array that you suspect might be too small.
Stack overflow
severe (174): SIGSEGV, possible program stack overflow occurred
Program requirements exceed current stacksize resource limit.
Problem: GEOS-Chem and/or HEMCO is using more stack memory than is currently available to the system. Stack memory is a reserved portion of the memory structure where short-lived variables are stored, such as:
Variables that are local to a given subroutine
Variables that are NOT globally saved
Variables that are NOT declared as an
ALLOCATABLE
arrayVariables that are NOT declared as a
POINTER
variable or arrayVariables that are included in an
!$OMP PRIVATE
or!$OMP THREADPRIVATE
Solution: Max out the amount of stack memory that is available to GEOS-Chem and HEMCO. See this section for instructions.
Less commmon errors
The errors listed below, which occur infrequently, are related to
invalid memory operations. These can especially occur with
POINTER
-based variables.
Bus Error
Problem: GEOS-Chem or HEMCO is trying to reference memory that cannot possibly be there. The website StackOverflow.com has a definition of bus error and how it differs from a segmentation fault.
Solution: A bus error may occur when you call a subroutine with too many arguments. Check subroutine definitions and subroutine calls to make sure the correct number of arguments are passed.
Double free or corruption
*** glibc detected *** PROGRAM_NAME: double free or corruption (out): ____ ***
Problem: The following error is not common, but can occur under some circumstances. Usually this means one of the following has occurred:
You are deallocating the same variable more than once.
You are deallocating a variable that wasn’t allocated, or that has already been deallocated.
Please see this link for more details.
Solution: Try setting all deleted pointers to NULL()
.
You can also use a debugger like gdb, which will show you a backtrace from your crash. This will contain information about in which routine and line number the code crashed, and what other routines were called before the crash happened.
Remember these three basic rules when working with
POINTER
-based variables:
Set pointer to NULL after free.
Check for NULL before freeing.
Initialize pointer to NULL in the start.
Using these rules helps to prevent this type of error.
Also note, you may see this error when a software library required by GEOS-Chem and/or HEMCO is not (e.g. netcdf or netcdf-fortran has not been installed. GEOS-Chem and/or HEMCO may be making calls to the missing library, which results in the error. If this is the case, the solution would be to install all required libraries.
Dwarf subprogram entry error
Dwarf subprogram entry L_ROUTINE-NAME__LINE-NUMBER__par_loop2_2_576 has high_pc < low_pc.
This warning will not be repeated for other occurrences.
Problem: GEOS-Chem or HEMCO code tried to use a
POINTER
-based variable that is unassociated (i.e. not
pointing to any other variable or memory) from within an OpenMP
parallel loop.
This error can happen when a POINTER
-based variable is set to
NULL()
where it is declared:
TYPE(Species), POINTER :: ThisSpc => NULL()
The above declaration causes use pointer variable ThisSpc
to
be implicitly declared with the SAVE
attribute. This causes a
segmentation fault, because all pointers used within an OpenMP
parallel region must be associated and nullified on the same thread.
Solution: Make sure that any POINTER
-based variables (such
as ThisSpc
in this example) point to their target and are
nullified within the same OpenMP parallel loop.
TYPE(Species), POINTER :: ThisSpc ! Do not set to NULL() here!!!
... etc ...
!$OMP PARALLEL DO(
!$OMP DEFAULT( SHARED ) &
!$OMP PRIVATE( I, J, L, N, ThisSpc, ... )
DO N = 1, nSpecies
DO L = 1, NZ
DO J = 1, NY
DO I = 1, NX
... etc ...
! Point to species database entry
ThisSpc => State_Chm%Species(N)%Info
... etc ...
! Free pointer at end of loop
ThisSpc => NULL()
ENDDO
ENDDO
ENDDO
ENDDO
Note that you must also add POINTER
-based variables (such as
ThisSpc
) to the !$OMP PRIVATE
clause for the parallel
loop.
For more information about this type of error, please see this article.
Free: invalid size
Error in PROGRAM_NAME free(): invalid size: 0x00000000 0662e090
Problem: This error is not common. It can happen when:
You are trying to free a pointer that wasn’t allocated.
You are trying to delete an object that wasn’t created.
You may be trying to nullify or deallocate an object more than once.
You may be overflowing a buffer.
You may be writing to memory that you shouldn’t be writing to.
Solution: Any number of programming errors can cause this problem. You need to use a debugger (such as gdb), get a backtrace, and see what your program is doing when the error occurs. If that fails and you determine you have corrupted the memory at some previous point in time, you may be in for some painful debugging (it may not be too painful if the project is small enough that you can tackle it piece by piece).
See this post on StackOverFlow for more information.
Munmap_chunk: invalid pointer
** glibc detected *** PROGRAM_NAME: munmap_chunk(): invalid pointer: 0x00000000059aac30 ***
Problem: This is not a common error, but can happen if you
deallocate or nullify a POINTER
-based variable that has
already been deallocated or modified.
Solution: Use a debugger (like gdb) to see where in
GEOS-Chem or HEMCO the error occurs. You will likely have to remove a
duplicate DEALLOCATE
or => NULL()
statement. See
this link
for more information.
Out of memory asking for NNNNN
Fatal compilation error: Out of memory asking for 36864.
Problem: This error may be caused by the datasize
limit
not being maxed out in your Linux login environment. For more
informatin, see this link for more
information.
Solution: Use this command to check the status of the
datasize
limit:
$ ulimit -d
unlimited
If the result of this command is not unlimited
, then set it
to unlimited with this command:
$ ulimit -d unlimited
Note
The two most important limits for GEOS-Chem and HEMCO
are datasize
and stacksize
These should both
be set to unlimited
.
Debug GEOS-Chem and HEMCO errors
If your GEOS-Chem or HEMCO simulation dies unexpectedly with an error or takes much longer to execute than it should, the most important thing is to try to isolate the source of the error or bottleneck right away. Below are some debugging tips that you can use.
Check if a solution has been posted to Github
We have migrated support requests from the GEOS-Chem wiki to Github issues. A quick search of Github issues (both open and closed) might reveal the answer to your question or provide a solution to your problem.
You should also feel free to open a new issue at one of these Github links:
If you are new to Github, we recommend viewing our Github tutorial videos at our GEOS-Chem Youtube site.
Check if your computational environment is configured properly
Many GEOS-Chem and HEMCO errors occur due to improper configuration settings (i.e. missing libraries, incorrectly-specified environment variables, etc.) in your computational environment. Take a moment and refer back to these manual pages (on ReadTheDocs) for information on configuring your environment:
Check any code modifications that you have added
If you have made modifications to a “fresh out-of-the-box” GEOS-Chem or HEMCO version, look over your code edits to search for sources of potential error.
You can also use Git to revert to the last stable version, which is always in the main branch.
Check if your runs exceeded time or memory limits
If you are running GEOS-Chem or HEMCO on a shared computer system, you will probably have to use a job scheduler (such as SLURM) to submit your jobs to a computational queue. You should be aware of the run time and memory limits for each of the queues on your system.
If your job uses more memory or run time than the computational queue allows, it can be cancelled by the scheduler. You will usually get an error message printed out to the stderr stream, and maybe also an email stating that the run was terminated. Be sure to check all of the log files created by your jobs for such error messages.
To solve this issue, try submitting your GEOS-Chem or HEMCO simulations to a queue with larger run-time and memory limits. You can also try splitting up your long simulations into several smaller stages (e.g. monthly) that take less time to run to completion.
Send debug printout to the log files
If your GEOS-Chem simulation stopped with an error, but you
cannot tell where, turn on the the debug_printout
option.
This is found in the Simulation Settings section of
geoschem_config.yml
:
#============================================================================
# Simulation settings
#============================================================================
simulation:
name: fullchem
start_date: [20190701, 000000]
end_date: [20190801, 000000]
root_data_dir: /path/to/ExtData
met_field: MERRA2
species_database_file: ./species_database.yml
debug_printout: false # <---- set this to true
use_gcclassic_timers: false
This will send additional output to the GEOS-Chem log file, which may help you to determine where the simulation stopped.
If your HEMCO simulation stopped with an error, turn on debug
printout by editing the Verbose
and Warnings
settings
at the top of the HEMCO_Config.rc
configuration file:
###############################################################################
### BEGIN SECTION SETTINGS
###############################################################################
ROOT: /path/to/ExtData/HEMCO
METDIR: MERRA2
GCAP2SCENARIO: none
GCAP2VERTRES: none
Logfile: HEMCO.log
DiagnFile: HEMCO_Diagn.rc
DiagnPrefix: ./OutputDir/HEMCO_diagnostics
DiagnFreq: Monthly
Wildcard: *
Separator: /
Unit tolerance: 1
Negative values: 0
Only unitless scale factors: false
Verbose: 0 # <---- set this to 3
Warnings: 1 # <---- set this to 3
Both Verbose
and Warnings
settings can have values
from 0 to 3. The higher the number, the more information will be
printed out to the HEMCO.log
file. A value of 0 disables
debug printout.
Having this extra debug printout in your log file output may provide insight as to where your simulation is halting.
Look at the traceback output
An error traceback will be printed out whenever a GEOS-Chem or HEMCO simulation halts with an error. This is a list of routines that were called when the error occurred.
An sample error traceback is shown here:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
gcclassic 0000000000C82023 Unknown Unknown Unknown
libpthread-2.17.s 00002AACE8015630 Unknown Unknown Unknown
gcclassic 000000000095935E error_mod_mp_erro 437 error_mod.F90
gcclassic 000000000040ABB7 MAIN__ 422 main.F90
gcclassic 0000000000406B92 Unknown Unknown Unknown
libc-2.17.so 00002AACE8244555 __libc_start_main Unknown Unknown
gcclassic 0000000000406AA9 Unknown Unknown Unknown
The top line with a valid routine name and line number printed is the
routine that exited with an error (error_mod.F90
, line 437).
You might also have to look at the other listed files as well to get
some more information about the error (e.g. main.F90
, line
422).
Identify whether the error happens consistently
If your GEOS-Chem or HEMCO error always happens at the same model date and time, this could indicate corrupted meteorology or emissions input data files. In this case, you may be able to fix the issue simply by re-downloading the files to your disk space.
If the error happened only once, it could be caused by a network problem or other such transient condition.
Isolate the error to a particular operation
If you are not sure where a GEOS-Chem error is occurring,
turn off operations (such as transport, chemistry, dry deposition,
etc.) one at a time in the geoschem_config.yml
configuration
file, and rerun your simulation.
Similarly, if you are debugging a HEMCO error, turn off
different emissions inventories and extensions one at a time in the
HEMCO_Config.rc
file, and rerun your simulation.
Repeating this process should eventually lead you to the source of the error.
Compile with debugging options
You can compile GEOS-Chem or HEMCO in debug mode. This will activate several additional error run-time error checks (such as looking for assignments that go outside of array bounds or floating point math errors) that can give you more insight as to where your simulation is dying.
Configure your code for debug mode with the -DCMAKE_RELEASE_TYPE=Debug option. From your run directory, type these commands:
cd build
cmake ../CodeDir -DCMAKE_RELEASE_TYPE=Debug -DRUNDIR=..
make -j
make -j install
cd ..
Attention
Compiling in debug mode will add a significant amount of computational overhead to your simulation. Therefore, we recommend to activate these additional error checks only in short simulations and not in long production runs.
Use a debugger
You can save yourself a lot of time and hassle by using a debugger such as gdb (the GNU debugger). With a debugger you can:
Examine data when a program stops
Navigate the stack when a program stops
Set break points
To run GEOS-Chem or HEMCO in the gdb
debugger, you should first compile in debug mode. This will turn on the -g
compiler
flag (which tells the compiler to generate symbolic information for
debugging) and the -O0
compiler flag (which shuts off all
optimizations. Once the executable has been created, type one of the
following commands, which will start gdb:
$ gdb gcclassic # for GEOS-Chem Classic
$ gdb gchp # for GCHP
$ gdb hemco # for HEMCO standalone
At the gdb prompt, type one of these commands:
(gdb) run # for GEOS-Chem Classic or GCHP
(gdb) run HEMCO_sa_Config.rc # for HEMCO standalone
With gdb, you can also go directly to the point of
the error without having to re-run GEOS-Chem or HEMCO. When your
GEOS-Chem or HEMCO simulation dies, it will create a corefile
such as core.12345
. The 12345
refers to the process
ID assigned to your executable by the operating system; this number is
different for each running process on your system.
Typing one of these commands:
$ gdb gcclassic core.12345 # for GEOS-Chem Classic
$ gdb gchp core.12345 # for GCHP
$ gdb hemco_standalone core.12345 # for HEMCO standalone
will open gdb and bring you immediately to the point of the
error. If you then type at the (gdb)
prompt:
(gdb) where
You will get a traceback listing.
To exit gdb, type quit
.
Print it out if you are in doubt!
Add print*,
statements to write values of variables in the
area of the code where you suspect the error is occurring. Also add
the call flush(6)
statement to flush the output to the screen
and/or log file immediately after printing. Maybe you will see
something wrong in the output.
You can often detect numerical errors by adding debugging print statements into your source code:
Use
MINVAL
andMAXVAL
functions to get the minimum and maximum values of an array:PRINT*, '### Min, Max: ', MINVAL( ARRAY ), MAXVAL( ARRAY ) CALL FLUSH( 6 )
Use the
SUM
function to check the sum of an array:PRINT*, '### Sum of X : ', SUM( ARRAY ) CALL FLUSH( 6 )
Use the brute-force method when all else fails
If the bug is difficult to locate, then comment out a large section of code and run your GEOS-Chem or HEMCO simulation again. If the error does not occur, then uncomment some more code and run again. Repeat the process until you find the location of the error. The brute force method may be tedious, but it will usually lead you to the source of the problem.
Identify poorly-performing code with a profiler
If you think your GEOS-Chem or HEMCO simulation is taking too long to run, consider using profiling tools to generate a list of the time that is spent in each routine. This can help you identify badly written and/or poorly-parallelized code. For more information, please see our Profiling GEOS-Chem wiki page.
Manage a data archive with bashdatacatalog
If you need to download a large amount of input data for GEOS-Chem or HEMCO (e.g. in support of a large user group at your institution) you may find bashdatacatalog helpful.
What is bashdatacatalog?
The bashdatacatalog is a command-line tool (written by Liam Bindle) that facilitates synchronizing local data collections with a remote data source. With the bashdatacatalog, you can run queries on your local data collections to answer questions like “What files am I missing?” or “What files aren’t bitwise identical to remote data?”. Queries can include a date range, in which case collections with temporal assets are filtered-out accordingly. The bashdatacatalog can format the results of queries as: a URL download list, a Globus transfer list, an rsync transfer list, or simply a file list.
The bashdatacatalog was written to facilitate downloading input data for users of the GEOS-Chem atmospheric chemistry model. The canonical GEOS-Chem input data repository has >1 M files and >100 TB of data, and the input data required for a simulation depends on the model version and simulation parameters such as start and end date.
Usage instructions
For detailed instructions on using bashdatacatalog, please see the bashdatacatalog wiki on Github.
Also see our input-data-catalogs Github repository for comma-separated input lists of GEOS-Chem data, separated by model version.
Work with netCDF files
On this page we provide some useful information about working with data files in netCDF format.
Useful tools
There are many free and open-source software packages readily available for visualizing and manipulating netCDF files.
- cdo
Climate Data Operators: Highly-optimized command-line tools for manipulating and analyzing netCDF files. Contains features that are especially useful for Earth Science applications.
- GCPy
GEOS-Chem Python toolkit: Python package for visualizing and analyzing GEOS-Chem output. Used for creating the GEOS-Chem benchmark plots. Also contains some useful routines for creating single-panel plots and multi-panel difference plots, as well as file regridding utilities.
- ncdump
Generates a text representation of netCDF data and can be used to quickly view the variables contained in a netCDF file. ncdump is installed to the
bin/
folder of your netCDF library distribution.See: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/utilities/Ncdump.html
- nco
netCDF operators: Highly-optimized command-line tools for manipulating and analyzing netCDF files.
- ncview
Visualization package for netCDF files. Ncview has limited features, but is great for a quick look at the contents of netCDF files.
- netcdf-scripts
Our repository of useful netCDF utility scripts for GEOS-Chem.
- Panoply
Java-based data viewer for netCDF files. This package offers an alternative to ncview. From our experience, Panoply works nicely when installed on the desktop, but is slow to respond in the Linux environment.
- xarray
Python package that lets you read the contents of a netCDF file into a data structure. The data can then be further manipulated or converted to numpy or dask arrays for further procesing.
Some of the tools listed above, such as ncdump and ncview may come pre-installed on your system. Others may need to be installed or loaded (e.g. via the module load command). Check with your system administrator or IT staff to see what is available on your system.
Examine the contents of a netCDF file
An easy way to examine the contents of a netCDF file is to use ncdump as follows:
$ ncdump -ct GEOSChem.SpeciesConc.20190701_0000z.nc4
You will see output similar to this:
netcdf GEOSChem.SpeciesConc.20190701_0000z {
dimensions:
time = UNLIMITED ; // (1 currently)
lev = 72 ;
ilev = 73 ;
lat = 46 ;
lon = 72 ;
nb = 2 ;
variables:
double time(time) ;
time:long_name = "Time" ;
time:units = "minutes since 2019-07-01 00:00:00" ;
time:calendar = "gregorian" ;
time:axis = "T" ;
double lev(lev) ;
lev:long_name = "hybrid level at midpoints ((A/P0)+B)" ;
lev:units = "level" ;
lev:axis = "Z" ;
lev:positive = "up" ;
lev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
lev:formula_terms = "a: hyam b: hybm p0: P0 ps: PS" ;
double ilev(ilev) ;
ilev:long_name = "hybrid level at interfaces ((A/P0)+B)" ;
ilev:units = "level" ;
ilev:positive = "up" ;
ilev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
ilev:formula_terms = "a: hyai b: hybi p0: P0 ps: PS" ;
double lat_bnds(lat, nb) ;
lat_bnds:long_name = "Latitude bounds (CF-compliant)" ;
lat_bnds:units = "degrees_north" ;
double lat(lat) ;
lat:long_name = "Latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
lat:bounds = "lat_bnds" ;
double lon_bnds(lon, nb) ;
lon_bnds:long_name = "Longitude bounds (CF-compliant)" ;
lon_bnds:units = "degrees_east" ;
double lon(lon) ;
lon:long_name = "Longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
lon:bounds = "lon_bnds" ;
double hyam(lev) ;
hyam:long_name = "hybrid A coefficient at layer midpoints" ;
hyam:units = "hPa" ;
double hybm(lev) ;
hybm:long_name = "hybrid B coefficient at layer midpoints" ;
hybm:units = "1" ;
double hyai(ilev) ;
hyai:long_name = "hybrid A coefficient at layer interfaces" ;
hyai:units = "hPa" ;
double hybi(ilev) ;
hybi:long_name = "hybrid B coefficient at layer interfaces" ;
hybi:units = "1" ;
double P0 ;
P0:long_name = "reference pressure" ;
P0:units = "hPa" ;
float AREA(lat, lon) ;
AREA:long_name = "Surface area" ;
AREA:units = "m2" ;
float SpeciesConcVV_RCOOH(time, lev, lat, lon) ;
SpeciesConc_RCOOH:long_name = "Dry mixing ratio of species RCOOH" ;
SpeciesConcVV_RCOOH:units = "mol mol-1 dry" ;
SpeciesConcVV_RCOOH:averaging_method = "time-averaged" ;
float SpeciesConcVV_O2(time, lev, lat, lon) ;
SpeciesConcVV_O2:long_name = "Dry mixing ratio of species O2" ;
SpeciesConcVV_O2:units = "mol mol-1 dry" ;
SpeciesConcVV_O2:averaging_method = "time-averaged" ;
float SpeciesConcVV_N2(time, lev, lat, lon) ;
SpeciesConcVV_N2:long_name = "Dry mixing ratio of species N2" ;
SpeciesConcVV_N2:units = "mol mol-1 dry" ;
SpeciesConcVV_N2:averaging_method = "time-averaged" ;
float SpeciesConcVV_H2(time, lev, lat, lon) ;
SpeciesConcVV_H2:long_name = "Dry mixing ratio of species H2" ;
SpeciesConcVV_H2:units = "mol mol-1 dry" ;
SpeciesConcVV_H2:averaging_method = "time-averaged" ;
float SpeciesConcVV_O(time, lev, lat, lon) ;
SpeciesConcVV_O:long_name = "Dry mixing ratio of species O" ;
SpeciesConcVVO:units = "mol mol-1 dry" ;
... etc ...
// global attributes:
:title = "GEOS-Chem diagnostic collection: SpeciesConc" ;
:history = "" ;
:format = "not found" ;
:conventions = "COARDS" ;
:ProdDateTime = "" ;
:reference = "www.geos-chem.org; wiki.geos-chem.org" ;
:contact = "GEOS-Chem Support Team (geos-chem-support@g.harvard.edu)" ;
:simulation_start_date_and_time = "2019-07-01 00:00:00z" ;
:simulation_end_date_and_time = "2019-07-01 01:00:00z" ;
data:
time = "2019-07-01 00:30" ;
lev = 0.99250002413, 0.97749990013, 0.962499776, 0.947499955, 0.93250006,
0.91749991, 0.90249991, 0.88749996, 0.87249996, 0.85750006, 0.842500125,
0.82750016, 0.8100002, 0.78750002, 0.762499965, 0.737500105, 0.7125001,
0.6875001, 0.65625015, 0.6187502, 0.58125015, 0.5437501, 0.5062501,
0.4687501, 0.4312501, 0.3937501, 0.3562501, 0.31279158, 0.26647905,
0.2265135325, 0.192541016587707, 0.163661504087706, 0.139115, 0.11825,
0.10051436, 0.085439015, 0.07255786, 0.06149566, 0.05201591, 0.04390966,
0.03699271, 0.03108891, 0.02604911, 0.021761005, 0.01812435, 0.01505025,
0.01246015, 0.010284921, 0.008456392, 0.0069183215, 0.005631801,
0.004561686, 0.003676501, 0.002948321, 0.0023525905, 0.00186788,
0.00147565, 0.001159975, 0.00090728705, 0.0007059566, 0.0005462926,
0.0004204236, 0.0003217836, 0.00024493755, 0.000185422, 0.000139599,
0.00010452401, 7.7672515e-05, 5.679251e-05, 4.0142505e-05, 2.635e-05,
1.5e-05 ;
ilev = 1, 0.98500004826, 0.969999752, 0.9549998, 0.94000011, 0.92500001,
0.90999981, 0.89500001, 0.87999991, 0.86500001, 0.85000011, 0.83500014,
0.82000018, 0.80000022, 0.77499982, 0.75000011, 0.7250001, 0.7000001,
0.6750001, 0.6375002, 0.6000002, 0.5625001, 0.5250001, 0.4875001,
0.4500001, 0.4125001, 0.3750001, 0.3375001, 0.28808306, 0.24487504,
0.208152025, 0.176930008175413, 0.150393, 0.127837, 0.108663, 0.09236572,
0.07851231, 0.06660341, 0.05638791, 0.04764391, 0.04017541, 0.03381001,
0.02836781, 0.02373041, 0.0197916, 0.0164571, 0.0136434, 0.0112769,
0.009292942, 0.007619842, 0.006216801, 0.005046801, 0.004076571,
0.003276431, 0.002620211, 0.00208497, 0.00165079, 0.00130051, 0.00101944,
0.0007951341, 0.0006167791, 0.0004758061, 0.0003650411, 0.0002785261,
0.000211349, 0.000159495, 0.000119703, 8.934502e-05, 6.600001e-05,
4.758501e-05, 3.27e-05, 2e-05, 1e-05 ;
lat = -89, -86, -82, -78, -74, -70, -66, -62, -58, -54, -50, -46, -42, -38,
-34, -30, -26, -22, -18, -14, -10, -6, -2, 2, 6, 10, 14, 18, 22, 26, 30,
34, 38, 42, 46, 50, 54, 58, 62, 66, 70, 74, 78, 82, 86, 89 ;
lon = -180, -175, -170, -165, -160, -155, -150, -145, -140, -135, -130,
-125, -120, -115, -110, -105, -100, -95, -90, -85, -80, -75, -70, -65,
-60, -55, -50, -45, -40, -35, -30, -25, -20, -15, -10, -5, 0, 5, 10, 15,
20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105,
110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175 ;
}
You can also use ncdump to display the data values for a
given variable in the netCDF file. This command will display the
values in the SpeciesRst_O3
variable to the screen:
$ ncdump -v SpeciesConc_O3 GEOSChem.SpeciesConc.20190701_0000z.nc4 | less
Or you can redirect the output to a file:
$ ncdump -v SpeciesConc_O3 GEOSChem.SpeciesConc.20190701_0000z.nc4 > log
Read the contents of a netCDF file
Read data with Python
The easiest way to read a netCDF file is to use the xarray Python package.
#!/usr/bin/env python
# Imports
import numpy as np
import xarray as xr
# Read a restart file into an xarray Dataset object
ds = xr.open_dataset("GEOSChem.SpeciesConc.20190701_0000z.nc4")
# Print the contents of the DataSet
print(ds)
# Print units of data
print(f"\nUnits of SpeciesRst_O3: {ds['SpeciesConc_O3'].units}")
# Print the sum, max, and min of the data
# NOTE .values returns a numpy ndarray so that we can use
# other numpy functions like np.sum() on the data
print(f"Sum of SpeciesRst_O3: {np.sum(ds['SpeciesConc_O3'].values)}")
print(f"Max of SpeciesRst_O3: {np.max(ds['SpeciesConc_O3'].values)}")
print(f"Min of SpeciesRst_O3: {np.min(ds['SpeciesConc_O3'].values)}")
This above script will print the following output:
<xarray.Dataset>
Dimensions: (ilev: 73, lat: 46, lev: 72, lon: 72, nb: 2, time: 1)
Coordinates:
* time (time) datetime64[ns] 2019-07-01T00:30:00
* lev (lev) float64 0.9925 0.9775 ... 2.635e-05 1.5e-05
* ilev (ilev) float64 1.0 0.985 0.97 ... 3.27e-05 2e-05 1e-05
* lat (lat) float64 -89.0 -86.0 -82.0 ... 82.0 86.0 89.0
* lon (lon) float64 -180.0 -175.0 -170.0 ... 170.0 175.0
Dimensions without coordinates: nb
Data variables: (12/315)
lat_bnds (lat, nb) float64 ...
lon_bnds (lon, nb) float64 ...
hyam (lev) float64 ...
hybm (lev) float64 ...
hyai (ilev) float64 ...
hybi (ilev) float64 ...
... ...
SpeciesConc_AONITA (time, lev, lat, lon) float32 ...
SpeciesConc_ALK4 (time, lev, lat, lon) float32 ...
SpeciesConc_ALD2 (time, lev, lat, lon) float32 ...
SpeciesConc_AERI (time, lev, lat, lon) float32 ...
SpeciesConc_ACTA (time, lev, lat, lon) float32 ...
SpeciesConc_ACET (time, lev, lat, lon) float32 ...
Attributes:
title: GEOS-Chem diagnostic collection: Species...
history:
format: not found
conventions: COARDS
ProdDateTime:
reference: www.geos-chem.org; wiki.geos-chem.org
contact: GEOS-Chem Support Team (geos-chem-suppor...
simulation_start_date_and_time: 2019-07-01 00:00:00z
simulation_end_date_and_time: 2019-07-01 01:00:00z
Units of SpeciesRst_O3: mol mol-1 dry
Sum of SpeciesRst_O3: 0.4052325189113617
Max of SpeciesRst_O3: 1.01212954177754e-05
Min of SpeciesRst_O3: 3.758645839013752e-09
Read data from multiple files in Python
The xarray package will also let you read data from multiple files into a single Dataset object. This is done with the open_mfdataset (open multi-file-dataset) function as shown below:
#!/usr/bin/env python
# Imports
import xarray as xr
# Create a list of files to open
filelist = [
'GEOSChem.SpeciesConc.20160101_0000z.nc4',
'GEOSChem.SpeciesConc_20160201_0000z.nc4',
...
]
# Read a restart file into an xarray Dataset object
ds = xr.open_mfdataset(filelist)
Determining if a netCDF file is COARDS-compliant
All netCDF files used as input to GEOS-Chem and/or HEMCO must adhere to the COARDS netCDF conventions. You can use the isCoards script (from our netcdf-scripts repository at GitHub) to determine if a netCDF file adheres to the COARDS conventions.
Run the isCoards
script at the command line on any netCDF file, and
you will receive a report as to which elements of the file do not
comply with the COARDS conventions.
$ isCoards myfile.nc
===========================================================================
Filename: myfile.nc
===========================================================================
The following items adhere to the COARDS standard:
---------------------------------------------------------------------------
-> Dimension "time" adheres to standard usage
-> Dimension "lev" adheres to standard usage
-> Dimension "lat" adheres to standard usage
-> Dimension "lon" adheres to standard usage
-> time(time)
-> time is monotonically increasing
-> time:axis = "T"
-> time:calendar = "gregorian"
-> time:long_name = "Time"
-> time:units = "hours since 1985-1-1 00:00:0.0"
-> lev(lev)
-> lev is monotonically decreasing
-> lev:axis = "Z"
-> lev:positive = "up"
-> lev:long_name = "GEOS-Chem levels"
-> lev:units = "sigma_level"
-> lat(lat)
-> lat is monotonically increasing
-> lat:axis = "Y"
-> lat:long_name = "Latitude"
-> lat:units = "degrees_north"
-> lon(lon)
-> lon is monotonically increasing
-> lon:axis = "X"
-> lon:long_name = "Longitude"
-> lon:units = "degrees_east"
-> OH(time,lev,lat,lon)
-> OH:long_name = "Chemically produced OH"
-> OH:units = "kg/m3"
-> OH:long_name = 1.e+30f
-> OH:missing_value = 1.e+30f
-> conventions: "COARDS"
-> history: "Mon Apr 3 08:26:19 2017"
-> title: "COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+)"
-> format: "NetCDF-3"
The following items DO NOT ADHERE to the COARDS standard:
---------------------------------------------------------------------------
-> time[0] != 0 (problem for GCHP)
The following optional items are RECOMMENDED:
---------------------------------------------------------------------------
-> Consider adding the "references" global attribute
Edit variables and attributes
As discussed in the preceding section, you may find that you need to edit your
netCDF files for COARDS-compliance. Below are several useful commands
for editing netCDF files. Many of these commands utilize the
nco
and cdo
utilities.
Display the header and coordinate variables of a netCDF file, with the time variable displayed in human-readable format. Also show status of file compression and/or chunking.
$ ncdump -cts file.nc
Compress a netCDF file. This can considerably reduce the file size!
# No deflation $ nccopy -d0 myfile.nc tmp.nc $ mv tmp.nc myfile.nc # Minimum deflation (good for most applications) $ nccopy -d1 myfile.nc tmp.nc $ mv tmp.nc myfile.nc # Medium deflation $ nccopy -d5 myfile.nc tmp.nc $ mv tmp.nc myfile.nc # Maximum deflation $ nccopy -d9 myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Change variable name from
SpeciesConc_NO
toNO
:$ ncrename -v SpeciesConc_NO,NO myfile.nc
Set all missing values to zero:
$ cdo setemisstoc,0 myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Add/change the
long_name
attribute of the vertical coordinate (lev
) toGEOS-Chem levels
. This will ensure that HEMCO recognizes the vertical levels of the input file as GEOS-Chem model levels.$ ncatted -a long_name,lev,o,c,"GEOS-Chem levels" myfile.nc
Add/change the
axis
andpositive
attributes of the vertical coordinate (lev
):$ ncatted -a axis,lev,o,c,"Z" myfile.nc $ ncatted -a positive,lev,o,c,"up" myfile.nc
Add/change the
units
attribute of the latitude (lat
) coordinate todegrees_north
:$ ncatted -a units,lat,o,c,"degrees_north" myfile.nc
Convert the
units
attribute of the CHLA variable frommg/m3
tokg/m3
$ ncap2 -v -s "CHLA=CHLA/1000000.0f" myfile.nc tmp.nc $ ncatted -a units,CHLA,o,c,"kg/m3" tmp.nc $ mv tmp.nc myfile.nc
Add/change the
references
,title
, andhistory
global attributes$ ncatted -a references,global,o,c,"www.geos-chem.org; wiki.geos-chem.org" myfile.nc $ ncatted -a history,global,o,c,"Tue Mar 3 12:18:38 EST 2015" myfile.nc $ ncatted -a title,global,o,c,"XYZ data from ABC source" myfile.nc
Remove the
references
global attribute:$ ncatted -a references,global,d,, myfile.nc
Add a
time
dimension to a file that does not have one:$ ncap2 -h -s 'defdim(“time”,1);time[time]=0.0;time@long_name=“time”;time@calendar=“standard”;time@units=“days since 2007-01-01 00:00:00”' -O myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Add a
time
dimension to a variable:# Assume myVar has lat and lon dimensions to start with $ ncap2 -h -s 'myVar[$time,$lat,$lon]=myVar;' myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Make the
time
dimension unlimited:$ ncks --mk_rec_dmn time myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Change the file reference date and time (i.e.
time:units
) from 1 Jan 1985 to 1 Jan 2000:$ cdo setreftime,2000-01-01,00:00:00 myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Shift all time values ahead or back by 1 hour in a file:
# Shift ahead 1 hour $ cdo shifttime,1hour myfile.nc tmp.nc $ mv tmp.nc myfile.nc # Shift back 1 hour $ cdo shiftime,-1hour myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Set the date of all variables in the file. (Useful for files that have only one time point.)
$ cdo setdate,2019-07-02 myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Tip
The following cdo commands are similar to cdo setdate, but allow you to manipulate other time variables:
$ cdo settime,03:00:00 ... # Sets time to 03:00 UTC $ cdo setday,26, ... # Sets day of month to 26 $ cdo setmon,10, ... # Sets month to 10 (October) $ cdo setyear,1992, ... # Sets year to 1992
See the cdo user manual for more information.
Change the
time:calendar
attribute:GEOS-Chem and HEMCO cannot read data from netCDF files where:
time:calendar = "360_day" time:calendar = "365_day" time:calendar = "noleap"
We recommend converting the calendar used in the netCDF file to the
standard
netCDF calendar with these commands:$ cdo setcalendar,standard myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Change the type of the
time
coordinate fromint
todouble
:$ ncap2 -s 'time=double(time)' myfile.nc tmp.nc $ mv tmp.nc myfile.nc
Concatenate netCDF files
There are a couple of ways to concatenate multiple netCDF files into a single netCDF file, as shown in the sections below.
Concatenate with the netCDF operators
You can use the ncrcat utility (from nco
)
to concatenate the individual netCDF files into a single netCDF file.
Let’s assume we want to combine 12 monthy data files
(e.g. month_01.nc
, month_02.nc
, .. month_12.nc
into a single file called annual_data.nc
.
First, make sure that each of the month_*nc
files has an
unlimited time
dimension. Type this at the command line:
$ ncdump -ct month_01.nc | grep "time"
Then you should see this as the first line in the output:
time = UNLIMITED ; // (1 currently)
This indicates that the time dimension is unlimited. If on the other hand you see this output:
time = 1 ;
Then it means that the time dimension is fixed. If this is the case, you will have to use the ncks command to make the time dimension unlimited, as follows:
$ ncks --mk_rec_dmn time month_01.nc tmp.nc
$ mv tmp.nc month_01.nc
... etc for the other files ...
Then use ncrcat to combine the monthly data along the time dimension, and save the result to a single netCDF file:
$ ncrcat -hO month_*nc annual_data.nc
You may then discard the month_*.nc
files if so desired.
Concatenate with Python
You can use the xarray Python package to create a single netCDF file from multiple files. Click HERE to view a sample Python script that does this.
Regrid netCDF files
The following tools can be used to regrid netCDF data files (such as GEOS-Chem restart files and GEOS-Chem diagnostic files.
Regrid with cdo
cdo
includes several tools for regridding netCDF files. For
example:
# Apply conservative regridding $ cdo remapcon,gridfile infile.nc outfile.nc
For gridfile
, you can use the files here. Also see
this reference.
Issue with cdo remapdis regridding tool
GEOS-Chem user Bram Maasakkers wrote:
I have noticed a problem regridding GEOS-Chem diagnostic file to 2x2.5 using cdo version 1.9.4. When I use:
$ cdo remapdis,geos.2x25.grid GEOSChem.Restart.4x5.nc GEOSChem.Restart.2x25.ncThe last latitudinal band (-89.5) remains empty and gets filled with the standard missing value of cdo, which is really large. This leads to immediate problems in the methane simulation as enormous concentrations enter the domain from the South Pole. For now I’ve solved this problem by just using bicubic interpolation
$ cdo remapbic,geos.2x25.grid GEOSChem.Restart.4x5.nc GEOSChem.Restart.2x25.nc
You can also use conservative regridding:
$ cdo remapcon,geos.2x25.grid GEOSChem.Restart.4x5.nc GEOSChem.Restart.2x25.nc
Regrid with GCPy
GCPy (the GEOS-Chem Python Toolkit) has contains file regridding utilities that allow you to regrid from lat/lon to cubed-sphere grids (and vice versa). Regridding weights can be generated on-the-fly, or can be archived and reused. For detailed instructions, please see the please see the GCPy Regridding documentation.
Regrid with nco
nco
also includes several regridding utilities. See the
Regridding section of the NCO User Guide for more
information.
Regrid with xarray
The xarray Python package has a built-in capability for 1-D interpolation. It wraps the SciPy interpolation module. This functionality can also be used for vertical regridding.
Regrid with xESMF
xESMF is a universal regridding tool for geospatial data, which is written in Python. It can be used to regrid data not only on cartesian grids, but also on cubed-sphere and unstructured grids.
Note
xESMF only handles horizontal regridding.
Crop netCDF files
If needed, a netCDF file can be cropped to a subset of the globe with the nco or cdo utilities (cf. Useful tools).
For example, cdo has a selbox operator for selecting a box by specifying the lat/lon bounds:
$ cdo sellonlatbox,lon1,lon2,lat1,lat2 myfile.nc tmp.nc
$ mv tmp.nc myfile.nc
See the cdo guide for more information.
Add a new variable to a netCDF file
You have a couple of options for adding a new variable to a netCDF file (for example, when having to add a new species to an existing GEOS-Chem restart file).
You can use cdo and nco utilities to copy the data from one variable to another variable. For example:
#!/bin/bash # Extract field SpeciesRst_PMN from the original restart file cdo selvar,SpeciesRst_PMN initial_GEOSChem_rst.4x5_standard.nc NPMN.nc4 # Rename selected field to SpeciesRst_NPMN ncrename -h -v SpeciesRst_PMN,Species_Rst_NPMN NMPN.nc4 # Append new species to existing restart file ncks -h -A -M NMPN.nc4 initial_GEOSChem_rst.4x5_standard.nc
Sal Farina wrote a simple Python script for adding a new species to a netCDF restart file:
#!/usr/bin/env python import netCDF4 as nc import sys import os for nam in sys.argv[1:]: f = nc.Dataset(nam,mode='a') try: o = f['SpeciesRst_OCPI'] except: print "SpeciesRst_OCPI not defined" f.createVariable('SpeciesRst_SOAP',o.datatype,dimensions=o.dimensions,fill_value=o._FillValue) soap = f['SpeciesRst_SOAP'] soap[:] = 0.0 soap.long_name= 'SOAP species' soap.units = o.units soap.add_offset = 0.0 soap.scale_factor = 1.0 soap.missing_value = 1.0e30 f.close()
Bob Yantosca wrote this Python script to insert a fake species into GEOS-Chem Classic and GCHP restart files (13.3.0)
#!/usr/bin/env python """ Adds an extra DataArray for into restart files: Calling sequence: ./append_species_into_restart.py """ # Imports import gcpy.constants as gcon import xarray as xr from xarray.coding.variables import SerializationWarning import warnings # Suppress harmless run-time warnings (mostly about underflow or NaNs) warnings.filterwarnings("ignore", category=RuntimeWarning) warnings.filterwarnings("ignore", category=UserWarning) warnings.filterwarnings("ignore", category=SerializationWarning) def main(): """ Appends extra species to restart files. """ # Data vars to skip skip_vars = gcon.skip_these_vars # List of dates file_list = [ 'GEOSChem.Restart.fullchem.20190101_0000z.nc4', 'GEOSChem.Restart.fullchem.20190701_0000z.nc4', 'GEOSChem.Restart.TOMAS15.20190701_0000z.nc4', 'GEOSChem.Restart.TOMAS40.20190701_0000z.nc4', 'GCHP.Restart.fullchem.20190101_0000z.c180.nc4', 'GCHP.Restart.fullchem.20190101_0000z.c24.nc4', 'GCHP.Restart.fullchem.20190101_0000z.c360.nc4', 'GCHP.Restart.fullchem.20190101_0000z.c48.nc4', 'GCHP.Restart.fullchem.20190101_0000z.c90.nc4', 'GCHP.Restart.fullchem.20190701_0000z.c180.nc4', 'GCHP.Restart.fullchem.20190701_0000z.c24.nc4', 'GCHP.Restart.fullchem.20190701_0000z.c360.nc4', 'GCHP.Restart.fullchem.20190701_0000z.c48.nc4', 'GCHP.Restart.fullchem.20190701_0000z.c90.nc4' ] # Keep all netCDF attributes with xr.set_options(keep_attrs=True): # Loop over dates for f in file_list: # Input and output files infile = '../' + f outfile = f print("Creating " + outfile) # Open input file ds = xr.open_dataset(infile, drop_variables=skip_vars) # Create a new DataArray from a given species (EDIT ACCORDINGLY) if "GCHP" in infile: dr = ds["SPC_ETO"] dr.name = "SPC_ETOO" else: dr = ds["SpeciesRst_ETO"] dr.name = "SpeciesRst_ETOO" # Update attributes (EDIT ACCORDINGLY) dr.attrs["FullName"] = "peroxy radical from ethene" dr.attrs["Is_Gas"] = "true" dr.attrs["long_name"] = "Dry mixing ratio of species ETOO" dr.attrs["MW_g"] = 77.06 # Merge the new DataArray into the Dataset ds = xr.merge([ds, dr], compat="override") # Create a new file ds.to_netcdf(outfile) # Free memory by setting ds to a null dataset ds = xr.Dataset() if __name__ == "__main__": main()
Chunk and deflate a netCDF file to improve I/O
We recommend that you chunk the data in your netCDF file. Chunking specifies the order in along which the data will be read from disk. The Unidata web site has a good overview of why chunking a netCDF file matters.
For GEOS-Chem with the high-performance option (aka GCHP), the best file I/O performance occurs when the file is split into one chunk per level (assuming your data has a lev dimension). This allows each individual vertical level of data to be read in parallel.
You can use the nccopy command of nco
to do the
chunking. For example, say you have a netCDF file called
myfile.nc
with these dimensions:
dimensions:
time = UNLIMITED ; // (12 currently)
lev = 72 ;
lat = 181 ;
lon = 360 ;
Then you can use the nccopy command to apply the optimal chunking along levels:
$ nccopy -c lon/360,lat/181,lev/1,time/1 -d1 myfile.nc tmp.nc
$ mv tmp.nc myfile.nc
This will create a new file called tmp.nc
that has the proper
chunking. We then replace myfile.nc
with this temporary file.
You can specify the chunk sizes that will be applied to the variables
in the netCDF file with the -c argument to
nccopy. To obtain the optimal chunking, the lon
chunksize must be identical to the number of values along the
longitude dimension (e.g. lon/360
and the lat
chunksize must be equal to the number of points in the latitude
dimension (e.g. lat/181
).
We also recommend that you deflate (i.e. compress) the netCDF data variables at the same time you apply the chunking. Deflating can substantially reduce the file size, especially for emissions data that are only defined over the land but not over the oceans. You can deflate the data in a netCDF file by specifying the -d argumetnt to nccopy. There are 10 possible deflation levels, ranging from 0 (no deflation) to 9 (max deflation). For most purposes, a deflation level of 1 (d1) is sufficient.
The GEOS-Chem Support Team has created a Perl script named nc_chunk.pl (contained in the netcdf-scripts repository at GitHub) that will automatically chunk and compress data for you.
$ nc_chunk.pl myfile.nc # Chunk netCDF file
$ nc_chunk.pl myfile.nc 1 # Chunk and compress file using deflate level 1
You can use the ncdump -cts myfile.nc command to view the chunk size and deflation level in the file. After applying the chunking and compression to myfile.nc, you would see output such as this:
dimensions:
time = UNLIMITED ; // (12 currently)
lev = 72 ;
lat = 181 ;
lon = 360 ;
variables:
float PRPE(time, lev, lat, lon) ;
PRPE:long_name = "Propene" ;
PRPE:units = "kgC/m2/s" ;
PRPE:add_offset = 0.f ;
PRPE:scale_factor = 1.f ;
PRPE:_FillValue = 1.e+15f ;
PRPE:missing_value = 1.e+15f ;
PRPE:gamap_category = "ANTHSRCE" ;
PRPE:_Storage = "chunked" ;
PRPE:_ChunkSizes = 1, 1, 181, 360 ;
PRPE:_DeflateLevel = 1 ;
PRPE:_Endianness = "little" ;\
float CO(time, lev, lat, lon) ;
CO:long_name = "CO" ;
CO:units = "kg/m2/s" ;
CO:add_offset = 0.f ;
CO:scale_factor = 1.f ;
CO:_FillValue = 1.e+15f ;
CO:missing_value = 1.e+15f ;
CO:gamap_category = "ANTHSRCE" ;
CO:_Storage = "chunked" ;
CO:_ChunkSizes = 1, 1, 181, 360 ;
CO:_DeflateLevel = 1 ;
CO:_Endianness = "little" ;\
The attributes that begin with a _
character are “hidden”
netCDF attributes. They represent file properties instead of
user-defined properties (like the long name, units, etc.). The
“hidden” attributes can be shown by adding the -s argument
to ncdump.
Prepare COARDS-compliant netCDF files
On this page we discuss how you can generate netCDF data files in the proper format for HEMCO and and GEOS-Chem.
The COARDS netCDF standard
The Harmonized Emissions Compionent (HEMCO) reads data stored in the netCDF file format, which is a common data format used in atmospheric and climate sciences. NetCDF files contain data arrays as well as metadata, which is a description of the data.
Several netCDF conventions have been developed in order to facilitate data exchange and visualization. The Cooperative Ocean Atmosphere Research Data Service (COARDS) standard defines regular conventions for naming dimensions as well as the attributes describing the data. You will find more information about these conventions in the sections below. HEMCO requires its input data to be adhere to the COARDS standard.
Our our “Work with netCDF files” supplemental guide contains detailed instructions on how you can check a netCDF file for COARDS compliance.
COARDS dimensions
The dimensions of a netCDF file define how many grid boxes there are along a given direction. While the COARDS standard does not require any specific n
ames for dimensions, accepted practice is to use these names for rectilinear grids:
- time
Specifies the number of points along the time (
T
) axis.The
time
dimension must always be specified. When you create the netCDF file, you may declaretime
to beUNLIMITED
and then later define its size. This allows you to append further time points into the file later on.
- lev
Specifies the number of points along the vertical level (
Z
) axis.This dimension may be omitted none of the data arrays in the netCDF file have a vertical dimension.
- lat
Specifies the number of points along the latitude (
Y
) axis.
- lon
Specifies the number of points along the longitude (
X
) axis.
COARDS coordinate vectors
Coordinate vectors (aka index variables or axis variables) are 1-dimensional arrays that define the values along each axis.
The only COARDS requirement for coordinate vectors are these:
Each coordinate vector must be given the same name as the dimension that is used to define it.
All of the values contained within a coordinate vector must be either monotonically increasing or monotonically decreasing.
time
A COARDS-compliant time
coordinate vector will have these features:
dimensions
time = UNLIMITED ; // (12 currently)
. . .
variables
double time(time) ;
time:long_name = "time" ;
time:units = "hours since 2010-01-01 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T";
Note
The above was generated by the ncdump command.
As you can see, time
is an 8-byte floating point (aka
REAL*8
with 12 time points.
The time
coordinate vector has following attributes:
- time:long_name
A detailed description of the contents of this array. This is usually set to
time
orTime
.
- time:units
Specifies the number of hours, minutes, seconds, etc. that has elapsed with respect to a reference datetime
YYYY-MM-DD hh:mn:ss
. Set this to one of the folllowing values:"days since YYYY-MM-DD hh:mn:ss"
"hours since YYYY-MM-DD hh:mn:ss"
"minutes since YYYY-MM-DD hh:mn:ss"
"seconds since YYYY-MM-DD hh:mn:ss"
Tip
We recommend that you choose the reference datetime to correspond to the first time value in the file (i.e.
time(0) = 0
).
- time:calendar
Specifies the calendar used to define the time system. Set this to one of the following values:
- gregorian
Selects the Gregorian calendar system.
- time:axis
Identifies the axis
(X,Y,Z,T)
corresponding to this coordinate vector. Set this toT
.
Special considerations for time vectors
We recommend that index variables (such as
time
) be declared with typefloat
ordouble
. GCHP cannot parse files with that have index variables of typeint
.
We have noticed that netCDF files having a
time:units
reference datetime prior to1900/01/01 00:00:00
may not be read properly when using HEMCO or GCHP within an ESMF environment. We therefore recommend that you use reference datetime values after 1900 whenever possible.
Weekly data must contain seven time slices in increments of one day. The first entry must represent Sunday data, regardless of the real weekday of the assigned datetime. It is possible to store weekly data for more than one time interval, in which case the first weekday (i.e. Sunday) must hold the starting date for the given set of (seven) time slices.
For instance, weekly data for every month of a year can be stored as 12 sets of 7 time slices. The reference datetime of the first entry of each set must fall on the first day of every month, and the following six entries must be increments of one day.
Currently, weekly data from netCDF files is not correctly read in an ESMF environment.
lev
A COARDS-compliant lev
coordinate vector will have these features:
dimensions:
lev = 72 ;
. . .
variables:
double lev(lev) ;
lev:long_name = "level" ;
lev:units = "level" ;
lev:positive = "up" ;
lev:axis = "Z" ;
Here, lev
is an 8-byte floating point (aka
REAL*8
) with 72 levels.
The lev
coordinate vector has the following attributes:
- lev:long_name
A detailed description of the contents of this array. You may set this to values such as:
"level"
"GEOS-Chem levels"
"Eta centers"
"Sigma centers"
- lev:units
(Required) Specifies the units of vertical levels. Set this to one of the following:
"levels"
"eta_level"
"sigma_level"
Important
If you set
long_name:
tolevel
as well, then HEMCO will be able to regrid between GEOS-Chem vertical grids.
- lev:axis
Identifies the axis
(X,Y,Z,T)
corresponding to this coordinate vector. Set this toZ
.
- lev:positive
Specifies the direction in which the vertical dimension is indexed. Set this to one of these values:
"up"
(Level 1 is the surface, and level indices increase upwards)"down"
(Level 1 is the atmosphere top, and level indices increase downwards)
For emisisons and most other data sets, you can set
lev:positive
to"up"
.Important
GCHP and the NASA GEOS-ESM use a vertical grid where
lev:positive
is"down"
.
Additional considerations for lev vectors:
When using GEOS-Chem or HEMCO in a non-ESMF environment, data is interpolated onto the simulation levels if the input data is on vertical levels other than the HEMCO model levels (see HEMCO vertical regridding).
Data on non-model levels must be on a hybrid sigma pressure coordinate system. In order to properly determine the vertical pressure levels of the input data, the file must contain the surface pressure values and the hybrid coefficients (a, b) of the coordinate system. Furthermore, the level variable must contain the attributes standard_name and formula_terms (the attribute positive is recommended but not required). A header excerpt of a valid netCDF file is shown below:
float lev(lev) ;
lev:standard_name = ”atmosphere_hybrid_sigma_pressure_coordinate” ;
lev:units = ”level” ;
lev:positive = ”down” ;
lev:formula_terms = ”ap: hyam b: hybm ps: PS” ;
float hyam(nhym) ;
hyam:long_name = ”hybrid A coefficient at layer midpoints” ;
hyam:units = ”hPa” ;
float hybm(nhym) ;
hybm:long_name = ”hybrid B coefficient at layer midpoints” ;
hybm:units = ”1” ;
float time(time) ;
time:standard_name = ”time” ;
time:units = ”days since 2000-01-01 00:00:00” ;
time:calendar = ”standard” ;
float PS(time, lat, lon) ;
PS:long_name = ”surface pressure” ;
PS:units = ”hPa” ;
float EMIS(time, lev, lat, lon) ;
EMIS:long_name = ”emissions” ;
EMIS:units = ”kg m-2 s-1” ;
lat
A COARDS-compliant lat
coordinate vector will have these features:
dimensions:
lat = 181 ;
variables:``
double lat(lat) ;
lat:long_name = "Latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
Here, lat
is an 8-byte floating point (aka
REAL*8
) with 181 values.
The lat
coordinate vector has the following attributes:
- lat:long_name
A detailed description of the contents of this array. Set this to
Latitude
.
- lat:units
Specifies the units of latitude. Set this to
degrees_north
.
- lat:axis
Identifies the axis
(X,Y,Z,T)
corresponding to this coordinate vector. Set this toY
.
lon
A COARDS-compliant lat
coordinate vector will have these features:
dimensions:
lon = 360 ;
variables:``
double lon(lon) ;
lon:long_name = "Longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
Here, lon
is an 8-byte floating point (aka
REAL*8
) with 360 values.
The lon
coordinate vector has following attributes:
- lon:long_name
A detailed description of the contents of this array. Set this to
Longitude
.
- lon:units
Specifies the units of latitude. Set this to
degrees_east
.
- lon:axis
Identifies the axis
(X,Y,Z,T)
corresponding to this coordinate vector. Set this toX
.
Longitudes may be represented modulo 360. For example, -180, 180, and 540 are all valid representations of the International Dateline and 0 and 360 are both valid representations of the Prime Meridian. Note, however, that the sequence of numerical longitude values stored in the netCDF file must be monotonic in a non-modulo sense.
Practical guidelines:
If your grid begins at the International Dateline (-180°), then place your longitudes into the range -180..180.
If your grid begins at the Prime Meridian (0°), then place your longitudes into the range 0..360.
COARDS data arrays
A COARDS-compliant netCDF file may contain several data arrays. In our example file shown above, there are two data arrays:
dimensions:
time = UNLIMITED ; // (12 currently)
lev = 72 ;
lat = 181 ;
lon = 360 ;
variables:``
float PRPE(time, lev, lat, lon) ;
PRPE:long_name = "Propene" ;
PRPE:units = "kgC/m2/s" ;
PRPE:add_offset = 0.f ;
PRPE:missing_value = 1.e+15f ;
float CO(time, lev, lat, lon) ;``
CO:long_name = "CO" ;
CO:units = "kg/m2/s" ;
CO:_FillValue = 1.e+15f ;
CO:missing_value = 1.e+15f ;
These arrays contain emissions for species tracers PRPE (lumped < C3 alkenes) and CO.
Attributes for data arrays
- long_name
Gives a detailed description of the contents of the array.
- units
Specifies the units of data contained within the array. SI units are preferred.
Special usage for HEMCO:
Use
kg/m2/s
orkg m-2 s-1
for emission fluxes of speciesUse
kg/m3
orkg m-3
for concentration data;Use
1
for dimensionless data instead ofunitless
. HEMCO will recognizeunitless
, but it is non-standard and not recommended.
- missing_value
Specifies the value that should represent missing data. This should be set to a number that will not be mistaken for a valid data value.
- _FillValue
Synonym for
missing_value
. It is recommended to set bothmissing_value
and_FillValue
to the same value. Some data visualization packages look for one but not the other.
Ordering of the data
2D and 3D array variables in netCDF files must have specific dimension order. If the order is incorrect you will encounter netCDF read error “start+count exceeds dimension bound”. You can check the dimension ordering of your arrays by using the ncdump command as shown below:
$ ncdump file.nc -h
Be sure to check the dimensions listed next to the array name rather than the ordering of the dimensions listed at the top of the ncdump output.
The following dimension orders are acceptable:
array(time,lat,lon)
array(time,lat,lon,lev)
The rest of this section explains why the dimension ordering of arrays matters.
When you use ncdump to examine the contents of a netCDF file, you will notice that it displays the dimensions of the data in the opposite order with respect to Fortran. In our sample file, ncdump says that the CO and PRPE arrays have these dimensions:
CO(time,lev,lat,lon)
PRPE(time,lev,lat,lon)
But if you tried to read this netCDF file into GEOS-Chem (or any other program written in Fortran), you must use data arrays that have these dimensions:
CO(lon,lat,lev,time)
PRPE(lon,lat,lev,time)
Here’s why:
Fortran is a column-major language, which means that arrays are stored in memory by columns first, then by rows. If you have declared an arrays such as:
INTEGER :: I, J, L, T
INTEGER, PARAMETER :: N_LON = 360
INTEGER, PARAMETER :: N_LAT = 181
INTEGER, PARAMETER :: N_LEV = 72
INTEGER, PARAMTER :: N_TIME = 12
REAL*4 :: CO (N_LON,N_LAT,N_LEV,N_TIME)
REAL*4 :: PRPE(N_LON,N_LAT,N_LEV,N_TIME)
then for optimal efficiency, the leftmost dimension (I
) needs
to vary the fastest, and needs to be accessed by the innermost
DO-loop. Then the next leftmost dimension (J
) should be
accessed by the next innermost DO-loop, and so on. Therefore, the
proper way to loop over these arrays is:
DO T = 1, N_TIME
DO L = 1, N_LEV
DO J = 1, N_LAT
DO I = 1, N_LON
CO (I,J,L,N) = ...
PRPE(I,J,L,N) = ...
ENDDO
ENDDO
ENDDO
ENDDO
Note that the I
index is varying most often, since it is the
innermost DO-loop, then J
, L
, and T
. This is
opposite to how a car’s odometer reads.
If you loop through an array in this fashion, with leftmost indices varying fastest, then the code minimizes the number of times it has to load subsections of the array into cache memory. In this optimal manner of execution, all of the array elements sitting in the cache memory are read in the proper order before the next array subsection needs to be loaded into the cache. But if you step through array elements in the wrong order, the number of cache loads is proportionally increased. Because it takes a finite amount of time to reload array elements into cache memory, the more times you have to access the cache, the longer it will take the code to execute. This can slow down the code dramatically.
On the other hand, C is a row-major language, which means that arrays
are stored by rows first, then by columns. This means that the outermost
do loop (I
) is varying the fastest. This is identical to how a
car’s odometer reads.
If you use a Fortran program to write data to disk, and then try to read that data from disk into a program written in C, then unless you reverse the order of the DO loops, you will be reading the array in the wrong order. In C you would have to use this ordering scheme (using Fortran-style syntax to illustrate the point):
DO I = 1, N_LON
DO J = 1, N_LAT
DO L = 1, N_LEV
DO T = 1, N_TIME
CO(T,L,J,I) = ...
PRPE(T,L,J,I) = ...
ENDDO
ENDDO
ENDDO
ENDDO
Because ncdump is written in C, the order of the array appears opposite with respect to Fortran. The same goes for any other code written in a row-major programming language.
COARDS Global attributes
Global attributes are netCDF attributes that contain information about a netCDF file, as opposed to information about an individual data array.
From our example in the Examine the contents of a netCDF file, the output from ncdump showed that our sample netCDF file has several global attributes:
// global attributes:
:Title = "COARDS/netCDF file containing X data"
:Contact = "GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" ;
:References = "www.geos-chem.org; wiki.geos-chem.org" ;
:Conventions = "COARDS" ;
:Filename = "my_sample_data_file.1x1"
:History = "Mon Mar 17 16:18:09 2014 GMT" ;
:ProductionDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
:ModificationDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
:VersionID = "1.2" ;
:Format = "NetCDF-3" ;
:Model = "GEOS5" ;
:Grid = "GEOS_1x1" ;
:Delta_Lon = 1.f ;
:Delta_Lat = 1.f ;
:SpatialCoverage = "global" ;
:NLayers = 72 ;
:Start_Date = 20050101 ;
:Start_Time = 00:00:00.0 ;
:End_Date = 20051231 ;
:End_Time = 23:59:59.99999 ;
- Title (or title)
Provides a short description of the file.
- Contact (or contact)
Provides contact information for the person(s) who created the file.
- References (or references)
Provides a reference (citation, DOI, or URL) for the data contained in the file.
- Conventions (or conventions)
Indicates if the netCDF file adheres to a standard (e.g. COARDS or CF).
- Filename (or filename)
Specifies the name of the file.
- History (or history)
Specifies the datetime of file creation, and of any subsequent modifications.
Note
If you edit the file with nco or cdo, then this attribute will be updated to reflect the modification that was done.
- Format (or format)
Specifies the format of the netCDF file (such as
netCDF-3
ornetCDF-4
).
For more information
Please see our Work with netCDF files Supplemental Guide for more information about commands that you can use to combine, edit, or maniuplate data in netCDF files.
View GEOS-Chem species properties
Properties for GEOS-Chem species are stored in the GEOS-Chem
Species Database, which is a YAML file
(species_database.yml
) that is placed into each GEOS-Chem run
directory.
View species properties from the current stable GEOS-Chem version:
Species properties defined
The following sections contain a detailed description of GEOS-Chem species properties.
Required default properties
All GEOS-Chem species should have these properties defined:
Name:
FullName: full name of the species
Formula: chemical formula of the species
MW_g: molecular weight of the species in grams
EITHER Is_Gas: true
OR Is_Aerosol: true
All other properties are species-dependent. You may omit properties
that do not apply to a given species. GEOS-Chem will assign a “missing
value” (e.g. false
, -999
, -999.0
, or,
UNKNOWN
) to these properties when it reads the
species_database.yml
file from disk.
Identification
- Name
Species short name (e.g.
ISOP
).
- Formula
Species chemical formula (e.g.
CH2=C(CH3)CH=CH2
). This is used to define the species’formula
attribute, which gets written to GEOS-Chem diagnostic files and restart files.
- FullName
Species long name (e.g.
Isoprene
). This is used to define the species’long_name
attribute, which gets written to GEOS-Chem diagnostic files and restart files.
- Is_Aerosol
Indicates that the species is an aerosol (
true
), or isn’t (false
).
- Is_Advected
Indicates that the species is advected (
true
), or isn’t (false
).
- Is_DryAlt
Indicates that dry deposition diagnostic quantities for the species can be archived at a specified altitude above the surface (
true
), or can’t (false
).Note
The
Is_DryAlt
flag only applies to speciesO3
andHNO3
.
- Is_DryDep
Indicates that the species is dry deposited (
true
), or isn’t (false
).
- Is_HygroGrowth
Indicates that the species is an aerosol that is capable of hygroscopic growth (
true
), or isn’t (false
).
- Is_Gas
Indicates that the species is a gas (
true
), or isn’t (false
).
- Is_Hg0
Indicates that the species is elemental mercury (
true
), or isn’t (false
).
- Is_Hg2
Indicates that the species is a mercury compound with oxidation state +2 (
true
), or isn’t (false
).
- Is_HgP
Indicates that the species is a particulate mercury compound (
true
), or isn’t (false
).
- Is_Photolysis
Indicates that the species is photolyzed (
true
), or isn’t (false
).
- Is_RadioNuclide
Indicates that the species is a radionuclide (
true
), or isn’t (false
).
Physical properties
- Density
Density (\(kg\ m^{-3}\)) of the species. Typically defined only for aerosols.
- Henry_K0
Henry’s law solubility constant (\(M\ atm^{-1}\)), used by the default wet depositon scheme.
- Henry_K0_Luo
Henry’s law solubility constant (\(M\ atm^{-1}\)) used by the Luo et al. [2020] wet deposition scheme.
- Henry_CR
Henry’s law volatility constant (\(K\)) used by the default wet deposition scheme.
- Henry_CR_Luo
Henry’s law volatility constant (\(K\)) used by the Luo et al. [2020] wet deposition scheme.
- Henry_pKa
Henry’s Law pH correction factor.
- MW_g
Molecular weight (\(g\ mol^{-1}\)) of the species.
Note
Some aerosol-phase species (such as MONITA and IONITA) are given the molar mass corresponding to the number of nitrogens that they carry, whereas gas-phase species (MONITS and MONITU) get the full molar mass of the compounds that they represent. This treatment has its origins in J. Fisher et al [2016].
- Radius
Radius (\(m\)) of the species. Typically defined only for aerosols.
Dry deposition properties
- DD_AeroDryDep
Indicates that dry deposition should consider hygroscopic growth for this species (
true
), or shouldn’t (false
).Note
DD_AeroDryDep
is only defined for sea salt aerosols.
- DD_DustDryDep
Indicates that dry deposition should exclude hygroscopic growth for this species (
true
), or shouldn’t (false
).Note
DD_DustDryDep
is only defined for mineral dust aerosols.
- DD_DvzAerSnow
Specifies the dry deposition velocity (\(cm\ s^{-1}\)) over ice and snow for certain aerosol species. Typically,
DD_DvzAerSnow = 0.03
.
- DD_DvzAerSnow_Luo
Specifies the dry deposition velocity (\(cm\ s^{-1}\)) over ice and snow for certain aerosol species.
Note
DD_DvzAerSnow_Luo
is only used when the Luo et al. [2020] wet scavenging scheme is activated.
- DD_DvzMinVal
Specfies minimum dry deposition velocities (\(cm\ s^{-1}\)) for sulfate species (
SO2
,SO4
,MSA
,NH3
,NH4
,NIT
). This follows the methodology of the GOCART model.DD_DvzMinVal
is defined as a two-element vector:DD_DvzMinVal(1)
sets a minimum dry deposition velocity onto snow and ice.DD_DvzMinVal(2)
sets a minimum dry deposition velocity over land.
- DD_Hstar_Old
Specifies the Henry’s law constant (\(K_0\)) that is used in dry deposition. This will be used to assign the
HSTAR
variable in the GEOS-Chem dry deposition module.Note
The value of the
DD_Hstar_old
parameter was tuned for each species so that the dry deposition velocity would match observations.
- DD_F0
Specifies the reactivity factor for oxidation of biological substances in dry deposition.
- DD_KOA
Specifies the octanal-air partition coefficient, used for the dry deposition of species
POPG
.Note
DD_KOA
is only used in the POPs simulation.
Wet deposition properties
- WD_Is_H2SO4
Indicates that the species is
H2SO4
(true
), or isn’t (false)
. This allows the wet deposition code to perform special calculations when computingH2SO4
rainout and washout.
- WD_Is_HNO3
Indicates that the species is
HNO3
(true
), or isn’t (false)
. This allows the wet deposition code to perform special calculations when computingHNO3
. rainout and washout.
- WD_Is_SO2
Indicates that the species is
SO2
(true
), or isn’t (false)
. This allows the wet deposition code to perform special calculations when computingSO2
rainout and washout.
- WD_CoarseAer
Indicates that the species is a coarse aerosol (
true
), or isn’t (false
). For wet deposition purposes, the definition of coarse aerosol is radius > 1 \(\mu m\).
- WD_LiqAndGas
Indicates that the the ice-to-gas ratio can be computed for this species by co-condensation (
true
), or can’t (false
).
- WD_ConvFacI2G
Specifies the conversion factor (i.e. ratio of sticking coefficients on the ice surface) for computing the ice-to-gas ratio by co-condensation, as used in the default wet deposition scheme.
Note
WD_ConvFacI2G
only needs to be defined for those species for whichWD_LiqAndGas
istrue
.
- WD_ConvFacI2G_Luo
Specifies the conversion factor (i.e. ratio of sticking coefficients on the ice surface) for computing the ice-to-gas ratio by co-condensation, as used in the Luo et al. [2020] wet deposition scheme.
Note
WD_ConvFacI2G_Luo
only needs to be defined for those species for whichWD_LiqAndGas
istrue
, and is only used when the Luo et al. [2020] wet deposition scheme is activated.
- WD_RetFactor
Specifies the retention efficiency \(R_i\) of species in the liquid cloud condensate as it is converted to precipitation. \(R_i\) < 1 accounts for volatization during riming.
- WD_AerScavEff
Specifies the aerosol scavenging efficiency. This factor multiplies \(F\), the fraction of aerosol species that is lost to convective updraft scavenging.
WD_AerScavEff = 1.0
for most aerosols.WD_AerScavEff = 0.8
for secondary organic aerosols.WD_AerScavEff = 0.0
for hydrophobic aerosols.
- WD_KcScaleFac
Specifies a temperature-dependent scale factor that is used to multiply \(K\) (aka \(K_c\)), the rate constant for conversion of cloud condensate to precipitation.
WD_KcScaleFac
is defined as a 3-element vector:WD_KcScaleFac(1)
multiplies \(K\) when \(T < 237\) kelvin.WD_KcScaleFac(2)
multiplies \(K\) when \(237 \le T < 258\) kelvinWD_KcScaleFac(3)
multiplies \(K\) when \(T \ge 258\) kelvin.
- WD_KcScaleFac_Luo
Specifies a temperature-dependent scale factor that is used to multiply \(K\), aka \(K_c\), the rate constant for conversion of cloud condensate to precipitation.
Used only in the Luo et al. [2020] wet deposition scheme.
WD_KcScaleFac_Luo
is defined as a 3-element vector:WD_KcScaleFac_Luo(1)
multiplies \(K\) when \(T < 237\) kelvin.WD_KcScaleFac_Luo(2)
multiplies \(K\) when \(237 \le T < 258\) kelvin.WD_KcScaleFac_Luo(3)
multiplies \(K\) when \(T \ge 258\) kelvin.
- WD_RainoutEff
Specifies a temperature-dependent scale factor that is used to multiply \(F_i\) (aka
RAINFRAC
), the fraction of species scavenged by rainout.WD_RainoutEff
is defined as a 3-element vector:WD_RainoutEff(1)
multiplies \(F_i\) when \(T < 237\) kelvin.WD_RainoutEff(2)
multiplies \(F_i\) when \(237 \le T < 258\) kelvin.RainoutEff(3)
multiplies \(F_i\) when \(T \ge 258\) kelvin.
This allows us to better simulate scavenging by snow and impaction scavenging of BC. For most species, we need to be able to turn off rainout when \(237 \le T < 258\) kelvin. This can be easily done by setting
RainoutEff(2) = 0
.Note
For SOA species, the maximum value of
WD_RainoutEff
will be 0.8 instead of 1.0.
- WD_RainoutEff_Luo
Specifies a temperature-dependent scale factor that is used to multiply \(F_i\) (aka
RAINFRAC
), the fraction of species scavenged by rainout. (Used only in the [Luo et al., 2020] wet deposition scheme).WD_RainoutEff_Luo
is defined as a 3-element vector:WD_RainoutEff_Luo(1)
multiplies \(F_i\) when \(T < 237\) kelvin.WD_RainoutEff_Luo(2)
multiplies \(F_i\) when \(237 \le T < 258\) kelvin.RainoutEff_Luo(3)
multiplies \(F_i\) when \(T \ge 258\) kelvin.
This allows us to better simulate scavenging by snow and impaction scavenging of BC. For most species, we need to be able to turn off rainout when \(237 \le T < 258\) kelvin. This can be easily done by setting
RainoutEff(2) = 0
.Note
For SOA species, the maximum value of
WD_RainoutEff_Luo
will be 0.8 instead of 1.0.
Transport tracer properties
These properties are defined for species used in the TransportTracers simulation. We will refer to these species as tracers.
- Is_Tracer
Indicates that the species is a transport tracer (
true
), or is not (false
).
- Snk_Horiz
Specifies the horizontal domain of the tracer sink term. Allowable values are:
- all
The tracer sink term will be applied throughout the entire horizonatal domain of the simulation grid.
- Snk_Lats
Defines the latitude range
[min_latitude, max_latitude]
for the tracer sink term. Will only be used ifSnk_Horiz
is set tolat_zone
.
- Snk_Mode
Specifies how the tracer sink term will be applied. Allowable values are:
- efolding
The tracer sink term has an e-folding decay constant (specified in
Snk_Period
).
- halflife
A tracer sink term has a half-life (specified in
Snk_Period
).
- none
The tracer does not have a sink term.
- Snk_Period
Specifies the period (in days) for which the tracer sink term will be applied.
- Snk_Value
Specifies a value for the tracer sink term.
- Snk_Vert
Specifies the vertical domain of the tracer sink term. Allowable values are:
- all
The tracer sink term will be applied throughout the entire vertical domain of the simulation grid.
- boundary_layer
The tracer sink term will only be applied within the planetary boundary layer.
- surface
The tracer sink term will only be applied at the surface.
- troposphere
The tracer sink term will only be applied within the troposphere.
- Src_Add
Specifies whether the tracer has a source term (
true
) or not (false
).
- Src_Horiz
Specifies the horizontal domain of the tracer source term. Allowable values are:
- all
The tracer source term will be applied across the entire horizontal extent of the simulation grid.
- Src_Lats
Defines the latitude range
[min_latitude, max_latitude]
for the tracer source term. Will only be applied ifSrc_Horiz
is set tolat_zone
.
- Src_Mode
Describes the type of tracer source term. Allowable values are:
- decay_of_another_species
The tracer source term comes from the decay of another species (e.g. Pb210 source comes from Rn222 decay).
- HEMCO
The tracer source term will be read from a file via HEMCO.
- maintain_mixing_ratio
The tracer source term will be calculated as needed to maintain a constant mixing ratio at the surface.
- none
The tracer does not have a source term.
- Src_Unit
Specifies the unit of the source term that will be applied to the tracer.
- ppbv
The source term has units of parts per billion by volume.
- timestep
The source term has units of per emissions timestep.
- Src_Value
Specifies a value for the tracer source term in
Src_Units
.
- Src_Vert
Specifies the vertical domain of the tracer source term. Allowable values are:
- all
The tracer source term will be applied throughout the entire vertical domain of the simulation grid.
- pressures
The tracer source term will only be applied within the pressure range specified in
Src_Pressures
.
- stratosphere
The tracer source term will only be applied in the stratosphere.
- troposphere
The tracer source term will only be applied in the troposphere.
- surface
The tracer source term will only be applied at the surface.
- Src_Pressures
Defines the pressure range
[min_pressure, max_pressure]
, in hPa for the tracer source term. Will only be used ifSrc_Vert
is set topressures
.
- Units
Specifies the default units of the tracers (e.g.
aoa
,aoa_nh
,aoa_bl
are carried in unitsdays
, while all other species in GEOS-Chem arekg/kg dry air
).
Properties used by each transport tracer
The list below shows the various transport tracer properties that are used in the current TransportTracers simulation.
Is_Tracer
- true : all
Snk_Horiz:
- lat_zone : aoa_nh
- all : all others
Snk_Lats
- 30 50 : aoa_nh
Snk_Mode
- constant : aoa, aoa_bl, aoa_nh
- efolding : CH3I, CO_25
- none : SF6
- halflife : Be7, Be7s, Be10, Be10s
Snk_Period (days)
- 5 : CH3I
- 25 : CO_25
- 50 : CO_50
- 90 : e90, e90_n, e90_s
- 11742.8 : Pb210, Pb210s
- 5.5 : Rn222
- 53.3 : Be7, Be7s
- 5.84e8 : Be10, Be10s
Snk_Value
- 0 : aoa, aoa_bl, aoa_nh
Snk_Vert
- boundary_layer : aoa_bl
- surface : aoa, aoa_nh
- troposphere : stOx
- all : all others
Src_Add
- false : Passive, stOx, st80_25
- true : all others
Src_Horiz
- lat_zone : e90_n, e90_s, nh_5, nh_50
- all : all others
Src_Lats
- [ 40.0, 91.0] : e90_n
- [-91.0, -40.0] : e90_s
- [ 30.0, 50.0] : nh_5, nh_50
Src_Mode
- constant : aoa, aoa_bl, aoa_nh, nh_50, nh_5, st80_25
- file2d : CH3I, CO_25, CO_50, Rn222, SF6 - HEMCO
- file3d : Be10, Be7 - HEMCO
- maintain_mixing_ratio : e_90, e90_n, e90_s
- decay_of_another_species : Pb210, Pb210s
Src_Unit
- ppbv : e90, e90_n, e90_s, st80_25
- timestep : aoa, aoa_bl, aoa_nh
Src_Value
- 1 : aoa, aoa_bl, aoa_nh
- 100 : e90, e90_n, e90_s
- 200 : st80_25
Src_Vert
- all : aoa, aoa_bl, aoa_nh, Pb210
- pressures : st80_25
- stratosphere : Be10s, Be7s, Pb210s, stOx
- surface : all others (not specified when Src_Mode: HEMCO)
Src_Pressures
- [0, 80] : st80_25
Units
- days : aoa, aoa_bl, aoa_bl
Other properties
- BackgroundVV
If a restart file does not contain an global initial concentration field for a species, GEOS-Chem will attempt to set the initial concentration (in \(vol\ vol^{-1}\) dry air) to the value specified in
BackgroundVV
globally. But ifBackgroundVV
has not been specified, GEOS-Chem will set the initial concentration for the species to \(10^{-20} vol\ vol^{-1}\) dry air instead.Note
Recent versions of GCHP may require that all initial conditions for all species to be used in a simulation be present in the restart file. See gchp.readthedocs.io for more information.
Access species properties in GEOS-Chem
In this section we will describe the derived types and objects that are used to store GEOS-Chem species properties. We will also describe how you can extract species properties from the GEOS-Chem Species Database when you create new GEOS-Chem code routines.
The Species derived type
The Species
derived type (defined in module Headers/species_mod.F90
)
describes a complete set of properties for a single GEOS-Chem
species. In addition to the fields mentioned in the preceding sections, the
Species
derived type also contains several species indices.
Index |
Description |
---|---|
|
Model species index |
|
Advected species index |
|
Aerosol species index |
|
Dry dep species at altitude Id |
|
Dry deposition species index |
|
Gas-phase species index |
|
Hygroscopic growth species index |
|
KPP variable species index |
|
KPP fixed spcecies index |
|
KPP species index |
|
Photolyis species index |
|
Radionuclide index |
|
Transport tracer index |
|
Wet deposition index |
The SpcPtr derived type
The SpcPtr
derived type (also defined in Headers/species_mod.F90
)
describes a container for an object of type Species.
TYPE, PUBLIC :: SpcPtr
TYPE(Species), POINTER :: Info ! Single entry of Species Database
END TYPE SpcPtr
The GEOS-Chem Species Database object
The GEOS-Chem Species database is stored in the
State_Chm%SpcData
object. It describes an array, where each
element of the array is of type SpcPtr (which is a container for an object of type
type Species.
TYPE(SpcPtr), POINTER :: SpcData(:) ! GC Species database
Species index lookup with Ind_()
Use function Ind_()
(in module
Headers/state_chm_mod.F90
) to look up species indices by
name. For example:
SUBROUTINE MySub( ..., State_Chm, ... )
USE State_Chm_Mod, ONLY : Ind_
! Local variables
INTEGER :: id_O3, id_Br2, id_CO
! Find tracer indices with function the Ind_() function
id_O3 = Ind_( 'O3' )
id_Br2 = Ind_( 'Br2' )
id_CO = Ind_( 'CO' )
! Print tracer concentrations
print*, 'O3 at (23,34,1) : ', State_Chm%Species(id_O3 )%Conc(23,34,1)
print*, 'Br2 at (23,34,1) : ', State_Chm%Species(id_Br2)%Conc(23,34,1)
print*, 'CO at (23,34,1) : ', State_Chm%Species(id_CO )%Conc(23,34,1)
! Print the molecular weight of O3 (obtained from the Species Database object)
print*, 'Mol wt of O3 [g]: ', State_Chm%SpcData(id_O3)%Info%MW_g
END SUBROUTINE MySub
Once you have obtained the species ID (aka ModelId
) you can
use that to access the individual fields in the Species Database
object. In the example above, we use the species ID for O3
(stored in
id_O3
) to look up the molecular weight of O3
from
the Species Database.
You may search for other model indices with Ind_()
by passing
an optional second argument:
! Position of HNO3 in the list of advected species
AdvectId = Ind_( 'HNO3', 'A' )
! Position of HNO3 in the list of gas-phase species
AdvectId = Ind_( 'HNO3', 'G' )
! Position of HNO3 in the list of dry deposited species
DryDepId = Ind_( 'HNO3', 'D' )
! Position of HNO3 in the list of wet deposited species
WetDepId = Ind_( 'HNO3', 'W' )
! Position of HNO3 in the lists of fixed KPP, active, & overall KPP species
KppFixId = Ind_( 'HNO3', 'F' )
KppVarId = Ind_( 'HNO3', 'V' )
KppVarId = Ind_( 'HNO3', 'K' )
! Position of SALA in the list of hygroscopic growth species
HygGthId = Ind_( 'SALA', 'H' )
! Position of Pb210 in the list of radionuclide species
HygGthId = Ind_( 'Pb210', 'N' )
! Position of ACET in the list of photolysis species
PhotolId = Ind( 'ACET', 'P' )
Ind_()
will return -1 if a species does not belong to any of
the above lists.
Tip
For maximum efficiency, we recommend that you use Ind_()
to obtain the species indices during the initialization phase of a
GEOS-Chem simulation. This will minimize the number of
name-to-index lookup operations that need to be performed, thus
reducing computational overhead.
Implementing the tip mentioned above:
MODULE MyModule
IMPLICIT NONE
. . .
! Species ID of CO. All subroutines in MyModule can refer to id_CO.
INTEGER, PRIVATE :: id_CO
CONTAINS
. . . other subroutines . . .
SUBROUTINE Init_MyModule
! This subroutine only gets called at startup
. . .
! Store ModelId in the global id_CO variable
id_CO = Ind_('CO')
. . .
END SUBROUTINE Init_MyModule
END MODULE MyModule
Species lookup within a loop
If you need to access species properties from within a loop, it is
better not to use the Ind_()
function, as repeated
name-to-index lookups will incur computational overhead. Instead, you
can access the species properties directly from the GEOS-Chem Species
Database object, as shown here.
SUBROUTINE MySub( ..., State_Chm, ... )
!%%% MySub is an example of species lookup within a loop %%%
! Uses
USE Precision_Mod
USE State_Chm_Mod, ONLY : ChmState
USE Species_Mod, ONLY : Species
! Chemistry state object (which also holds the species database)
TYPE(ChmState), INTENT(INOUT) :: State_Chm
! Local variables
INTEGER :: N
TYPE(Species), POINTER :: ThisSpc
INTEGER :: ModelId, DryDepId, WetDepId
REAL(fp) :: Mw_g
REAL(f8) :: Henry_K0, Henry_CR, Henry_pKa
! Loop over all species
DO N = 1, State_Chm%nSpecies
! Point to the species database entry for this species
! (this makes the coding simpler)
ThisSpc => State_Chm%SpcData(N)%Info
! Get species properties
ModelId = ThisSpc%ModelId
DryDepId = ThisSpc%DryDepId
WetDepId = ThisSpc%WetDepId
MW_g = ThisSpc%MW_g
Henry_K0 = ThisSpc%Henry_K0
Henry_CR = ThisSpc%Henry_CR
Henry_pKa = ThisSpc%Henry_pKA
IF ( ThisSpc%Is_Gas )
! ... The species is a gas-phase species
! ... so do something appropriate
ELSE
! ... The species is an aerosol
! ... so do something else appropriate
ENDIF
IF ( ThisSpc%Is_Advected ) THEN
! ... The species is advected
! ... (i.e. undergoes transport, PBL mixing, cloud convection)
ENDIF
IF ( ThisSpc%Is_DryDep ) THEN
! ... The species is dry deposited
ENDIF
IF ( ThisSpc%Is_WetDep ) THEN
! ... The species is soluble and wet deposits
! ... it is also scavenged in convective updrafts
! ... it probably has defined Henry's law properties
ENDIF
... etc ...
! Free the pointer
ThisSpc => NULL()
ENDDO
END SUBROUTINE MySub
Parallelize GEOS-Chem and HEMCO source code
Single-node paralellization in GEOS-Chem Classic and HEMCO is acheieved with OpenMP. OpenMP directives, which are included in every modern compiler, allow you to divide the work done in DO loops among several computational cores. In this Guide, you will learn more about how GEOS-Chem Classic and HEMCO utilize OpenMP.
Overview of OpenMP parallelization
Most GEOS-Chem and HEMCO arrays represent quantities on a geospatial grid (such as meteorological fields, species concentrations, production and loss rates, etc.). When we parallelize the GEOS-Chem and HEMCO source code, we give each computational core its own region of the “world” to work on, so to speak. However, all cores can see the entire “world” (i.e. the entire memory on the machine) at once, but is just restricted to working on its own region of the “world”.

It is important to remember that OpenMP is loop-level parallelization. That means that only commands within selected DO loops will execute in parallel. GEOS-Chem Classic and HEMCO (when running within GEOS-Chem Classic, or as the HEMCO standalone) start off on a single core (known as the “main core”). Upon entering a parallel DO loop, other cores will be invoked to share the workload within the loop. At the end of the parallel DO loop, the other cores return to standby status and the execution continues only on the “main” core.
One restriction of using OpenMP parallelization is that simulations may use only as many cores that share by the same memory. In practice, this limits GEOS-Chem Classic and HEMCO standalone simulations to using 1 node (typically less than 64 cores) of a shared computer cluster.
We should also note that GEOS-Chem High Performance (aka GCHP) uses a different type of parallelization (MPI). This allows GCHP to use hundreds or thousands of cores across several nodes of a computer cluster. We encourage you to consider using GCHP for hour high-resolution simulations.
Example using OpenMP directives
Consider the following nested loop that has been parallelized with OpenMP directives:
!$OMP PARALLEL DO &
!$OMP SHARED( A ) &
!$OMP PRIVATE( I, J, B ) &
!$OMP COLLAPSE( 2 ) &
!$OMP SCHEDULE( DYNAMIC, 4 )
DO J = 1, NY
DO I = 1, NX
B = A(I,J)
A(I,J) = B * 2.0
ENDDO
ENDDO
!$OMP END PARALLEL DO
This loop will assign different (I,J)
pairs to different
computational cores. The more cores specified, the less time it will
take to do the operation.
Let us know look at the important features of this loop.
- !$OMP PARALLEL DO
This is known as a loop sentinel. It tells the compiler that the following DO-loop is to be executed in parallel. The clauses following the sentinel specify further options for the parallelization. These clauses may be spread across multiple lines by using a continuation command (
&
) at the end of the line.
- !$OMP SHARED( A )
This clause tells the compiler that all computational cores can write to
A
simultaneously. This is OK because each core will recieve a unique set of(I,J)
pairs. Thus data corruption of theA
array will not happen. We say thatA
is a SHARED variable.Note
We recommend using the clause
!$OMP DEFAULT( SHARED )
, which will declare all varaiables as shared, unless they are explicitly placed in an!$OMP PRIVATE
clause.
- !$OMP PRIVATE( I, J, B )
Because different cores will be handling different
(I,J)
pairs, each core needs its own private copy of variablesI
andJ
. The compiler creates these temporary copies of these variables in memory “under the hood”.If the
I
andJ
variables were not declaredPRIVATE
, then all of the computational cores could simultaneously write toI
andJ
. This would lead to data corruption. For the same reason, we must also place the variableB
within the!$OMP PRIVATE
clause.
- !$OMP COLLAPSE( 2 )
By default, OpenMP will parallelize the outer loop in a set of nested loops. To gain more efficiency, we can vectorize the loop. “Under the hood”, the compiler can convert the two nested loops over
NX
andNY
into a single loop of sizeNX * NY
, and then parallelize over the single loop. Because we wish to collapse 2 loops together, we use the!$OMP COLLAPSE( 2 )
statement.
- !$OMP SCHEDULE( DYNAMIC, 4 )
Normally, OpenMP will evenly split the domain to be parallelized (i.e.
(NX, NY)
) evenly between the cores. But if some computations take longer than others (i.e. photochemistry at the day/night boundary), this static scheduling may be inefficient.The
SCHEDULE( DYNAMIC, 4 )
will send groups of 4 grid boxes to each core. As soon as a core finishes its work, it will immediately receive another group of 4 grid boxes. This can help to achieve better load balancing.
- !$OMP END PARALLEL DO
This is a sentinel that declares the end of the parallel DO loop. It may be omitted. But we encourage you to include them, as defining both the beginning and end of a parallel loop is good programming style.
Environment variable settings for OpenMP
Please see Set environment variables for parallelization to learn which environment variables you must add to your login environment to control OpenMP parallelization.
OpenMP parallelization FAQ
Here are some frequently asked questions about parallelizing GEOS-Chem and HEMCO code with OpenMP:
How can I tell what should go into the !$OMP PRIVATE clause?
Here is a good rule of thumb:
All variables that appear on the left side of an equals sign, and that have lower dimensionality than the dimensionality of the parallel loop must be placed in the
!$OMP PRIVATE
clause.
In the example shown above, I
,
J
, and B
are scalars, so their dimensionality
is 0. But the parallelization occurs over two DO loops (1..NY
and 1..NX
), so the dimensionality of the parallelization is 2.
Thus I
, J
, and B
must go inside the
!$OMP PRIVATE
clause.
Tip
You can also think of dimensionality as the number of indices a
variable has. For example A
has dimensionality 0, but
A(I)
has dimensionality 1, A(I,J)
has
dimensionality 2, etc.
Why do the !$OMP statements begin with a comment character?
This is by design. In order to invoke the parallel procesing commands,
you must use a specific compiler command (such as -openmp
,
-fopenmp
, or similar, depending on the compiler). If you
omit these compiler switches, then the parallel processing directives
will be considered as Fortran comments, and the associated DO-loops
will be executed on a single core.
Do subroutine variables have to be declared PRIVATE?
Consider this subroutine:
SUBROUTINE mySub( X, Y, Z )
! Dummy variables for input
REAL, INTENT(IN) :: X, Y
! Dummy variable for output
REAL, INTENT(OUT) :: Z
! Add X + Y to make Z
Z = X + Y
END SUBROUTINE mySub
which is called from within a parallel loop:
INTEGER :: N
REAL :: A, B, C
!$OMP PARALLEL DO &
!$OMP DEFAULT( SHARED ) &
!$OMP PRIVATE( N, A, B, C )
DO N = 1, nIterations
! Get inputs from some array
A = Input(N,1)
B = Input(N,2)
! Add A + B to make C
CALL mySub( A, B, C )
! Save the output in an array
Output(N) = C
ENDDO
!$OMP END PARALLEL DO
Using the rule of thumb described above, because N
, A
,
B
, and C
are scalars (having dimensionality = 0), they
must be placed in the !$OMP PRIVATE
clause.
But note that the variables X
, Y
, and Z
do not
need to be placed within a !$OMP PRIVATE
clause within
subroutine mySub
. This is because each core calls
mySub
in a separate thread of execution, and will create its
own private copy of X
, Y
, and Z
in memory.
What does the THREADPRIVATE statement do?
Let’s modify the above example
slightly. Let’s now suppose that subroutine mySub
from the prior
example is now part of a Fortran-90 module, which looks like this:
MODULE myModule
! Module variable:
REAL, PUBLIC :: Z
CONTAINS
SUBROUTINE mySub( X, Y )
! Dummy variables for input
REAL, INTENT(IN) :: X, Y
! Add X + Y to make Z
! NOTE that Z is now a global variable
Z = X + Y
END SUBROUTINE mySub
END MODULE myModule
Note that Z
is now a global scalar variable with
dimensionality = 0. Let’s now use the same parallel loop
(dimensionality = 1) as before:
! Get the Z variable from myModule
USE myModule, ONLY : Z
INTEGER :: N
REAL :: A, B, C
!$OMP PARALLEL DO &
!$OMP DEFAULT( SHARED ) &
!$OMP PRIVATE( N, A, B, C )
DO N = 1, nIterations
! Get inputs from some array
A = Input(N,1)
B = Input(N,2)
! Add A + B to make C
CALL mySub( A, B )
! Save the output in an array
Output(N) = Z
ENDDO
!$OMP END PARALLEL DO
Because Z
is now a global variable with lower dimensionality
than the loop, we must try to place it within an !$OMP PRIVATE
clause. However, Z
is defined in a different program unit
than where the parallel loop occurs, so we cannot place it in an
!$OMP PRIVATE
clause for the loop.
In this case we must place Z
into an !$OMP
THREADPRIVATE
clause within the module where it is declared, as shown
below:
MODULE myModule
! Module variable:
! This is global and acts as if it were in a F77-style common block
REAL, PUBLIC :: Z
!$OMP THREADPRIVATE( Z )
... etc ...
This tells the computer to create a separate private copy of Z
in memory for each core.
Important
When you place a variable into an !$OMP PRIVATE
or
!$OMP THREADPRIVATE
clause, this means that the variable
will have no meaning outside of the parallel loop where it is used.
So you should not rely on using the value of PRIVATE
or
THREADPRIVATE
variables elsewhere in your code.
Most of the time you won’t have to use the !$OMP THREADPRIVATE
statement. You may need to use it if you are trying to parallelize code
that came from someone else.
Can I use pointers within an OpenMP parallel loop?
You may use pointer-based variables (including derived-type objects) within an OpenMP parallel loop. But you must make sure that you point to the target within the parallel loop section AND that you also nullify the pointer within the parallel loop section. For example:
INCORRECT:
! Declare variables
REAL, TARGET :: myArray(NX,NY)
REAL, POINTER :: myPtr (: )
! Declare an OpenMP parallel loop
!$OMP PARALLEL DO ) &
!$OMP DEFAULT( SHARED ) &
!$OMP PRIVATE( I, J, myPtr, ...etc... )
DO J = 1, NY
DO I = 1, NX
! Point to a variable.
!This must be done in the parallel loop section.
myPtr => myArray(:,J)
. . . do other stuff . . .
ENDDO
!$OMP END PARALLEL DO
! Nullify the pointer.
! NOTE: This is incorrect because we nullify the pointer outside of the loop.
myPtr => NULL()
CORRECT:
! Declare variables
REAL, TARGET :: myArray(NX,NY)
REAL, POINTER :: myPtr (: )
! Declare an OpenMP parallel loop
!$OMP PARALLEL DO ) &
!$OMP DEFAULT( SHARED ) &
!$OMP PRIVATE( I, J, myPtr, ...etc... )
DO J = 1, NY
DO I = 1, NX
! Point to a variable.
!This must be done in the parallel loop section.
myPtr => myArray(:,J)
. . . do other stuff . . .
! Nullify the pointer within the parallel loop
myPtr => NULL()
ENDDO
!$OMP END PARALLEL DO
In other words, pointers used in OpenMP parallel loops only have meaning within the parallel loop.
How many cores may I use for GEOS-Chem or HEMCO?
You can use as many computational cores as there are on a single node of your cluster. With OpenMP parallelization, the restriction is that all of the cores have to see all the memory on the machine (or node of a larger machine). So if you have 32 cores on a single node, you can use them. We have shown that run times will continue to decrease (albeit asymptotically) when you increase the number of cores.
Why is GEOS-Chem is not using all the cores I requested?
The number of threads for an OpenMP simulation is determined by the
environment variable OMP_NUM_THREADS.
You must define OMP_NUM_THREADS
in your environment file
to specify the desired number of computational cores for your
simulation. For the bash
shell, use4 this command to
request 8 cores:
export OMP_NUM_THREADS=8
MPI parallelization
The OpenMP parallelization used by GEOS-Chem Classic and HEMCO standalone is an example of shared memory parallelization (also known as serial parallelization). As we have seen, we are restricted to using a single node of a computer cluster. This is because all of the cores need to talk with all of the memory on the node.
On the other hand, MPI (Message Passing Interface) parallelzation is an example of distributed parallelization. An MPI library installation is required for passing memory from one physical system to another (i.e. across nodes).
GEOS-Chem High Performance (GCHP) uses Earth System Model Framework (ESMF) and MAPL libraries to implement MPI parallelization. For detailed information, please see gchp.readthedocs.io.
Run nested-grid simulations
A nested-grid simulation is a GEOS-Chem Classic simulation running at the native horizontal resolution of the GEOS-FP (0.25° x 0.3125°) or MERRA-2 (0.5° x 0.6125°) meteorology fields over a subset of the globe. Nested-grid simulations use boundary conditions for transport that are archived from a global simulatoin.
Follow these steps to set up a GEOS-Chem Classic nested-grid simulation:
Run a global simulation to create boundary conditions
1. Download the GEOS-Chem source code
Download the GEOS-Chem Classic source code by following these instructions.
2. Create a global simulation run directory
Create a run directory for your global simulation by executing these commands:
$ cd /path/to/GCClassic/run # or whatever you named the source code directory
$ ./createRunDir.sh
and then follow the prompts.
Tip
A 4° x 5° global simulation should be adequate for producing boundary condition output.
3. Activate the BoundaryConditions diagnostic collection
The BoundaryConditions
diagnostic collection is deactivated
by default in the HISTORY.rc file that ships with the run
directory. Activate this collection by removing the comment character
(#
) as shown below.
COLLECTIONS: 'Restart',
'SpeciesConc',
... etc ...
#'BoundaryConditions', <== Remove the # sign in front
::
The BoundaryConditions
collection will save out
instantaneous concentrations of advected species every three hours to
daily files. You may change those settings by modifying the
BoundaryConditions collection section in the HISTORY.rc file.
Tip
If you wish to save disk space, Use the .LON_RANGE
and
.LAT_RANGE
to reduce the size of the region in which the
boundary conditions will be saved. The region in which the
boundary condition is archived should be a little larger than the
nested-grid simulation window.
#==============================================================================
# %%%%% THE BoundaryConditions COLLECTION %%%%%
#
# GEOS-Chem boundary conditions for use in nested grid simulations
#
# Available for all simulations
#==============================================================================
BoundaryConditions.template: '%y4%m2%d2_%h2%n2z.nc4',
BoundaryConditions.format: 'CFIO',
BoundaryConditions.frequency: 00000000 030000
BoundaryConditions.duration: 00000001 000000
BoundaryConditions.mode: 'instantaneous'
BoundaryConditions.LON_RANGE: -130.0 -60.0,
BoundaryConditions.LAT_RANGE: 10.0 60.0,
BoundaryConditions.fields: 'SpeciesBC_?ADV? ', 'GIGCchem',
::
4. Configure the global simulation
Configure your global simulation by changing settings in the
relevant configuration files. If you do not need the output
from your global simulation, you may choose to turn off most of the
diagnostic output in HISTORY.rc
and HEMCO_Diagn.rc
.
Tip
Turn off most diagnostic output in the HISTORY.rc and HEMCO_Diagn.rc files. This will minimize the run time and reduce the size of diagnostic ouptut.
5. Compile GEOS-Chem and run the global simulation
Follow the steps outlined in these sections to compile and run your GEOS-Chem global simulation.
Download input data (e.g. do a dry-run and download data if necessary)
Once your global simulation finishes, the boundary conditions files
will be placed into the OutputDir
subdirectory of your run
directory. You should see files named
GEOSChem.BoundaryConditions.YYYYMMDD_0000z.nc4
(where
YYYYMMDD
are replaced by the simulation date) begin to
appear in your run directory as your simulation runs. You will need
to tell your nested-grid simulation where to find these files.
Set up your nested grid run directory
1. Create a nested-grid simulation run directory
Using the same GEOS-Chem Classic source code directory that we downloaded above follow these steps to create a run directory for your nested-grid simulation.
$ cd /path/to/GCClassic/run # or whatever you named the source code directory
$ ./createRunDir.sh
Select the native resolution corresponding to your choice of meteorology. You will then be asked to specify which nested region you would like to use.
2. Configure your nested-grid simulation
Check the run-directory configuration files to make sure that you have the same chemistry, emissions, transport, etc. options selected as in the global simulation.
In HEMCO_Config.rc, make sure the GC_BCs
option is set
to true
and update the BC_
entry to point to
your boundary condition files.
# ExtNr ExtName on/off Species
0 Base : on *
# ----- RESTART FIELDS ----------------------
--> GC_RESTART : true
--> GC_BCs : true <== make sure this is true
--> HEMCO_RESTART : true
...
#==============================================================================
# --- GEOS-Chem boundary condition file ---
#==============================================================================
(((GC_BCs
* BC_ /path/to/your/GEOSChem.BoundaryConditions.$YYYY$MM$DD_$HH$MNz.nc4 SpeciesBC_?ADV? 1980-2023/1-12/1-31/0-23 RFY xyz 1 * - 1 1
)))GC_BCs
Activate your preferred diagnostics by changing the relevant settings in these configuration files:
3. Copy the executable to the nested-grid run directory
You do not have to recompile GEOS-Chem Classic when changing grids.
Therefore, you can copy the gcclassic
executable from your
global simulation run directory to your
nested-grid run directory.
4. Run the nested-grid simulation
Follow the steps outlined in these sections to run your nested-grid simulation.
Download input data (e.g. do a dry-run and download data if necessary)
Frequently asked questions
Can I run nested GEOS-Chem simulations on the AWS cloud?
Yes, you can run the nested grid simulations on AWS cloud. Please see the Running GEOS-Chem on AWS cloud online tutorial and contact the GEOS-Chem Support Team with any questions.
Can I save out boundary conditions for more than one nested grid in the same global run?
We recommend that you generate boundary conditions over the entire global domain (at 4° x 5° or 2° x 2.5°). Then these boundary conditions can be used as input to simulations on different nested domains.
How can I find which data are available for nested grid simulations?
You will download meteorology and emissions data from one of the GEOS-Chem data portals. You can browse the WashU data portal (http://geoschemdata.wustl.edu/ExtData) to see if the data you need are available.
Where can I find out more info about nested grid errors?
Please see the following Supplemental Guides:
I noticed abnormal concentrations at boundaries of the nested region. Is that normal?
If you see high tracer concentrations right at the boundary of your nested grid region, then this may be normal.
For nested grid simulations, we have to leave a “buffer zone” (i.e. typically 3 boxes along each boundary) in which the TPCORE advection is not applied. However, all other operations (chemistry, wetdep, drydep, convection, PBL mixing) will be applied. Therefore, in the “buffer zone”, the concentrations will not be realistic because the advection is not allowed to transport the tracer out of these boxes.
In any case, the tracer concentrations in the “buffer zone” will get overwritten by the 2° x 2.5° or 4° x 5° boundary conditions at the specified time (usually every 3h).
Attention
You should exclude the boxes in the “buffer zone” from your scientific analysis.
The following diagram illustrates this:
<----------------------------NX global grid------------------------->
+-------------------------------------------------------------------+ ^
| GLOBAL REGION | |
| | |
| <----------NX nested grid---------> | |
| | |
| +=================================[Y] ^ | |
| | NESTED GRID WINDOW REGION | | | |
| | | | | |
| | <------- IM_W -------> | | | |
| | +--------------------+ ^ | | | |
| | | TPCORE REGION | | | | | |
| | | (advection is | | | NY | NY
|<------- I0 ---------->|<---->| done in this | JM_W | nested | global
| | I0_W | window!!!) | | | grid | grid
| | | | | | | | |
| | +--------------------+ V | | | |
| | ^ | | | |
| | | J0_W | | | |
| | V | | | |
| [X]=================================+ V | |
| ^ | |
| | J0 | |
| V | |
[1]------------------------------------------------------------------+ V
Diagram notes:
The outermost box (
GLOBAL REGION
) is the global grid size. This region hasNX global grid
boxes in longitude andNY global grid
boxes in latitude. The origin of theGLOBAL REGION
” is at the south pole, at the lower left-hand corner (point[1]
).
The next innermost box (
NESTED GRID WINDOW REGION
) is the nested-grid window. This region hasNX nested grid
boxes in longitude andNY nested grid
boxes in latitude. This is the size of the trimmed met fields that will be used for a “nested-grid” simulation.
The innermost region
TPCORE REGION
is the actual area in which TPCORE advection will be performed. Note that this region is smaller ehan theNESTED GRID WINDOW REGION
. It is set up this way since a cushion of grid boxes is needed for boundary conditions.
I0
is the longitude offset (# of boxes) andJ0
is the latitude offset (# of boxes) which translate between theGLOBAL REGION
and theNESTED GRID WINDOW REGION
.
I0_W
is the longitude offset (# of boxes), andJ0_W
is the latitude offset (# of boxes) which translate between theNESTED GRID WINDOW REGION
and theTPCORE REGION
. These define the thickness of the buffer zone mentioned above.
The lower left-hand corner of the
NESTED GRID WINDOW REGION
(point[X]
) has longitude and latitude indices (I1_W
,J1_W
). Similarly, the upper right-hand corner (point[Y]
) has longitude and latitude indices (I2_W
,J2_W
).
Note that if
I0=0
,J0=0
,I0_W=0
,J0_W=0
,NX nested grid = NX global grid
,NY nested grid = NY global grid
specifies a global simulation. In this case theNESTED GRID WINDOW REGION
totally coincides with theGLOBAL REGION
.
In order for the nested-grid simulation to work we must save out concentrations over the
NESTED GRID WINDOW REGION
from a coarse model (e.g. 2° x 2.5° or 4° x 5°). These concentrations are copied along the edges of theNESTED GRID WINDOW REGION
and are thus used as boundary conditions for TPCORE.
Update chemical mechanisms with KPP
This Guide demonstrates how you can use The Kinetic PreProcessor (aka KPP) to translate a chemical mechanism specification in plain text format to highly-optimized Fortran90 code for use with GEOS-Chem:
Attention
You must use at least KPP 3.0.0 with the current GEOS-Chem release series.
Using KPP: Quick start
2. Edit the chemical mechanism configuration files
The KPP/custom
folder contains sample chemical mechanism
specification files (custom.eqn and
custom.kpp). These files define the chemical
mechanism and are copies of the default fullchem mechanism
configuration files found in the KPP/fullchem
folder. (For a
complete description of KPP configuration files, please see the
documentation at kpp.readthedocs.io.)
You can edit these custom.eqn and custom.kpp configuration files to define your own custom mechanism (cf. Using KPP: Reference section for details).
Important
We recommend always building a custom mechanism from the
KPP/custom
folder, and to leave the other folders
untouched. This will allow you to validate your modified mechanism
against one of the standard mechanisms that ship with GEOS-Chem.
custom.eqn
The custom.eqn
configuration file contains:
List of active species
List of inactive species
Gas-phase reactions
Heterogeneous reactions
Photolysis reactions
custom.kpp
The custom.kpp
configuration file is the main configuration
file. It contains:
Solver options
Production and loss family definitions
Functions to compute reaction rates
Global definitions
An #INCLUDE custom.eqn command, which tells KPP to look for chemical reaction definitions in custom.eqn.
Important
The symbolic link gckpp.kpp
points to custom.kpp
.
This is necessary in order to generate Fortran files with the
the naming convention gckpp*.F90
.
3. Run the build_mechanism.sh script
Once you are satisfied with your custom mechanism specification you may now use KPP to build the source code files for GEOS-Chem.
Return to the top-level KPP
folder from KPP/custom
:
$ cd ..
There you will find a script named build_mechanism.sh
, which
is the driver script for running KPP. Execute the script as
follows:
$ ./build_mechanism.sh custom
This will run the KPP executable (located in the folder
$KPP_HOME/bin
) custom.kpp
configuration
file (via symbolic link gckpp.kpp
, It also runs a python
script to generate code for the OH reactivity diagnostic. You should
see output similar to this:
This is KPP-X.Y.Z.
KPP is parsing the equation file.
KPP is computing Jacobian sparsity structure.
KPP is starting the code generation.
KPP is initializing the code generation.
KPP is generating the monitor data:
- gckpp_Monitor
KPP is generating the utility data:
- gckpp_Util
KPP is generating the global declarations:
- gckpp_Main
KPP is generating the ODE function:
- gckpp_Function
KPP is generating the ODE Jacobian:
- gckpp_Jacobian
- gckpp_JacobianSP
KPP is generating the linear algebra routines:
- gckpp_LinearAlgebra
KPP is generating the utility functions:
- gckpp_Util
KPP is generating the rate laws:
- gckpp_Rates
KPP is generating the parameters:
- gckpp_Parameters
KPP is generating the global data:
- gckpp_Global
KPP is generating the driver from none.f90:
- gckpp_Main
KPP is starting the code post-processing.
KPP has succesfully created the model "gckpp".
Reactivity consists of xxx reactions # NOTE: xxx will be replaced by the actual number
Written to gckpp_Util.F90
where X.Y.Z
denotes the KPP version that you are using.
If this process is successful, the custom
folder will have
several new files starting with gckpp
:
$ ls gckpp*
gckpp_Function.F90 gckpp_Jacobian.F90 gckpp.log gckpp_Precision.F90
gckpp_Global.F90 gckpp_JacobianSP.F90 gckpp_Model.F90 gckpp_Rates.F90
gckpp_Initialize.F90 gckpp.kpp@ gckpp_Monitor.F90 gckpp_Util.F90
gckpp_Integrator.F90 gckpp_LinearAlgebra.F90 gckpp_Parameters.F90
The gckpp*.F90
files contain optimized Fortran-90 instructions
for solving the chemical mechanism that you have specified. The
gckpp.log
file is a human-readable description of the
mechanism. Also, gckpp.kpp
is a symbolic link to the
custom.kpp
file.
A complete description of these KPP-generated files at kpp.readthedocs.io.
4. Recompile GEOS-Chem with your custom mechanism
GEOS-Chem will always use the default mechanism (which is named
fullchem
). To tell GEOS-Chem to use the custom
mechanism instead, follow these steps.
Tip
GEOS-Chem Classic run directories have a subdirectory named
build
in which you can configure and build GEOS-Chem. If
you don’t have a build directory, you can add one to your run
directory with mkdir build.
From the build directory, type:
$ cmake ../CodeDir -DMECH=custom -DRUNDIR=..
You should see output similar to this written to the screen:
-- General settings:
* MECH: fullchem carbon Hg **custom**
This confirms that the custom mechanism has been selected.
Once you have configured GEOS-Chem to use the
custom
mechanism, you may build the exectuable. Type:
$ make -j
$ make -j install
The executable file (gcclassic
or gchp
, depending on which
mode of GEOS-Chem that you are using) will be placed in the run
directory.
Using KPP: Reference section
Adding species to a mechanism
List chemically-active (aka variable) species in the #DEFVAR section of custom.eqn
, as shown below:
#DEFVAR
A3O2 = IGNORE; {CH3CH2CH2OO; Primary RO2 from C3H8}
ACET = IGNORE; {CH3C(O)CH3; Acetone}
ACTA = IGNORE; {CH3C(O)OH; Acetic acid}
...etc ...
The IGNORE
tells KPP not to perform mass-balance checks, which
would make GEOS-Chem execute more slowly.
List species whose concentrations do not change in the #DEFFIX section of custom.eqn
, as shown below:
#DEFFIX
H2 = IGNORE; {H2; Molecular hydrogen}
N2 = IGNORE; {N2; Molecular nitrogen}
O2 = IGNORE; {O2; Molecular oxygen}
... etc ...
Species may be listed in any order, but we have found it convenient to list them alphabetically.
Adding reactions to a mechanism
Gas-phase reactions
List gas-phase reactions first in the #EQUATIONS
section of custom.eqn
.
#EQUATIONS
//
// Gas-phase reactions
//
...skipping over the comment header...
//
O3 + NO = NO2 + O2 : GCARR_ac(3.00E-12, -1500.0);
O3 + OH = HO2 + O2 : GCARR_ac(1.70E-12, -940.0);
O3 + HO2 = OH + O2 + O2 : GCARR_ac(1.00E-14, -490.0);
O3 + NO2 = O2 + NO3 : GCARR_ac(1.20E-13, -2450.0);
... etc ...
Gas-phase reactions: General form
No matter what reaction is being added, the general procedure is the
same. A new line must be added to custom.eqn
of the following
form:
A + B = C + 2.000D : RATE_LAW_FUNCTION(ARG_A, ARG_B ...);
The denotes the reactants (\(A\) and \(B\)) as well as the
products (\(C\) and \(D\)) of the reaction. If exactly one
molecule is consumed or produced, then the factor can be omitted;
otherwise the number of molecules consumed or produced should be
specified with at least 1 decimal place of accuracy. The final
section, between the colon and semi-colon, specifies the function
RATE_LAW_FUNCTION
and its arguments which will be used to
calculate the reaction rate constant k. Rate-law functions are
specified in the custom.kpp
file.
For an equation such as the one above, the overall rate at which the reaction will proceed is determined by \(k[A][B]\). However, if the reaction rate does not depend on the concentration of \(A\) or \(B\), you may write it with a constant value, such as:
A + B = C + 2.000D : 8.95d-17
This will save the overhead of a function call.
Rates for two-body reactions according to the Arrhenius law
For many reactions, the calculation of k follows the Arrhenius law:
k = a0 * ( 300 / TEMP )**b0 * EXP( c0 / TEMP )
Important
In relation to Arrhenius parameters that you may find in scientific literature, \(a_0\) represents the \(A\) term and \(c_0\) represents \(-E/R\) (not \(E/R\), which is usually listed).
For example, the JPL chemical data evaluation), (Feb 2017) specifies that the reaction O3 + NO produces NO2 and O2, and its Arrhenius parameters are \(A\) = 3.0x10^-12 and \(E/R\) = 1500. To use the Arrhenius formulation above, we must specify \(a_0 = 3.0e-12\) and \(c_0 = -1500\).
To specify a two-body reaction whose rate follows the Arrhenius law, you
can use the GCARR
rate-law function, which is defined in
gckpp.kpp
. For example, the entry for the \(O3 + NO =
NO2 + O2\) reaction can be written as in custom.eqn
as:
O3 + NO = NO2 + O2 : GCARR(3.00E12, 0.0, -1500.0);
Other rate-law functions
The gckpp.kpp
file contains other rate law functions, such as
those required for three-body, pressure-dependent reactions. Any rate
function which is to be referenced in the custom.eqn
file must be available in gckpp.kpp
prior to building the
reaction mechanism.
Making your rate law functions computationally efficient
We recommend writing your rate-law functions so as to avoid
explicitly casting variables from REAL*4
to
REAL*8
. Code that looks like this:
REAL, INTENT(IN) :: A0, B0, C0
rate = DBLE(A0) + ( 300.0 / TEMP )**DBLE(B0) + EXP( DBLE(C0)/ TEMP )
Can be rewritten as:
REAL(kind=dp), INTENT(IN) :: A0, B0, C0
rate = A0 + ( 300.0d0 / TEMP )**B0 + EXP( C0/ TEMP )
Not only do casts lead to a loss of precision, but each cast takes a few CPU clock cycles to execute. Because these rate-law functions are called for each cell in the chemistry grid, wasted clock cycles can accumulate into a noticeable slowdown in execution.
You can also make your rate-law functions more efficient if you
rewrite them to avoid computing terms that evaluate to 1. We saw
above (cf. Rates for two-body reactions according to the Arrhenius law) that the rate of the
reaction \(O3 + NO = NO2 + O2\) can be computed according to the
Arrhenius law. But because b0 = 0
, term
(300/TEMP)**b0
evaluates to 1. We can therefore rewrite the
computation of the reaction rate as:
k = 3.0x10^-12 + EXP( 1500 / TEMP )
Tip
The EXP()
and **
mathematical operations are
among the most costly in terms of CPU clock cycles. Avoid calling
them whenever necessary.
A recommended implementation would be to create separate rate-law functions
that take different arguments depending on which parameters are
nonzero. For example, the Arrhenius law function GCARR
can be split
into multiple functions:
GCARR_abc(a0, b0, c0)
: Use whena0 > 0
andb0 > 0
andc0 > 0
GCARR_ab(a0, b0)
: Use whena0 > 0
andb0 > 0
GCARR_ac(a0, c0)
: Use whena0 > 0
andc0 > 0
Thus we can write the O3 + NO reaction in custom.eqn
as:
O3 + NO = NO2 + O2 : GCARR_ac(3.00d12, -1500.0d0);
using the rate law function for when both a0 > 0
and c0
> 0
.
Heterogeneous reactions
List heterogeneous reactions after all of the gas-phase reactions in
custom.eqn
, according to the format below:
//
// Heterogeneous reactions
//
HO2 = O2 : HO2uptk1stOrd( State_Het ); {2013/03/22; Paulot2009; FP,EAM,JMAO,MJE}
NO2 = 0.500HNO3 + 0.500HNO2 : NO2uptk1stOrdAndCloud( State_Het );
NO3 = HNO3 : NO3uptk1stOrdAndCloud( State_Het );
NO3 = NIT : NO3hypsisClonSALA( State_Het ); {2018/03/16; XW}
... etc ...
A simple example is uptake of HO2, specified as
HO2 = H2O : HO2uptk1stOrd( State_Het );
Note
KPP requires that each reaction have at least one product. In order to satisfy this requirement, you might need to set the product of your heterogeneous reaction to a dummy product or a fixed species (i.e. one whose concentration does not change with time).
The rate law function NO2uptk1stOrd
is contained in the
Fortran module KPP/fullchem/fullchem_RateLawFuncs.F90
, which
is symbolically linked to the custom
folder. The
fullchem_RateLawFuncs.F90
file is inlined into
gckpp_Rates.F90
so that it can be used within the custom
mechanism.
To implement an additional heterogeneous reaction, the rate calculation
must be added to the KPP/custom/custom.eqn
file. Rate
calculations may be specified as mathematical expressions (using any
of the variables contained in the gckpp_Global.F90
)
SPC1 + SPC2 = SPC3 + SPC4: 8.0e-13 * TEMP_OVER_K300; {Example}
or you may define a new rate law function in the
fullchem_RateLawFuncs.F90
such as:
SPC1 + SPC2 = SPC3 + SPC4: myNewRateFunction( State_Het ); {Example}
Photolysis reactions
List photolysis reactions after the heterogeneous reactions, as shown below.
//
// Photolysis reactions
//
O3 + hv = O + O2 : PHOTOL(2); {2014/02/03; Eastham2014; SDE}
O3 + hv = O1D + O2 : PHOTOL(3); {2014/02/03; Eastham2014; SDE}
O2 + hv = 2.000O : PHOTOL(1); {2014/02/03; Eastham2014; SDE}
... etc ...
NO3 + hv = NO2 + O : PHOTOL(12); {2014/02/03; Eastham2014; SDE}
... etc ...
A photolysis reaction can be specified by giving the correct index of
the PHOTOL
array. This index can be determined by inspecting the file
FJX_j2j.dat
.
Tip
See the photolysis section of geoschem_config.yml
to
determine the folder in which FJX_j2j.dat
is located.
For example, one branch of the \(NO_3\) photolysis reaction is specified in
the custom.eqn
file as
NO3 + hv = NO2 + O : PHOTOL(12)
Referring back to FJX_j2j.dat
shows that reaction 12, as
specified by the left-most index, is indeed \(NO_3 = NO2 + O\):
12 NO3 PHOTON NO2 O 0.886 /NO3 /
If your reaction is not already in FJX_j2j.dat
, you may add it
there. You may also need to modify FJX_spec.dat
(in the same
folder ast FJX_j2j.dat
) to include cross-sections for your
species. Note that if you add new reactions to FJX_j2j.dat
you
will also need to set the parameter JVN_
in GEOS-Chem module
Headers/CMN_FJX_MOD.F90
to match the total number of entries.
If your reaction involves new cross section data, you will need to follow an additional set of steps. Specifically, you will need to:
Estimate the cross section of each wavelength bin (using the correlated-k method), and
Add this data to the
FJX_spec.dat
file.
For the first step, you can use tools already available on the Prather
research group website. To generate the cross-sections used by Fast-JX,
download the file UCI_fastJ_addX_73cx.tar.gz.
You can then simply add your data to FJX_spec.dat
and refer to it in
FJX_j2j.dat
as specified above. The following then describes
how to generate a new set of cross-section data for the example of some
new species MEKR:
To generate the photolysis cross sections of a new species, come up with
some unique name which you will use to refer to it in the
FJX_j2j.dat
and FJX_spec.dat
files - e.g. MEKR. You
will need to copy one of the addX_*.f
routines and make your own (say,
addX_MEKR.f
). Your edited version will need to read in whatever cross
section data you have available, and you’ll need to decide how to handle
out-of-range information - this is particularly crucial if your cross
section data is not defined in the visible wavelengths, as there have
been some nasty problems in the past caused by implicitly assuming that
the XS can be extrapolated (I would recommend buffering your data with
zero values at the exact limits of your data as a conservative first
guess). Then you need to compile that as a standalone code and run it;
this will spit out a file fragment containing the aggregated 18-bin
cross sections, based on a combination of your measured/calculated XS
data and the non-contiguous bin subranges used by Fast-JX. Once that
data has been generated, just add it to FJX_spec.dat
and refer
to it as above. There are examples in the addX files of how to deal with
variations of cross section with temperature or pressure, but the main
takeaway is that you will generate multiple cross section entries to be
added to FJX_spec.dat
with the same name.
Important
If your cross section data varies as a function of temperature AND pressure, you need to do something a little different. The acetone XS documentation shows one possible way to handle this; Fast-JX currently interpolates over either T or P, but not both, so if your data varies over both simultaneously then this will take some thought. The general idea seems to be that one determines which dependence is more important and uses that to generate a set of 3 cross sections (for interpolation), assuming values for the unused variable based on the standard atmosphere.
Adding production and loss families to a mechanism
Certain common families (e.g. \(PO_x\), \(LO_x\)) have been
pre-defined for you. You will find the family definitions near the top of the
custom.kpp
file (which is symbolically linked to gckpp,kpp
):
#FAMILIES
POx : O3 + NO2 + 2NO3 + PAN + PPN + MPAN + HNO4 + 3N2O5 + HNO3 + BrO + HOBr + BrNO2 + 2BrNO3 + MPN + ETHLN + MVKN + MCRHN + MCRHNB + PROPNN + R4N2 + PRN1 + PRPN + R4N1 + HONIT + MONITS + MONITU + OLND + OLNN + IHN1 + IHN2 + IHN3 + IHN4 + INPB + INPD + ICN + 2IDN + ITCN + ITHN + ISOPNOO1 + ISOPNOO2 + INO2B + INO2D + INA + IDHNBOO + IDHNDOO1 + IDHNDOO2 + IHPNBOO + IHPNDOO + ICNOO + 2IDNOO + MACRNO2 + ClO + HOCl + ClNO2 + 2ClNO3 + 2Cl2O2 + 2OClO + O + O1D + IO + HOI + IONO + 2IONO2 + 2OIO + 2I2O2 + 3I2O3 + 4I2O4;
LOx : O3 + NO2 + 2NO3 + PAN + PPN + MPAN + HNO4 + 3N2O5 + HNO3 + BrO + HOBr + BrNO2 + 2BrNO3 + MPN + ETHLN + MVKN + MCRHN + MCRHNB + PROPNN + R4N2 + PRN1 + PRPN + R4N1 + HONIT + MONITS + MONITU + OLND + OLNN + IHN1 + IHN2 + IHN3 + IHN4 + INPB + INPD + ICN + 2IDN + ITCN + ITHN + ISOPNOO1 + ISOPNOO2 + INO2B + INO2D + INA + IDHNBOO + IDHNDOO1 + IDHNDOO2 + IHPNBOO + IHPNDOO + ICNOO + 2IDNOO + MACRNO2 + ClO + HOCl + ClNO2 + 2ClNO3 + 2Cl2O2 + 2OClO + O + O1D + IO + HOI + IONO + 2IONO2 + 2OIO + 2I2O2 + 3I2O3 + 4I2O4;
PCO : CO;
LCO : CO;
PSO4 : SO4;
LCH4 : CH4;
PH2O2 : H2O2;
Note
The \(PO_x\), \(LO_x\), \(PCO\), and \(LCO\) families are used for computing budgets in the GEOS-Chem benchmark simulations. \(PSO4\) is required for simulations using TOMAS aerosol microphysics.
To add a new prod/loss family, add a new line to the #FAMILIES
section with the format
FAM_NAME : MEMBER_1 + MEMBER_2 + ... + MEMBER_N;
The family name must start with P
or L
to indicate
whether KPP should calculate a production or a loss rate. You will
also need to make a corresponding update to the GEOS-Chem
species database (species_database.yml
) in order
to define the FullName
, Is_Gas
, and
MW_g
, and attributes. For example, the entries for family
species LCO
and PCO
are:
LCO:
FullName: Dummy species to track loss rate of CO
Is_Gas: true
MW_g: 28.01
PCO:
FullName: Dummy species to track production rate of CO
Is_Gas: true
MW_g: 28.01
The maximum number of families allowed by KPP is currently set to 300.
Depending on how many prod/loss families you add, you may need to
increase that to a larger number to avoid errors in KPP. You can change
the number for MAX_FAMILIES
in
KPP/kpp-code/src/gdata.h
and then rebuild the KPP executable.
// - Many limits can be changed here by adjusting the MAX_* constants
// - To increase the max size of inlined code (F90_GLOBAL etc.),
// change MAX_INLINE in scan.h.
//
// NOTES:
// ------
// (1) Note: MAX_EQN or MAX_SPECIES over 1023 causes a seg fault in CI build
// -- Lucas Estrada, 10/13/2021
//
// (2) MacOS has a hard limit of 65332 bytes for stack memory. To make
// sure that you are using this max amount of stack memory, add
// "ulimit -s 65532" in your .bashrc or .bash_aliases script. We must
// also set smaller limits for MAX_EQN and MAX_SPECIES here so that we
// do not exceed the avaialble stack memory (which will result in the
// infamous "Segmentation fault 11" error). If you are stll having
// problems on MacOS then consider reducing MAX_EQN and MAX_SPECIES
// to smaller values than are listed below.
// -- Bob Yantosca (03 May 2022)
#ifdef MACOS
#define MAX_EQN 2000 // Max number of equations (MacOS only)
#define MAX_SPECIES 1000 // Max number of species (MacOS only)
#else
#define MAX_EQN 11000 // Max number of equations
#define MAX_SPECIES 6000 // Max number of species
#endif
#define MAX_SPNAME 30 // Max char length of species name
#define MAX_IVAL 40 // Max char length of species ID ?
#define MAX_EQNTAG 32 // Max length of equation ID in eqn file
#define MAX_K 1000 // Max length of rate expression in eqn file
#define MAX_ATOMS 10 // Max number of atoms
#define MAX_ATNAME 10 // Max char length of atom name
#define MAX_ATNR 250 // Max number of atom tables
#define MAX_PATH 300 // Max char length of directory paths
#define MAX_FILES 20 // Max number of files to open
#define MAX_FAMILIES 300 // Max number of family definitions
#define MAX_MEMBERS 150 // Max number of family members
#define MAX_EQNLEN 300 // Max char length of equations
#define MAX_EQNLEN 200
Important
When adding a prod/loss family or changing any of the other
settings in gckpp.kpp
, you must re-run KPP to produce
new Fortran90 files for GEOS-Chem.
Production and loss families are archived via the HISTORY diagnostics. For more information, please see the Guide to GEOS_Chem History diagnostics on the GEOS-Chem wiki.
Changing the numerical integrator
Several global options for KPP are listed at the top of the
gckpp.kpp
file:
#MINVERSION 3.0.0 { Need this version of KPP or later }
#INTEGRATOR rosenbrock_autoreduce { Use Rosenbrock integration method }
#AUTOREDUCE on { ... with autoreduce enabled but optional }
#LANGUAGE Fortran90 { Generate solver code in Fortran90 ... }
#UPPERCASEF90 on { ... with .F90 suffix (instead of .f90) }
#DRIVER none { Do not create gckpp_Main.F90 }
#HESSIAN off { Do not create the Hessian matrix }
#MEX off { MEX is for Matlab, so skip it }
#STOICMAT off { Do not create stoichiometric matrix }
The #INTEGRATOR tag specifies the choice of numerical integrator that you wish to use with your chemical mechanism. The table below lists
Simulation |
#INTEGRATOR |
#AUTOREDUCE |
---|---|---|
carbon |
|
|
custom |
|
|
fullchem |
|
|
Hg |
|
Attention
The auto-reduction option is activated but disabled by default
in the GEOS-Chem carbon and fullchem mechanisms. You must
activate the auto-reduction option in
geoschem_config.yml
.
If you wish to use a different integrator for research purposes, you may select from several more options.
The #LANGUAGE should be set to Fortran90 and #UPPERCASEF90 should be set to on.
The #MINVERSION should be set to 3.0.0. This is the minimum KPP version you should be using with GEOS-Chem.
The other options should be left as they are, as they are not relevant to GEOS-Chem.
For more information about KPP settings, please see https://kpp.readthedocs.io.
GEOS-Chem version history
For a list of updates by GEOS-Chem version, please see:
Known bugs and issues
Please see our Issue tracker on GitHub for a list of recent bugs and fixes.
Current bug reports
These bug reports (listed at GitHub) are currently unresolved. We hope to fix these in future releases.
Contributing Guidelines
Thank you for looking into contributing to GEOS-Chem! GEOS-Chem is a grass-roots model that relies on contributions from community members like you. Whether you’re new to GEOS-Chem or a longtime user, you’re a valued member of the community, and we want you to feel empowered to contribute.
Updates to the GEOS-Chem model benefit both you and the entire GEOS-Chem community. You benefit through coauthorship and citations. Priority development needs are identified at GEOS-Chem users’ meetings with updates between meetings based on GEOS-Chem Steering Committee (GCSC) input through Working Groups.
We use GitHub and ReadTheDocs
We use GitHub to host the GEOS-Chem Classic source code, to track issues, user questions, and feature requests, and to accept pull requests: https://github.com/geoschem/geos-chem. Please help out as you can in response to issues and user questions.
GEOS-Chem Classic documentation can be found at geos-chem.readthedocs.io.
When should I submit updates?
Submit bug fixes right away, as these will be given the highest priority. Please see “Support Guidelines” for more information.
Submit updates (code and/or data) for mature model developments once you have submitted a paper on the topic. Your Working Group chair can offer guidance on the timing of submitting code for inclusion into GEOS-Chem.
The practical aspects of submitting code updates are listed below.
How can I submit updates?
We use GitHub Flow, so all changes happen through pull requests. This workflow is described here.
As the author you are responsible for:
Testing your changes
Updating the user documentation (if applicable)
Supporting issues and questions related to your changes
Process for submitting code updates
Contact your GEOS-Chem Working Group leaders to request that your updates be added to GEOS-Chem. They will will forward your request to the GCSC.
The GCSC meets quarterly to set GEOS-Chem model development priorities. Your update will be slated for inclusion into an upcoming GEOS-Chem version.
Create or log into your GitHub account.
Fork the relevant GEOS-Chem repositories into your Github account.
Clone your forks of the GEOS-Chem repositories to your computer system.
Add your modifications into a new branch off the main branch.
Test your update thoroughly and make sure that it works. For structural updates we recommend performing a difference test (i.e. testing against the prior version) in order to ensure that identical results are obtained).
Review the coding conventions and checklists for code and data updates listed below.
Create a pull request in GitHub.
The GEOS-Chem Support Team will add your updates into the development branch for an upcoming GEOS-Chem version. They will also validate your updates with benchmark simulations.
If the benchmark simulations reveal a problem with your update, the GCST will request that you take further corrective action.
Coding conventions
The GEOS-Chem codebase dates back several decades and includes contributions from many people and multiple organizations. Therefore, some inconsistent conventions are inevitable, but we ask that you do your best to be consistent with nearby code.
Checklist for submitting code updates
Use Fortran-90 free format instead of Fortran-77 fixed format.
Include thorough comments in all submitted code.
Include full citations for references at the top of relevant source code modules.
Remove extraneous code updates (e.g. testing options, other science).
Submit any related code or configuration files for GCHP along with code or configuration files for GEOS-Chem Classic.
Checklist for submitting data files
Choose a final file naming convention before submitting data files for inclusion to GEOS-Chem.
Make sure that all netCDF files adhere to the COARDS conventions.
Concatenate netCDF files to reduce the number of files that need to be opened. This results in more efficient I/O operations.
Chunk and deflate netCDF files in order to improve file I/O.
Include an updated HEMCO configuration file corresponding to the new data.
Include a README file detailing data source, contents, etc.
Include script(s) used to process original data
Include a summary or description of the expected results (e.g. emission totals for each species)
Also follow these additional steps to ensure that your data can be read by GCHP:
All netCDF data variables should be of type
float
(akaREAL*4
) ordouble
(akaREAL*8
).Use a recent reference datetime (i.e. after
1900-01-01
) for the netCDFtime:units
attribute.The first time value in each file should be 0, corresponding with the reference datetime.
How can I request a new feature?
We accept feature requests through issues on GitHub. To request a new feature, open a new issue and select the feature request template. Please include all the information that migth be relevant, including the motivation for the feature.
How can I report a bug?
Please see Support Guidelines.
Where can I ask for help?
Please see Support Guidelines
Support Guidelines
GEOS-Chem support is maintained by the GEOS-Chem Support Team (GCST), which is based jointly at Harvard University and Washington University in St. Louis.
We track bugs, user questions, and feature requests through GitHub issues. Please help out as you can in response to issues and user questions.
How to report a bug
We use GitHub to track issues. To report a bug, open a new issue. Please include your name, institution, and all relevant information, such as simulation log files and instructions for replicating the bug.
Where can I ask for help?
We use GitHub issues to support user questions. To ask a question, open a new issue and select the question template. Please include your name and institution in the issue.
What type of support can I expect?
We will be happy to assist you in resolving bugs and technical issues that arise when compiling or running GEOS-Chem. User support and outreach is an important part of our mission to support the International GEOS-Chem User Community.
Even though we can assist in several ways, we cannot possibly do everything. We rely on GEOS-Chem users being resourceful and willing to try to resolve problems on their own to the greatest extent possible.
If you have a science question rather than a technical question, you should contact the relevant GEOS-Chem Working Group(s) directly. But if you do not know whom to ask, you may open a new issue (See “Where can I ask for help” above) and we will be happy to direct your question to the appropriate person(s).
How to submit changes
Please see Contributing Guidelines.
How to request an enhancement
Please see Contributing Guidelines.
Editing this User Guide
This user guide is generated with Sphinx. Sphinx is an open-source Python
project designed to make writing software documentation easier. The
documentation is written in a reStructuredText (it’s similar to
markdown), wh ich Sphinx extends for software documentation. The
source for the documentation is the docs/source
directory in
top-level of the source code.
Quick start
To build this user guide on your local machine, you need to install
Sphinx and its dependencies. Sphinx is a Python 3 package and it is
available via pip. This user guide uses the Read The Docs
theme, so you will also need to install
sphinx-rtd-theme
. It also uses the sphinxcontrib-bibtex and recommonmark extensions, which you’ll need
to install.
$ cd docs
$ pip install -r requirements.txt
To build this user guide locally, navigate to the docs/
directory and make the html
target.
$ make html
This will build the user guide in docs/build/html
, and you can open index.html
in your web-browser. The source files for the user guide are found in docs/source
.
Note
You can clean the documentation with make clean
.
Learning reST
Writing reST can be tricky at first. Whitespace matters, and some directives can be easily miswritten. Two important things you should know right away are:
Indents are 3-spaces
“Things” are separated by 1 blank line. For example, a list or code-block following a paragraph should be separated from the paragraph by 1 blank line.
You should keep these in mind when you’re first getting started. Dedicating an hour to learning reST will save you time in the long-run. Below are some good resources for learning reST.
reStructuredText primer: (single best resource; however, it’s better read than skimmed)
Official reStructuredText reference (there is a lot of information here)
Presentation by Eric Holscher (co-founder of Read The Docs) at DjangoCon US 2015 (the entire presentation is good, but reST is described from 9:03 to 21:04)
A good starting point would be Eric Holscher’s presentations followed by the reStructuredText primer.
Style guidelines
Important
This user guide is written in semantic markup. This is important so that the user guide remains maintainable. Before contributing to this documentation, please review our style guidelines (below). When editing the source, please refrain from using elements with the wrong semantic meaning for aesthetic reasons. Aesthetic issues can be addressed by changes to the theme.
For titles and headers:
Section headers should be underlined by
#
charactersSubsection headers should be underlined by
-
charactersSubsubsection headers should be underlined by
^
charactersSubsubsubsection headers should be avoided, but if necessary, they should be underlined by
"
characters
File paths (including directories) occuring in the text should use
the :file:
role.
Program names (e.g. cmake) occuring in the text should
use the :program:
role.
OS-level commands (e.g. rm) occuring in the text should
use the :command:
role.
Environment variables occuring in the text should use the
:envvar:
role.
Inline code or code variables occuring in the text should use the
:code:
role.
Code snippets should use .. code-block:: <language>
directive like so
.. code-block:: python
import gcpy
print("hello world")
The language can be “none” to omit syntax highlighting.
For command line instructions, the “console” language should be
used. The $
should be used to denote the console’s
prompt. If the current working directory is relevant to the
instructions, a prompt like $~/path1/path2$
should be
used.
Inline literals (e.g. the $
above) should use the
:literal:
role.