Download data from dry-run output

Once you have successfully executed a GEOS-Chem dry-run, you can use the output from the dry-run (contained in the log.dryrun file) to download the data files that GEOS-Chem will need to perform the corresponding “production” simulation.

Choose a data portal

You can download input data from any of the portals listed below.

GEOS-Chem data portals and access methods

Portal

S3 Explorer

AWS CLI

HTTP

Bashdatacatalog

Globus

GEOS-Chem Input Data
(The main source of GEOS-Chem input data)

Yes

Yes

Yes

Yes

Yes

GEOS-Chem Nested Input Data

Yes

Yes

Yes

No

No

GCAP 2.0 meteorology hosted at U. Rochester

No

No

Yes

No

No

Most of the data that you will need is contained in the GEOS-Chem Input Data portal.

Activate the GCPy Python environment

You will need to activate a Python environment before you can start downloading data. We recommend using the Python environment for GCPy, as it has all of the relevant packages installed. If you installed GCPy from PyPI, then no further action is needed. On the other hand, if you installed GCPy from conda-forge, you will need to activate the GCPy Python environment with this command:

$ conda activate gcpy_env
(gcpy_env) $

The prefix (gcpy_env) will be added to the command prompt, which lets you know that the Python environment is active. (If you installed GCPy from PyPI, you will not see this prefix.)

Run the download_data.py script on the dryrun log file

Navigate to your GEOS-Chem run directory. The command that you will use to download data takes the form:

(gcpy_env) $ ./download_data.py log.dryrun PORTAL-NAME

where:

For example, to download data from the GEOS-Chem Input Data portal, use this command:

(gcpy_env) $ ./download_data.py log.dryrun geoschem+http

But if you have AWS CLI (command-line interface) set up on your machine, use this command instead:

(gcpy_env) $ ./download_data.py log.dryrun geoschem+aws

This will result in a much faster data transfer than by HTTP. This is also the command you will use if you are running GEOS-Chem Classic on an AWS EC2 cloud instance.

(Optional) Examine the log of unique data files

The download_data.py program will generate a log of unique data files (i.e. with all duplicate listings removed), which looks similar to this:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! LIST OF (UNIQUE) FILES REQUIRED FOR THE SIMULATION
!!! Start Date       : 20190701 000000
!!! End Date         : 20190701 010000
!!! Simulation       : fullchem
!!! Meteorology      : MERRA2
!!! Grid Resolution  : 4.0x5.0
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
./HEMCO_Config.rc
./HEMCO_Config.rc.gmao_metfields
./HEMCO_Diagn.rc
./HISTORY.rc
./Restarts/GEOSChem.Restart.20190701_0000z.nc4 --> /home/ubuntu/ExtData/GEOSCHEM_RESTARTS/GC_14.5.0/GEOSChem.Restart.fullchem.20190701_0000z.nc4
./Restarts/HEMCO_restart.201907010000.nc
./geoschem_config.yml
/path/to/ExtData/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_j2j.dat
/path/to/ExtData/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_scat-aer.dat
/path/to/ExtData/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_scat-cld.dat
/path/to/ExtData/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_scat-ssa.dat
/path/to/ExtData/CHEM_INPUTS/CLOUD_J/v2024-09/FJX_spec.dat
/path/to/ExtData/CHEM_INPUTS/FastJ_201204/fastj.jv_atms_dat.nc
/path/to/ExtData/CHEM_INPUTS/Linoz_200910/Linoz_March2007.dat
/path/to/ExtData/CHEM_INPUTS/Olson_Land_Map_201203/Olson_2001_Drydep_Inputs.nc
/path/to/ExtData/CHEM_INPUTS/UCX_201403/NoonTime/Grid4x5/InitCFC_JN2O_01.dat

 ... etc ...

This name of this “unique” log file will be the same as the log file with dryrun ouptut, with .unique appended. In our above example, we passed log.dryrun to download_data.py, so the “unique” log file will be named log.dryrun.unique. This “unique” log file can be very useful for documentation purposes.

If you wish to only produce the log of unique data files without downloading any data, then type the following command from within your GEOS-Chem run directory:

$ ./download_data.py log.dryrun skip-download

or for short:

$ ./download_data.py log.dryrun skip

This can be useful if you already have the necessary data downloaded to your system but wish to create the log of unique files for documentation purposes (such as for benchmark simulations, etc.)

Deactivate the GCPy Python environment

Once you have downloaded all of the data needed for your GEOS-Chem Classic simulation, you can deactivate the GCPy Python environment.

(gcpy_env) $ conda deactivate
$

This will remove the (gcpy_env) prefix from the command prompt.

(Optional) Download additional meteorology data

You may need to perform a subsequent dry-run simulation to download additional data that are stored separately from the GEOS-Chem Input Data portal:

  1. If you plan to run a GEOS-Chem Classic nested-grid simulation with meteorology fields that have been cropped to a specific nested grid domain, then follow these steps:

    $ ./gcclassic --dryrun | tee log.dryrun.nested
    
    $ conda activate gcpy_env                                      # Skip if using GCPy from PyPI
    
    (gcpy_env) $ ./download_data.py log.dryrun.nested nested+http  # or nested+aws if you have AWSCLI
    
    (gcpy_env) $ conda deactivate                                  # Skip if using GCPy from PyPI
    

    This will download the cropped meteorology fields from our GEOS-Chem Nested Input Data portal to your computer system or EC2 instance.

  2. If you plan to perform a GEOS-Chem Classic simulation drven by GCAP 2.0 meteorology, follow these steps:

    $ ./gcclassic --dryrun | tee log.dryrun.gcap2
    
    $ conda activate gcpy_env                                      # Skip if using GCPy from PyPI
    
    (gcpy_env) $ ./download_data.py log.dryrun.gcap2 rochester
    
    (gcpy_env) $ conda deactivate                                  # Skip if using GCPy from PyPI
    

    This will download the GCAP 2.0 meteorology data from the GCAP 2.0 data portal hosted at U. Rochester to your computer system or EC2 instance.