River Discharge Data

The GEOGloWS model provides forecast and retrospective simulations of river discharge. The retrospective simulation is derived from the ERA5 Runoff data and the daily forecasts use Integrated Forecast System (IFS) 48r1 data. You can access this data in 2 ways; through a web data service and in bulk downloads from an AWS storage bucket (S3). Specific instructions for acquiring data through these can be found in the "Web Services" tab.

About Bulk Downloads (NetCDF and Zarr)

When downloading in bulk from AWS S3, the data comes in the form of netCDF files. There are 3 primary netCDF variables you will need to query. The "rivid" variable is a 1 dimensional array of all the ID numbers of the rivers contained in that file. The "time" variable is a 1 dimensional array of the time stamps of the discharge data in that file, usually in units of  number of days since 1 January of the start of the decade. The "Qout" variable is a 2 dimensional array with 1 column per river and 1 row per time step. Typically, users want to select all timesteps (all rows, axis 1) for a specific river (1 column, axis 2). We recommend Python and the Xarray package for writting scripts to query this data. 

Retrospective Discharge (netCDF, CSV, JSON)

Time Variable

The retrospective discharge simulation data goes back roughly 80 years, beginning January 1st, 1940 and going through to the present.

Discharge Variable - Qout

NetCDFs containing retrospective discharge simulation data are structured according to the Climate and Forecast Conventions. The NetCDFs include average river water discharge downstream of each river reach labeled as "Qout" and are ordered by the unique identifier for each river reach, the "rivid" variable. Each "rivid" has a corresponding latitude and longitude for a point related to individual river reaches.

Zarr vs NetCDF

Meteorlogical data is consistently archived in the NetCDF file format, but these files can be enormous. Due to size, Python's ability to read this data slows down; hence, GEOGloWS 2.0 will be moving to the use of ZARR files for cloud access and use in applications. For bulk downloads, the data will be broken up and downloaded a decade per NetCDF.