GEOGLOWS Cookbook

This page is a list of mini-tutorials answering common tasks for retrieving, processing, or browsing GEOGLOWS data. Most are in the form of short code snippets, or "recipes", for common tasks retrieving and using GEOGLOWS river data.

Don't see the recipe you need here? Please ask the GEOGLOWS mailing list and your solution could be posted here! Browse the table of contents and click on an entry to jump to that code block.

Download GEOGLOWS forecast data for my river

If you only need to download data for a few rivers, or you do not want to write code, use our web app! GEOGLOWS has a web tool for graphically browsing the latest forecast using maps, downloading forecast or retrospective data, comparing forecasts with the latest satellite imagery, and finding links to more information. Visit https://apps.geoglows.org/apps/geoglows-hydroviewer/ to get started.

Get a list of all rivers ID numbers in my watershed

Every river in the GEOGLOWS model has an attribute called "TerminalLink". The TerminalLink is the ID number of the river at the outlet of the watershed a given river of interest is in. Every river in the watershed has the same outlet. You can filter the table of rivers to select only those rivers which all drain to the same river. You may use the GIS datasets and perform that in ArcGIS or QGIS. You can find links to retrieve GIS files for the streams using the Available Data page and the tutorial on Finding River Numbers Alternatively, you can do this in code by using the tables of metadata for GEOGLOWS using the model metadata table. You will need to download that table (about 250 MB) in order to solve this problem via code.

Retrieve forecasts for many rivers at once

The geoglows python package allows you to request data for many rivers simultaneously. You do not need to use for loops in your code to make sequential requests for data for a single river. Prepare a list of all the river ID numbers you want to get data for. As an example, you could get a list of all rivers in a watershed (see tutorial on this page). When using the geoglows python package, you can pass that entire list of rivers to the functions to retrieve data.

Archive the forecast records dataset

New forecasts are generated daily. The ensemble average flows predicted to occur in the 24 hours between new forecast are archived each day at the start of a new forecast simulation. This dataset is refered to as the forecast record. It is continuously updated each day. This dataset is not archived on an AWS bucket. It is only available from the REST service for convenience in plotting on the fly. However, the full ensemble streamflow prediction is saved every day.

Please do not write code which loops through a list of rivers and downloads the forecast records each day. This is burdensome to the REST service and is not as fast or efficient if the number of rivers is large. If you want to download a copy of these, you can retrieve them from AWS where the full ensemble forecast is stored each day and calculate them using the tools available in the geoglows python package.

Save a local copy of forecast or retrospective data

Many users want to keep copies of new forecasts on their own devices. In particular, some users are required to download data so they can be moved to a secure compute environment or high performance compute center. You can use the geoglows Python package to retrieve either forecast or retrospective data and save a copy.

By default, you download data as a DataFrame (tabular data). You have many options for saving DataFrames to disc such as Parquet, CSV, or Excel. If you are storing large tables of data, such as retrospective or forecast ensemble streamflow for many rivers, we recommend the Parquet format. It will be faster to read and write as well as more compressed than many other formats.

For more advanced users, you can also retrieve data as an Xarray Dataset suitable for higher dimensional data and file formats such as netCDF or Zarr. You can also save Xarray datasets to disc in several formats. The right format depends on your anticipated use case. Most users do not need this format.

To specify the format of data to download, use format='df' to retrieve tables of data (DataFrames) or format='xarray' to retrieve as a multidimension dataset. Most users should use the default format, DataFrame.