Full details

Input data

The model takes as inputs ERA5-Land data, previous river discharge measurements, and while forecasting, weather forecast data assembled from the NOAA GEFS and CFS forecasts. While training the weather forecast data are not used.

We have experimented with a variety of input weather variables, but at the moment the weather parameters used are:

  • tp (Daily total precipitation)

  • t2m (2-meter temperature daily mean)

  • skt (Skin (surface) temperature daily mean)

  • snowc (Percent snow cover daily mean)

  • sd (Snow-depth water-equivalent daily mean)

  • slhf (Surface latent heat flux daily mean)

  • swvl1 (Level 1 soil moisture daily mean)

  • stl1 (Level 1 soil temperature daily mean)

All variables are individually standardized before inclusion in the model, so whether they are presented as sums or averages is irrelevant to the output of the forecast so long as the choice is consistent.

Model design

Our model is based on the design of the model presented in Deng et al.1 They create a two-part LSTM model for the discharge of a river basin (Fig. 1), specifically the discharge at the Lobith gauge station on the Rhine river just across the Dutch border from Germany in the Netherlands. It is the final point on the Rhine before it splits into its many distributaries. The model contains two primary components: a hindcast model that uses observed weather and discharge measurements to encode the current state of the river basin, and a forecast model that ingests the encoded state and numerical weather forecasts to predict future river discharges. Deng et al. forecast a single gauge station, but our model forecasts all gauge stations that are included as input to the model. The lengths of the look back window and forecast horizon are adjustable parameters, as are the size of the hidden layer and the number of layers (stacked LSTM). Currently we forecast to a horizon of 28 days. The model is implemented with Lightning, which is itself built on PyTorch. We have experimented with a variety of possible architectures and lookback periods and settled on the optimal one based on the results of those experiments. An overview of these experiments will be presented later in this document.

Figure 1 (from Deng et al.): The model ingests a time series of observed weather data (ERA5-Land) aggregated over the river basin, combined with observed measurements of select gauge stations (blue boxes). The second component of the model is also an LSTM that is fed weather forecasts (in inference/operational mode) or observed weather data (in training mode), and produces predictions of future river discharge. Fully connected layers connect the hidden and cell states of the two components to each other.

Description of input data

ERA5-Land

The ECMWF Reanalysis v5 (ERA5) is a 0.25-0.5 degree gridded estimation of past observed weather based on ground- and space-based weather measurements, created daily by the European Center for Medium-range Weather Forecasts (ECMWF). ERA5-Land is a replay of the standard ERA5 dataset with enhanced resolution of 0.1 degrees that focuses on accuracy of land-based variables2, which are what interest us. The temporal gridding is hourly, but we use daily averages of the hourly data in our model. Reanalysis data are interpolated using both past and future weather measurements, which means that its availability lags several days (typically 5 for ERA5). Technically, the data released on a 5-day lag is the ERA5 Early Release (ERA5T), which then undergoes validation by ECMWF. The finalized ERA5 dataset is released on a 2-3 month lag, but is typically identical to ERA5T. If any irregularities in ERA5T are found during validation, ECMWF communicates this to the user community and updates ERA5.

We download the ERA5-Land data we need daily at 1100 UTC from the CDS data store. We download 5 files for each day since the variables we want come in several different formats. There are variables returned in both GRIB1 and GRIB2 formats, variables returned with separate time (date) and step (hour) dimensions, variables returned with a single time dimension (hour), hourly instantaneous variables, and daily accumulated variables. Each variable we use comes in a particular combination of those factors, leading to 5 different output specifications for the 8 variables we use in the model. We download individual files for each basin for which we produce forecasts, which as of December 10th, 2025 are the Rhine and Mississippi rivers. All files for a given basin are processed into a consistent format, merged together, and stored in an S3 bucket as Zarr files. Each month is stored individually. Because the possibility exists for ECMWF to correct ERA5 for irregularities up to 2-3 months later, we always download data for the past 5 months. This way if a correction is issued our pipeline will update to the corrected data automatically. This could impact further training runs or operational forecasts for models where the look back period is more than 2-3 months but happens rarely.

River discharge

For the Rhine river, discharge measurements come from the PEGELONLINE portal maintained by the German Federal Waterways and Shipping Administration (WSV)3. PEGELONLINE provides level and discharge measurements for many monitoring stations across Germany, with 2000 being the earliest year with available data. However, not all stations provide both level and discharge measurements. Although we report forecasts of river levels, we actually predict the river discharge and convert that forecast to a level based on the rating curve of the river. Since PEGELONLINE does not provide rating curves directly, we compute them ourselves using data for stations where both level and discharge measurements are provided. That limits our model to a subset of stations that are available from PEGELONLINE. As of December 10th, 2025, the stations we use in our model and for which we download measurements for the stations on this page.

Our model produces forecasts for all of those stations, but we only present the stations on the Rhine (excepting Kehl-Kronenhof) in the application. PEGELONLINE updates its measurements every 15 minutes at 0/15/30/45 minutes past every hour, so we download from the PEGELONLINE API at 5/20/35/50 minutes past every hour to allow for up to 5 minutes of latency. Each time the measurements are downloaded, the daily mean/max/min levels which are shown in the UI are recalculated. Therefore, the mean level seen by the application user for the current day is the mean as of the last time the user refreshed their browser or reloaded the data in the application, with a possible lag of 15 minutes plus PEGELONLINE's own reporting latency, which is usually no more than 30 minutes. The download script runs as an AWS Lambda task and takes a few minutes to execute.

For the Mississippi river, discharge measurements come from the United States Geological Survey's National Water Information System (USGS NWIS)4. USGS measurements are also reported every 15 minutes, so our download script runs on the same schedule as the PEGELONLINE download script (5/20/35/50 minutes past the hour). Many USGS gages have measurement records of many decades. However, as our model is not capable of handling large swaths of missing data in its training set, we are limited by the station with the shortest historical measurement record. As of December 10th, 2025, the earliest date for which we have collected data is January 1st, 1980, and we download measurements for the stations on this page.

Stations were chosen to balance several criteria. Each station needed to have sufficient historical measurements of both discharge and level to provide a reasonably long training period. We break up the full river basin into subbasins, which are determined by the location of available discharge measurements. We strove to create subbasins that are similar in size and compact in shape, which further constrained the possible stations. For the Rhine there are 9 subbasins, and for the Mississippi there are

Global Ensemble Forecast System (GEFS)

We use the Global Ensemble Forecast System (GEFS) from the United States National Centers for Environmental Prediction (NCEP) for short-term weather forecasts (16 days). The GEFS model is run 4 times daily, at 0000, 0600, 1200, and 1800 UTC. We download daily the variables we require from the 0000 UTC run at 1100 UTC using Herbie. We download the GEFS control forecast and the 30 perturbed forecasts. Due to particularities in the data format similar to what is described in the section covering the ERA5-Land data, Herbie retrieves the requisite variables in several download steps. GEFS forecasts are reported in 3-hr steps, but the 6n+3 steps are semi-redundant with the 6n steps as the 6n steps forecast averages or accumulations over the past 6 hours, while the 6n+3 steps forecast averages or accumulations over the past 3 hours. Since we take a daily average of each variable, we only download the 6n steps. The 0000 UTC run of the perturbed forecasts extends to 35 days, but as of December 10th, 2025 we do not download past a 16-day horizon.

Climate Forecast System (CFS)

We use the Climate Forecast System (CFS) produced by NCEP for lead times beyond 16 days in our river level forecasts. There are a variety of CFS products, but the one we are interested in has a 6-hr step, is run multiple times daily, including at 0000 UTC, extends to 9 months, is produced in a 4-member ensemble. We download all 4 members daily at 1100 UTC from the 0000 UTC run and average across the 4x daily steps to get a daily average for each variable.

Data preprocessing

Harmonization of ECMWF and NCEP data

There are slight differences in the definition of variables reported by ECMWF (ERA5-Land) and NCEP (GEFS/CFS). Because we train using the reanalysis data but run inference using the forecast data, an additional harmonization step ensures that the two data sources use identical variable definitions, or at least as close to identical as possible. For the NCEP forecasts, we actually download the average precipitation rate and convert that to daily total precipitation to match the ERA5-Land variable. The snow-depth water equivalent is different by a constant factor, the surface latent heat flux is different by a constant factor and a sign flip (outgoing - incoming for NCEP vs. incoming - outgoing for ERA5-Land), and the definition of the first soil level depth is slightly different (10 cm vs 7 cm). For the soil depth, the soil temperatures are similar enough that they do not require any transformation, and the soil water content is adjusted by an empirically determined constant factor. There is not a definitionally linear relationship between the two soil water definitions, so this does present a small source of error.

Creation of dataset object

The code object that holds and provides the processed data for training and inference is an instance of the RiverForecastingDataset class, which ultimately derives from the PyTorch Dataset class. When indexed, it returns a tuple of PyTorch Tensors, which include the input tensor to the hindcast, the input tensor to the forecast, the target discharge values (when training), and the current value of the river discharge, which is necessary to provide for certain target options for our models.

Prior to training and inference, we have created area masks for all of the subbasins in the

Training procedure

We train our models on an AWS EKS cluster managed by Flyte using m5.xlarge and m5.2xlarge instances.

Overview of hyperparameter optimization

Last updated