Interactions between data assimilation and post-processing
I have recently contributed to a paper where we investigate how statistical post-processing and data assimilation (also called real-time model updating in the engineering community) can be intrinsically related in the hydrological forecasting framework. The paper, co-written with François Bourgin (main author), Guillaume Thirel, and Vazken Andréassian, can be found here. We were basically guided by the following questions:
- How does data assimilation impact hydrological ensemble forecasts?
- How does post-processing impact hydrological ensemble forecasts?
- How does data assimilation interact with post-processing to improve the quality and skill of hydrological ensemble forecasts over the forecast lead times?
Our study was based on 202 unregulated catchments in France, data at hourly time steps over 1997–2006, bias corrected short-range meteorological ensemble predictions from the PEARP system by Météo-France (11 members, 60-h forecast range, spatially disaggregated to an 8 km 8 km grid), and the GRP hydrological model (a continuous, parsimonious lumped storage-type model designed for flood forecasting and developed at Irstea in France).
Our main conclusions indicated that:
- Data assimilation has a strong impact on improving the quality of the ensemble mean, and a much lesser effect on the variability of the ensemble members (i.e., their spread). Post-processing has a strong impact on forecast reliability.
- The benefits of the combined use of data assimilation and post-processing are clearly shown: both contribute to achieve reliable and sharp forecasts, with impacts acting differently according to the target lead time.
- The stronger impact on forecast reliability comes from the use of post-processing. Adding data assimilation to the system helps in improving sharpness and reliability at all lead times, with higher gains in performance at shorter lead times
But what were we considering as “post-processing” and as “data assimilation”?
One interesting comment of an anonymous reviewer of the paper pointed out to our definition of what pertains to post-processing and what pertains to data assimilation (by the way, interesting posts on these topics were previously published in the Hepex blog and can be found following the links).
The reviewer asked us to provide a clear explanation and difference between data assimilation and post-processing.
In our study, a hydrological uncertainty processor (HUP) was used as a post-processing technique applied to estimate the conditional errors of the hydrological model (i.e., the model is run with observed weather data). Basically, it can be summarized by the following characteristics:
- Data-based and non-parametric method to assess model simulation uncertainties.
- Empirical quantiles of relative errors estimated (stratified by different flow groups).
- HUP trained during the period used for calibrating the parameters of the hydrological model.
As for data assimilation, we considered two procedures in the flood forecasting chain:
- the last available observed discharge is used to directly update the routing store state,
- the last relative error is used to correct the model output with a multiplicative coefficient.
The question of the reviewer more specifically addressed the last point. The reviewer wrote the following:
“…[the] DA scheme includes both model-state updates (“routing store state”), along with multiplicative model error corrections (MEC) applied to the model output discharge time-series. The latter (MEC) could fall into a gray-zone between DA and PP: if DA were defined to operate only on states of the hydrological model, then MEC would not qualify as DA; similarly, if PP [post-processing] could “operate” on time-series of model output (but was agnostic to hydrologic model states), then the MEC could qualify as PP; however, if PP were only to “operate” on static distributions of model errors, but potentially conditional on model output (as the authors’ PP is constructed), but not conditional on model output error values, then the MEC would not fall under the “PP umbrella”. […]”
“Gray-zones” in hydro-meteorological ensemble forecasting
I found it a very interesting comment; one that forced us to clarify our procedures, as I indicated above and you can read in the paper. Interestingly, some time later, at the 2015 EGU General Assembly in Vienna (where we presented a poster on the results of this paper during the ‘Ensemble hydro-meteorological forecasting’ session), a researcher from a meteorological institute came to discuss the same issue: for him, DA implies techniques that change the states of a model (it thus does not include output error corrections).
I didn’t dare to ask this researcher if he was our reviewer, but it made me think again that the “gray-zone” seems to be a real one. Overall, post-processing and data assimilation represent techniques that may be used in a forecasting system to improve the quality of the forecasts (i.e., to provide more accurate and reliable forecasts) and to, ultimately, enhance the usefulness of the forecasts in decision-making. But what we have also learned is that it is very useful to clearly define these operations in the context we use them in our pre-/operational systems.
Besides, I can only believe that other “gray-zones” may exist in hydro-meteorological ensemble forecasting. If you have one in mind, share it using the comment box. We will be glad to hear more about it!
2 thoughts on “Interactions between data assimilation and post-processing”
Some years ago Fredrik Wetterhall and I discussed the need for a hydrological-meteorological dictionary, in particular to address the problem of “les faux amies”, i.e. similar words meaning different things. As the story goes about the German speaking guest at a London restaurant:
-Waiter, when will I become a beef? – Never, I hope, Sir! (zu bekommen = to get).
For a meteorologist data assimilation (DA) means updating your analysis of the current state of the atmosphere with new observations before launching the numerical forecast (NWP). In the most simple fashion it is done by running a very short forecast from the previous analysis and then modifying this forecast with newly received observations. In NWP this will improve the forecast accuracy (reduce the errors) but not affect the spread (the variability). For this different properties of the model (numerics, physics etc) have to be changed.
Post-processing (PP) in meteorology can mean almost anything you do with the NWP output, but mostly some sort of refinement on statistical grounds (SPP). This can improve both the accuracy and the spread.
This SPP can in principle take two forms:
a) from historical data (some weeks back to some years back) forecasts are calibrated against verifying observations, in its most simple form through a linear regression equation such as
corr = A*P1 + B*P2 + C*P3 + … where A, B, C etc are the optimal weights and P1, P2, P3 etc different forecast parameters.
b) on a day-to-day basis where the last forecast is matched against the verifying observation through an adaptive scheme that slightly modifies the time dependent coefficients At, Bt, Ct etc in a similar but time dependent correction equation corr = At*P1 + Bt*P2 + Ct*P3 + … This approach, using the Kalman filter technique, was my “bread and butter” during the last 25 years of my career.
But this approach sounds very much like the DA approach mentioned in the main article…
I would agree that a classical sense of DA involves updating model states and is distinct from PP that just transforms model output without touching those states. I do think there’s a gray area that houses techniques such particle filtering (PF), where particle states are not updated, but rather selected, weighted, culled, resampled, and used to initialize forecasts. To some extent, the reweighting of particles and the forecasts initialized with them (which affects the spread & bias of the forecasts) is conceptually similar to post-processing calibration procedures. Nonetheless, one difference may be that the objective functions of the PF utilize (minimize) analysis errors, which is common to DA, rather than train based on forecast verifications, a characteristic of PP.
Just a thought…