| Title: | Flood Frequency Analysis Framework |
|---|---|
| Description: | Tools to support systematic and reproducible workflows for both stationary and nonstationary flood frequency analysis, with applications extending to other hydroclimate extremes, such as precipitation frequency analysis. This package implements the FFA framework proposed by Vidrio- Sahagún et al. (2024) <doi:10.1016/j.envsoft.2024.105940>, originally developed in 'MATLAB', now adapted for the 'R' environment. This work was funded by the Flood Hazard Identification and Mapping Program of Environment and Climate Change Canada, as well as the Canada Research Chair (Tier 1) awarded to Dr. Pietroniro. |
| Authors: | Riley Wheadon [aut, cre], Cuauhtémoc Vidrio-Sahagún [aut], Alain Pietroniro [aut, fnd], Jianxun He [aut], Environment and Climate Change Canada (ECCC) [fnd] |
| Maintainer: | Riley Wheadon <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 0.1.2 |
| Built: | 2026-05-28 07:17:47 UTC |
| Source: | https://github.com/rileywheadon/ffa-framework |
This package provides tools for stationary (S-FFA) and nonstationary (NS-FFA)
flood flood frequency analysis of annual maximum series data. High-level wrapper
functions with the framework_* prefix orchestrate the EDA and/or FFA modules from
Vidrio-Sahagún et al. (2024) and generate
reports. Users who wish to develop customized workflows may use methods with
the following prefixes:
eda_*: Explore annual maximum series data for evidence of nonstationarity to
inform approach selection (S-FFA or NS-FFA):
Detect statistically significant change points.
Detect statistically significant temporal trends in the mean and variability.
select_*: Select a suitable probability distribution using the L-moments.
fit_*: Fit parameters given a distribution and approach (S-FFA or NS-FFA).
uncertainty_*: Quantify uncertainty by computing confidence intervals.
model_assessment() evaluates model performance using a variety of metrics.
Additional utility functions for visualization and computation are also available:
data_* methods load, transform, and decompose annual maximum series data.
plot_* methods produce diagnostic and summary plots.
utils_* methods implement distribution-specific computations.
Datasets from five hydrometric stations in Canada are provided as representative
use cases (other datasets in /inst/extdata are for testing purposes only):
Athabasca River at Athabasca (CAN-07BE001): An unregulated station with no statistical evidence of trends or change points (S-FFA recommended).
Kootenai River at Porthill (CAN-08NH21): A regulated station with evidence of an abrupt change in mean in 1972 (piecewise NS-FFA recommended).
Bow River at Banff (CAN-05BB001). An unregulated station with statistical evidence of a trend in the mean (NS-FFA recommended).
Chilliwack River at Chilliwack Lake (CAN-08MH016): An unregulated station with statistical evidence of a linear trend in variability (NS-FFA recommended).
Okanagan River at Penticton (CAN-08NM050): A regulated station with statistical evidence of a linear trend in both the mean and variability (NS-FFA recommended).
This package assumes familiarity with statistical techniques used in FFA, including parameter estimation (e.g., L-moments and maximum likelihood), dataset decomposition, and uncertainty quantification (parametric bootstrap and profile likelihood). For an explanation of these methods, see the FFA Framework wiki. For examples, see the vignettes on exploratory data analysis and flood frequency analysis.
Maintainer: Riley Wheadon [email protected]
Authors:
Cuauhtémoc Vidrio-Sahagún [email protected]
Alain Pietroniro [email protected] [funder]
Jianxun He [email protected]
Other contributors:
Environment and Climate Change Canada (ECCC) [funder]
Useful links:
Report bugs at https://github.com/rileywheadon/ffa-framework/issues
A dataframe of annual maximum series observations for station 05BB001, BOW RIVER AT BANFF in Alberta, Canada.
CAN_05BB001CAN_05BB001
A dataframe with 110 rows and 2 columns spanning the period 1909-2018.
Variables:
max: Numeric; the observed annual maximum series, in m/s.
year: Integer; the corresponding years.
This is an unregulated station in the RHBN. Whitfield & Pomeroy (2016) found that floods may be caused by rain, snow, or a combination of both. Therefore, practitioners should be careful when interpreting the results of FFA. Minimal human intervention in the basin means there is little justification for change points. EDA finds evidence of a decreasing trend in the mean.
Meteorological Service of Canada (MSC) GeoMet Platform
Whitfield P. H., and Pomeroy J. W. (2016) Changes to flood peaks of a mountain river: implications for analysis of the 2013 flood in the Upper Bow River, Canada, Hydrological Processes, 30: 4657–4673. doi:10.1002/hyp.10957.
A dataframe of annual maximum series observations for station 07BE001, ATHABASCA RIVER AT ATHABASCA in Alberta, Canada.
CAN_07BE001CAN_07BE001
A dataframe with 108 rows and 2 columns spanning the period 1913-2020.
Variables:
max: Numeric; the observed annual maximum series, in m/s.
year: Integer; the corresponding years.
This is an unregulated station that is not in the RHBN. Additionally,
The MKS/Pettitt tests find no evidence of change points at the 0.05 significance level.
Trend detection finds no evidence of trends in the mean or variability.
This dataset is an excellent introductory example to stationary FFA.
Meteorological Service of Canada (MSC) GeoMet Platform
A dataframe of annual maximum series observations for station 08MH016, CHILLIWACK RIVER AT CHILLIWACK LAKE in British Columbia, Canada.
CAN_08MH016CAN_08MH016
A dataframe with 95 rows and 2 columns spanning the period 1922-2016.
Variables:
max: Numeric; the observed annual maximum series, in m/s.
year: Integer; the corresponding years.
This is an unregulated station in the RHBN. Additionally,
The MKS/Pettitt tests find no evidence of change points at the 0.05 significance level.
Trend detection finds evidence of an increasing trend in the variability.
Meteorological Service of Canada (MSC) GeoMet Platform
A dataframe of annual maximum series observations for station 08NH021, KOOTENAI RIVER AT PORTHILL in British Columbia, Canada.
CAN_08NH021CAN_08NH021
A dataframe with 91 rows and 2 columns spanning the period 1928-2018.
Variables:
max: Numeric; the observed annual maximum series, in m/s.
year: Integer; the corresponding years.
This is a regulated station that is not in the RHBN. Additionally,
The Libby dam was constructed upstream of this station in 1972.
The Pettitt test finds evidence of a change point in 1972 at the 0.05 significance level.
The MKS test finds evidence of change points in 1960 & 1985 at the 0.05 significance level.
This dataset is an excellent introduction to change point detection and piecewise NS-FFA.
Meteorological Service of Canada (MSC) GeoMet Platform
A dataframe of annual maximum series observations for station 08NM050, OKANAGAN RIVER AT PENTICTON in British Columbia, Canada.
CAN_08NM050CAN_08NM050
A dataframe with 97 rows and 2 columns spanning the period 1921-2017.
Variables:
max: Numeric; the observed annual maximum series, in m/s.
year: Integer; the corresponding years.
This is a regulated station that is not part of the RHBN. The Okanagan River upstream of the station has been regulated since 1914 due to the construction of the first dam, followed by a second dam in 1920, and a regulation system in the early 1950s, consisting of four dams and 38 km of engineered channel. Rapid human settlement, development, and agricultural activity have also occurred in the watershed.
This dataset exhibits serial correlation and trends in both the mean and variability. Since the station is heavily influenced by reservoir operations, this dataset is intended for teaching purposes—not for practical flood estimation.
Meteorological Service of Canada (MSC) GeoMet Platform
Decomposes a nonstationary annual maxima series to derive its stationary stochastic component, which can be used to identify a best-fit distribution using conventional stationary methods, like those based on L-moments. The decomposition procedure follows that proposed by Vidrio-Sahagún and He (2022), which relies on the statistical representation of nonstationary stochastic processes.
data_decomposition(data, ns_years, ns_structure)data_decomposition(data, ns_years, ns_structure)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
Internally, the function does the following:
If there is a trend in the location, fit Sen's trend estimator and subtract away the fitted trend.
If there is a trend in the scale, estimate the variability of the data
with data_mw_variability(), fit Sen's trend estimator to the variability
vector, and rescale the data to remove the trend.
If necessary, shift the data so that its minimum is at least 1.
Numeric vector of decomposed data.
Vidrio-Sahagún, C. T., and He, J. (2022). The decomposition-based nonstationary flood frequency analysis. Journal of Hydrology, 612 (September 2022), 128186. doi:10.1016/j.jhydrol.2022.128186
data_mw_variability(), eda_sens_trend()
data <- rnorm(n = 100, mean = 100, sd = 10) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) data_decomposition(data, ns_years, ns_structure)data <- rnorm(n = 100, mean = 100, sd = 10) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) data_decomposition(data, ns_years, ns_structure)
Gets annual maximum series data for a hydrological monitoring station from the MSC GeoMet API.
data_geomet(station_id)data_geomet(station_id)
station_id |
A character scalar containing the ID of a hydrological monitoring station. You can search for station IDs by name, province, drainage basin, and location here. |
A dataframe with two columns:
max: A float, the annual maximum series observation, in m/s.
year: An integer, the corresponding year.
# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_geomet("05BB001")# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_geomet("05BB001")
Fetch annual maximum series data for a hydrological monitoring station from the package data directory.
data_local(csv_file)data_local(csv_file)
csv_file |
A character scalar containing the file name of a local dataset in
|
A dataframe with two columns:
max: A float, the annual maximum series observation, in m/s.
year: An integer, the corresponding year.
# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv")# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv")
Generates a time series of standard deviations using a moving window algorithm,
which can be used to explore potential evidence of nonstationarity in the
variability of a dataset. It returns a list that pairs each window’s mean year with
its window standard deviation. The hyperparameters size and step control the
behaviour of the moving window. Following the simulation findings from Vidrio-Sahagún
and He (2022), the default window size and step are set to 10 and 5 years
respectively. However, these can be changed by the user.
data_mw_variability(data, years, size = 10L, step = 5L)data_mw_variability(data, years, size = 10L, step = 5L)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
size |
Integer scalar. The number of years in each moving window.
Must be a positive number less than or equal to |
step |
Integer scalar. The offset (in years) between successive moving windows. Must be a positive number (default is 5). |
A list with two entries:
years: Numeric vector containing the mean year within each window.
std: Numeric vector of standard deviations within each window.
Vidrio-Sahagún, C. T., and He, J. (2022). The decomposition-based nonstationary flood frequency analysis. Journal of Hydrology, 612 (September 2022), 128186. doi:10.1016/j.jhydrol.2022.128186
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) data_mw_variability(data, years)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) data_mw_variability(data, years)
Checks for missing entries and generates a list of summary statistics about a dataset.
data_screening(data, years)data_screening(data, years)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
A list with seven entries:
years_min: The minimum value in the 'years' argument.
years_max: The maximum value in the 'years' argument.
data_min: The minimum value in the 'data' argument.
data_med: The median value in the 'data' argument.
data_max: The maximum value in the 'data' argument.
missing_years: An integer vector of years with no data.
missing_count: The number of missing entries in the dataset.
data <- rnorm(n = 10, mean = 100, sd = 10) years <- c(1900, 1902, 1903, 1904, 1905, 1907, 1909, 1911, 1912, 1914) data_screening(data, years)data <- rnorm(n = 10, mean = 100, sd = 10) years <- c(1900, 1902, 1903, 1904, 1905, 1907, 1909, 1911, 1912, 1914) data_screening(data, years)
Performs a bootstrapped version of the Mann-Kendall trend test to adjust for serial correlation in annual maximum series data. The BBMK test uses Spearman’s serial correlation test to identify the least insignificant lag, then applies a shuffling procedure to obtain the empirical p-value and confidence bounds for the Mann-Kendall test statistic.
eda_bbmk_test(data, alpha = 0.05, samples = 10000L)eda_bbmk_test(data, alpha = 0.05, samples = 10000L)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
alpha |
Numeric scalar in |
samples |
Integer scalar. The number of bootstrap samples. Default is 10000. |
The block size for reshuffling is equal to the least_lag as estimated in
eda_spearman_test(). Bootstrap samples are generated by resampling blocks
of the original data without replacement. This procedure effectively removes
serial correlation from the data.
A list containing the test results, including:
data: The data argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
statistic: The Mann-Kendall S-statistic computed on the original series.
bootstrap: Vector of bootstrapped Mann-Kendall test statistics.
p_value: Empirical two-sided p-value derived from the bootstrap distribution.
bounds: Empirical confidence interval bounds from the bootstrap distribution.
reject: If TRUE, the null hypothesis was rejected at significance alpha.
Bayazit, M., 2015. Nonstationarity of hydrological records and recent trends in trend analysis: a state-of-the-art review. Environmental Processes 2 (3), 527–542. doi:10.1007/s40710-015-0081-7
Khaliq, M.N., Ouarda, T.B.M.J., Gachon, P., Sushama, L., St-Hilaire, A., 2009. Identification of hydrological trends in the presence of serial and cross correlations: a review of selected methods and their application to annual flow regimes of Canadian rivers. Journal Hydrolology 368 (1–4), 117–130. doi:10.1016/j.jhydrol.2009.01.035
Sonali, P., Nagesh Kumar, D., 2013. Review of trend detection methods and their application to detect temperature changes in India. Journal Hydrology 476, 212–227. doi:10.1016/j.jhydrol.2012.10.034
plot_bbmk_test(), eda_mk_test(), eda_spearman_test()
data <- rnorm(n = 100, mean = 100, sd = 10) eda_bbmk_test(data, samples = 1000L)data <- rnorm(n = 100, mean = 100, sd = 10) eda_bbmk_test(data, samples = 1000L)
Performs the KPSS unit root test on annual maximum series data. The null hypothesis is that the time series is trend-stationary with a linear deterministic trend and constant drift. The alternative hypothesis is that the time series has a unit root (also known as a stochastic trend).
eda_kpss_test(data, alpha = 0.05)eda_kpss_test(data, alpha = 0.05)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
alpha |
Numeric scalar in |
The implementation of the KPSS test is based on the 'aTSA' package, which
interpolates a significance table from Hobijn et al. (2004). Therefore, a result
of implies that and a result of
implies that .
A list containing the test results, including:
data: The data argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
statistic: The KPSS test statistic.
p_value: The interpolated p-value. See the details for more information.
reject: If TRUE, the null hypothesis was rejected at significance alpha.
Hobijn, B., Franses, P.H. and Ooms, M. (2004), Generalizations of the KPSS-test for stationarity. Statistica Neerlandica, 58: 483-502.
Kwiatkowski, D.; Phillips, P. C. B.; Schmidt, P.; Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54 (1-3): 159-178.
data <- rnorm(n = 100, mean = 100, sd = 10) eda_kpss_test(data)data <- rnorm(n = 100, mean = 100, sd = 10) eda_kpss_test(data)
Performs the Mann–Kendall trend test on a numeric vector to detect the presence of an increasing or decreasing monotonic trend over time. The test is nonparametric and accounts for tied observations in the data. The null hypothesis assumes there is no monotonic trend.
eda_mk_test(data, alpha = 0.05)eda_mk_test(data, alpha = 0.05)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
alpha |
Numeric scalar in |
The test statistic is the sum over all pairs of the sign
of the difference . Ties are explicitly accounted for when
calculating the variance of , using grouped frequencies of tied observations.
The test statistic is then computed based on the sign and magnitude of
, and the p-value is derived from the standard normal distribution.
A list containing the test results, including:
data: The data argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
statistic: The Mann–Kendall test statistic.
variance: The variance of the test statistic under the null hypothesis.
p_value: The p-value associated with the two-sided hypothesis test.
reject: Logical. If TRUE, the null hypothesis is rejected at alpha.
Kendall, M. (1975). Rank Correlation Methods. Griffin, London, 202 pp.
Mann, H. B. (1945). Nonparametric Tests Against Trend. Econometrica, 13(3): 245-25
data <- rnorm(n = 100, mean = 100, sd = 10) eda_mk_test(data)data <- rnorm(n = 100, mean = 100, sd = 10) eda_mk_test(data)
Performs the Mann–Kendall–Sneyers (MKS) test to detect a trend change in annual maximum series data. The test computes normalized progressive and regressive Mann–Kendall statistics and identifies statistically significant crossing points, indicating potential change points. Under the null hypothesis, there are no trend changes.
eda_mks_test(data, years, alpha = 0.05)eda_mks_test(data, years, alpha = 0.05)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
alpha |
Numeric scalar in |
This function computes progressive and regressive Mann–Kendall-Sneyers statistics, normalized by their expected values and variances under the null hypothesis. The crossing points occur when the difference between the progressive and regressive statistics switches sign. The significance of detected crossing points is assessed using the quantiles of the normal distribution.
A list containing the test results, including:
data: The data argument.
years: The years argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
progressive_series: Normalized progressive Mann–Kendall-Sneyers statistics.
regressive_series: Normalized regressive Mann–Kendall-Sneyers statistics.
bound: Critical confidence bound for significance based on alpha.
change_points: A dataframe of potential change points.
p_value: Two-sided p-value of the most significant crossing point.
reject: If TRUE, the null hypothesis was rejected at significance alpha.
change_points contains the years, test statistics, and p-values of each
potential change point. If no change points were identified, change_points
is empty.
Sneyers, R. (1990). On the statistical analysis of series of observations. Technical note No. 143, World Meteorological Organization, Geneva.
plot_mks_test(), eda_pettitt_test()
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_mks_test(data, years)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_mks_test(data, years)
Performs the nonparametric Pettitt test to detect a single abrupt change in the central tendency of a time series. Under the null hypothesis, there is no change.
eda_pettitt_test(data, years, alpha = 0.05)eda_pettitt_test(data, years, alpha = 0.05)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
alpha |
Numeric scalar in |
A list containing the test results, including:
data: The data argument.
years: The years argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
u_series: Numeric vector of absolute U-statistics for all years.
statistic: The test statistic and maximum absolute U-statistic.
bound: The critical value of the test statistic based on alpha.
change_points: A dataframe containing the potential change point.
p_value: An asymptotic approximation of the p-value for the test.
reject: Logical scalar. If TRUE, the null hypothesis was rejected.
change_points contains the years, test statistics, and p-values of each
potential change point. If no change points were identified, change_points
is empty.
Pettitt, A.N., 1979. A Non-parametric Approach to the Change-point Problem. J. Royal Statistics Society 28 (2), 126–135. http://www.jstor.org/stable/2346729
plot_pettitt_test(), eda_mks_test()
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_pettitt_test(data, years)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_pettitt_test(data, years)
Applies the Phillips–Perron (PP) test to check for a unit root in annual maximum series data. The null hypothesis assumes the time series contains a unit root (also known as a stochastic trend). The alternative hypothesis is that the time series is trend-stationary with a deterministic linear trend.
eda_pp_test(data, alpha = 0.05)eda_pp_test(data, alpha = 0.05)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
alpha |
Numeric scalar in |
The implementation of this test is based on the 'aTSA' package, which
interpolates p-values from the table of critical values presented in Fuller W.
A. (1996). The critical values are only available for .
Therefore, a reported p-value of 0.01 indicates that .
A list containing the test results, including:
data: The data argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
statistic: The PP test statistic.
p_value: Reported p-value from the test. See the details for more information.
reject: If TRUE, the null hypothesis was rejected at significance alpha.
Fuller, W. A. (1976). Introduction to Statistical Time Series. New York: John Wiley and Sons
Phillips, P. C. B.; Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 75 (2): 335-346
data <- rnorm(n = 100, mean = 100, sd = 10) eda_pp_test(data)data <- rnorm(n = 100, mean = 100, sd = 10) eda_pp_test(data)
Applies the Wald–Wolfowitz runs test to a numeric vector in order to assess whether it behaves as a random sequence. The test statistic and p-value is computed using the number of runs (sequences of values above or below the median). Under the null hypothesis, the data is random. The runs test can be used to assess whether the residuals of a nonstationary model are random, indicating the suitability of the assumed nonstationary structure (e.g., linear).
eda_runs_test(values, years, alpha = 0.05)eda_runs_test(values, years, alpha = 0.05)
values |
A numeric vector of values to check for randomness. |
years |
Numeric vector of observation years corresponding to |
alpha |
Numeric scalar in |
A list containing the test results, including:
values: The values argument.
years: The years argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
n: The length of the input vector after removing the median.
runs: The number of runs in the transformed sequence of residuals.
statistic: The runs test statistic, computed using runs and n.
p_value: The p-value derived from the normally distributed test statistic.
reject: If TRUE, the null hypothesis was rejected at significance alpha.
Wald, A. and Wolfowitz, J. (1940). On a test whether two samples are from the same population. Annals of Mathematical Statistics, 11(2), 147–162.
plot_runs_test(), eda_sens_trend()
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) sens_trend <- eda_sens_trend(data, years) eda_runs_test(sens_trend$residuals, years)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) sens_trend <- eda_sens_trend(data, years) eda_runs_test(sens_trend$residuals, years)
Computes Sen's linear trend estimator for a univariate time series. The estimated
slope and y-intercept are given in terms of the data and the covariate (time),
which is derived from the years using the formula .
eda_sens_trend(data, years)eda_sens_trend(data, years)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
Sen's slope estimator is a robust, nonparametric trend estimator based on the
median of all pairwise slopes between data points. The corresponding intercept
is the median of each where is the estimated slope.
A list containing the estimated trend:
data: The data argument.
years: The years argument.
slope: The estimated slope.
intercept: The estimated y-intercept.
residuals: Vector of differences between the predicted and observed values.
Sen, P.K. (1968). Estimates of the regression coefficient based on Kendall's tau. Journal of the American Statistical Association, 63(324), 1379–1389.
eda_runs_test(), plot_ams_data()
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_sens_trend(data, years)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_sens_trend(data, years)
Performs the Spearman serial correlation test on annual maximum series data to check for serial correlation at various lags. Reports the smallest lag where the serial correlation is not statistically significant at the given significance level (the least insignificant lag).
eda_spearman_test(data, alpha = 0.05)eda_spearman_test(data, alpha = 0.05)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
alpha |
Numeric scalar in |
A list containing the test results, including:
data: The data argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
rho: Numeric vector of serial correlation estimates for lags to .
least_lag: The smallest lag at which the serial correlation is insignificant.
significant: Indicates whether the serial correlation is significant at each lag.
reject: If TRUE, the null hypothesis was rejected at significance alpha.
stats::cor.test(), eda_bbmk_test(), plot_spearman_test()
data <- rnorm(n = 100, mean = 100, sd = 10) eda_spearman_test(data)data <- rnorm(n = 100, mean = 100, sd = 10) eda_spearman_test(data)
Performs the White test for heteroskedasticity by regressing the squared residuals of a linear model on the original regressors and their squared terms. The null hypothesis is homoskedasticity.
eda_white_test(data, years, alpha = 0.05)eda_white_test(data, years, alpha = 0.05)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
alpha |
Numeric scalar in |
The White test regresses the squared residuals from a primary linear model
lm(data ~ years) against both the original regressor and its square.
The test statistic is calculated as , where is the
coefficient of determination from the auxiliary regression and is
the number of elements in the time series. Under the null hypothesis, the
test statistic has the distribution with 2 degrees of freedom.
A list containing the results of the White test:
data: The data argument.
years: The years argument.
alpha: The significance level as specified in the alpha argument.
null_hypothesis: A string describing the null hypothesis.
alternative_hypothesis: A string describing the alternative hypothesis.
statistic: White test statistic based on sample size and r_squared.
p_value: The p-value derived from a Chi-squared distribution with df = 2.
reject: If TRUE, the null hypothesis was rejected at significance alpha.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838.
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_white_test(data, years)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) eda_white_test(data, years)
Estimates the parameters of the generalized extreme value (GEV) distribution by
maximizing the generalized log‐likelihood, which incorporates a Beta prior on the
shape parameter. Initial parameter estimates are obtained using the method of L‐moments
and optimization is performed via stats::nlminb() with repeated perturbations if
needed.
For NS-FFA: To estimate parameters for a nonstationary model, include the
observation years (ns_years) and the nonstationary model structure (ns_structure).
fit_gmle(data, prior, ns_years = NULL, ns_structure = NULL)fit_gmle(data, prior, ns_years = NULL, ns_structure = NULL)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
prior |
Numeric vector of length 2. Specifies the parameters of the
Beta prior for the shape parameter |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
Calls fit_lmoments() on the data to obtain initial parameter estimates.
Initializes trend parameters to zero if necessary.
Defines an objective function using utils_generalized_likelihood().
Runs stats::nlminb() with box constraints. Attempts minimization up to 100 times.
A list containing the results of parameter estimation:
data: The data argument.
prior: The prior argument.
ns_years: The ns_years argument, if given.
ns_structure: The ns_structure argument, if given.
method: "GMLE".
params: Numeric vector of estimated parameters.
mll: The maximum value of the generalized log‐likelihood.
El Adlouni, S., Ouarda, T.B.M.J., Zhang, X., Roy, R., Bobee, B., 2007. Generalized maximum likelihood estimators for the nonstationary generalized extreme value model. Water Resources Research 43 (3), 1–13. doi:10.1029/2005WR004545
Martins, E. S., and Stedinger, J. R. (2000). Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resources Research, 36(3), 737–744. doi:10.1029/1999WR900330
utils_generalized_likelihood(), fit_lmoments(), stats::nlminb()
data <- rnorm(n = 100, mean = 100, sd = 10) prior <- c(6, 9) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) fit_gmle(data, prior, ns_years, ns_structure)data <- rnorm(n = 100, mean = 100, sd = 10) prior <- c(6, 9) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) fit_gmle(data, prior, ns_years, ns_structure)
For S-FFA only: Estimates the parameters of a stationary probability model using the L-moments.
fit_lmoments(data, distribution)fit_lmoments(data, distribution)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
distribution |
A three-character code indicating the distribution family.
Must be |
First, the sample L-moments of the data are computed using utils_sample_lmoments().
Then, formulas from Hosking (1997) are used to match the parameters to
the sample L-moments. The distributions "GNO", "PE3", and "LP3" use a
rational approximation of the parameters since no closed-form expression is known.
A list containing the results of parameter estimation:
data: The data argument.
distribution: The distribution argument.
method: "L-moments".
params: Numeric vector of estimated parameters.
Hosking, J.R.M. & Wallis, J.R., 1997. Regional frequency analysis: an approach based on L-Moments. Cambridge University Press, New York, USA.
fit_lmoments_kappa(), fit_mle(), fit_gmle()
data <- rnorm(n = 100, mean = 100, sd = 10) fit_lmoments(data, "GUM")data <- rnorm(n = 100, mean = 100, sd = 10) fit_lmoments(data, "GUM")
This function estimates the parameters of the four-parameter Kappa distribution using the method of L-moments. Since no closed-form solution for the parameters in terms of the L-moments is known, the parameters are estimated numerically using Newton-Raphson iteration.
fit_lmoments_kappa(data)fit_lmoments_kappa(data)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
First, the sample L-moments of the data are computed using utils_sample_lmoments().
Then, the stats::optim() function is used to determine the parameters
by minimizing the euclidian distance between the sample and theoretical L-moment
ratios. The implementation of this routine is based on the deprecated 'homtest'
package, formerly available at https://CRAN.R-project.org/package=homtest.
A list containing the results of parameter estimation:
data: The data argument.
distribution: "KAP".
method: "L-moments".
params: numeric vector of 4 parameters in the order location, scale, shape (2).
Hosking, J.R.M. & Wallis, J.R., 1997. Regional frequency analysis: an approach based on L-Moments. Cambridge University Press, New York, USA.
utils_sample_lmoments(), fit_lmoments()
data <- rnorm(n = 100, mean = 100, sd = 10) fit_lmoments_kappa(data)data <- rnorm(n = 100, mean = 100, sd = 10) fit_lmoments_kappa(data)
Estimates the parameters of a probability distribution by maximizing the log‐likelihood.
Initial parameter estimates are obtained using the method of L‐moments and optimization
is performed via stats::nlminb() with repeated perturbations if needed.
For NS-FFA: To estimate parameters for a nonstationary model, include the
observation years (ns_years) and the nonstationary model structure
(ns_structure).
fit_mle(data, distribution, ns_years = NULL, ns_structure = NULL)fit_mle(data, distribution, ns_years = NULL, ns_structure = NULL)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
distribution |
A three-character code indicating the distribution family.
Must be |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
Calls fit_lmoments() on data to obtain initial parameter estimates.
Initializes trend parameters to zero if necessary.
For WEI models, sets the location parameter to zero to ensure support.
Defines an objective function using utils_log_likelihood().
Runs stats::nlminb() with box constraints. Attempts minimization up to 100 times.
A list containing the results of parameter estimation:
data: The data argument.
distribution: The distribution argument.
ns_years: The ns_years argument, if given.
ns_structure: The ns_structure argument, if given.
method: "MLE".
params: Numeric vector of estimated parameters.
mll: The maximum value of the log‐likelihood.
utils_log_likelihood(), fit_lmoments(), stats::nlminb()
data <- rnorm(n = 100, mean = 100, sd = 10) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) fit_mle(data, "GNO", ns_years, ns_structure)data <- rnorm(n = 100, mean = 100, sd = 10) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) fit_mle(data, "GNO", ns_years, ns_structure)
First, this method identifies change points in the original annual maximum series data. Then, the user is given the option to split the dataset into two or more homogenous subperiods (trend-free or with monotonic trends). Finally, this method performs a collection of statistical tests for identifying monotonic nonstationarity in the mean and variability of each subperiod (if the dataset was split) or of the entire dataset (if it was not split). The results of EDA can help guide FFA approach selection (stationary or nonstationary) and FFA model determination.
framework_eda( data, years, ns_splits = NULL, generate_report = TRUE, report_path = NULL, report_formats = "html", ... )framework_eda( data, years, ns_splits = NULL, generate_report = TRUE, report_path = NULL, report_formats = "html", ... )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
ns_splits |
An integer vector of years used to split the data into homogeneous
subperiods. For S-FFA, set to |
generate_report |
If |
report_path |
A character scalar, the file path for the generated report.
If |
report_formats |
A character vector specifying the output format for the
report. Supported values are |
... |
Additional arguments. See the "Optional Arguments" section for a complete list. |
eda_recommendations: A list containing the recommended FFA approach, split
point(s) and nonstationary structure(s) from EDA:
approach: Either "S-FFA", "NS-FFA" (for a single homogeneous period), or
"Piecewise NS-FFA" (for multiple homogeneous subperiods).
ns_splits: The split point(s) identified by the change point detection test with
the the lowest statistically significant p-value, or NULL if no such point exists.
ns_structures: A list of structure objects for each homogeneous subperiod. Each
structure is a list with boolean items location and scale, which represent a
linear trend in the in the mean or variability of the data, respectively. If no
trends were found in any homogeneous subperiod, ns_structures will be NULL.
submodule_results: A list of lists of statistical tests. Each list contains:
name: Either "Change Point Detection" or "Trend Detection".
start: The first year of the homogeneous subperiod.
end: The last year of the homogeneous subperiod.
Additional items from the statistical tests within the submodule.
alpha: The numeric significance level for all statistical tests (default is 0.05).
bbmk_samples: The number of samples used in the Block-Bootstrap Mann-Kendall
(BBMK) test (default is 10000). Must be an integer.
window_size: The size of the window used to compute the variability series.
window_step: The number of years between successive moving windows. Used to
compute the variability series.
eda_pettitt_test(), eda_mks_test(), eda_mk_test(),
eda_spearman_test(), eda_bbmk_test(), eda_pp_test(), eda_kpss_test(),
eda_sens_trend(), eda_runs_test(), eda_white_test()
# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv") # Run EDA (takes several minutes) ## Not run: framework_eda(df$max, df$year)# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv") # Run EDA (takes several minutes) ## Not run: framework_eda(df$max, df$year)
Performs frequency analysis of annual maximum series data including distribution selection, parameter estimation, uncertainty quantification, and model assessment. Supports both stationary (S-FFA) or nonstationary (NS-FFA) flood frequency analysis.
framework_ffa( data, years, ns_splits = NULL, ns_structures = NULL, generate_report = TRUE, report_path = NULL, report_formats = "html", ... )framework_ffa( data, years, ns_splits = NULL, ns_structures = NULL, generate_report = TRUE, report_path = NULL, report_formats = "html", ... )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
ns_splits |
An integer vector of years used to split the data into homogeneous
subperiods. For S-FFA, set to |
ns_structures |
For S-FFA, set to |
generate_report |
If |
report_path |
A character scalar, the file path for the generated report.
If |
report_formats |
A character vector specifying the output format for the
report. Supported values are |
... |
Additional arguments. See the "Optional Arguments" section for a complete list. |
modelling_assumptions: A list describing the model(s) used for the analysis.
approach: Either "S-FFA", "NS-FFA", or "Piecewise NS-FFA".
ns_splits: The ns_splits argument, if given.
ns_structures: The ns_structures argument, if given.
submodule_results: A list of lists of containing the results of frequency analysis.
Each list contains:
name: Either "Distribution Selection", "Parameter Estimation", "Uncertainty
Quantification", or "Model Assessment".
start: The first year of the homogeneous subperiod.
end: The last year of the homogeneous subperiod.
Additional items specific to the the submodule.
selection: Distribution selection method (default is "L-distance"). Must be
one of "L-distance", "L-kurtosis" or "Z-statistic". Alternatively, set
selection to a three-letter distribution code (e.g., "GUM") to use a specific
distribution.
s_estimation: Parameter estimation method for S-FFA (default is "L-moments").
Must be "L-moments", "MLE", or "GMLE". Method "GMLE" requires
selection = "GEV".
ns_estimation: Parameter estimation method for NS-FFA (default is "MLE").
Must be "MLE" or "GMLE". Method "GMLE" requires selection = "GEV".
s_uncertainty: Uncertainty quantification method for S-FFA (default is
"Bootstrap"). Must be one of "Bootstrap", "RFPL", or RFGPL". Using
method "RFPL" requires s_estimation = "MLE" and method "RFGPL" requires
s_estimation = "GMLE".
ns_uncertainty: Uncertainty quantification method for NS-FFA (default is
"RFPL"). Must be one of "Bootstrap", "RFPL", or RFGPL". Using method
"RFPL" requires ns_estimation = "MLE" and method "RFGPL" requires
ns_estimation = "GMLE".
z_samples: Integer number of bootstrap samples for selection method
"Z-statistic" (default is 10000).
gev_prior: Parameters for the prior distribution of the shape parameter of the
GEV distribution (default is 6, 9). Used with estimation method "GMLE".
return_periods: Integer list of return periods (in years) for estimating return
levels. Uses the 2, 5, 10, 20, 50, and 100 year return periods by default.
ns_slices: Integer vector of years at which to estimate the return levels for
nonstationary models. Slices outside of the period will be ignored (default is
1925, 1975, 2025).
bootstrap_samples: Integer number of samples for uncertainty quantification
method '"Bootstrap" (default is 10000).
rfpl_tolerance: Log-likelihood tolerance for uncertainty quantification method
"RFPL"(default is 0.01).
pp_formula: Plotting position formula for model assessment. Must be one of:
"Weibull" (default):
"Blom":
"Cunnane":
"Gringorten":
"Hazen":
select_ldistance(), select_lkurtosis(), select_zstatistic(),
fit_lmoments(), fit_mle(), fit_gmle(), uncertainty_bootstrap(),
uncertainty_rfpl(), uncertainty_rfgpl(), model_assessment()
# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv") # Run the FFA module (takes several minutes) ## Not run: framework_ffa(df$max, df$year)# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv") # Run the FFA module (takes several minutes) ## Not run: framework_ffa(df$max, df$year)
Runs the entire flood frequency analysis framework using the results of exploratory data analysis (EDA) to guide approach selection (stationary or nonstationary) and perform flood frequency analysis. Returns a comprehensive and reproducible summary of the results.
framework_full( data, years, ns_splits = NULL, ns_structures = NULL, generate_report = TRUE, report_path = NULL, report_formats = "html", ... )framework_full( data, years, ns_splits = NULL, ns_structures = NULL, generate_report = TRUE, report_path = NULL, report_formats = "html", ... )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
ns_splits |
An integer vector of years used to split the data into homogeneous
subperiods. For S-FFA, set to |
ns_structures |
For S-FFA, set to |
generate_report |
If |
report_path |
A character scalar, the file path for the generated report.
If |
report_formats |
A character vector specifying the output format for the
report. Supported values are |
... |
Additional arguments to be passed to the statistical tests and frequency
analysis functions. See the details of |
eda_recommendations: See framework_eda().
modelling_assumptions: See framework_ffa().
submodule_results: A list of lists of results. Each list contains:
name: Either "Change Point Detection", "Trend Detection", "Distribution
Selection", "Parameter Estimation", "Uncertainty Quantification", or "Model
Assessment".
start: The first year of the homogeneous subperiod.
end: The last year of the homogeneous subperiod.
Additional items specific to the the submodule.
framework_eda(), framework_ffa()
# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv") # Run the complete FFA framework (takes several minutes) ## Not run: framework_full(df$max, df$year)# Get data for the BOW RIVER AT BANFF (05BB001) df <- data_local("CAN-05BB001.csv") # Run the complete FFA framework (takes several minutes) ## Not run: framework_full(df$max, df$year)
Computes various metrics for assessing the quality of a fitted flood frequency model. Metrics include accuracy (residual statistics), fitting efficiency (information criteria), and uncertainty (coverage based metrics using confidence intervals).
For NS-FFA: The metrics are computed from the normalized empirical/theoretical quantiles, which are determined based on the selected distribution family. Therefore, metrics from stationary and nonstationary models should not be compared.
model_assessment( data, distribution, method, prior = NULL, ns_years = NULL, ns_structure = NULL, alpha = 0.05, pp_formula = "Weibull", ci = NULL )model_assessment( data, distribution, method, prior = NULL, ns_years = NULL, ns_structure = NULL, alpha = 0.05, pp_formula = "Weibull", ci = NULL )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
distribution |
A three-character code indicating the distribution family.
Must be |
method |
Character scalar specifying the estimation method.
Must be |
prior |
Numeric vector of length 2. Specifies the parameters of the
Beta prior for the shape parameter |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
alpha |
Numeric scalar in |
pp_formula |
Character string specifying the plotting position formula.
Must |
ci |
For S-FFA only. Dataframe containing return periods (in the column
|
These metrics are are computed for all models:
AIC_MLL: Akaike Information Criterion, computed using the maximum log-likelihood.
BIC_MLL: Bayesian Information Criterion, computed using the maximum log-likelihood.
R2: Coefficient of determination from linear regression of estimates vs. data.
RMSE: Root mean squared error of quantile estimates.
bias: Mean bias of quantile estimates.
AIC_RMSE: Akaike Information Criterion, computed using the RMSE.
BIC_RMSE: Bayesian Information Criterion, computed using the RMSE.
These metrics are only computed if the ci argument is provided:
AW: Average width of the confidence interval(s).
POC: Percent of observations covered by the confidence interval(s).
CWI: Confidence width index, a metric that combines AW and POC.
List containing the results of model assessment:
data: The data argument.
q_theoretical: The theoretical return level estimates based on the plotting
positions. Normalized for nonstationary models.
q_empirical: The empirical return levels. Normalized for nonstationary models.
metrics: A list of model assessment metrics (see details).
uncertainty_bootstrap(), uncertainty_rfpl(), uncertainty_rfgpl(),
plot_sffa_assessment(), plot_nsffa_assessment()
# Initialize example data and params data <- rnorm(n = 100, mean = 100, sd = 10) params <- c(100, 10) # Perform uncertainty analysis ci <- uncertainty_bootstrap(data, "NOR", "L-moments")$ci # Run model assessment model_assessment(data, "NOR", "L-moments", ci = ci)# Initialize example data and params data <- rnorm(n = 100, mean = 100, sd = 10) params <- c(100, 10) # Perform uncertainty analysis ci <- uncertainty_bootstrap(data, "NOR", "L-moments")$ci # Run model assessment model_assessment(data, "NOR", "L-moments", ci = ci)
Produces a scatterplot of annual maximum series data against time, optionally overlaid with the sample mean/variability or Sen's trend estimator of the mean/variability.
plot_ams_data( data, years, plot_mean = "None", plot_variability = "None", show_line = TRUE, ... )plot_ams_data( data, years, plot_mean = "None", plot_variability = "None", show_line = TRUE, ... )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
years |
Numeric vector of observation years corresponding to |
plot_mean |
If "None" (default), the mean will not be plotted. If |
plot_variability |
If "None" (default), the variability will not be plotted.
If |
show_line |
If |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot containing:
Gray points for each year’s annual maximum series value.
A gray line connecting the data if show_line = TRUE.
A solid black line representing a constant mean, if plot_mean == "Constant".
A solid blue line representing a trend in the mean, if plot_mean == "Trend".
A dashed black line representing constant variability, if plot_variability == "Constant".
A dashed blue line representing a trend in variability, if plot_variability == "Trend".
eda_sens_trend(), data_mw_variability()
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) plot_ams_data(data, years, plot_mean = "Trend", plot_variability = "Constant")data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) plot_ams_data(data, years, plot_mean = "Trend", plot_variability = "Constant")
Generates a histogram of bootstrapped Mann–Kendall S‐statistics with vertical lines indicating the observed S‐statistic and confidence bounds.
plot_bbmk_test(results, ...)plot_bbmk_test(results, ...)
results |
List of BB‐MK test results generated by |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; A plot containing:
A gray histogram of the distribution of bootstrapped S‐statistics.
A red vertical line at the lower and upper confidence bounds.
A black vertical line at the observed S‐statistic.
data <- rnorm(n = 100, mean = 100, sd = 10) results <- eda_bbmk_test(data, samples = 1000L) plot_bbmk_test(results)data <- rnorm(n = 100, mean = 100, sd = 10) results <- eda_bbmk_test(data, samples = 1000L) plot_bbmk_test(results)
Generates a plot of L-moment ratios with the L-skewness on the x-axis and L-kurtosis on the y-axis. Plots the sample and log-sample L-moment ratios alongside the theoretical L-moment ratios for a set of candidate distributions. Also includes a small inset around the L-moment ratios of the recommended distribution.
plot_lmom_diagram(results, ...)plot_lmom_diagram(results, ...)
results |
List of distribution selection results generated by |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; plot object containing the L-moment ratio diagram, with:
L-moment ratio curves for each 3-parameter distribution.
Points for the L-moment ratios of each 2-parameter distribution.
Sample and log-sample L-moment ratio points.
select_ldistance(), select_lkurtosis(), select_zstatistic()
data <- rnorm(n = 100, mean = 100, sd = 10) results <- select_ldistance(data) plot_lmom_diagram(results)data <- rnorm(n = 100, mean = 100, sd = 10) results <- select_ldistance(data) plot_lmom_diagram(results)
Constructs a two‐panel visualization of the MKS test. The upper panel plots the normalized progressive and regressive Mann–Kendall S‐statistics over time, with dashed confidence bounds and potential trend‐change points. The lower panel contains the annual maximum series data with the change points highlighted.
plot_mks_test(results, show_line = TRUE, ...)plot_mks_test(results, show_line = TRUE, ...)
results |
A list generated by |
show_line |
If |
... |
Optional named arguments: 'title', 'top_xlabel', 'top_ylabel', 'bottom_xlabel' and 'bottom_ylabel'. |
A 'patchwork' object with two 'ggplot2' panels stacked vertically.
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- eda_mks_test(data, years) plot_mks_test(results)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- eda_mks_test(data, years) plot_mks_test(results)
Creates a normalized and detrended quantile–quantile plot (a worm plot) comparing empirical annual maximum series data to quantile estimates from a fitted, parametric, and nonstationary model. Confidence intervals are also provided. The worm plot can be interpreted using the following heuristics:
For a satisfactory fit, the worm (red data points) should be within the 95% confidence interval (dashed black lines).
If the worm (red points) is passes above/below the origin, the model mean is too small/large respectively.
If the worm has a clear positive/negative slope, the model variance is too small/large respectively.
If the worm has a U-shape or inverted U-shape, the model is too skew to the left/right respectively.
plot_nsffa_assessment(results, ...)plot_nsffa_assessment(results, ...)
results |
List; model assessment results generated by |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot containing:
A black line representing a model with no deviation from the empirical quantiles.
Red points denoting the estimated quantiles against the empirical quantiles.
A dashed black line showing the 95% confidence intervals of the residuals.
van Buuren, S., Fredriks, M. Worm plot: a simple diagnostic device for modelling growth reference curves. Statistics in Medicine 20 (8), 1259-1277. doi:10.1002/sim.746
# Initialize example data and params data <- rnorm(n = 100, mean = 100, sd = 10) + seq(from = 1, to = 100) years <- seq(from = 1901, to = 2000) structure <- list(location = TRUE, scale = FALSE) # Evaluate model diagnostics results <- model_assessment(data, "NOR", "MLE", NULL, years, structure) # Generate a model assessment plot plot_nsffa_assessment(results)# Initialize example data and params data <- rnorm(n = 100, mean = 100, sd = 10) + seq(from = 1, to = 100) years <- seq(from = 1901, to = 2000) structure <- list(location = TRUE, scale = FALSE) # Evaluate model diagnostics results <- model_assessment(data, "NOR", "MLE", NULL, years, structure) # Generate a model assessment plot plot_nsffa_assessment(results)
Generates a plot with effective return periods on the x-axis and effective return levels (annual maxima magnitudes) on the y-axis. Each slice is displayed in a distinct color. Confidence bounds are shown as semi-transparent ribbons, and the point estimates are overlaid as solid lines. Return periods have a logarithmic scale.
plot_nsffa_estimates( results, slices = c(1900, 1940, 1980, 2020), periods = c(2, 5, 10, 20, 50, 100), ... )plot_nsffa_estimates( results, slices = c(1900, 1940, 1980, 2020), periods = c(2, 5, 10, 20, 50, 100), ... )
results |
A fitted flood frequency model generated by |
slices |
Default time slices for plotting the return levels if confidence intervals are not provided. |
periods |
Numeric vector used to set the return periods for FFA. All entries must be greater than or equal to 1. |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot with one line and ribbon per slice.
# Fit a nonstationary model data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) results <- fit_mle( data, "GEV", ns_years = years, ns_structure = ns_structure ) # Generate the plot plot_nsffa_estimates(results)# Fit a nonstationary model data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) results <- fit_mle( data, "GEV", ns_years = years, ns_structure = ns_structure ) # Generate the plot plot_nsffa_estimates(results)
Generates a plot showing probability densities of a nonstationary model for selected time slices (left panel) and the data (right panel).
plot_nsffa_fit( results, slices = c(1925, 1950, 1975, 2000), show_line = TRUE, ... )plot_nsffa_fit( results, slices = c(1925, 1950, 1975, 2000), show_line = TRUE, ... )
results |
A fitted flood frequency model generated by |
slices |
Years at which to plot the nonstationary probability model. |
show_line |
If |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot showing:
The likelihood function of the distribution plotted vertically on the left panel.
The data, connected with a line if show_line == TRUE, on the right panel.
data <- rnorm(n = 100, mean = 100, sd = 10) + seq(1, 100) years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) results <- fit_mle( data, "GEV", ns_years = years, ns_structure = ns_structure ) plot_nsffa_fit(results)data <- rnorm(n = 100, mean = 100, sd = 10) + seq(1, 100) years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) results <- fit_mle( data, "GEV", ns_years = years, ns_structure = ns_structure ) plot_nsffa_fit(results)
Creates a two‐panel visualization of the Mann–Whitney–Pettitt test. The
upper panel plots the Pettitt statistic over time along with the
significance threshold and potential change point. The lower panel displays
the annual maximum series data with an optional trend line, the period
mean(s), and potential change point(s).
plot_pettitt_test(results, show_line = TRUE, ...)plot_pettitt_test(results, show_line = TRUE, ...)
results |
A list generated by |
show_line |
If |
... |
Optional named arguments: 'title', 'top_xlabel', 'top_ylabel', 'bottom_xlabel' and 'bottom_ylabel'. |
A 'patchwork' object with two 'ggplot2' panels stacked vertically.
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- eda_pettitt_test(data, years) plot_pettitt_test(results)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- eda_pettitt_test(data, years) plot_pettitt_test(results)
Generates a residual plot of Sen's estimator applied to annual maximum series data (or the variability of the data) with a horizontal dashed line at zero and an annotation indicating the p-value of the Runs test.
plot_runs_test(results, ...)plot_runs_test(results, ...)
results |
A list of runs test results generated by |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot containing:
Black points for the residual at each year.
A red dashed horizontal line at .
A text annotation “Runs p-value: X.XXX” in the plot area.
# Initialize data and years data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) # Generate the runs test plot sens_trend <- eda_sens_trend(data, years) results <- eda_runs_test(sens_trend$residuals, years) plot_runs_test(results)# Initialize data and years data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) # Generate the runs test plot sens_trend <- eda_sens_trend(data, years) results <- eda_runs_test(sens_trend$residuals, years) plot_runs_test(results)
Creates a quantile–quantile plot comparing observed annual maximum series data to quantile estimates from a fitted parametric model. The 1:1 line is drawn in black and the parametric model estimates are plotted as semi‐transparent red points.
plot_sffa_assessment(results, ...)plot_sffa_assessment(results, ...)
results |
List; model assessment results generated by |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot containing:
A black line representing a model with no deviation from the empirical quantiles.
Red points denoting the estimated quantiles against the empirical quantiles.
# Initialize example data data <- rnorm(n = 100, mean = 100, sd = 10) # Evaluate model diagnostics results <- model_assessment(data, "NOR", "L-moments") # Generate a model assessment plot plot_sffa_assessment(results)# Initialize example data data <- rnorm(n = 100, mean = 100, sd = 10) # Evaluate model diagnostics results <- model_assessment(data, "NOR", "L-moments") # Generate a model assessment plot plot_sffa_assessment(results)
Generates a plot with return periods on the x-axis and return levels (annual maxima magnitudes) on the y-axis for S-FFA. The confidence bound is shown as a semi-transparent ribbon, and the point estimates are overlaid as a solid line. Return periods are shown on a logarithmic scale.
plot_sffa_estimates(results, periods = c(2, 5, 10, 20, 50, 100), ...)plot_sffa_estimates(results, periods = c(2, 5, 10, 20, 50, 100), ...)
results |
A fitted flood frequency model generated by |
periods |
Numeric vector used to set the return periods for FFA. All entries must be greater than or equal to 1. |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot showing:
A solid black line for the point estimates produced by the model.
A semi-transparent gray ribbon indicating the confidence interval, if given.
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- fit_lmoments(data, "WEI") plot_sffa_estimates(results)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- fit_lmoments(data, "WEI") plot_sffa_estimates(results)
Generates a plot showing the probability density of a stationary model (left panel) and the data (right panel).
plot_sffa_fit(results, show_line = TRUE, ...)plot_sffa_fit(results, show_line = TRUE, ...)
results |
A fitted flood frequency model generated by |
show_line |
If |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot showing:
The likelihood function of the distribution plotted vertically on the left panel.
The data, connected with a line if show_line == TRUE, on the right panel.
data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- fit_lmoments(data, "WEI") plot_sffa_fit(results)data <- rnorm(n = 100, mean = 100, sd = 10) years <- seq(from = 1901, to = 2000) results <- fit_lmoments(data, "WEI") plot_sffa_fit(results)
Visualizes Spearman’s rho serial correlation coefficients with shaded points indicating statistical significance.
plot_spearman_test(results, ...)plot_spearman_test(results, ...)
results |
A list generated by |
... |
Optional named arguments: 'title', 'xlabel', and 'ylabel'. |
ggplot; a plot showing:
Vertical segments from up to each value at its lag.
Filled circles at each lag, filled black if serial correlation is detected.
data <- rnorm(n = 100, mean = 100, sd = 10) results <- eda_spearman_test(data) plot_spearman_test(results)data <- rnorm(n = 100, mean = 100, sd = 10) results <- eda_spearman_test(data) plot_spearman_test(results)
Selects a distribution from a set of candidate distributions by minimizing the
Euclidean distance between the theoretical L-moment ratios
and the sample L-moment ratios .
For NS-FFA: To select a distribution for a nonstationary model, include the
observation years (ns_years) and the nonstationary model structure
(ns_structure). Then, this method will detrend the original, nonstationary data
internally using the data_decomposition() function prior to distribution selection.
select_ldistance(data, ns_years = NULL, ns_structure = NULL)select_ldistance(data, ns_years = NULL, ns_structure = NULL)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
For each candidate distribution, this method computes the Euclidean distance between
sample L-moment ratios (, ) and the closest point on the
theoretical distribution's L-moment curve. For two-parameter distributions (Gumbel,
Normal, Log-Normal), the theoretical L-moment ratios are compared directly with
the sample L-moment ratios. The distribution with the minimum distance is selected.
If a distribution is fit to log-transformed data (Log-Normal or Log-Pearson Type
III), the L-moment ratios for the log-transformed sample are used instead.
A list with the results of distribution selection:
method: "L-distance".
decomposed_data: The detrended dataset used to compute the L-moments. For S-FFA,
this is the data argument. For NS-FFA, it is output of data_decomposition().
metrics: A list of L-distance metrics for each candidate distribution.
recommendation: The name of the distribution with the smallest L-distance.
Hosking, J.R.M. & Wallis, J.R., 1997. Regional frequency analysis: an approach based on L-Moments. Cambridge University Press, New York, USA.
utils_sample_lmoments(), select_lkurtosis(), select_zstatistic(),
plot_lmom_diagram()
data <- rnorm(n = 100, mean = 100, sd = 10) select_ldistance(data)data <- rnorm(n = 100, mean = 100, sd = 10) select_ldistance(data)
Selects a probability distribution by minimizing the absolute distance
between the theoretical L-kurtosis () and the sample L-kurtosis
(). Only supports 3-parameter distributions.
For NS-FFA: To select a distribution for a nonstationary model, include the
observation years (ns_years) and the nonstationary model structure
(ns_structure). Then, this method will detrend the original, nonstationary data
internally using the data_decomposition() function prior to distribution selection.
select_lkurtosis(data, ns_years = NULL, ns_structure = NULL)select_lkurtosis(data, ns_years = NULL, ns_structure = NULL)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
This method computes the distance between the sample and theoretical L-kurtosis
values at a fixed L-skewness. For three parameter distributions, the shape parameter
that best replicates the sample L-skewness is determined using stats::optim().
A list with the results of distribution selection:
method: "L-kurtosis".
decomposed_data: The detrended dataset used to compute the L-moments. For S-FFA,
this is the data argument. For NS-FFA, it is output of data_decomposition().
metrics: A list of L-kurtosis metrics for each distribution.
recommendation: Name of the distribution with the smallest L-kurtosis metric.
Hosking, J.R.M. & Wallis, J.R., 1997. Regional frequency analysis: an approach based on L-Moments. Cambridge University Press, New York, USA.
utils_sample_lmoments(), select_ldistance(), select_zstatistic(),
plot_lmom_diagram()
data <- rnorm(n = 100, mean = 100, sd = 10) select_lkurtosis(data)data <- rnorm(n = 100, mean = 100, sd = 10) select_lkurtosis(data)
Selects the best-fit distribution by comparing a bias-corrected Z-statistic for
the sample L-kurtosis () against the theoretical L-moments for a set
of candidate distributions. The distribution with the smallest absolute Z-statistic
is selected.
For NS-FFA: To select a distribution for a nonstationary model, include the
observation years (ns_years) and the nonstationary model structure
(ns_structure). Then, this method will detrend the original, nonstationary data
internally using the data_decomposition() function prior to distribution selection.
select_zstatistic(data, ns_years = NULL, ns_structure = NULL, samples = 10000L)select_zstatistic(data, ns_years = NULL, ns_structure = NULL, samples = 10000L)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
samples |
Integer scalar. The number of bootstrap samples. Default is 10000. |
First, this method fits a four-parameter Kappa distribution to both the original and log-transformed data. Then, bootstrapping is used to estimate the bias and variance of the L-kurtosis. These values, along with the difference between the sample and theoretical L-kurtosis, are used to compute the Z-statistic for each distribution.
A list with the results of distribution selection:
method: "Z-selection".
decomposed_data: The detrended dataset used to compute the L-moments. For S-FFA,
this is the data argument. For NS-FFA, it is output of data_decomposition().
metrics: List of computed Z-statistics for each candidate distribution.
recommendation: Name of the distribution with the smallest Z-statistic.
reg_params: Kappa distribution parameters for the data.
reg_bias_t4: Bias of the L-kurtosis from the bootstrap.
reg_std_t4: Standard deviation of the L-kurtosis from the bootstrap.
log_params: Kappa distribution parameters for the log-transformed data.
log_bias_t4: Bias of the L-kurtosis from the bootstrap using log_params.
log_std_t4: Standard deviation of the L-kurtosis from the bootstrap using log_params.
Hosking, J.R.M. & Wallis, J.R., 1997. Regional frequency analysis: an approach based on L-Moments. Cambridge University Press, New York, USA.
select_ldistance(), select_lkurtosis(), fit_lmoments_kappa(),
utils_quantiles(), plot_lmom_diagram()
data <- rnorm(n = 100, mean = 100, sd = 10) select_zstatistic(data)data <- rnorm(n = 100, mean = 100, sd = 10) select_zstatistic(data)
Computes return level estimates and confidence intervals at the specified return periods (defaults to 2, 5, 10, 20, 50, and 100 years) using the parametric bootstrap. This function supports many probability models and parameter estimation methods.
For NS-FFA: To perform uncertainty quantification for a nonstationary model,
include the observation years (ns_years), the nonstationary model structure
(ns_structure), and a list of years at which to compute the return level estimates
and confidence intervals (ns_slices).
uncertainty_bootstrap( data, distribution, method, prior = NULL, ns_years = NULL, ns_structure = NULL, ns_slices = NULL, alpha = 0.05, samples = 10000L, periods = c(2, 5, 10, 20, 50, 100) )uncertainty_bootstrap( data, distribution, method, prior = NULL, ns_years = NULL, ns_structure = NULL, ns_slices = NULL, alpha = 0.05, samples = 10000L, periods = c(2, 5, 10, 20, 50, 100) )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
distribution |
A three-character code indicating the distribution family.
Must be |
method |
Character scalar specifying the estimation method.
Must be |
prior |
Numeric vector of length 2. Specifies the parameters of the
Beta prior for the shape parameter |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
ns_slices |
For NS-FFA only: Numeric vector specifying the years at which to
evaluate the return levels confidence intervals of a nonstationary probability
distribution. |
alpha |
Numeric scalar in |
samples |
Integer scalar. The number of bootstrap samples. Default is 10000. |
periods |
Numeric vector used to set the return periods for FFA. All entries must be greater than or equal to 1. |
Bootstrap samples are obtained from the fitted distribution via inverse transform
sampling. For each bootstrapped sample, the parameters are re-estimated based on the
method argument. Then, the bootstrapped parameters are used to compute a new set of
bootstrapped quantiles. Confidence intervals are obtained from the empirical
nonexceedance probabilities of the bootstrapped quantiles.
A list containing the following six items:
method: "Bootstrap"
distribution: The distribution argument.
params: The fitted parameters.
ns_structure: The ns_structure argument, if given.
ns_slices: The ns_slices argument, if given.
ci: A dataframe containing confidence intervals (S-FFA only)
ci_list: A list of dataframes containing confidence intervals (NS-FFA only).
The dataframe(s) in ci and ci_list have four columns:
estimates: Estimated quantiles for each return period.
lower: Lower bound of the confidence interval for each return period.
upper: Upper bound of the confidence interval for each return period.
periods: The periods argument.
The parametric bootstrap is known to give unreasonably wide confidence intervals
for small datasets. If this method yields a confidence interval that is at least
5 times greater than the magnitude of the return levels, it will return an error
and recommend uncertainty_rfpl() or uncertainty_rfgpl() as alternatives.
Vidrio-Sahagún, C.T., He, J. Enhanced profile likelihood method for the nonstationary hydrological frequency analysis, Advances in Water Resources 161, 10451 (2022). doi:10.1016/j.advwatres.2022.104151
fit_lmoments(), fit_mle(), fit_gmle(), utils_sample_lmoments()
utils_quantiles(), plot_sffa_estimates(), plot_nsffa_estimates()
data <- rnorm(n = 100, mean = 100, sd = 10) uncertainty_bootstrap(data, "WEI", "L-moments")data <- rnorm(n = 100, mean = 100, sd = 10) uncertainty_bootstrap(data, "WEI", "L-moments")
Calculates return level estimates and confidence intervals at specified return periods (defaults to 2, 5, 10, 20, 50, and 100 years) using the regula-falsi generalized profile likelihood root‐finding method for the GEV distribution.
For NS-FFA: To perform uncertainty quantification for a nonstationary model,
include the observation years (ns_years), the nonstationary model structure
(ns_structure), and a list of years at which to compute the return level estimates
and confidence intervals (ns_slices).
uncertainty_rfgpl( data, prior, ns_years = NULL, ns_structure = NULL, ns_slices = NULL, alpha = 0.05, periods = c(2, 5, 10, 20, 50, 100), tolerance = 0.01 )uncertainty_rfgpl( data, prior, ns_years = NULL, ns_structure = NULL, ns_slices = NULL, alpha = 0.05, periods = c(2, 5, 10, 20, 50, 100), tolerance = 0.01 )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
prior |
Numeric vector of length 2. Specifies the parameters of the
Beta prior for the shape parameter |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
ns_slices |
For NS-FFA only: Numeric vector specifying the years at which to
evaluate the return levels confidence intervals of a nonstationary probability
distribution. |
alpha |
Numeric scalar in |
periods |
Numeric vector used to set the return periods for FFA. All entries must be greater than or equal to 1. |
tolerance |
The log-likelihood tolerance for Regula-Falsi convergence (default is 0.01). |
Uses fit_gmle() to obtain the maximum generalized log‐likelihood.
Defines an objective function by reparameterizing the
generalized log-likelihood.
Iteratively brackets the root by rescaling initial guesses by 0.05 until
changes sign.
Uses the regula-falsi method to solve for each
return period probability.
Returns lower and upper confidence bounds at significance level alpha.
A list containing the following six items:
method: "RFGPL"
distribution: "GEV"
params: The fitted parameters.
ns_structure: The ns_structure argument, if given.
ns_slices: The ns_slices argument, if given.
ci: A dataframe containing confidence intervals (S-FFA only)
ci_list: A list of dataframes containing confidence intervals (NS-FFA only).
The dataframe(s) in ci and ci_list have four columns:
estimates: Estimated quantiles for each return period.
lower: Lower bound of the confidence interval for each return period.
upper: Upper bound of the confidence interval for each return period.
periods: The periods argument.
RFGPL uncertainty quantification can be numerically unstable for some datasets.
If this function encounters an issue, it will return an error and recommend
uncertainty_bootstrap() instead.
Vidrio-Sahagún, C.T., He, J. Enhanced profile likelihood method for the nonstationary hydrological frequency analysis, Advances in Water Resources 161, 10451 (2022). doi:10.1016/j.advwatres.2022.104151
Vidrio-Sahagún, C.T., He, J. & Pietroniro, A. Multi-distribution regula-falsi profile likelihood method for nonstationary hydrological frequency analysis. Stochastic Environmental Research and Risk Assessment 38, 843–867 (2024). doi:10.1007/s00477-023-02603-0
utils_quantiles(), uncertainty_bootstrap(), uncertainty_rfpl(),
plot_sffa_estimates(), plot_nsffa_estimates()
data <- rnorm(n = 100, mean = 100, sd = 10) uncertainty_rfgpl(data, c(6, 9))data <- rnorm(n = 100, mean = 100, sd = 10) uncertainty_rfgpl(data, c(6, 9))
Calculates return level estimates and confidence intervals at specified return periods (defaults to 2, 5, 10, 20, 50, and 100 years) using the regula-falsi profile likelihood root‐finding method.
For NS-FFA: To perform uncertainty quantification for a nonstationary model,
include the observation years (ns_years), the nonstationary model structure
(ns_structure), and a list of years at which to compute the return level estimates
and confidence intervals (ns_slices).
uncertainty_rfpl( data, distribution, ns_years = NULL, ns_structure = NULL, ns_slices = NULL, alpha = 0.05, periods = c(2, 5, 10, 20, 50, 100), tolerance = 0.01 )uncertainty_rfpl( data, distribution, ns_years = NULL, ns_structure = NULL, ns_slices = NULL, alpha = 0.05, periods = c(2, 5, 10, 20, 50, 100), tolerance = 0.01 )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
distribution |
A three-character code indicating the distribution family.
Must be |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
ns_slices |
For NS-FFA only: Numeric vector specifying the years at which to
evaluate the return levels confidence intervals of a nonstationary probability
distribution. |
alpha |
Numeric scalar in |
periods |
Numeric vector used to set the return periods for FFA. All entries must be greater than or equal to 1. |
tolerance |
The log-likelihood tolerance for Regula-Falsi convergence (default is 0.01). |
Uses fit_mle() to obtain the maximum log‐likelihood.
Defines an objective function by reparameterizing the
log-likelihood.
Iteratively brackets the root by rescaling initial guesses by 0.05 until
changes sign.
Uses the regula-falsi method to solve for each
return period probability.
Returns lower and upper confidence bounds at significance level alpha.
A list containing the following four items:
method: "RFPL"
distribution: The distribution argument.
params: The fitted parameters.
ns_structure: The ns_structure argument, if given.
ns_slices: The ns_slices argument, if given.
ci: A dataframe containing confidence intervals (S-FFA only)
ci_list: A list of dataframes containing confidence intervals (NS-FFA only).
The dataframe(s) in ci and ci_list have four columns:
estimates: Estimated quantiles for each return period.
lower: Lower bound of the confidence interval for each return period.
upper: Upper bound of the confidence interval for each return period.
periods: The periods argument.
RFPL uncertainty quantification can be numerically unstable for some datasets.
If this function encounters an issue, it will return an error and recommend
using the parametric bootstrap method uncertainty_bootstrap() instead.
Vidrio-Sahagún, C.T., He, J. Enhanced profile likelihood method for the nonstationary hydrological frequency analysis, Advances in Water Resources 161, 10451 (2022). doi:10.1016/j.advwatres.2022.104151
Vidrio-Sahagún, C.T., He, J. & Pietroniro, A. Multi-distribution regula-falsi profile likelihood method for nonstationary hydrological frequency analysis. Stochastic Environmental Research and Risk Assessment 38, 843–867 (2024). doi:10.1007/s00477-023-02603-0
utils_quantiles(), uncertainty_bootstrap(), uncertainty_rfgpl(),
plot_sffa_estimates(), plot_nsffa_estimates()
data <- rnorm(n = 100, mean = 100, sd = 10) uncertainty_rfpl(data, "GLO")data <- rnorm(n = 100, mean = 100, sd = 10) uncertainty_rfpl(data, "GLO")
Compute probabilities from quantiles for both stationary and nonstationary models.
For NS-FFA: To compute the probabilities for a nonstationary model, specify a
time slice (ns_slice) and the nonstationary model structure (ns_structure).
utils_cdf(q, distribution, params, ns_slice = 0, ns_structure = NULL)utils_cdf(q, distribution, params, ns_slice = 0, ns_structure = NULL)
q |
Numeric vector of quantiles with no missing values. |
distribution |
A three-character code indicating the distribution family.
Must be |
params |
Numeric vector of distribution parameters, in the order (location,
scale, shape). The length must be between 2 and 5, depending on the specified
|
ns_slice |
For NS-FFA only: Numeric scalar specifying the year at which to
evaluate the quantiles of a nonstationary probability distribution. |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
A numeric vector of quantiles with the same length as q.
q <- seq(1, 10) params <- c(1, 1, 1) utils_cdf(q, "GEV", params)q <- seq(1, 10) params <- c(1, 1, 1) utils_cdf(q, "GEV", params)
Computes the generalized log-likelihood for stationary and nonstationary variants of the Generalized Extreme Value (GEV) distribution with a geophysical (Beta) prior distribution for the shape parameter (Martins and Stedinger, 2000).
For NS-FFA: To compute the generalized log-likelihood for a nonstationary
probability model, include the observation years (ns_years) and the nonstationary
model structure (ns_structure).
utils_generalized_likelihood( data, params, prior, ns_years = NULL, ns_structure = NULL )utils_generalized_likelihood( data, params, prior, ns_years = NULL, ns_structure = NULL )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
params |
Numeric vector of distribution parameters, in the order (location,
scale, shape). The length must be between 2 and 5, depending on the specified
|
prior |
Numeric vector of length 2. Specifies the parameters of the
Beta prior for the shape parameter |
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
The generalized log-likelihood is defined as sum of (1) the log-likelihood and (2)
the log-density of the Beta prior with parameters . The contribution of
the prior is:
Numeric scalar. The generalized log-likelihood value.
El Adlouni, S., Ouarda, T.B.M.J., Zhang, X., Roy, R., Bobee, B., 2007. Generalized maximum likelihood estimators for the nonstationary generalized extreme value model. Water Resources Research 43 (3), 1–13. doi:10.1029/2005WR004545
Martins, E. S., and Stedinger, J. R. (2000). Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resources Research, 36(3), 737–744. doi:10.1029/1999WR900330
data <- rnorm(n = 100, mean = 100, sd = 10) params <- c(100, 10, 0.1) prior <- c(1, 1) # Compute the generalized log-likelihood utils_generalized_likelihood(data, params, prior)data <- rnorm(n = 100, mean = 100, sd = 10) params <- c(100, 10, 0.1) prior <- c(1, 1) # Compute the generalized log-likelihood utils_generalized_likelihood(data, params, prior)
Compute the log-likelihood for stationary and nonstationary probability models.
For NS-FFA: To compute the log-likelihood for a nonstationary probability model,
include the observation years (ns_years) and the nonstationary model structure
(ns_structure).
utils_log_likelihood( data, distribution, params, ns_years = NULL, ns_structure = NULL )utils_log_likelihood( data, distribution, params, ns_years = NULL, ns_structure = NULL )
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
distribution |
A three-character code indicating the distribution family.
Must be |
params |
Numeric vector of distribution parameters, in the order (location,
scale, shape). The length must be between 2 and 5, depending on the specified
|
ns_years |
For NS-FFA only: Numeric vector of observation years corresponding
to |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
Numeric scalar. The log-likelihood value.
utils_generalized_likelihood()
data <- rnorm(n = 100, mean = 100, sd = 10) params <- c(100, 1, 10) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) # Compute the log-likelihood utils_log_likelihood(data, "NOR", params, ns_years, ns_structure)data <- rnorm(n = 100, mean = 100, sd = 10) params <- c(100, 1, 10) ns_years <- seq(from = 1901, to = 2000) ns_structure <- list(location = TRUE, scale = FALSE) # Compute the log-likelihood utils_log_likelihood(data, "NOR", params, ns_years, ns_structure)
Compute the quantiles for stationary and nonstationary probability models.
For NS-FFA: To compute the quantiles for a nonstationary probability model,
specify a time slice (ns_slice) and the nonstationary model structure
(ns_structure).
utils_quantiles(p, distribution, params, ns_slice = 0, ns_structure = NULL)utils_quantiles(p, distribution, params, ns_slice = 0, ns_structure = NULL)
p |
Numeric vector of probabilities between 0 and 1 with no missing values. |
distribution |
A three-character code indicating the distribution family.
Must be |
params |
Numeric vector of distribution parameters, in the order (location,
scale, shape). The length must be between 2 and 5, depending on the specified
|
ns_slice |
For NS-FFA only: Numeric scalar specifying the year at which to
evaluate the quantiles of a nonstationary probability distribution. |
ns_structure |
For NS-FFA only: Named list indicating which distribution parameters are modeled as nonstationary. Must contain two logical scalars:
|
A numeric vector of quantiles with the same length as p.
p <- runif(n = 100) params <- c(1, 1, 1) utils_quantiles(p, "GEV", params)p <- runif(n = 100) params <- c(1, 1, 1) utils_quantiles(p, "GEV", params)
Computes the first four sample L-moments and L-moment ratios from a numeric vector of data. L-moments are linear combinations of order statistics that provide robust alternatives to conventional moments, with advantages in parameter estimation for heavy-tailed and skewed distributions.
utils_sample_lmoments(data)utils_sample_lmoments(data)
data |
Numeric vector of observed annual maximum series values. Must be strictly positive, finite, and not missing. |
Given probability weighted moments ,
the first four sample L-moments are:
Then, the sample L-skewness is and the sample L-kurtosis
is .
A numeric vector containing the first four sample L-moments and L-moment ratios:
: L-mean
: L-variance
: L-skewness
: L-kurtosis
Hosking, J. R. M. (1990). L-moments: Analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society: Series B (Methodological), 52(1), 105–124.
data <- rnorm(n = 100, mean = 100, sd = 10) utils_sample_lmoments(data)data <- rnorm(n = 100, mean = 100, sd = 10) utils_sample_lmoments(data)
Computes the first four L-moments and L-moment ratios for stationary probability models.
utils_theoretical_lmoments(distribution, params)utils_theoretical_lmoments(distribution, params)
distribution |
A three-character code indicating the distribution family.
Must be |
params |
Numeric vector of distribution parameters, in the order (location,
scale, shape). The length must be between 2 and 5, depending on the specified
|
The distributions "GUM", "NOR", "GEV", "GLO", and "WEI" have
closed-form solutions for the L-moments and L-moment ratios in terms of the parameters.
The distributions "GNO" and "PE3" use rational approximations of the L-moment ratios
from Hosking (1997). The L-moments ratios for the "LNO" and "LP3" distributions
are should be compared to the log-transformed data and are thus identical to the "NOR"
and "PE3" distributions respectively.
A numeric vector of with four elements:
: L-mean
: L-variance
: L-skewness
: L-kurtosis
Hosking, J.R.M. & Wallis, J.R., 1997. Regional frequency analysis: an approach based on L-Moments. Cambridge University Press, New York, USA.
utils_theoretical_lmoments("GEV", c(1, 1, 1))utils_theoretical_lmoments("GEV", c(1, 1, 1))