Title: | Goodness-of-Fit Tests Based on Kullback-Leibler Divergence |
---|---|
Description: | An implementation of Vasicek and Song goodness-of-fit tests. Several functions are provided to estimate differential Shannon entropy, i.e., estimate Shannon entropy of real random variables with density, and test the goodness-of-fit of some family of distributions, including uniform, Gaussian, log-normal, exponential, gamma, Weibull, Pareto, Fisher, Laplace and beta distributions; see Lequesne and Regnault (2020) <doi:10.18637/jss.v096.c01>. |
Authors: | Justine Lequesne [aut], Philippe Regnault [aut, cre] |
Maintainer: | Philippe Regnault <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0-1 |
Built: | 2024-11-06 04:11:43 UTC |
Source: | https://github.com/cran/vsgoftest |
An implementation of Vasicek and Song goodness-of-fit tests. Several functions are provided to estimate differential Shannon entropy, i.e., estimate Shannon entropy of real random variables with density, and test the goodness-of-fit of some family of distributions, including uniform, Gaussian, log-normal, exponential, gamma, Weibull, Pareto, Fisher, Laplace and beta distributions; see Lequesne and Regnault (2020) <doi:10.18637/jss.v096.c01>.
The DESCRIPTION file:
Package: | vsgoftest |
Type: | Package |
Title: | Goodness-of-Fit Tests Based on Kullback-Leibler Divergence |
Version: | 1.0-1 |
Date: | 2020-12-17 |
Author: | Justine Lequesne [aut], Philippe Regnault [aut, cre] |
Maintainer: | Philippe Regnault <[email protected]> |
Description: | An implementation of Vasicek and Song goodness-of-fit tests. Several functions are provided to estimate differential Shannon entropy, i.e., estimate Shannon entropy of real random variables with density, and test the goodness-of-fit of some family of distributions, including uniform, Gaussian, log-normal, exponential, gamma, Weibull, Pareto, Fisher, Laplace and beta distributions; see Lequesne and Regnault (2020) <doi:10.18637/jss.v096.c01>. |
Depends: | stats, fitdistrplus |
Imports: | Rcpp (>= 0.12.1) |
Suggests: | knitr |
VignetteBuilder: | knitr |
LinkingTo: | Rcpp |
Encoding: | UTF-8 |
License: | GPL (>= 2) |
Packaged: | 2020-12-17 13:30:49 UTC; philippe |
NeedsCompilation: | yes |
Date/Publication: | 2020-12-17 16:30:02 UTC |
Repository: | https://testpregnault.r-universe.dev |
RemoteUrl: | https://github.com/cran/vsgoftest |
RemoteRef: | HEAD |
RemoteSha: | 4d091f66be09886f437f673a47f238ec9a5e6a39 |
Index of help topics:
contaminants Organic and inorganic contaminant concentration data dlaplace The Laplace distribution dpareto The Pareto distribution entropy.estimate Vasicek estimate of differential Shannon Entropy vs.test Vasicek-Song goodness-of-fit test for various distributions vsgoftest-package Goodness-of-Fit Tests Based on Kullback-Leibler Divergence
Further information is available in the following vignettes:
vsgoftest_tutorial |
Tutorial (source, pdf) |
Justine Lequesne [aut], Philippe Regnault [aut, cre]
Maintainer: Philippe Regnault <[email protected]>
Vasicek, O., A test for normality based on sample entropy, Journal of the Royal Statistical Society, 38(1), 54-59 (1976).
Song, K. S., Goodness-of-fit tests based on Kullback-Leibler discrimination information, Information Theory, IEEE Transactions on, 48(5), 1103-1117 (2002).
Girardin, V., Lequesne, J. Entropy-based goodness-of-fit tests - a unifying framework. Application to DNA replication. Communications in Statistics: Theory and Methods (2017). https://doi.org/10.1080/03610926.2017.1401084
Lequesne, J., Regnault, P. vsgoftest: An R Package for Goodness-of-Fit Testing Based on Kullback-Leibler Divergence. Journal of Statistical Software, 96 (2020). doi:10.18637/jss.v096.c01
set.seed(1) samp <- rnorm(50, mean = 2, s = 3) ##Estimating entropy entropy.estimate(x = samp, window = 8) log(2*pi*exp(1))/2 #true value of entropy of normal distribution ##Testing normality vs.test(x = samp, densfun = 'dnorm', param = c(2,3), B = 500) #Simple null hypothesis vs.test(x = samp, densfun='dnorm', B = 500) #Composite null hypothesis
set.seed(1) samp <- rnorm(50, mean = 2, s = 3) ##Estimating entropy entropy.estimate(x = samp, window = 8) log(2*pi*exp(1))/2 #true value of entropy of normal distribution ##Testing normality vs.test(x = samp, densfun = 'dnorm', param = c(2,3), B = 500) #Simple null hypothesis vs.test(x = samp, densfun='dnorm', B = 500) #Composite null hypothesis
Organic and inorganic contaminant concentration data from Superfund sites; see Singh et al. (1997).
data(contaminants)
data(contaminants)
Four numeric vectors of respective lengths 17, 17, 23 and 23.
aluminium1
and manganese
are groundwater concentration measurements of aluminium and manganese from seventeen wells at the Naval Construction Battalion Center Superfound Site in Rhode Island.
aluminium2
and toluene
are concentration measurements of aluminium and toluene compiled from two waste piles at Elmara School Superfound site in Washington County, PA.
Singh, A K., Singh, A., Engelhardt, M. The lognormal distribution in environmental applications, Technology Support Center Issue Paper, US EPA (1997).
Density, cumulative distribution function, quantile function and random generation for the laplace distribution.
dlaplace(x, mu, b, log = FALSE) plaplace(q, mu, b, lower.tail = TRUE, log.p = FALSE) qlaplace(p, mu, b, lower.tail = TRUE, log.p = FALSE) rlaplace(n, mu, b)
dlaplace(x, mu, b, log = FALSE) plaplace(q, mu, b, lower.tail = TRUE, log.p = FALSE) qlaplace(p, mu, b, lower.tail = TRUE, log.p = FALSE) rlaplace(n, mu, b)
x , q
|
( |
p |
( |
n |
( |
mu |
( |
b |
( |
log , log.p
|
( |
lower.tail |
( |
The laplace distribution with shape parameter and scale parameter
has density
dlaplace gives the density, plaplace gives the distribution function, qlaplace gives the quantile function, and rlaplace generates random deviates.
The length of the result is determined by n for rnorm, and is the maximum of the lengths of the numerical arguments for the other functions.
J. Lequesne [email protected]
set.seed(1) rlaplace(100,mu=2,b=1)
set.seed(1) rlaplace(100,mu=2,b=1)
Density, cumulative distribution function, quantile function and random generation for the Pareto distribution.
dpareto(x, mu, c, log = FALSE) ppareto(q, mu, c, lower.tail = TRUE, log.p = FALSE) qpareto(p, mu, c, lower.tail = TRUE, log.p = FALSE) rpareto(n, mu, c)
dpareto(x, mu, c, log = FALSE) ppareto(q, mu, c, lower.tail = TRUE, log.p = FALSE) qpareto(p, mu, c, lower.tail = TRUE, log.p = FALSE) rpareto(n, mu, c)
x , q
|
( |
p |
( |
n |
( |
mu |
( |
c |
( |
log , log.p
|
( |
lower.tail |
( |
The pareto distribution with shape parameter and scale parameter
has density
for .
dpareto gives the density, ppareto gives the distribution function, qpareto gives the quantile function, and rpareto generates random deviates.
The length of the result is determined by n for rnorm, and is the maximum of the lengths of the numerical arguments for the other functions.
J. Lequesne [email protected]
Arnold, B.C. Pareto distribution, International Cooperative Publishing House, Fairland (1983).
Philbrick, S.W. A practical guide to the single parameter Pareto distribution. Proceedings of the Casualty Actuarial Society LXXII, 44, 44-85 (1985).
n<- 100 rpareto(n,mu=2,c=1)
n<- 100 rpareto(n,mu=2,c=1)
Computes Vasicek estimate of differential Shannon entropy from a numeric sample.
entropy.estimate(x,window)
entropy.estimate(x,window)
x |
( |
window |
( |
Vasicek estimator of Shannon entropy is defined, for a random sample , by
where is the order statistic,
is the window size, and
for
and
for
.
A single numeric value representing the Vasicek estimate of entropy of the sample
J. Lequesne [email protected]
Vasicek, O., A test for normality based on sample entropy, Journal of the Royal Statistical Society, 38(1), 54-59 (1976).
vs.test
which performs Vasicek-Song goodness-of-fit tests to the specified maximum entropy distribution family.
set.seed(2) samp <- rnorm(100, mean = 0, s = 1) entropy.estimate(x = samp, window = 8) log(2*pi*exp(1))/2 #true value of entropy of normal distribution
set.seed(2) samp <- rnorm(100, mean = 0, s = 1) entropy.estimate(x = samp, window = 8) log(2*pi*exp(1))/2 #true value of entropy of normal distribution
Performs Vasicek-Song goodness-of-fit test to the specified distribution family.
vs.test(x, densfun, param = NULL, simulate.p.value = NULL, B = 5000, delta = NULL, extend = FALSE, relax = FALSE)
vs.test(x, densfun, param = NULL, simulate.p.value = NULL, B = 5000, delta = NULL, extend = FALSE, relax = FALSE)
x |
( |
densfun |
A character string specifying the fitted distribution. Possible values are |
param |
( |
simulate.p.value |
( |
B |
( |
delta |
( |
extend |
( |
relax |
( |
The test statistic is
where is the Vasicek estimator of Shannon entropy computed from the numeric sample
x
with window size and
is the density function of the specified distribution
densfun
to be tested, with the parameter of the null for a simple hypothesis or its maximum likelihood estimate for a composite null hypothesis (
param=NULL
); See Song (2002), Girardin and Lequesne (2017) and Lequesne and Regnault (2018).
An optimal window size is automatically computed; see Song (2002).
An exact p-value is computed if the sample size is less than 100. Otherwise, asymptotic distribution is used whose approximation may be inaccurate for small samples; see Lequesne and Regnault (2018).
A list with class "htest" containing the following components:
observed |
The sample under study. |
data.name |
The name (as an R object) of the sample. |
null.value |
A character string specifying the name of the fitted distribution. |
method |
The character string |
statistic |
Vasicek test statistic; see Details below. |
parameter |
The optimal window for Vasicek test statistic |
estimate |
Parameter(s) of the fitted distribution. If |
p.value |
The p-value of the test. |
J. Lequesne [email protected]
Vasicek, O., A test for normality based on sample entropy, Journal of the Royal Statistical Society, 38(1), 54-59 (1976).
Song, K. S., Goodness-of-fit tests based on Kullback-Leibler discrimination information, Information Theory, IEEE Transactions on, 48(5), 1103-1117 (2002).
Girardin, V., Lequesne, J. Entropy-based goodness-of-fit tests - a unifying framework. Application to DNA replication. Communications in Statistics: Theory and Methods (2017). https://doi.org/10.1080/03610926.2017.1401084
Lequesne, J., Regnault, P. vsgoftest: An R Package for Goodness-of-Fit Testing Based on Kullback-Leibler Divergence. Journal of Statistical Software, 96 (2020). doi:10.18637/jss.v096.c01
entropy.estimate
which computes the Vasicek estimator of Shannon entropy.
set.seed(1) samp <- rnorm(50,2,3) vs.test(x = samp, densfun = 'dnorm', param = c(2,3), B = 500) #Simple null hypothesis vs.test(x = samp, densfun='dnorm', B = 500) #Composite null hypothesis ## Using asymptotic distribution to compute the p-value vs.test(x = samp, densfun='dnorm', simulate.p.value = FALSE) #Composite null hypothesis
set.seed(1) samp <- rnorm(50,2,3) vs.test(x = samp, densfun = 'dnorm', param = c(2,3), B = 500) #Simple null hypothesis vs.test(x = samp, densfun='dnorm', B = 500) #Composite null hypothesis ## Using asymptotic distribution to compute the p-value vs.test(x = samp, densfun='dnorm', simulate.p.value = FALSE) #Composite null hypothesis