Probability distributions and their generalizations have contributed greatly in
modeling and analysis of random variables. However, due to the increased
introduction of new distributions there has been a major problem with the
applications of the several distributions in the literature, this has to do
with deciding the most appropriate distribution to be used for a given set of
data. Most times, it is discovered that, the data set in question fits two or
more probability distributions and hence one has to be chosen among the others.
The Lomax-Weibull and Lomax-Log-Logistic distributions introduced in an earlier
study using a Lomax-based generator were found to be positively skewed and may
be victims of this situation especially when modeling positively skewed
datasets. In this article, we apply the two distributions to some selected datasets
to compare their performance and provide useful insight on how to select the
most fit among them in real life situations. We used the value of the
log-likelihood function, AIC, CAIC, BIC, HQIC, Cram’er-Von
Mises(W*) and Anderson Darling(A*) statistics as performance
evaluation tools for selecting between the two distributions.
Lomax-Weibull distribution, Lomax-Log-Logistic distribution, Lomax-based
generator, Performance Evaluation.
A clear choice between
two related probability distribution functions is very vital and has been done
by some researchers such as Atkinson (1969), Dumonceaux et al (1973), Atkinson
(1970),Kundu and Manglick (2005), Dumonceauxand Antle (1973), as
well as Kundu and Manglick (2004), e.t.c.
distribution was pioneered to model business failure data by Lomax (1954). This
distribution has found wide application in different fields of human endeavor
comprising income and wealth inequality, size of cities, actuarial science, medical
and biological sciences, engineering, lifetime and reliability modeling.
variable X is said to follow a Lomax
distribution with parameter ? and ? if
its probability density function (pdf)
is given by
where the corresponding
cumulative distribution function (cdf)
is given as
For where ? and ? are the shape and scale
According to Cordeiro
et al. (2014), the cdf and pdf
of the Lomax-G family distributions (based on a Lomax generator) are respectively
where g(x) and G(x) are the pdf and cdf of any continuous distribution to be
generalized, while>0 and ?>0 are the additional new
parameters responsible for the scale and shape of the distribution
The rest of this
article is organized as follows: in Section 2we defined both Lomax-Weibull and Lomax-Log-Logistic
distributions. In section 3, we present a description of the goodness-of-fittest,
some datasets, their summary and analysis. Finally, we offer some concluding
remarks in section 4.
The Lomax-Weibull Distribution (LWD)
distribution is a very popular continuous probability distribution named after
a Swedish Engineer, Scientist and Mathematician, Waloddi Weibull (1887 – 1979).
The probability distribution was proposed and applied in 1939 to analyze the
breaking strength of materials. Since then, it has been widely used for
analyzing lifetime data in reliability engineering. It is a versatile
distribution that can take on the characteristics of other types of
distributions, based on the value of the shape parameter. The Weibull distribution
is a widely used statistical model for studying fatigue and endurance life in
engineering devices and materials.
If a random
variable X follows Weibull distribution with scale parameter ?>0
and shape parameter ?>0, then its cdf
and pdf are respectively given by:
For where a and b are the scale and shape
By substituting equations
(5) and (6) into (3) and (4) and simplifying, we obtain the cdf and pdf of the Lomax-Weibull distribution respectively as:
The following is a
plot the pdf of the LWD at different parameter values.
Figure 2: The
graph of pdf of the LWD at different parameter values where .
Considering the plot
above, we can rightly say that the LWD
is skewed to the right with a very high degree of peakedness and can be used
for modeling data sets positively skewed with higher kurtosis.
2.2 The Lomax-Log-Logistic Distribution (LLD)
is also referred to as the fisk distribution in Economics, is a continuous
probability distribution for a non-negative random variable. The log-logistic
distribution is often used to model random lifetime data and hence has
applications in reliability analyses.
The cdf and pdf of the Log-logistic are respectively given by:
For , where a> 0 and
b> 0 are the scale and shape
By substituting equations
(9) and (10) into (3) and (4) and simplifying, we obtain the cdf and pdf of the Lomax-Log-Logistic distribution as follows:
Below is a graph
of the pdf of the LLD for some selected values of the
Figure 2: The graph of pdf of the LLD at
different parameter values where .
The plot for the pdf shows that the LLD is positively skewed with a very low coefficient of kurtosis and
therefore will only be good for datasets skewed to the right with moderate
2.3 Goodness-of-Fit Test
To compare these two
distributions, we have considered some criteria: the value of the
log-likelihood function evaluated at the MLEs (ll), AIC (Akaike Information Criterion), CAIC (Consistent Akaike Information Criterion), BIC (Bayesian Information Criterion),
and HQIC (Hannan Quin Information
Criterion). These statistics are given as:
Where ?? denotes the
log-likelihood function evaluated at the MLEs,
k is the number of model parameters
and n is the sample size.
We also used
goodness-of-fit tests in order to know which distribution fits the data better,
we apply the Cram’er-Von Mises (W*), and
Anderson Darling (A*) statistics.
Further information about these statistics can be obtained from Chen and
Balakrishnan (1995).These statistics can be computed as:
is the known cdf
with (a k-dimensional
parameter vector), is the standard
quantile function, , and .
Note: In decision
making, model with the lowest values for these statistics would be chosen as
the best fit model.
3 Results and Discussions
Analysis of Data
In this section, seven different
datasets were used to fit both the LWD
and Lomax-Log-Logistic distribution by applying the formulas of the test
statistics in section 4 in order to discriminating between the two mentioned
distributions. The available data sets and their
respective summary statistics are provided in as follows;
Dataset I: This dataset represents the remission times (in months) of a
random sample of 128 bladder cancer patients. It has previously been used by
Lee and Wang (2003).
It is summarized as follows:
Table 1: Summary statistics
for dataset I
Dataset II:This dataset is the strength data of glass of the aircraft
window reported by Fuller et al. (1994).
Table 2: Summary statistics
for dataset II
Dataset III: This dataset represents the waiting times (in minutes) before
service of 100 Bank customers and examined and analyzed by Ghitany et al. (2013) for fitting the Lindley
Table 3: Summary statistics
for dataset III
Dataset IV: This dataset represents the lifetime’s data relating to relief
times (in minutes) of 20 patients receiving an analgesic and reported by Gross et al. (1975) and has been used by Shanker
et al. (2016).
Table 4: Summary statistics
for dataset IV
This data represent the survival times in weeks for male rats. (Lawless, 2003).
Table 5: Summary statistics
for dataset V
The dataset is from Lawless (1982). The data given arose in tests on endurance
of deep groove ball bearings. The data are the number of million revolutions
before failure for each of the 23 ball bearings in the life tests. Its summary
is given as follows:
Table 6: Summary statistics
for dataset VI
Dataset VII: This dataset represents 66 observations of the breaking stress
of carbon fibres of 50mm length (in GPa) given by Nicholas and Padgett (2006).
The descriptive statistics for this data are as follows:
Table 7: Descriptive statistics
for dataset VII
From the summary statistics
of the seven data sets, we found that data sets I, II, III, IV and VI are
positively skewed, while V is approximately normal. Also, data sets I, III and
IV have higher kurtosis while others have moderate level of peakness.
Performance of the distribution using their AIC, CAIC,
BIC and HQIC values of the models MLEs
based on datasets I-VII.