Abstract: skewed and may be victims of this

Abstract:
Probability distributions and their generalizations have contributed greatly in
modeling and analysis of random variables. However, due to the increased
introduction of new distributions there has been a major problem with the
applications of the several distributions in the literature, this has to do
with deciding the most appropriate distribution to be used for a given set of
data. Most times, it is discovered that, the data set in question fits two or
more probability distributions and hence one has to be chosen among the others.
The Lomax-Weibull and Lomax-Log-Logistic distributions introduced in an earlier
study using a Lomax-based generator were found to be positively skewed and may
be victims of this situation especially when modeling positively skewed
datasets. In this article, we apply the two distributions to some selected datasets
to compare their performance and provide useful insight on how to select the
most fit among them in real life situations. We used the value of the
log-likelihood function, AIC, CAIC, BIC, HQIC, Cram’er-Von
Mises(W*) and Anderson Darling(A*) statistics as performance
evaluation tools for selecting between the two distributions.

Keywords:
Lomax-Weibull distribution, Lomax-Log-Logistic distribution, Lomax-based
generator, Performance Evaluation.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

1.
Introduction

A clear choice between
two related probability distribution functions is very vital and has been done
by some researchers such as Atkinson (1969), Dumonceaux et al (1973), Atkinson
(1970),Kundu  and  Manglick (2005), Dumonceauxand Antle (1973), as
well as Kundu  and  Manglick (2004), e.t.c.

The Lomax
distribution was pioneered to model business failure data by Lomax (1954). This
distribution has found wide application in different fields of human endeavor
comprising income and wealth inequality, size of cities, actuarial science, medical
and biological sciences, engineering, lifetime and reliability modeling.

A random
variable X is said to follow a Lomax
distribution with parameter ? and ? if
its probability density function (pdf)
is given by

(1)

where the corresponding
cumulative distribution function (cdf)
is given as

(2)

For where ? and ? are the shape and scale
parameters respectively.

According to Cordeiro
et al. (2014), the cdf and pdf
of the Lomax-G family distributions (based on a Lomax generator) are respectively
given by:

                            (3)         

and

  (4)

where g(x) and G(x) are the pdf and cdf of any continuous distribution to be
generalized, while>0 and ?>0 are the additional new
parameters responsible for the scale and shape of the distribution
respectively.

The rest of this
article is organized as follows: in Section 2we defined both Lomax-Weibull and Lomax-Log-Logistic
distributions. In section 3, we present a description of the goodness-of-fittest,
some datasets, their summary and analysis. Finally, we offer some concluding
remarks in section 4.

2   Materials
and Methods

2.1
The Lomax-Weibull Distribution (LWD)

Weibull
distribution is a very popular continuous probability distribution named after
a Swedish Engineer, Scientist and Mathematician, Waloddi Weibull (1887 – 1979).
The probability distribution was proposed and applied in 1939 to analyze the
breaking strength of materials. Since then, it has been widely used for
analyzing lifetime data in reliability engineering. It is a versatile
distribution that can take on the characteristics of other types of
distributions, based on the value of the shape parameter. The Weibull distribution
is a widely used statistical model for studying fatigue and endurance life in
engineering devices and materials.

If a random
variable X follows Weibull distribution with scale parameter ?>0
and shape parameter ?>0, then its cdf
and pdf are respectively given by:

(5)

(6)

For  where a and b are the scale and shape
parameters respectively.

By substituting equations
(5) and (6) into (3) and (4) and simplifying, we obtain the cdf and pdf of the Lomax-Weibull distribution respectively as:

(7)

(8)

The following is a
plot the pdf of the LWD at different parameter values.

Figure 2: The
graph of pdf of the LWD at different parameter values where .

Considering the plot
above, we can rightly say that the LWD
is skewed to the right with a very high degree of peakedness and can be used
for modeling data sets positively skewed with higher kurtosis.

2.2   The Lomax-Log-Logistic Distribution (LLD)

Log-Logistic distribution
is also referred to as the fisk distribution in Economics, is a continuous
probability distribution for a non-negative random variable. The log-logistic
distribution is often used to model random lifetime data and hence has
applications in reliability analyses.

The cdf and pdf of the Log-logistic are respectively given by:

 (9)

and

 (10)

For , where a> 0 and
b> 0 are the scale and shape
parameters respectively.

By substituting equations
(9) and (10) into (3) and (4) and simplifying, we obtain the cdf and pdf of the Lomax-Log-Logistic distribution as follows:

(11)

  (12)

Below is a graph
of the pdf of the LLD for some selected values of the
model parameters.

Figure 2: The graph of pdf of the LLD at
different parameter values where .

The plot for the pdf shows that the LLD is positively skewed with a very low coefficient of kurtosis and
therefore will only be good for datasets skewed to the right with moderate
kurtosis.

2.3  Goodness-of-Fit Test

To compare these two
distributions, we have considered some criteria: the value of the
log-likelihood function evaluated at the MLEs (ll), AIC (Akaike Information Criterion), CAIC (Consistent Akaike Information Criterion), BIC (Bayesian Information Criterion),
and HQIC (Hannan Quin Information
Criterion). These statistics are given as:

and

Where ?? denotes the
log-likelihood function evaluated at the MLEs,
k is the number of model parameters
and n is the sample size.

We also used
goodness-of-fit tests in order to know which distribution fits the data better,
we apply the Cram’er-Von Mises (W*), and
Anderson Darling (A*) statistics.
Further information about these statistics can be obtained from Chen and
Balakrishnan (1995).These statistics can be computed as:

and

Where

,

,

is the known cdf
with  (a k-dimensional
parameter vector),  is the standard
quantile function, ,  and .

Note: In decision
making, model with the lowest values for these statistics would be chosen as
the best fit model.

3   Results and Discussions

3.1
Analysis of Data

In this section, seven different
datasets were used to fit both the LWD
and Lomax-Log-Logistic distribution by applying the formulas of the test
statistics in section 4 in order to discriminating between the two mentioned
distributions. The available data sets and their
respective summary statistics are provided in as follows;

Dataset I: This dataset represents the remission times (in months) of a
random sample of 128 bladder cancer patients. It has previously been used by
Lee and Wang (2003).
It is summarized as follows:

Table 1: Summary statistics
for dataset I

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

128

0.0800

3.348

6.395

11.840

9.366

79.05

110.425

3.3257

19.1537

 

Dataset II:This dataset is the strength data of glass of the aircraft
window reported by Fuller et al. (1994).

Table 2: Summary statistics
for dataset II

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

31

18.83

25.51

29.90

35.83

30.81

45.38

52.61

0.43

2.38

 

Dataset III: This dataset represents the waiting times (in minutes) before
service of 100 Bank customers and examined and analyzed by Ghitany et al. (2013) for fitting the Lindley
distribution.

Table 3: Summary statistics
for dataset III

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

100

0.80

4.675

8.10

13.020

9.877

38.500

52.3741

1.4953

5.7345

 

Dataset IV: This dataset represents the lifetime’s data relating to relief
times (in minutes) of 20 patients receiving an analgesic and reported by Gross et al. (1975) and has been used by Shanker
et al. (2016).

Table 4: Summary statistics
for dataset IV

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

20

1.10

1.475

1.70

2.05

1.90

4.10

0.4958

1.8625

7.1854

 

Dataset
V:
This data represent the survival times in weeks for male rats. (Lawless, 2003).

Table 5: Summary statistics
for dataset V

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

20

40.00

86.75

119.00

140.80

113.45

165.00

1280.892

-0.3552

2.2120

Dataset
VI:
The dataset is from Lawless (1982). The data given arose in tests on endurance
of deep groove ball bearings. The data are the number of million revolutions
before failure for each of the 23 ball bearings in the life tests. Its summary
is given as follows:

Table 6: Summary statistics
for dataset VI

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

23

17.88

47.20

67.80

95.88

72.23

173.40

1404.78

1.0089

3.9288

 

Dataset VII: This dataset represents 66 observations of the breaking stress
of carbon fibres of 50mm length (in GPa) given by Nicholas and Padgett (2006).
The descriptive statistics for this data are as follows:

Table 7: Descriptive statistics
for dataset VII

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

66

0.390

2.178

2.835

3.278

2.760

4.900

0.795

-0.1285

3.2230

From the summary statistics
of the seven data sets, we found that data sets I, II, III, IV and VI are
positively skewed, while V is approximately normal. Also, data sets I, III and
IV have higher kurtosis while others have moderate level of peakness.

Table
8:
Performance of the distribution using their AIC, CAIC,
BIC and HQIC values of the models MLEs
based on datasets I-VII.

Datasets

Models

Log-likelihood
value

Parameter
Estimates

Statistics

Model
Ranks

Dataset
I

LWD
 

420.7675

=0.3928
=0.8735
=4.4202
=6.5906

AIC=849.5355
CAIC=849.8607
BIC=860.9437
HQIC=854.1707

2

LLD

411.4727

=7.9519
=1.6252
=8.1254
=5.4517

AIC=830.9454
CAIC=831.2707
BIC=842.3536
HQIC=835.5806

1

Dataset
II

LWD

146.435

=0.0987
=0.7832
=7.1911
=5.3806

AIC=300.8701
CAIC=302.4085
BIC=306.606
HQIC=302.7398

1

LLD

148.548

=9.5745
=3.3012
=2.2311
=6.2759

AIC=305.096
CAIC=306.6345
BIC=310.832
HQIC306.9658

2

Dataset
III

LWD

342.2547

=0.5010
=0.7455
=3.4439
=8.6494

AIC=692.5095
CAIC=692.9305
BIC=702.9302
HQIC=696.7269

2

LLD

319.8772

=9.5864
=2.2868
=7.5884
=4.8861

AIC=647.7543
CAIC=648.1754
BIC=658.175
HQIC=651.9718

1