Abstract:

Probability distributions and their generalizations have contributed greatly in

modeling and analysis of random variables. However, due to the increased

introduction of new distributions there has been a major problem with the

applications of the several distributions in the literature, this has to do

with deciding the most appropriate distribution to be used for a given set of

data. Most times, it is discovered that, the data set in question fits two or

more probability distributions and hence one has to be chosen among the others.

The Lomax-Weibull and Lomax-Log-Logistic distributions introduced in an earlier

study using a Lomax-based generator were found to be positively skewed and may

be victims of this situation especially when modeling positively skewed

datasets. In this article, we apply the two distributions to some selected datasets

to compare their performance and provide useful insight on how to select the

most fit among them in real life situations. We used the value of the

log-likelihood function, AIC, CAIC, BIC, HQIC, Cram’er-Von

Mises(W*) and Anderson Darling(A*) statistics as performance

evaluation tools for selecting between the two distributions.

Keywords:

Lomax-Weibull distribution, Lomax-Log-Logistic distribution, Lomax-based

generator, Performance Evaluation.

1.

Introduction

A clear choice between

two related probability distribution functions is very vital and has been done

by some researchers such as Atkinson (1969), Dumonceaux et al (1973), Atkinson

(1970),Kundu and Manglick (2005), Dumonceauxand Antle (1973), as

well as Kundu and Manglick (2004), e.t.c.

The Lomax

distribution was pioneered to model business failure data by Lomax (1954). This

distribution has found wide application in different fields of human endeavor

comprising income and wealth inequality, size of cities, actuarial science, medical

and biological sciences, engineering, lifetime and reliability modeling.

A random

variable X is said to follow a Lomax

distribution with parameter ? and ? if

its probability density function (pdf)

is given by

(1)

where the corresponding

cumulative distribution function (cdf)

is given as

(2)

For where ? and ? are the shape and scale

parameters respectively.

According to Cordeiro

et al. (2014), the cdf and pdf

of the Lomax-G family distributions (based on a Lomax generator) are respectively

given by:

(3)

and

(4)

where g(x) and G(x) are the pdf and cdf of any continuous distribution to be

generalized, while>0 and ?>0 are the additional new

parameters responsible for the scale and shape of the distribution

respectively.

The rest of this

article is organized as follows: in Section 2we defined both Lomax-Weibull and Lomax-Log-Logistic

distributions. In section 3, we present a description of the goodness-of-fittest,

some datasets, their summary and analysis. Finally, we offer some concluding

remarks in section 4.

2 Materials

and Methods

2.1

The Lomax-Weibull Distribution (LWD)

Weibull

distribution is a very popular continuous probability distribution named after

a Swedish Engineer, Scientist and Mathematician, Waloddi Weibull (1887 – 1979).

The probability distribution was proposed and applied in 1939 to analyze the

breaking strength of materials. Since then, it has been widely used for

analyzing lifetime data in reliability engineering. It is a versatile

distribution that can take on the characteristics of other types of

distributions, based on the value of the shape parameter. The Weibull distribution

is a widely used statistical model for studying fatigue and endurance life in

engineering devices and materials.

If a random

variable X follows Weibull distribution with scale parameter ?>0

and shape parameter ?>0, then its cdf

and pdf are respectively given by:

(5)

(6)

For where a and b are the scale and shape

parameters respectively.

By substituting equations

(5) and (6) into (3) and (4) and simplifying, we obtain the cdf and pdf of the Lomax-Weibull distribution respectively as:

(7)

(8)

The following is a

plot the pdf of the LWD at different parameter values.

Figure 2: The

graph of pdf of the LWD at different parameter values where .

Considering the plot

above, we can rightly say that the LWD

is skewed to the right with a very high degree of peakedness and can be used

for modeling data sets positively skewed with higher kurtosis.

2.2 The Lomax-Log-Logistic Distribution (LLD)

Log-Logistic distribution

is also referred to as the fisk distribution in Economics, is a continuous

probability distribution for a non-negative random variable. The log-logistic

distribution is often used to model random lifetime data and hence has

applications in reliability analyses.

The cdf and pdf of the Log-logistic are respectively given by:

(9)

and

(10)

For , where a> 0 and

b> 0 are the scale and shape

parameters respectively.

By substituting equations

(9) and (10) into (3) and (4) and simplifying, we obtain the cdf and pdf of the Lomax-Log-Logistic distribution as follows:

(11)

(12)

Below is a graph

of the pdf of the LLD for some selected values of the

model parameters.

Figure 2: The graph of pdf of the LLD at

different parameter values where .

The plot for the pdf shows that the LLD is positively skewed with a very low coefficient of kurtosis and

therefore will only be good for datasets skewed to the right with moderate

kurtosis.

2.3 Goodness-of-Fit Test

To compare these two

distributions, we have considered some criteria: the value of the

log-likelihood function evaluated at the MLEs (ll), AIC (Akaike Information Criterion), CAIC (Consistent Akaike Information Criterion), BIC (Bayesian Information Criterion),

and HQIC (Hannan Quin Information

Criterion). These statistics are given as:

and

Where ?? denotes the

log-likelihood function evaluated at the MLEs,

k is the number of model parameters

and n is the sample size.

We also used

goodness-of-fit tests in order to know which distribution fits the data better,

we apply the Cram’er-Von Mises (W*), and

Anderson Darling (A*) statistics.

Further information about these statistics can be obtained from Chen and

Balakrishnan (1995).These statistics can be computed as:

and

Where

,

,

is the known cdf

with (a k-dimensional

parameter vector), is the standard

quantile function, , and .

Note: In decision

making, model with the lowest values for these statistics would be chosen as

the best fit model.

3 Results and Discussions

3.1

Analysis of Data

In this section, seven different

datasets were used to fit both the LWD

and Lomax-Log-Logistic distribution by applying the formulas of the test

statistics in section 4 in order to discriminating between the two mentioned

distributions. The available data sets and their

respective summary statistics are provided in as follows;

Dataset I: This dataset represents the remission times (in months) of a

random sample of 128 bladder cancer patients. It has previously been used by

Lee and Wang (2003).

It is summarized as follows:

Table 1: Summary statistics

for dataset I

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

128

0.0800

3.348

6.395

11.840

9.366

79.05

110.425

3.3257

19.1537

Dataset II:This dataset is the strength data of glass of the aircraft

window reported by Fuller et al. (1994).

Table 2: Summary statistics

for dataset II

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

31

18.83

25.51

29.90

35.83

30.81

45.38

52.61

0.43

2.38

Dataset III: This dataset represents the waiting times (in minutes) before

service of 100 Bank customers and examined and analyzed by Ghitany et al. (2013) for fitting the Lindley

distribution.

Table 3: Summary statistics

for dataset III

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

100

0.80

4.675

8.10

13.020

9.877

38.500

52.3741

1.4953

5.7345

Dataset IV: This dataset represents the lifetime’s data relating to relief

times (in minutes) of 20 patients receiving an analgesic and reported by Gross et al. (1975) and has been used by Shanker

et al. (2016).

Table 4: Summary statistics

for dataset IV

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

20

1.10

1.475

1.70

2.05

1.90

4.10

0.4958

1.8625

7.1854

Dataset

V:

This data represent the survival times in weeks for male rats. (Lawless, 2003).

Table 5: Summary statistics

for dataset V

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

20

40.00

86.75

119.00

140.80

113.45

165.00

1280.892

-0.3552

2.2120

Dataset

VI:

The dataset is from Lawless (1982). The data given arose in tests on endurance

of deep groove ball bearings. The data are the number of million revolutions

before failure for each of the 23 ball bearings in the life tests. Its summary

is given as follows:

Table 6: Summary statistics

for dataset VI

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

23

17.88

47.20

67.80

95.88

72.23

173.40

1404.78

1.0089

3.9288

Dataset VII: This dataset represents 66 observations of the breaking stress

of carbon fibres of 50mm length (in GPa) given by Nicholas and Padgett (2006).

The descriptive statistics for this data are as follows:

Table 7: Descriptive statistics

for dataset VII

parameters

n

Minimum

Median

Mean

Maximum

Variance

Skewness

Kurtosis

Values

66

0.390

2.178

2.835

3.278

2.760

4.900

0.795

-0.1285

3.2230

From the summary statistics

of the seven data sets, we found that data sets I, II, III, IV and VI are

positively skewed, while V is approximately normal. Also, data sets I, III and

IV have higher kurtosis while others have moderate level of peakness.

Table

8:

Performance of the distribution using their AIC, CAIC,

BIC and HQIC values of the models MLEs

based on datasets I-VII.

Datasets

Models

Log-likelihood

value

Parameter

Estimates

Statistics

Model

Ranks

Dataset

I

LWD

420.7675

=0.3928

=0.8735

=4.4202

=6.5906

AIC=849.5355

CAIC=849.8607

BIC=860.9437

HQIC=854.1707

2

LLD

411.4727

=7.9519

=1.6252

=8.1254

=5.4517

AIC=830.9454

CAIC=831.2707

BIC=842.3536

HQIC=835.5806

1

Dataset

II

LWD

146.435

=0.0987

=0.7832

=7.1911

=5.3806

AIC=300.8701

CAIC=302.4085

BIC=306.606

HQIC=302.7398

1

LLD

148.548

=9.5745

=3.3012

=2.2311

=6.2759

AIC=305.096

CAIC=306.6345

BIC=310.832

HQIC306.9658

2

Dataset

III

LWD

342.2547

=0.5010

=0.7455

=3.4439

=8.6494

AIC=692.5095

CAIC=692.9305

BIC=702.9302

HQIC=696.7269

2

LLD

319.8772

=9.5864

=2.2868

=7.5884

=4.8861

AIC=647.7543

CAIC=648.1754

BIC=658.175

HQIC=651.9718

1