Vol 5 No 2 (2019) | Journal of Biostatistics and Epidemiology

Vol 5 No 2 (2019)

Original Article(s)

Multiclass Response Feature Selection and Cancer Tumour Classification With Support Vector Machine

Alabi Waheed Banjoko , Waheed Babatunde Yahya , Mohammed Kabir Garba

XML | PDF | downloads: 304 | views: 391 | pages: 91-104

Abstract

Background & Aim: In this study, efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multi-category tumour classes of biological samples using gene expression profiles was proposed.
Methods: Feature selection interface of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate which ensured efficient detection of false-positive genes. The selected gene subsets using the above method were further screened for optimality using the Misclassification Error Rates yielded by each of them and their combinations in a sequential selection manner. In a 10-fold cross-validation, the optimal values of the SVM parameters with appropriate kernel were determined for tissue sample classification using one-versus-all approach. The entire data matrix was randomly partitioned into 95% training set to train the SVM classifier and 5% test set to evaluate the predictive performance of the classifier over 1,000 Monte-Carlo cross-validation runs. Published microarray breast cancer dataset with five clinical endpoints was employed to validate the results from the simulation studies.
Results: Results from Monte-Carlo study showed excellent performance of the SVM classifier with higher prediction accuracy of the tissue samples based on the few gene biomarkers selected by the proposed feature selection method.
Conclusion: SVM could be considered as a classification of multi-category tumour classes of biological samples using gene expression profiles.

Use of Bayesian Mixture Models in Analyzing Heterogeneous Survival Data: A Simulation Study

Naser Ahmadi , Saeed Shirazi , Hamed Baziyad

XML | PDF | downloads: 292 | views: 425 | pages: 105-109

Abstract

Background and Aim: One of the statistical methods used to analyze the time-to-event medical data is survival analysis. In survival models, the response variable is time to the occurrence of an event. The main characteristic of survival data is the existence of censored data. When we have the distribution of survival time, we can use parametric methods. Among the important and popular distributions that can be used, we can mention the Weibull distribution. If the data derives from a heterogeneous population, simple parametric models (such as Weibull) would not fit the data appropriately. One of the methods which have been introduced to overcome this problem is the use of mixture models.
Methods: To assess the validity of the two-component Weibull mixture model, we use a simulation method on heterogeneous survival data. For this purpose, data with different sample sizes were produced in a batch of 1000. Then, the validity of the model is checked using root mean square error (RMSE) criterion
Results: It is obtained that increasing the sample size would decrease the RMSE in the parameters. However the maximum observed RMSE in all the parameters was negligible.
Conclusion: The Bayesian Weibull mixture model was a proper fit for the heterogeneous survival data.

Comparison of Parametric Models: Appication to Hypertensive Patients in a Teaching Hospital, Awka

Amuche Henrietta Ibenegbu , George Amaeze Osuji , Edith Uzoma Umeh

XML | PDF | downloads: 323 | views: 420 | pages: 110-119

Abstract

Introduction: In Nigeria, hypertension is a common sickness among grownups. This research was carried out to determine the best model for predicting survival of hypertensive patients using goodness of fit criteria, Standard Error (SE), Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
Method: A total of 105 patients who were diagnosed with hypertension from January 2013 to July 2018 were considered in which death is the event of interest. Six parametric models such as; exponential, Weibull, Lognormal, Log-logistic, Gompertz and hypertabastic distribution were fitted to the data using goodness of fit such as S.E, AIC and BIC to determine the best model. The parametric models were considered because they are all lifetime distributions.
Results:The result shows that the hypertabastic distribution has the lowest AIC and BIC, followed by Gompertz distribution. The standard error also indicates the hypertabastic model is better because it has the least value of standard error. This indicates that in terms of relative efficiency and parameterization the hypertabastic model is the best. The Survival Probability Plot of the six parametric models shows that the Hypertabastic distribution best fitted the data because it shows a clear step function than the other distribution and this justifies the result SE, AIC and BIC presented.
Conclusion: Since hypertabastic distribution has the lowest SE, AIC and BIC it indicates that it is the best parametric model for predicting survival of hypertensive patients in chukwuemeka Odumegwu Ojukwu university teaching hospital Awka, Nigeria.

Quantifying the Relationship between Adaptive Traits and Agro-climatic Conditions

Mehari Gebre Teklezgi

XML | PDF | downloads: 233 | views: 493 | pages: 120-136

Abstract

Background & Aim: Durum wheat is an economically important and regularly eaten food for billions of people in the world. In the International Center for Agriculture Research in the Dry Areas (ICARDA), genbanks are using Focused Identification of the Germplasm Strategy (FIGS) to find out and quantify relationships between agro-climatic conditions and the presence of specific traits. Hence, the study is aimed to investigate the predictive value of various types of long-term agro-climatic variables on the future values of different traits.
Method: Ordinary multiple linear regression with stepwise variable selection method on the complete data set, and multiple linear regression models with predictors selected by penalized methods with mean square error cross-validation as a model selection criterion, are used to analyze 238 durum wheat landraces. Each of the models are fitted on Days to Heading and Days to Maturity response variables with 57 predictor variables, independently. Ordinary least square and weighted least square estimation methods were used.
Result: Findings implied that there is high multicollinearity among the predictor variables. It is found that there are some predictors which affect positively and some others affect negatively for both Days to Heading and Days to Maturity using both ordinary and shrinkage based models. It is revealed that the prediction from the lasso based model is not that much reasonable. Furthermore, for the Days to Heading showed that there seems better prediction as their predicted value increase continuously as a function of the actual values though there is considerable variability.
Conclusion: In conclusion, inferences and predictions by the ordinary MLR models are not trusted due to the presence of multicollinearity, and violation of some model assumptions. However, predictions using the models with predictors selected by the shrinkage methods may be better as the effects of the variability on these methods are minimal. Moreover, the WLS methods might give more sensible predictions than the OLS estimation methods. Better predictions were found on the Days to Heading.

Truncated log-logistic Family of Distributions

Mohadese Akbarinasab , Ali Reza Arabpour , Abbas Mahdavi

XML | PDF | downloads: 360 | views: 539 | pages: 137-147

Abstract

Background & Aim: There are various data associated with any events in the world which need to be analyzed. In response to this, many researchers attempt to find appropriate methods that could better fit these data using new models. One of these methods is to introduce new distributions which could better describe available data. During last few years, new distributions have been extended based on existing well-known distributions. Usually, new distributions have more parameters than existing ones. This addition of parameter(s) has been proved useful in exploring tail properties and also for improving the goodness-of-fit of the family under study.
Methods & Materials: A new family of distributions is introduced by using truncated log-logistic distribution. Some statistical and reliability properties of the new family are derived.
Results: Four special lifetime models of the new family are investigated. We estimate the parameters by maximum likelihood method. The obtained results are validated using a real dataset and it is shown that the new distributions provide a better fit than some other known distributions.
Conclusion: We have provided four new distributions. The flexibility of the proposed distributions and increased range of skewness was able to fit and capture features in one real dataset much better than some competitor distributions

Twin Births and Their Survival under Age Five: Evidence from Bangladesh Demographic and Health Survey 2014

M. Mazharul Islam , Uzma Marium

XML | PDF | downloads: 271 | views: 1044 | pages: 148-162

Abstract

Background & Aim: Little is known about twinning in developing countries due to lack of reliable data. However, the large data set from the national level Demographic and Health Surveys (DHSs) in developing countries can fill this gap. This paper examines the level, trends and determinants of twin births, and their risk of survival until age five relative to singletons in Bangladesh.
Methods & Materials: The data for the study were obtained from the 2014 Bangladesh DHS. The analysis was based on birth histories of 43,842 live births, experienced by the 17,863 women between 1978 and until survey date November 2014. Frequency distribution, cross tabulation, univariate and multivariate logistic regression models, and demographic methods such as conventional life table approach were used for data analysis.
Results: About 1.52% of the total live births in Bangladesh were found found to be twins. The twin birth rate has increased by 13.4% over the last 20 years in Bangladesh. Maternal age, parity, region of residence, economic status, father’s education, contraceptive use status and religion were identified as significant predictors of twin births. Twinning appeared as a significant predictor of high childhood mortality. Twins were found to have more than eight times higher risk of death during neonatal period than that of singletons.
Conclusion: The increasing trends in twin births in Bangladesh and the associated higher risk of childhood mortality among twins underscores the need for more focused care strategy during pregnancy and after birth. Further studies are needed to identify the reasons for exceptionally high childhood mortality among twins in Bangladesh

Assessing the level of Knowledge and attitude of the young couples about HIV in Shiraz, Iran

Firooz Esmailzadeh , Mojtaba Sepandi , Abdolhalim Rajabi , Zahra Kavosi , Manije Alimohammadi , Yousef Alimohamadi

XML | PDF | downloads: 274 | views: 485 | pages: 163-171

Abstract

Introduction: HIV infection is one of the main public health problems in the world. This study aimed to assess the knowledge and attitudes of young couples married in the city of Shiraz, and eventually suggest an Operational Program for the prevention of HIV in Iran.
Method: the data collection tool was a questionnaire consisted of 32 questions on transmission and prevention of HIV infection. The young couples were selected through simple random sampling, and the sample size was 400. The data analysis was performed using SPSS 19 software.
Results: Of the total of 400 cases, 201 (50.25%) were male and 199 (49.75%) were female. The mean age of the couples was 25.96±5.95 years. The most frequent correct answer was related to the knowledge of transmission through sharing needles among drug users (87.4%). Regarding attitude, 94.6% of the subjects agreed with the struggle against HIV. Examining the relationship between knowledge and age showed that they had a significant relationship (P=0.002). There was also a significant relationship between attitude and gender (P=0.004).
Conclusion: One of the important ways to stop the epidemic and prevent the incidence of new cases of HIV is educating people at an early age

Effects of Collinearity on Cox Proportional Hazard Model with Time Dependent Coefficients: A Simulation Study

Bayowa Teniola Babalola , Babatunde W Yahya

XML | PDF | downloads: 512 | views: 1254 | pages: 172-182

Abstract

Background: The Cox proportional hazard model has gained ground in Biostatistics and other related fields. It has been extended to capture different scenarios, part of which are violation of the proportionality of the hazards, presence of time dependent covariates and also time dependent co-efficients. This paper focuses on the behaviour of the Cox Model in relation to time coefficients in the presence of different levels of collinearity.
Objectives: The objectives of this study are to examine the effects of collinearity on the estimates of time dependent co-effiecients in Cox proportional hazard model and to compare the estimates of the model for the logarithm and the square functions of time.
Materials and methods: The Algorithm based on a binomial model was extended in order to incorporate the different correlation structures required for the study. The scaled Schoenfeld residuals plots revealed the behaviour of the estimated betas at different degrees of collinearity. Results and conclusions are based of outcome of simulation study performed only.
Results: The estimated betas were compared to the true betas at the different level of collinearity in graphical pattern.
Conclusion: The study shows that collinearity is a huge factor that influences the correctness of the estimates of the regressors within the framework of Cox model.