Abstract

Paper Title/ Authors Name Download View

A HYBRID RANDOM FORESTS-BORUTA FEATURE SELECTION ALGORITHM FOR BIODEGRADIBILITY PREDICTION

Zhe F. Liu, Hedia Fgaier, Stanislav Y. Ivanov, Ali Elkamel, Xiang H. Meng, and Suo Q. Zhao


The a priori knowledge about biodegradability is adopted to save time and money for research and design of new products. Quantitative structure activity relationship (QSAR) models as a tool for biodegradability prediction of chemicals have been encouraged by environmental organizations. In the current work, a new algorithm has been proposed to investigate the importance of chemical descriptors to be used as input variables in modeling and prediction of biodegradability. The algorithm allows obtaining an ensemble of feature subsets compromising between model complexity and generalization performance. It utilizes random forests as classifier coupled with Boruta algorithm to automatically rank and omit descriptors based on Z-score. It is shown how four least relevant variables were identified and removed from model remaining generation ability. Furthermore, a hybrid feature selection method is developed to inspect weak relevant features and omit them in a loop mode in order to remain generalization of classifiers. The prediction accuracy of the new model showed improvements compared to previous works.