Skip to content

Latest commit

 

History

History
121 lines (61 loc) · 11.7 KB

File metadata and controls

121 lines (61 loc) · 11.7 KB

The Concept Drift Detection In Metabolomic Data

The prediction models with large potential in the future are based on biomedical data, especially model-based approach of prediction from metabolite datasets. These models will lead to early disease detection that will be useful for a milder course of the disease and less financial cost to treatment. Moreover, the current scientist challenges the move toward analysis and insight of phenotype-genotype relationship. This understanding leads over the examination and prediction of metabolites changes. The corrected tools for the creation of prediction models have to include the self-healing algorithm preventing the unwanted effects such as concept drift.

The main goal of this scripts include new challenge as different perspective on the analysis of metabolomics datasets using the concept drift detection. The evaluation includes two main approaches. The first approach is connected to the concept drift detection in available metabolomics datasets and the second approach is to provide the assessment of commonly used tools, resulting in the best detection approach for a general metabolomics dataset.

Thanks to that, this study brings the first insight on detection of concept drift in metabolite data. Although the results in determining drift are not unambiguous, and the further analysis is needed on larger datasets, the concept of drift detection has been confirmed. The metabolite data are affected by the concept drift like in financial credit applications. Thanks to that, the innovated adaptivity approach for the prediction should be used with caution. The study brings the first insight to detection the concept drift in metabolomic dataset, the detection drift in two different metabolite experiments, and evaluation of use on available datasets. The study opens new paths for next steps in using the correction algorithm connected with the concept drift analysis in metabolomic prediction.

The metabolomic data used in this study for concept drift detection analysis is taken from [30], see Datasets file.

REFERENCES:

[1] WEBB, Geoffrey I., et al. Characterizing concept drift. Data Mining and Knowledge Discovery, 2016, 30.4: 964-994.

[2] LU, Jie, et al. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 2018, 31.12: 2346-2363.

[3] GAMA, João, et al. Concept drift in decision-tree learning for data streams. In: Proceedings of the Fourth European Symposium on Intelligent Technologies and their implementation on Smart Adaptive Systems, Aachen, Germany, Verlag Mainz. 2004. p. 218-225.

[4] MONTEMAYOR, Daniel; SHARMA, Kumar. mGWAS: next generation genetic prediction in kidney disease. Nature Reviews Nephrology, 2020, 16.5: 255-256.

[5] MOATS, Rex A., et al. Abnormal cerebral metabolite concentrations in patients with probable Alzheimer disease. Magnetic resonance in medicine, 1994, 32.1: 110-115.

[6] WANG, Thomas J., et al. Metabolite profiles and the risk of developing diabetes. Nature medicine, 2011, 17.4: 448-453.

[7] COSTA, Albert França Josuá; ALBUQUERQUE, Régis Ant&nio Saraiva; DOS SANTOS, Eulanda Miranda. A drift detection method based on active learning. In: 2018 International Joint Conference on Neural Networks (IJCNN).IEEE,2018.p.1-8.

[8] GAMA, Joao, et al. Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, Berlin, Heidelberg, 2004. p. 286-295.

[9] BAENA-GARCIA, Manuel, et al. Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams. 2006. p. 77-86.

[10] Kyosuke Nishida and Koichiro Yamauchi. “Detecting Concept Drift Using Statistical Testing”. In: Discovery Science. Ed. by Vincent Corruble, Masayuki Takeda, and Einoshin Suzuki. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 264–269. isbn: 978-3-540-75488-6.

[11] Danilo Rafael de Lima Cabral and Roberto Souto Maior de Barros. “EMZD: Equal Means Z-Test Concept Drift Detector”. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 2020, pp. 1037–1044.

[12] S. W. Roberts. “Control Chart Tests Based on Geometric Moving Averages.” In: Technometrics 42.1 (2000), p. 97. issn: 00401706.

[13] Roberto S.M. Barros et al. “RDDM: Reactive drift detection method”. In: Expert Systems with Applications 90 (2017), pp. 344–355. issn: 0957-4174.

[14] Isvani Fr´ıas-Blanco et al. “Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds”. In: IEEE Transactions on Knowledge and Data Engineering 27.3 (2015), pp. 810–823.

[15] Silas G.T.C. Santos, Roberto S.M. Barros, and Paulo M. Gon¸calves. “A differential evolution based method for tuning concept drift detectors in data streams”. In: Information Sciences 485 (2019), pp. 376–393. issn: 0020-0255

[16] Silas G.T.C. Santos, Roberto S.M. Barros, and Paulo Mauricio Gon¸calves J´unior. “Optimizing the Parameters of Drift Detection Methods Using a Genetic Algorithm”. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI). 2015, pp. 1077–1084.

[17] Roberto Barros and Silas Santos. “An Overview and Comprehensive Comparison of Ensembles for Concept Drift”. In: Information Fusion 52 (Dec. 2019), pp. 213–244.

[18] Eduardo Spinosa, Andre de Carvalho, and Jo˜ao Gama. “OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams.” In: Jan. 2007, pp. 448–452.

[19] Elaine R. Faria, Jo˜ao Gama, and Andr´e C. P. L. F. Carvalho. “Novelty Detection Algorithm for Data Streams Multi-Class Problems”. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing. SAC ’13. Coimbra, Portugal: Association for Computing Machinery, 2013, pp. 795– 800. isbn: 9781450316569

[20] Mohammad Masud et al. “Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints”. In: IEEE Transactions on Knowledge and Data Engineering 23.6 (2011), pp. 859–874.

[21] Emre Velipasaoglu. “Machine Learned Model Quality Monitoring in Fast Data and Streaming Applications”. In: ORieilly, 2018.

[22] Gregory Ditzler and Robi Polikar. “Hellinger distance based drift detection for nonstationary environments”. In: 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE). 2011, pp. 41–48

[23] Tegjyot Singh Sethi and Mehmed Kantardzic. “On the reliable detection of concept drift from streaming unlabeled data”. In: Expert Systems with Applications 82 (2017), pp. 77–99. issn: 0957-4174.

[24] COHEN, Aaron M.; BHUPATIRAJU, Ravi Teja; HERSH, William R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In: TREC. 2004.

[25] ŽLIOBAITĖ, Indrė. Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784, 2010.

[26] SONG, Xiuyao, et al. A bayesian mixture model with linear regression mixing proportions. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008. p. 659-667.

[27] BLACK, Michaela; HICKEY, Ray. Detecting and adapting to concept drift in bioinformatics. In: International Symposium on Knowledge Exploration in Life Science Informatics. Springer, Berlin, Heidelberg, 2004. p. 161-168.

[28] KOYAMA, Takahiko, et al. Emergence of drift variants that may affect COVID-19 vaccine development and antibody treatment. Pathogens, 2020, 9.5: 324.

[29] IP, Clement, et al. Chemical form of selenium, critical metabolites, and cancer prevention. Cancer research, 1991, 51.2: 595-600.

[30] CHU, Xiaojing, et al. Integration of metabolomics, genomics, and immune phenotypes reveals the causal roles of metabolites in disease. Genome biology, 2021, 22.1: 1-22.

[31] HUNG, Jen Yu, et al. Overexpression and proliferation dependence of acyl CoA thioesterase 11 and 13 in lung adenocarcinoma. Oncology letters, 2017, 14.3: 3647-3656.

[32] LANDI, Maria Teresa, et al. Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PloS one, 2008, 3.2: e1651.

[33] Chu X, Li Y. MTBLS2633: integration of metabolomics, genomics and immune phenotypes reveals the causal roles of metabolites in disease. MetaboLights. 2021.

[34] ZHANG, Kungang; BUI, Anh T.; APLEY, Daniel W. Concept drift monitoring and diagnostics of supervised learning models via score vectors. arXiv preprint arXiv:2012.06916, 2020.

[35] Biecek, P. (2019). drifter: Concept drift and concept shift detection for predictive models. Retrieved from https://ModelOriented.github.io/drifter/

[36] Tom Mitchell. Machine Learning. Maidenhead, U.K.: McGraw-Hill, 1997. isbn: 0-07-115467-1.

[37] MONTIEL, Jacob, et al. Scikit-multiflow: A multi-output streaming framework. The Journal of Machine Learning Research, 2018, 19.1: 2915-2914.

[38] BISONG, Ekaba. Logistic Regression. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Apress, Berkeley, CA, 2019. p. 243-250.

[39] PEDREGOSA, Fabian, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 2011, 12: 2825-2830.

[40] PESARANGHADER, Ali. "A Reservoir of Adaptive Algorithms for Online Learning from Evolving Data Streams", Ph.D. Dissertation, Université d'Ottawa/University of Ottawa, 2018. DOI: http://dx.doi.org/10.20381/ruor-22444

[41] PESARANGHADER, Ali, et al. "Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams", Machine Learning Journal, 2018. DOI: https://doi.org/10.1007/s10994-018-5719-z

[42] PESARANGHADER, Ali, et al. "A framework for classification in data streams using multi-strategy learning", International Conference on Discovery Science, 2016. DOI: https://doi.org/10.1007/978-3-319-46307-0_22

[43] BIFET, Albert, et al. Fast perceptron decision tree learning from evolving data streams. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, 2010. p. 299-310.

[44] Chollet, F., & others. (2015). Keras. GitHub. Retrieved from https://github.com/fchollet/keras

[45] PESARANGHADER, Ali; VIKTOR, Herna L. Fast hoeffding drift detection method for evolving data streams. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Cham, 2016. p. 96-111.

[46] PAGE, Ewan S. Continuous inspection schemes. Biometrika, 1954, 41.1/2: 100-115.

[47] BIFET, Albert; GAVALDA, Ricard. Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2007. p. 443-448.

[48] PEARS, Russel; SAKTHITHASAN, Sripirakas; KOH, Yun Sing. Detecting concept change in dynamic data streams. Machine Learning, 2014, 97.3: 259-293.

[49] SONDHEIMER, STEVEN. Metabolic effects of the birth control pill. Clinical obstetrics and gynecology, 1981, 24.3: 927-942.

[50] HAY, William M., et al. Menstrual cycle, tolerance and blood alcohol level discrimination ability. Addictive behaviors, 1984, 9.1: 67-77.

[51] GULDI, Melanie. Fertility effects of abortion and birth control pill access for minors. Demography, 2008, 45.4: 817-827.

[52] LANDI, Maria Teresa, et al. Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PloS one, 2008, 3.2: e1651.

[53] DU, Bei, et al. A prospective study of serum metabolomic and lipidomic changes in myopic children and adolescents. Experimental Eye Research, 2020, 199: 108182.

[54] DUFT, Renata G., et al. Altered metabolomic profiling of overweight and obese adolescents after combined training is associated with reduced insulin resistance. Scientific Reports, 2020, 10.1: 1-11.

[55] PERNG, Wei, et al. Metabolomic determinants of metabolic risk in Mexican adolescents. Obesity, 2017, 25.9: 1594-1602.