IMPROVING THE RELIABILITY OF MACHINE LEARNING MODELS BY FILLING IN MISSING NAN VALUES IN MEDICAL DATASETS USING A GENETIC ALGORITHM
Keywords:
Genetic algorithm, ML, AI, Random Forest, KNNAbstract
This article proposes a genetic algorithm-based approach to optimize the filling of missing NaN values in a dataset. The focus is on selecting NaN values in the dataset directly corresponding to the results of the classification task. In the proposed method, each individual is represented as a chromosome in the form of a vector of all missing values. The search space is bounded by the given intervals for numerical attributes, and by the set of appropriate categories for categorical attributes. The accuracy indicator of the Random Forest ensemble model was used as the fitness function in the genetic algorithm.
References
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
N. Fayzullo, S. Sariyev and Y. Sherzodjon, "Analyzing the Effectiveness of Ensemble Methods in Solving Multi-Class Classification Problems," 2025 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russian Federation, 2025, pp. 788-793, doi: 10.1109/SmartIndustryCon65166.2025.10986248
Juan Carlos Figueroa-García, Roman Neruda, German Hernandez–Pérez, A genetic algorithm for multivariate missing data imputation, Information Sciences, Volume 619, 2023, Pages 947-967, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2022.11.037.
K. Laxmikant, R. Bhuvaneswari and B. Natarajan, "An Efficient Approach to Detect Diabetes using XGBoost Classifier," 2023 Winter Summit on Smart Computing and Networks (WiSSCoN), Chennai, India, 2023, pp. 1-8, doi: 10.1109/WiSSCoN56857.2023.10133854.
Lobato, Fábio & Araújo, Igor & Tadaiesky, Vincent & Santana, Ádamo. (2015). An Evolutionary Missing Data Imputation Method for Pattern Classification. 10.1145/2739482.2768451.
S. Sariyev, I. Yalgoshev and M. Nuriddinova, "Traditional Methods and Modern Approaches Based on Ensemble Algorithms for Decision-Making in Diagnostics Using Medical Data," 2025 International Russian Automation Conference (RusAutoCon), Sochi, Russian Federation, 2025, pp. 971-975, doi: 10.1109/RusAutoCon65989.2025.11177423.
S. Sariyev, G. Negmatova and A. Sayidkulov, "Preparation Datasets Based on Artificial Intelligence Models and Classification Algorithms," 2025 International Russian Automation Conference (RusAutoCon), Sochi, Russian Federation, 2025, pp. 260-266, https://doi.org/10.1109/RusAutoCon65989.2025.11177339
Qiang Long, Changzhi Wu, Tingwen Huang, Xiangyu Wang,A genetic algorithm for unconstrained multi-objective optimization,Swarm and Evolutionary Computation,Volume 22,2015,Pages 1-14,ISSN 2210-6502,https://doi.org/10.1016/j.swevo.2015.01.002.
V. Venugopal, T.T. Narendran,A genetic algorithm approach to the machine-component grouping problem with multiple objectives,Computers & ndustrial Engineering,Volume 22, Issue 4,1992,Pages 469-480,ISSN 0360-8352,https://doi.org/10.1016/0360-8352(92)90022-C.
Yang Yuan, Jianqiang Du, Yanchen Zhu, Jigen Luo, Qiang Huang,Research on the application of dynamic weighted KNN with preprocessing based on a normal distribution in metabolomics data imputation,Computational Biology and Chemistry,Volume 121,2026,108804,ISSN 1476-9271, https://doi.org/10.1016/j.compbiolchem.2025.108804.
Rubul Kumar Bania, Anindya Halder,R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data,Computer Methods and Programs in Biomedicine,Volume 184,2020,105122,ISSN 0169-2607, https://doi.org/10.1016/j.cmpb.2019.105122.
R. Hephzibah, A. H. Christinal, R. Jayanthi, D. A. Chandy and C. Bajaj, "A Novel Ensemble Classifier Framework for Accurate Fetal Heart Rate Classification," 2023 4th International Conference on Signal Processing and Communication (ICSPC), Coimbatore, India, 2023, pp. 321-324, doi: 10.1109/ICSPC57692.2023.10125713.
S. Akhatkulov, I. Yalgoshev and J. Haydarov, "Different Warmup and Annealing Strategies for ANN Models to Predict Air Quality Index," 2025 International Russian Automation Conference (RusAutoCon), Sochi, Russian Federation, 2025, pp. 491-496, doi: 10.1109/RusAutoCon65989.2025.11177428.
N. F. Makhmadiyarovich and Y. Sherzodjon, “Methods of increasing data reliability based on distributed and parallel technologies based on blockchain,” Artificial Intelligence, Blockchain, Computing and Security Volume 2. eBook ISBN: 9781032684994, pages 637 – 642, January 2023.
А. Axatov, M. Nurmamatov, F. Nazarov, and Sh. Sariyev, “Genetic algorithm application technology in multi-parameter optimization problems,” AIP Conf. Proc., vol. 3244, art. no. 030025, 2024, doi: 10.1063/5.0242074
D. E. Goldberg and J. H. Holland, “Genetic algorithms and machine learning,” Machine Learning, vol. 3, pp. 95–99, 1988, doi: 10.1023/A:1022602019183.
S. E. Haupt, “Introduction to genetic algorithms,” in Artificial Intelligence Methods in the Environmental Sciences, S. E. Haupt, A. Pasini, and C. Marzban, Eds. Dordrecht, The Netherlands: Springer, 2009, pp. 103–125, doi: 10.1007/978-1-4020-9119-3_5.
S. Katoch, S. S. Chauhan, and V. Kumar, “A review on genetic algorithm: past, present, and future,” Multimedia Tools and Applications, vol. 80, no. 5, pp. 8091–8126, 2021, doi: 10.1007/s11042-020-10139-6.
T. Alam, S. Qamar, A. Dixit, and M. Benaida, “Genetic algorithm: reviews, implementations and applications,” International Journal of Engineering Pedagogy (iJEP), vol. 10, no. 6, pp. 57–77, 2020, doi: 10.3991/ijep.v10i6.14567
A. Hassanat et al., “Choosing mutation and crossover ratios for genetic algorithms—A review with a new dynamic approach,” Information, vol. 10, no. 12, art. 390, 2019, doi: 10.3390/info10120390
Sokhobiddin Akhatkulov, Islom Yalgoshev; Air quality prediction based on machine learning techniques. AIP Conf. Proc. 15 September 2025; 3356 (1): 030005. https://doi.org/10.1063/5.0296121
S. Akhatkulov, I. Yalgoshev and Z. Urinboyev, "Vehicle CO2 Emission Prediction Using Deep Learning and Ensemble Machine Learning Methods," 2025 International Russian Automation Conference (RusAutoCon), Sochi, Russian Federation, 2025, pp. 819-824, doi: 10.1109/RusAutoCon65989.2025.11177377.
Downloads
Published
How to Cite
License
Copyright (c) 2025 Shohruh Sariyev

This work is licensed under a Creative Commons Attribution 4.0 International License.
