Data-Centric Versus Algorithm-Centric Machine Learning Approaches: A Systematic Review of Comparative Effectiveness and Evaluation Frameworks
Awodele Oludele
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Noze-Otote Aisosa *
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Agu Ekeoma Emmanuel
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Sunday Bridget Nneamaka
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Afelumo Ifeoluwa
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Peter Iyogun
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Arowojobe Yemi Adisa
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Ogunwumi Oluyemi Samuel
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Benard Adepoju Victor
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Oyinloye Adebayo
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Adewoye Adekunle Samuel
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Abigail Ogunlolu
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Ajaegbu Ikechukwu Udo
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Ogunlolu Gabriel
Department of Computer Science, Babcock University, Ogun State, Nigeria.
Sowemimo Oluwakemi
Department of Computer Science, Babcock University, Ogun State, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
Machine learning research has traditionally emphasised algorithmic innovation as the main driver of performance improvement, with advances in model architecture, optimisation, and hyperparameter tuning shaping progress in computer vision, natural language processing, healthcare, and industrial automation. Recent work, however, has increased attention to data-centric artificial intelligence, which treats training data quality, quantity, and preparation as central determinants of model performance. Despite this growing interest, there remains limited agreement on how data-centric approaches compare with algorithm-centric methods across domains and evaluation settings. This systematic review synthesised evidence from 44 peer-reviewed studies published between January 2021 and February 2026 to compare the effectiveness of data-centric and algorithm-centric machine learning approaches, identify dominant data engineering techniques, evaluate domain-specific performance outcomes, and assess frameworks used to measure data quality improvements. Following PRISMA guidance, studies were selected from major academic databases and examined through narrative synthesis. The findings indicate that data-centric techniques, including data augmentation, synthetic data generation, preprocessing, feature engineering, and annotation refinement, consistently improved model performance, particularly in low-data settings and highly imbalanced datasets. Several studies reported that data-centric interventions equalled or exceeded algorithm-centric modifications while requiring fewer computational resources. Healthcare showed the most frequent and substantial benefits from data-centric approaches, followed by manufacturing, environmental science, and cybersecurity. The review also identified an important methodological gap, as relatively few studies used standardised frameworks or rigorous statistical validation to evaluate data quality improvements directly. The study concludes that data-centric and algorithm-centric approaches should be understood as complementary rather than competing paradigms and that standardised evaluation methods are needed to clarify the contribution of data quality to machine learning performance.
Keywords: Data-centric AI, algorithm-centric machine learning, machine learning performance, data quality, data augmentation, synthetic data generation, data preprocessing, feature engineering, evaluation frameworks, systematic literature review