An Enhanced K-NN Algorithm Leveraging BERT Techniques for Resume Parsing System
Oluwaseyi Ezekiel Olorunshola
*
Department of Computer Science, Air Force Institute of Technology, Kaduna, Nigeria.
Ikuponiyi Oluwapelumi Ampitan
Department of Computer Science, Air Force Institute of Technology, Kaduna, Nigeria.
Fatimah Adamu-Fika
Department of Cyber Security, Air Force Institute of Technology, Kaduna, Nigeria.
Adeniran Kolade Ademuwagun
Department of Cyber Security, Air Force Institute of Technology, Kaduna, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
The increasing volume of job applications has created significant challenges for organizations in efficiently screening and ranking candidate resumes. Manual and keyword-based automated systems often struggle with accuracy, and contextual understanding. The study introduced an experimental design that develops a hybrid ensemble model for resume parsing and ranking, combining k-nearest neighbors (KNN) and Bidirectional Encoder Representations from Transformers (BERT). The enhancement lies in BERT's ability to generate deep contextual embeddings that are integrated into KNN’s distance-based classification and keyword matching to improve contextual accuracy, a combination not commonly explored in previous resume parsing systems. The research involved stages such as data cleaning, preprocessing, feature extraction using named entity recognition (NER), model development and training. The system achieved 96.91% parsing accuracy and 100% ranking accuracy across 962 resumes, demonstrating strong performance with precision, recall, and F1-score of 97.0% and allows resumes in DOCX, PDF, or image formats as input. Using Natural Language Processing (NLP) techniques, term frequency- inverse document frequency (TF-IDF) vectorization, and cosine similarity, the system processes resume and ranks them based on relevance to job descriptions with a similarity score. The study was conducted at Air Force Institute of Technology within the time frame of December 2024 and June 2025. The system built highlighted the importance of automated resume parsing systems in recruitment processes.
Keywords: NLP, KNN, BERT, vectorization, TF-IDF, cosine similarity, similarity score, resumes