WhatsApp Romanized Sinhala (Singlish) Group Chat Summarization Using NLP Techniques
Patabandi K. P. D. P *
Department of Statistics and Computer Science, University of Kelaniya, Sri Lanka.
Rathsara K. M. A. C. D
Department of Statistics and Computer Science, University of Kelaniya, Sri Lanka.
Nirmani H. M. C
Faculty of Computing, Sabaragamuwa University of Sri Lanka, Sri Lanka.
*Author to whom correspondence should be addressed.
Abstract
With the growing popularity of WhatsApp group chats, especially in Sri Lanka, users increasingly face challenges of information overload, leading to missed or unread important messages. While solutions exist for summarizing English-typed messages, there has been no significant attempt to summarize Singlish, a unique typing style where Sinhala words are written using the English alphabet. This research aims to address this gap by developing a Natural Language Processing (NLP)-based system to automatically summarize Singlish-typed WhatsApp group chats over 24 hours. Using exported chat data without media attachments, a customized data pre-processing pipeline was developed to clean, tokenize, and extract keywords from the chats. Two popular Summarization models, facebook/bart-base and sshleifer/distilbart-cnn 12-6, were employed to generate concise summaries, which were then distributed to users via email. The system was evaluated through information retrieval metrics and human assessments to ensure relevance and quality. The study highlights the challenges of processing Singlish due to its informal variations and lack of language resources and sets a foundation for future improvements in chat summarization for low-resource languages. The developed solution not only enhances user productivity but also contributes to the broader field of localized NLP research.
Keywords: Extractive summarization, abstractive summarization, chat summarization, chat data preprocessing