Pengembangan Sistem NER (Named Entity Recognition) Untuk Identifikasi Gejala dan Diagnosis Psikologis pada Catatan Klinis Indonesia: Adaptasi Dataset dan Nusabert
Development of a Named Entity Recognition (NER) System for Identification of Symptoms and Psychological Diagnosis in Indonesian Clinical Notes: Adaptation of Dataset and Nusabert
Date
2025Author
Tanjung, Muhammad Fadhlan
Advisor(s)
Amalia
Br Ginting, Dewi Sartika
Metadata
Show full item recordAbstract
The explosion of unstructured text data has opened up significant opportunities for the use of Natural Language Processing (NLP) to analyze information related to individual psychological conditions. However, developing a Named Entity Recognition (NER) system specifically for the Indonesian-language domain of clinical psychology still faces several challenges, particularly limited labeled data and differences in linguistic structure between English as the data source and Indonesian as the model's target. This research aims to build an NER model capable of recognizing psychological entities such as emotions, symptoms, stressors, and behaviors from Indonesian text using a transfer learning approach and fine-tuning the NusaBERT model. The research dataset was obtained from two primary sources: 42 counseling interview transcripts and online conversation datasets from Discord and Kaggle translated into Indonesian, resulting in a total of over 23,000 rows of data. All data was annotated using the BIO scheme according to psychological entity categories. After going through pre-processing and post-translation normalization stages, the NusaBERT model was fine-tuned for the NER task. Evaluation was conducted using Precision, Recall, and F1-Score metrics. The results of this study indicate that the transfer learning approach is effective in adapting Indonesian language models for the clinical psychology domain. The developed model is able to detect psychological entities more accurately than common NER-based approaches. This system has the potential to be used as an initial analytical tool by psychologists in understanding individual emotional states and behaviors based on text, without being intended to replace the role of clinical professionals.
Collections
- Undergraduate Theses [1273]
