dc.description.abstract | In the digital era, news data serves as an abundant information source but is often unstructured, particularly regarding the spread of endemic diseases. This study aims to implement automatic data extraction from news articles about endemic diseases to generate map annotations using a Natural Language Generation (NLG) approach. The developed system utilizes scraping techniques to gather data from online news articles, which are then processed through summarization using the pre-trained BART model and a frequency-based method. The text preprocessing steps include case folding, tokenization, and stopword removal. The extracted data is used to identify geographic locations and types of diseases mentioned, which are then annotated onto maps for visualization in a Geographic Information System (GIS). The evaluation was conducted using multiple metrics, including BLEU, ROUGE, and BERTScore, showing strong performance with average accuracies. The average BERTScore F1 was 0.88, BLEU was 0.78, ROUGE-1 was 0.84, ROUGE-2 was 0.80, and ROUGE-L was 0.84, indicating high consistency between the summaries and the original texts. Additionally, an evaluation involving three groups of respondents (general public, medical professionals, and linguists) revealed that 80% found the summaries easy to understand, 75% found them clear, and 70% found them easy to read. These findings demonstrate that the NLG approach effectively generates informative news summaries and accurate map annotations to facilitate monitoring and managing the spread of endemic diseases in Indonesia. | en_US |