Show simple item record

dc.contributor.advisorHarumy, Henny Febriana
dc.contributor.advisorHardi, Sri Melvani
dc.contributor.authorSaragih, Andhika Mandalanta
dc.date.accessioned2025-07-22T08:37:49Z
dc.date.available2025-07-22T08:37:49Z
dc.date.issued2025
dc.identifier.urihttps://repositori.usu.ac.id/handle/123456789/106211
dc.description.abstractGenerating relevant and engaging captions for images is a common challenge across various digital platforms, such as social media and e-commerce, making the need for systems that can automatically produce image captions increasingly important. However, vision-language models like BLIP still face limitations in generating descriptions that are truly accurate and contextually appropriate, especially when dealing with ambiguous or complex visual elements. This study aims to optimize the automatic captioning results from the BLIP model through stylistic refinement using GPT-3.5. The initial captions generated by BLIP are adapted into four different language styles Formal, Informal, Social Media, and E-Commerce using a prompt-based approach. Evaluation is conducted quantitatively using BERTScore and subjectively through user surveys. The quantitative results show that the refined captions have a high degree of semantic similarity with the reference captions, with the highest F1 score in the Formal style (0.888) and the lowest in the Social Media style (0.805). Subjective evaluation from 83 users yielded an average score of 4.19 on a 5-point Likert scale, indicating positive user perceptions in terms of readability, relevance, and ease of use. Nevertheless, the system still has limitations, such as visual misinterpretations by BLIP for instance, mistaking patterns on clothing for other objects. This study demonstrates that the integration of BLIP and GPT-3.5 can produce image captions that are adaptive to various communication styles and holds strong potential for further development in multimodal applications.en_US
dc.language.isoiden_US
dc.publisherUniversitas Sumatera Utaraen_US
dc.subjectImage Captioningen_US
dc.subjectBLIPen_US
dc.subjectGPT-3.5en_US
dc.subjectCaption Refinementen_US
dc.subjectBERTScoreen_US
dc.subjectUser Evaluationen_US
dc.titleOptimasi Caption Otomatis: Studi Refinement Caption dari Model Vision-Language Menggunakan GPT-3.5en_US
dc.title.alternativeAutomatic Caption Optimization: A Caption Refinement Study of Vision-Language Models Using GPT-3.5en_US
dc.typeThesisen_US
dc.identifier.nimNIM211401076
dc.identifier.nidnNIDN0119028802
dc.identifier.nidnNIDN0101058801
dc.identifier.kodeprodiKODEPRODI55201#Ilmu Komputer
dc.description.pages90 Pagesen_US
dc.description.typeSkripsi Sarjanaen_US
dc.subject.sdgsSDGs 9. Industry Innovation And Infrastructureen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record