Analisis Kinerja Word Embedding Glove dalam Penerjemahan Bahasa Batak-Inggris
Performance Analysis of Glove Word Embedding for Batak Language - English Translation

Date
2024Author
Syahputra, Andika
Advisor(s)
Nababan, Erna Budhiarti
Mawengkang, Herman
Metadata
Show full item recordAbstract
This study examines the performance of GloVe word embedding on the Long
Short-Term Memory (LSTM) model in translating from Batak to English. GloVe
has the ability to capture semantic meaning from a wide context of words. This
study includes training the GloVe model with various parameters as well as
collecting and processing a unique parallel Batak - English dataset. LSTM is a
derivative of Recurrent Neural Network (RNN) which has the ability to maintain
long-term dependencies and handle sequential data. In machine translation, LSTM
performance is influenced by the quality of the word embedding used, which
produces vector representations of words, and captures their semantic and
contextual relationships. In this study, the authors analyze the performance of
GloVe word embedding on the LSTM model and compare it with Word2Vec. The
dataset used is 28,420 Batak-English sentence pairs collected from various
sources.
With encoder and decoder components, the LSTM model is trained for
several epochs and the results are evaluated using the Bilingual Evalution
Understudy (BLEU) score. This metric evaluates n-grams of actual translations
with predicted translations, which then gives a translation accuracy score. The
results show that GloVe word embedding performs better than Word2Vec. Glove
word embedding gets an average BLEU score of 0.9415, while Word2Vec gets an
average BLEU score of 0.9346. GloVe's better performance is due to its ability to
understand language in a larger dataset and understand the context of words in a
wider context
Collections
- Master Theses [620]