Peningkatan Akurasi Long Short-Term Memori (LSTM) Menggunakan Word2Vec dan Fastext untuk Machine Translation Bahasa Batak-Inggris
Improving The Accuracy of Long Short-Term Memory (LSTM) Using Word2Vec and Fastext for Batak-English Machine Translation

Date
2024Author
Nasution, Nur Amalia
Advisor(s)
Nababan, Erna Budhiarti
Mawengkang, Herman
Metadata
Show full item recordAbstract
This research examines the performance of the Long Short-Term Memory (LSTM) algorithm in combination with two word embedding techniques, FastText and Word2Vec, for translating text between the Batak and English languages. LSTM, an advanced form of Recurrent Neural Networks (RNNs), is utilized for its capability to handle sequential data and maintain long-term dependencies. However, LSTM's effectiveness in translation tasks is significantly influenced by the quality of word embeddings, which provide low-dimensional vector representations of words, capturing their semantic and contextual relationships. This study conducted a comparative analysis of LSTM's performance using FastText and Word2Vec embeddings. Data comprising 28,420 Batak-English sentence pairs were collected from various sources, including the Lets Read Asia website and the "Kamus Batak Toba - Indonesia" dictionary. The sentences were then embedded using both FastText and Word2Vec techniques, and the resulting vectors were fed into the LSTM model.
The LSTM model, incorporating encoder and decoder components, was trained over multiple epochs, and its performance was evaluated using the BLEU (Bilingual Evaluation Understudy) score. This metric compares n-grams of the predicted translations with reference translations, providing a measure of translation accuracy. The results indicate that the LSTM model with FastText embeddings consistently outperformed the model with Word2Vec embeddings. The FastText-based model achieved an average BLEU score of 0.9516, compared to 0.9389 for the Word2Vec-based model. This superior performance is attributed to FastText's ability to handle out-of-vocabulary words by leveraging subword information, thus providing more accurate and contextually relevant translations.
Collections
- Master Theses [620]