Ekstraksi Data Alamat Indonesia dengan Named Entity Recognition Menggunakan Teknik Bilstm + Crf

Hakim, Arif Rahman

Ekstraksi Data Alamat Indonesia dengan Named Entity Recognition Menggunakan Teknik Bilstm + Crf

dc.contributor.advisor	Gunawan, Dani
dc.contributor.advisor	Seniman
dc.contributor.author	Hakim, Arif Rahman
dc.date.accessioned	2022-02-15T01:30:21Z
dc.date.available	2022-02-15T01:30:21Z
dc.date.issued	2021
dc.identifier.uri	https://repositori.usu.ac.id/handle/123456789/47670
dc.description.abstract	The variety of addressing formats in Indonesia is due to the long history that has been passed, ethnicity, ethnicity and the vast territory of Indonesia. For addresses, before being processed and stored in the data warehouse, a process is needed to extract the information contained. Address data can be in the form of numbers, street names, sub-districts, sub-districts, provincial regencies to postcodes. Some countries have templates to equalize address writing, but Indonesia does not yet have standardization. So that the topic of Indonesian address data extraction is unique and has a different level of difficulty compared to other countries. Where it is possible that the format of writing the address can be different depending on the region. This study aims to be able to extract information from Indonesian address data, so that the extraction results can be used for further other purposes. In this study, the extraction of information on Indonesian address data was carried out using Named Entity Recognition (NER) with the biLSTM and CRF techniques. Extraction considers patterns, relationships between words and is influenced by prepositions and words behind them. The results of the evaluation showed that the NER method worked well in extracting information with an F1-Score of 0.9086.	en_US
dc.description.abstract	Beragamnya format pengalamatan di Indonesia dikarenakan sejarah panjang yang telah dilalui, etnik, suku serta luas wilayah Indonesia. Untuk alamat, sebelum diolah dan disimpan pada data warehouse diperlukan suatu proses untuk mengekstrak informasi yang terkandung. Data alamat dapat berupa nomor, nama jalan, kelurahan, kecamatan, kabupaten provinsi hingga kodepos. Beberapa negara memiliki templating untuk menyetarakan penulisan alamat, namun Indonesia belum memiliki standarisasi. Sehingga topik ekstraksi data alamat Indonesia menjadi unik dan memiliki tingkat kesulitan berbeda dibandingkan negara lain. Dimana bisa saja format penulisan alamat dapat berbeda tergantung daerahnya. Penelitian ini bertujuan untuk dapat mengekstraksi informasi dari data alamat Indonesia, sehingga hasil ekstraksi dapat digunakan untuk tujuan lain lebih lanjut. Pada penelitian ini, ekstraksi informasi pada data alamat Indonesia dilakukan menggunakan Named Entity Recognition (NER) dengan teknik biLSTM dan CRF. Ekstraksi mempertimbangkan pola, hubungan antar kata serta dipengaruhi kata depan dan kata di belakangnya. Hasil evaluasi penelitian menunjukkan bahwa metode NER bekerja dengan baik dalam mengekstraksi informasi dengan nilai F1-Score sebesar 0.9086.	en_US
dc.language.iso	id	en_US
dc.publisher	Universitas Sumatera Utara	en_US
dc.subject	information extraction	en_US
dc.subject	Indonesian address data	en_US
dc.subject	named entity recognition	en_US
dc.subject	bi-directional long short-term memory	en_US
dc.subject	conditional random fiel	en_US
dc.subject	ekstraksi informasi	en_US
dc.subject	data alamat Indonesia	en_US
dc.subject	named entity recognition	en_US
dc.subject	bi-directional long short-term memory	en_US
dc.subject	conditional random field	en_US
dc.title	Ekstraksi Data Alamat Indonesia dengan Named Entity Recognition Menggunakan Teknik Bilstm + Crf	en_US
dc.type	Thesis	en_US
dc.identifier.nim	NIM151402105
dc.description.pages	51 Halaman	en_US
dc.description.type	Skripsi Sarjana	en_US

Files in this item

Name:: 151402105.pdf
Size:: 949.5Kb
Format:: PDF
Description:: Fulltext

View/Open

This item appears in the following Collection(s)

Undergraduate Theses [876]
Skripsi Sarjana

Show simple item record