EMAIL: PASSWORD:
Front Office
UPT. PERPUSTAKAAN
Institut Teknologi Sepuluh Nopember Surabaya


Kampus ITS Sukolilo - Surabaya 60111

Phone : 031-5921733 , 5923623
Fax : 031-5937774
E-mail : libits@its.ac.id
Website : http://library.its.ac.id

Support (Customer Service) :
timit_perpus@its.ac.id




Welcome..guys!

Have a problem with your access?
Please, contact our technical support below:
LIVE SUPPORT


Moh. Fandika Aqsa


Davi Wahyuni


Tondo Indra Nyata


Anis Wulandari


Ansi Aflacha




ITS » Paper and Presentation » Teknik Informatika - S2
Posted by tondoindra@gmail.com at 29/03/2016 13:59:28  •  1691 Views


INDONESIAN DOCUMENT SEARCHING BASED ON TOPIC MODEL USING LATENT DIRICHLET ALLOCANTION AND METADATA

PENCARIAN DOKUMEN BERBAHASA INDONESIA BERDASARKAN MODEL TOPIK MENGGUNAKAN LATENT DIRICHLET ALLOCATION DAN METADATA

Author :
LUKMANA, INDRA ( 5112201061 )




ABSTRAK

Perkembangan data dalam bentuk teks berkembang sangat pesat dari berbagai sumber seperti media sosial blog dan berita. Sehingga diperlukan metode pencarian data tekstual dalam jumlah besar. Berbagai solusi telah dikembangkan untuk melakukan pencarian ini. Kekurangan dari teknik-teknik tersebut adalah masih lemah dalam menangani konsep sinonim dan polisemi serta belum memperhitungkan konteks implisit yang ada dalam teks. Tesis ini mengembangkan metode pencarian dengan menggunakan analisis semantik berdasarkan topik yang dikandung dalam suatu teks dengan menggunakan Latent Dirichlet Allocation LDA selain itu juga dilakukan penelitian untuk meningkatkan metode pencarian ini dengan melakukan analisis metadata. Dengan mengintegrasikan teknik pencarian berdasarkan topik teks dengan metadata ini telah dapat dilakukan proses pencarian berdasarkan bobot kepentingan. Teknik yang diusulkan melakukan ekstraksi topik menggunakan gibbs sampling berdasarkan LDA topik ini menjadi representasi teks untuk pencarian. Pencarian tersebut lalu di kombinasikan dengan hasil pencarian berdasarkan metadata. Kombinasi ini dilakukan dengan menambahkan hasil pencarian berdasarkan LDA dan metadata berdasarkan suatu bobot kepentingan. Teknik yang diusulkan di uji menggunakan pengukuran precision recall dan Fmeasure. Berdasarkan hasil pengujian pencarian berdasarkan topik menggunakan latent dirichlet allocation dan metada untuk hasil pencarian dengan teknik yang diusulkan terhadap representasi 20 topik didapat nilai F-measure harmonic tertinggi 0.81 dan terendah 0.71.


ABSTRACT

The development of the data in text form is growing very rapidly from a variety of sources such as social media blogs and news. And there is a need of search methods on a large amounts of textual data. Various solutions have been developed to perform document searching. In general these techniques are based on the document bag-of-words and projecting them into the vector space from here the search is done by comparing the vectors of texts to determine the most similar document. Disadvantages of these techniques are they still weak in dealing with the concept of synonyms and polysemy also they stil not taking into account the implicit context of text. This research develop a search method by semantic analysis based on the topics contained in a text using Latent Dirichlet Allocation and also a research to improve the method by analyzing metadata. By integrating the search techniques based on the text metadata and the topic a search technique based on weighted importance can be done. The proposed searching technique uses extraction Gibbs sampling of topics based on LDA this topic becomes the representation of text searching. The search results then combined with the search results based on metadata. This combination is done by adding the search results based on LDA and metadata based on a weight. The proposed technique is tested using precision recall and F-measure. Searching method for comparison has been carried out for searches based on LDA with the F-measure harmonic highest and lowest obtained 0.78 and 0.71 respectively searching based on metadata with the F-measure harmonic highest and lowest 0.49 and 0.70 respectively. For search results with the proposed technique to representation 20 topics obtained F-measure harmonic highest and lowest 0.81 and 0.71 respectively.



KeywordsModel Topik, Latent Dirichlet Allocation, Metadata, Pencarian Dokumen, Bahasa Indonesia.
 
Subject:  Algoritma komputer
Contributor
  1. Dr. Agus Zainal Arifin, S.Kom,. M.Kom.
  2. Diana Purwitasari, S.Kom., M.Sc.
Date Create: 29/03/2016
Type: Text
Format: PDF
Language: Indonesian
Identifier: ITS-paper-51121150008726
Collection ID: 51121150008726
Call Number: RTif 005.74 Luk p


Source
Paper And Presentation Of Informatics Engineering RTif 005.74 Luk p, 2016

Coverage
ITS Community

Rights
Copyright @2016 by ITS Library. This publication is protected by copyright and per obtained from the ITS Library prior to any prohibited reproduction, storage in a re transmission in any form or by any means, electronic, mechanical, photocopying, reco For information regarding permission(s), write to ITS Library




[ Download - Summary ]

ITS-paper-51121150008726-40968.pdf




 Similar Document...




! ATTENTION !

To facilitate the activation process, please fill out the member application form correctly and completely

Registration activation of our members will process up to max 24 hours (confirm by email). Please wait patiently

POLLING

Bagaimana pendapat Anda tentang layanan repository kami ?

Bagus Sekali
Baik
Biasa
Jelek
Mengecewakan





You are connected from 107.23.129.77
using CCBot/2.0 (https://commoncrawl.org/faq/)



Copyright © ITS Library 2006 - 2019 - All rights reserved.
Dublin Core Metadata Initiative and OpenArchives Compatible
Developed by Hassan