DEVELOPMENT OF THAI LANGUAGE PROFANITY INVESTIGATION MODEL FOR ONLINE MEDIA USING DATA MINING TECHNIQUE

  • ณัฐาศิริ เชาว์ประสิทธิ์
  • สมชาย เล็กเจริญ
Keywords: Profanity, web board, dictionary, K-Nearest Neighbors (K-NN), Naïve Bayes

Abstract

This research aims to comparatively study the process of Thai language profanity investigation for online media with data mining techniques. These models were used to investigate Thai language profanity by using a profanity dictionary that improved by Term Frequency-Inverse Class Frequency technique (or TFICF). According to this research, the comparatively study process of Thai profanity investigation with data mining techniques such as decision tree technique which gave the accuracy at 0.96 and root mean square error (RMSE) equal to 0.19, followed by Naive Bayes technique which gave the accuracy at 0.96 and RMSE equal to 0.21, and K-Nearest Neighbor technique which gave the lowest accuracy at 0.95 and RMSE equal to 0.22. Although the decision tree and Naive Bayes techniques gave the similar accuracy, profanity investigation using decision tree technique had the lowest RMSE and easy analysis pattern to more understand when compared with the other techniques.

Published
2017-09-17
Section
Engineering and Technology Articles

Most read articles by the same author(s)