PATTERN DISCOVERY AND DOCUMENT CLUSTERING USING K-MEANS, PAM AND HAC

Abstract

Paper Title/ Authors Name	Download	View
PATTERN DISCOVERY AND DOCUMENT CLUSTERING USING K-MEANS, PAM AND HAC Abdul Ameer Hussain, Prof. Prajna Bodapati

People search for important information which they are prone to use internet, but now a dayâ€™s most of the information is stored in text such as in news articles, E-books, email message, blogs and web pages. This is very difficult to get accurate data what exactly people want. To make easier them we have to apply text mining process for pattern finding and clustering similar information from the text. In this paper we mining the frequent terms from documents and generating the plot diagram for frequent terms using RStudio IDE. We use Euclidean and cosine similarity methods. The aim of this paper is clustering the documents using k-means, PAM and HAC methods in R as statistical analysis tool and calculating the precision, recall and F-measure values for clusters we compare the three clustering algorithms. For comparing we take the class labels documents from the 20 news group data set.