Abstract

Paper Title/ Authors Name Download View

PATTERN DISCOVERY AND DOCUMENT CLUSTERING USING K-MEANS, PAM AND HAC

Abdul Ameer Hussain, Prof. Prajna Bodapati


People search for important information which they are prone to use internet, but now a day’s most of the information is stored in text such as in news articles, E-books, email message, blogs and web pages. This is very difficult to get accurate data what exactly people want. To make easier them we have to apply text mining process for pattern finding and clustering similar information from the text. In this paper we mining the frequent terms from documents and generating the plot diagram for frequent terms using RStudio IDE. We use Euclidean and cosine similarity methods. The aim of this paper is clustering the documents using k-means, PAM and HAC methods in R as statistical analysis tool and calculating the precision, recall and F-measure values for clusters we compare the three clustering algorithms. For comparing we take the class labels documents from the 20 news group data set.