10. Interval ckMeans An Algorithm for Clustering Symbolic Data. (Domain: Knowledge and Data Engineering)
ABSTRACT:
Clustering is the process of organizing a collection of patterns into groups based on their similarities. Fuzzy clustering techniques aim at finding groups to which every object in the database belongs to some membership degree. This paper presents a new algorithm for clustering symbolic data based on ckMeans algorithm. This new algorithm allows the data entry and the membership degree to be intervals. In order to validate the proposal, it is compared to two other algorithms using the same database.
EXISTING SYSTEM:
- Even though dynamic clustering method used in large database like web page collection which yields better clustering, but it needs additional computation which leads to increase in time complexity.
- And also when dynamic document clustering adopted for real world applications, sometimes it may not yield the desired output. And also dynamic algorithm works like static algorithm in initial clustering.
PROPOSED SYSTEM:
An approach for dynamic document clustering based on structured MARDL technique is our objective. At first the documents are clustered in Static method using Bisecting K-means algorithm. For clustering of documents in bisecting K-Means, all documents should be preprocessed in the initial stage. The preprocessing stage includes stop word removal process and stemming process. In stop word removal process, words having negative influence like adverbs, conjunctions are removed and in stemming process root word will find out by removing prefixes and suffixes of the word.
After the preprocessing process, the documents should grouped into desired number of clusters. To make desired number of clusters, bisecting K-Means clustering method is used. In this method, each document is assigning a weight by term frequency and inverse document frequency method using cosine similarity measure. After assigning weight to each document, the documents are first separated into clusters using k-Means method. After clustering of documents using K-means method the largest cluster will split and forms two sub clusters and this step would be repeated for many times until clusters formed are with high similarity.
The overall process is explained in the diagram below.
HARDWARE REQUIREMENTS
• SYSTEM : Pentium IV 2.4 GHz
• HARD DISK : 40 GB
• MONITOR : 15 VGA colour
• MOUSE : Logitech.
• RAM : 256 MB
• KEYBOARD : 110 keys enhanced.
SOFTWARE REQUIREMENTS
• Operating system : Windows XP Professional
• Front End : JAVA
• Tool : NETBEANS IDE
REFERENCE:
Rogerio R. de Vargas, Benjamin R. C. Bedregal, “Interval ckMeans: An Algorithm for Clustering Symbolic Data”, IEEE Ref.: 978-1-61284-968-3/11. IEEE Conference 2011.
No comments:
Post a Comment