Dataset Clustering Using K-Means Algorithm

Mayur Kulkarni, Karan Jamdade, Krushna Kashid, Akash Dhotre

Abstract: In this paper, we consider the clustering of very large distributed datasets over a network using a decentralized K-means algorithm. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. Many algorithms has been invented for distributed data clustering, which are applied when datasets cannot be concentrated on a single machine, for instance because of some reasons or due to net-work bandwidth limitations or because of the big amount of distributed data. Low overhead analysis of huge distributed data sets is must for current data centers and for future sensor networks. Our experimental evaluations show that dataset Clustering using K-means can discover the clusters more efficiently with scalable transmission cost, and also expose its supremacy in compare to the popular method LSP2P.

Keywords: Distributed systems, clustering, dynamic system, partition-based clustering, and density-based clustering.

Title: Dataset Clustering Using K-Means Algorithm

Author: Mayur Kulkarni, Karan Jamdade, Krushna Kashid, Akash Dhotre

International Journal of Computer Science and Information Technology Research

ISSN 2348-1196 (print), ISSN 2348-120X (online)

Research Publish Journals