Abstract: The main objective is to describe an approach for data clustering by using Mini Batch K-Means algorithm. The implementation describes here optimizes the K-Means by using one-pass over the input data and produces as many centroids as it determines is optimal. Avoiding multiple passes over the input data can have major impacts on running time because just reading large data set can increase the cost in large-scale computations. Mini Batch K-Means algorithm is implemented by using Hadoop framework. Mini Batch K-Means is implemented using Map-Reduce programming paradigms and clusters of machine is created by using VMware virtual machine. Experimental results are compared between existing system K-Means and proposed system Mini Batch K-Means by using datasets like reuters21578 and SC time series dataset. Mini Batch K-Means clustering algorithm can improve parameters like accuracy at good extent as it shows compact and well-separated clusters and computation time can also decrease as compared to existing algorithm. Performance can also improve by using more number of machines in Hadoop cluster.
Keywords: Mini Batch K-Means; hadoop; K-Means; Map-Reduce; Clustering.
Title: Mini-Batch K-Means Clustering Using Map-Reduce in Hadoop
Author: Mr. Krishna Yadav, Mr. Jwalant Baria,
International Journal of Computer Science and Information Technology Research
ISSN 2348-120X (online), ISSN 2348-1196 (print)
Research Publish Journals