Hadoop Big Data for Processing Data and Performing Workload

Girish T B, Shadik Mohammed Ghouse, Dr. B. R. Prasad Babu

Abstract: Apache Hadoop is the Java based open source platform that makes processing of large data possible over thousands of the distributed nodes and hadoop development resulted from publication of the two Google authored white papers like Google File System and the Google MapReducing. There are numerous vendor-specific distributions available based on Apache Hadoop and from the companies such as Cloud era (CDH), the Horton works (1X/2X) and Map (M3/M5). From the addition, appliance-based solutions are offered by many vendors like IBM and Oracle.

Hadoop distributed file system (HDFS) is the basic component of the Hadoop framework that manages the data stora and it stores the data in the form of the data blocks [default size is: 64M]) on hard disk and the input data size defines the block size. A block size is 128MB for the large file set is the good choice.

Keywords: HDFS, Hive, Pig, Hbase, Mapreduce work.

Title: Hadoop Big Data for Processing Data and Performing Workload

Author: Girish T B, Shadik Mohammed Ghouse, Dr. B. R. Prasad Babu

International Journal of Computer Science and Information Technology Research

ISSN 2348-1196 (print), ISSN 2348-120X (online)

Research Publish Journals