IMPROVING PERFORMANCE IN HADOOP CLUSTER OVER CLOUD COMPUTING ENVIRONMENT USING HOLD & RELEASE MECHANISM
With the increasing use of cloud applications, data generation from the application is also increases which has become an important issue for the cloud that is known as a big data problem. So there is a need of technology to manage the big data on the cloud efficiently. Apache Hadoop is an efficient solution for handling big data because it is an open source technology that stores the big data in a distributed manner over the cluster of heterogeneous systems, which provides a reliable storage as well as processing to the big data over the cloud. In this paper to improve the performance of MapReduce in heterogeneous or shared environments, a data holding and releasing mechanism is proposed through which we can hold the map output at the block level and perform shuffling at the map end so the shuffle bytes has been reduced, which will improve the overall mapreduce performance over the cloud.
cloud, big data, Hadoop, heterogeneous cluster, performance, load balancing, geo-distributed, hold & release.