Improvement Of Data Throughput In Data-Intensive Cloud Computing Applications
In recent years, Hadoop framework is popularly known for providing cost-effective solutions to process large-scale data intensive applications in a distributed manner. One of the key issues that can significantly affect the performance of dataintensive cloud computing is computation load balancing among cluster nodes. Replica placement in HDFS plays a significant role in data availability and balanced utilization of clusters. In the current replica placement policy of Hadoop Distributed File System (HDFS), the replicas of data blocks cannot be evenly distributed across clusterâ??s nodes, so the current HDFS must rely on load balancing utility for balancing replicasâ?? distributions, which results in more time and resources consuming. In this paper, we address the load balancing problem and present an innovative replica placement policy for HDFS for data-intensive computing on massive datasets on the cloud system. It can perfectly balance the computing load among clusterâ??s nodes in both homogeneous and heterogeneous cluster environments. Experimental results of the proposed solution confirm that, the proposed replica placement scheme gives better data nodes utilization than the default replica placement policy of Hadoop.
Cloud Computing , Hadoop Distributed File System, Replica Placement policy, Data Replication, Load Balance