Improving the performance of Hadoop MapReduce Applications via Optimization of concurrent containers per Node
Apache Hadoop is a distributed platform for storing, processing and analyzing of big data on commodity machines. Hadoop has tunable parameters and they affect the performance of MapReduce applications significantly. In order to improve the performance, tuning the Hadoop configuration parameters is an effective approach. Performance optimization is usually based on memory utilization, disk I/O rate, CPU utilization and network traffic. In this paper, the effect of MapReduce performance is experimented and analyzed by varying the number of concurrent containers (cc) per machine on yarn-based pseudo-distributed mode for MapReduce applications. In this experiment, we also measure the impact of performance by using different suitable Hadoop Distributed File System (HDFS) block size. From our experiment ,we found that tuning cc per node improve performance compared to default parameter setting, we also observed the further performance improvement via optimizing cc along with different HDFS block size.
MapReduce, parameter tuning, concurrent containers, block size