LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
Big data refers to numerous forms of complex and large datasets which need distinctive computational platforms in order to be analyzed. Hadoop is one of the popular frameworks for analytics of big data. In Hadoop, a big job is split into multiple small tasks and then they are distributed to worker nodes in a parallel way using MapReduce to speed up computational processes. In this aspect, it is important how to improve throughput performance. MapReduce jobs require quick responses from the worker nodes to complete them under their deadlines. The existing scheduling schemes for Hadoop such as FIFO, fair, and capacity schedulers cannot guarantee the quick response requirement satisfying a prior deadline. Thus, Hadoop system needs to improve response time and completion time for the heterogeneous MapReduce jobs. In this paper, we propose an efficient preemptive deadline constraint scheduler based on least slack time and data locality. In order for better allocation of tasks and load balancing, we first analyze the task scheduling behaviors of the Hadoop platform. Based on that, we propose a novel preemptive approach which considers the remaining execution time of the job being executed in deciding preemption. The experimental results show that the proposed scheme significantly reduces the job execution time and queue waiting time, compared to existing schemes.
Hadoop, MapReduce, distributed system, parallel computing, preemptive job scheduling, Queuing Theory