A Survey on Parallel Join Algorithms Using MapReduce on Hadoop
In this paper, we will present a recent survey on the improvements over parallel join algorithms using the popular MapReduce framework on the distributed file system Hadoop. We will talk briefly about MapReduce and Hadoop frameworks, and we will discuss the general main steps to install, configure and start using Hadoop. Then we will talk about parallel join algorithms where we are going to divide join algorithms into categories, and we will discuss in each category the main works of improvements from the beginning to the date in chronological way. After that, we will organize these works into an easy-to-learn table, and we will present an analysis of these works in term of advantages and disadvantages. This survey will help researchers to study the improvements over parallel join algorithms in a chronological manner in one place to simplify the process of improving and proposing new approaches to improve parallel join algorithms using MapReduce on Hadoop.
Parallel and distributed computing, Distributed file systems, Parallel join algorithms, Hadoop, Cloud computing