Statistical Analysis for Detection of Sensitive Data using Hadoop Clusters
The omnipresence of internet technology and the advent of smart devices accumulate varieties of voluminous, viscous real-time data in a network from varieties of sources and also facilitates a way for criminal, intruders to attack the network which persuades information theft, financial loss, cyber-attack, and cyberwar. It is the major challenge for researchers to determine sensitive data from large scale realtime data so that the right action can be taken at the right time. Therefore, it is important to purpose a framework to handle massive data and to detect sensitive data from big data. In this regard, the collected data from the network are stored in cloud drives, other storage devices and then transferred to Hadoop distributed file system and processed with MapReduce processing using distributed computing concepts, Machine learning algorithms, and advanced statistical methods. Statistical analytical tools like Descriptive Analysis, Regression, ANOVA, Parametric Leveneā??s test, Pearson Correlation, and Kruskal-Wallis Test present that recent big data analytical tools are effective than the traditional method for retrieval of sensitive data from Big Data.
Big Data, Distributed Computing, Correlation, Regression, Hadoop Distributed File System