Multi-Resolution Hierarchical Structure for Efficient Data Aggregation and Mining of Big Data
Big data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining and learning algorithms. Data mining algorithms are modified to accept the aggregated data as input. Hierarchical data aggregation serves as a paradigm under which novel data representations and algorithms work together for analysis and mining of big data. To evaluate its performance, we have implemented a multi-resolution Naive Bayes Classifier on the data aggregation structure. Experimental results show that the proposed structure helps the classifier to reduce computation time to 25% on average and reduce the memory usage while preserving the accuracy of results.
Big data reduction, data aggregation, multiresolution data mining.