a Hadoop based Query System on Accumulative and
Column-oriented stores, known for their scalability and flexibility, are a common NoSQL database implementation and are increasingly used in big data management. In column-oriented stores, a ‚??full-scan‚?? query strategy is inefficient and the search space can be reduced if data is well partitioned or indexed, however there is no pre-defined schema for building and maintaining partitions and indexes at lower cost. We leverage an accumulative and high-dimensional data model, a sophisticated linearization algorithm, and an efficient query algorithm, to solve the challenge of how a pre-defined and wellpartitioned data model can be applied to flexible and time-varied key-value data. We adapt a high-dimensional array as the data model to partition the key-value data without additional storage and massive calculation; improve the Z-order linearization algorithm, which map multidimensional data to one dimension while preserving locality of the data points, for flexibility; efficiently build an expansion mechanism for the data model to support time-varied data. The result is Haery, a column-oriented store, based on a distributed file system and computing framework. In experiments, Haery is compared with Hive, HBase, Cassandra, MongoDB, PostgresXL and HyperDex in terms of query performance. With results indicating Haery on average performs 4.57x, 4.23x, 3.55x, 1.79x, 1.82x and 120.6x faster, respectively.
‚??Key-value data, Column-oriented store, Multi-dimensional data model, Linearization, Accumulation