BE/BTech & ME/MTech Final Year Projects for Computer Science | Information Technology | ECE Engineer | IEEE Projects Topics, PHD Projects Reports, Ideas and Download | Sai Info Solution | Nashik |Pune |Mumbai
director@saiinfo settings_phone02536644344 settings_phone02048626262 settings_phone+919270574718 +919096813348 settings_phone+919028924212
logo


SAI INFO SOLUTION

Diploma | BE |B.Tech |ME | M.Tech |PHD

Project Development and Training

Search Project by Domain wise


An IO Efficient Distributed Approximation Framework Using Cluster Sampling


Class Agnostic Image Common Ob

Class Agnostic Image Common Ob
Abstract


t??In this paper, we present an I/O efficient distributed approximation framework to support approximations on arbitrary subdatasets of a large dataset. Due to the prohibitive storage overhead of caching offline samples for each sub-dataset, existing offline sample-based systems provide high accuracy results for only a limited number of sub-datasets, such as the popular ones. On the other hand, current online sample-based approximation systems, which generate samples at runtime, do not take into account the uneven storage distribution of a sub-dataset. They work well for uniform distribution of a sub-dataset while suffer low I/O efficiency and poor estimation accuracy on unevenly distributed sub-datasets. To address the problem, we develop a distribution aware method called CLAP (cluster sampling based approximation). Our idea is to collect the occurrences of a sub-dataset at each logical partition of a dataset (storage distribution) in the distributed system, and make good use of such information to enable I/O efficient online sampling. There are three thrusts in CLAP. First, we develop a probabilistic map to reduce the exponential number of recorded sub-datasets to a linear one. Second, we apply the cluster sampling with unequal probability theory to implement a distribution-aware method for efficient online sampling for a single or multiple sub-datasets. Third, we enrich CLAP support with more complex approximations such as ratio and regression using bootstrap based estimation beyond the simple aggragation approxiamtions. Forth, we add an option in CLAP to allow users specifying a target error bound when submitting an approximation job. Fifth, we quantitatively derive the optimal sampling unit size in a distributed file system by associating it with approximation costs and accuracy. We have implemented CLAP into Hadoop as an example system and open sourced it on GitHub. Our comprehensive experimental results show that CLAP can achieve a speedup by up to 20? over the precise execution.

KeyWords
Approximation, Cluster sampling, Sub-dataset, Storage distribution, Hadoop.



Share
Share via WhatsApp
BE/BTech & ME/MTech Final Year Projects for Computer Science | Information Technology | ECE Engineer | IEEE Projects Topics, PHD Projects Reports, Ideas and Download | Sai Info Solution | Nashik |Pune |Mumbai
Call us : 09096813348 / 02536644344
Mail ID : developer.saiinfo@gmail.com
Skype ID : saiinfosolutionnashik