Efficient Algorithms for Mining Erasable Closed Patterns From Product Datasets
Finding knowledge from large data sets to use in intelligent systems becomes more and more important in the Internet era. Pattern mining, classication, text mining, and opinion mining are the topical issues. Among them, pattern mining is an important issue. The problem of mining erasable patterns (EPs) has been proposed as a variant of frequent pattern mining for optimizing the production plans of factories. Several algorithms have been proposed for effectively mining EPs. However, for large threshold values, many EPs are obtained, leading to large memory usage. Therefore, it is necessary to mine a condensed representation of EPs. This paper rst denes erasable closed patterns (ECPs), which can represent the set of EPs without information loss. Then, a theorem for fast determining ECPs based on dPidset structure is proposed and proven. Next, two efcient algorithms [erasable closed pattern mining (ECPat) and dNC_Set based algorithm for erasable closed pattern mining (dNC-ECPM)] for mining ECPs based on this theorem are proposed. Experimental results show that ECPat is the best method for sparse data sets, while dNC-ECPM algorithm outperforms ECPat algorithm and a modied mining erasable itemsets algorithm in terms of the mining time and memory usage for all remaining data sets.
Data mining, pattern mining, erasable pattern, erasable closed pattern.