A Framework For Incremental Parallel Mining Of Interesting Association . PDF Free Download

1y ago

19 Views

1 Downloads

820.69 KB

12 Pages

Report/dmca

Download PDF

Transcription

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, nline.netPrint ISSN 1727-6209On-line ISSN 2312-5381International Journal of ComputingA FRAMEWORK FOR INCREMENTAL PARALLEL MININGOF INTERESTING ASSOCIATION PATTERNS FOR BIG DATAAhmed Sultan Alhegami 1), Hussein Alkhader Alsaeedi 2)1)2)University of Sana’a, Yemen, e-mail: Alhegami@su.edu.yeUniversity of Science and Technology, Yemen, e-mail: h.alkhadher@ust.eduPaper history:Received 12 August 2019Received in revised form 13 December 2020Accepted 13 February 2020Available online 31 March 2020Keywords:Big Data Mining;Association Pattern Mining;Parallel Mining;Incremental Mining;Interesting;Measure Novelty Measure;KDD.Abstract: Association rule mining plays a very important role in the distributedenvironment for Big Data analysis. The massive volume of data createsimminent needs to design novel, parallel and incremental algorithms for theassociation rule mining in order to handle Big Data. In this paper, a framework isproposed for incremental parallel interesting association rule mining algorithmfor Big Data. The proposed framework incorporates interestingness measuresduring the process of mining. The proposed framework works to process theincremental data, which usually comes at different times, the user's importantknowledge is explored by processing of new data only, without having to returnfrom scratch. One of the main features of this framework is to consider the userdomain knowledge, which is monotonically increased. The model thatincorporates the users’ belief during the extraction of patterns is attractive,effective and efficient. The proposed framework is implemented on publicdatasets as well as it is evaluated based on the interesting results that are found.Copyright Research Institute for Intelligent Computer Systems, 2020.All rights reserved.1. INTRODUCTIONRecent advances in digital data collection anddata acquisition technologies have opened newavenues to acquire and store increasingly massivevolumes of data. This rapid growth of data leads toseveral considerable issues such as storage, security,scalability, and extraction of interesting knowledgewhich are difficult to handle using conventionaltechniques, methods, and tools. Data is useful only ifit can be interpreted, analyzed and if a conclusioncan be drawn from them [1-3].Big Data mining refers to finding extractiontechniques that are performed on Big Data. Big Dataextracts and retrieves interesting patterns from amassive volume of data [4]. Association rule miningplays a very important role in a distributedenvironment in Big Data analysis [5].Although many efficient algorithms have beendeveloped to extract association rules, traditionalalgorithms do not work well on Big Data [6-9]. Themain drawbacks with such algorithms are that theydon’t consider the data size and the time when the106data arrives and therefore build a model in batchmanner. In contrast, incremental algorithmconstructs and refines the model as long as new dataarrives at different times [10-13].The aim of this paper is to propose a frameworkfor incremental parallel mining of interestingassociation rules for big data. One of the mainadvantages of the proposed framework is to handlethe time changing big data and user domainknowledge. This is useful when many datasets arriveat different times or from a distributed environment.Certainly, it is desirable to update the discoveredpatterns each time new data arrives. The incrementaland parallel nature of the proposed frameworkmakes it valuable to extract interesting patterns at acurrent time with regard to the previously discoveredpatterns, more willingly than comprehensivelyextracting all patterns.The parallel and incremental association rulesalgorithms that incorporate the users’ domainknowledge during the extraction of patterns areattractive, effective and efficient for the knowledgediscovery in database (KDD) process.

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-1172. RELATED WORKSFrequent itemsets mining algorithms are such asApriori method [14] and Tree method [15]. AlsoParallel frequent itemsets mining algorithms arebased on Apriori methods [14] such as in [6-, 7, 16].They are categorized as count distribution (e.g.,parallel data mining (PDM) [6], fast parallel mining[7]), and data distribution (DD) [17]. Theassumptions of these approaches are that eachprocessor of a parallel system calculates the localsupport counts of all candidate itemsets. Then, allprocessors compute the total support counts of thecandidates by exchanging the local support counts.Other parallel frequent itemsets mining algorithmsare based on Tree methods [15]. For example,Parallel FP-Growth algorithm (PFP-Growth) whichis based on the clustered system [18], load balancedparallel FP Growth algorithm [19], an efficientparallel algorithm using message passing interfaceon a shared-nothing multiprocessor system [9], andParallel FP-Growth algorithm to mining frequentpatterns [8]. PFP algorithm makes use of theMapReduce parallel programming model for thepurpose of analysis and mining of data [8, 20]. Itsplits the database into small chunks and then usesthe MapReduce in three phases to count values,group items, and build tree, and eventually integratesas well as combines the results of the previousphases. The main drawbacks of PFP-Growth are thatit does not work on an incremental database anddoesn’t use any subjective measure ofinterestingness. Many works have been conductedfor developing algorithms based on miningincremental association rules [21-25]. The mainhypothesis of these algorithms is to update thediscovered model when new data stream arrives. In[24], DEMON algorithm is proposed to handle theevolving data more effectively and efficiently. In[25], DELI algorithm is proposed for monitoring theenvironment changes of the data stream. It makesuse of statistical methods for the updating process.DELI algorithm uses a sampling method to estimatethe support counts using an approximateupper/lower bounds on the number of changes in thenewly discovred association rules. As the low boundgets smaller, the changes of the association rules getsmaller, therefore the model maintenance is notrequired. Although these algorithms are incremental,they don’t reuse the previously discoveredknowledge when new data arrive at new timeinstance. In [22, 23] a Fast UPdate (FUP) algorithmis proposed which is incremental in nature formining association rules in huge databases. It worksby scanning the database to verify whether there arelarge itemsets or not. FUP algorithm is proposed tocompute the large itemsets in the updated database.The main purpose of this algorithm is to solve theefficient update issue of association patterns in theupdated database. The algorithm is extended toFUP* and FUP2 that scan the database kth time. In[26] Paralle incremental FP-Growth (PIFP-Growth)is proposed for improvement PFP algorithm [8] tosolve the problem of an incremental database. PIFPGrowth is based on MapReduce [20] for parallelizedincremental mining. The drawbacks of thesealgorithms are the following ones: they have manystages that are time consuming and performMapReduce several times, for instance, PFP usesMapReduce in three stages out of seven stages whilePIFP uses MapReduce in four stages out of sevenstages. In addition, both algorithms don’t use anysubjective measure of interestingness. The noveltymeasure of discovered patterns is proposed in [1012]. It is quantified with respect to knownknowledge and it eliminates the patterns that are notinteresting from the user’s point of view. In ourwork, we take advantage of the novelty measure ofinterestingness proposed in [10-12]. Although PFPand PIFP are proposed to deal with parallel andincremental Big Data mining, both approaches arebased on traditional FP Growth [15] and make use ofMapReduce programming model. Our frameworkcan use any frequent pattern mining algorithm whichuses MapReduce. It is similar to PFP and PIFP as ituses MapReduce to achieve parallelism but it differsfrom PIFP in its incremental manner. The majordifferences between the proposed framework andPFP and PIFP are: PFP uses MapReduce in three out of its fivestages and PIFP uses MapReduce in four out ofits seven stages while the proposed frameworkuses MapReduce only twice out of its fourstages. Both PFP and PIFP don’t consider the previous,discovered patterns when new data arrives whilethe framework updates the model with novelpatterns as new data stream arrives. The PIFP resets the threshold value as new dataarrives and updates the old local tree while ourapproach constructs different local tree as newdata arrives and generates new frequent items. To achieve parallelism, PFP and PIFP divide upitems into groups and perform Generating groupdependent transactions to build trees and extractfrequent items while the framework usesMapReduce to construct trees directly fromtransactions after pruning the infrequent itemsthat don’t meet the minimum support criterion.Even though PFP and PIFP are based on FPGrowth which includes two steps, the frameworkadds extra steps in order to update the model as newdata arrives and guarantees that the discoveredpatterns are interesting.107

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117The rest of the paper is organized as follows. Insection 3, we present the problem statement. AFramework for Incremental Parallel Mining ofInteresting Association Patterns is presented insection 4. In section 5 a detailed example isillustrated. In section 6, the experimental results arepresented and the conclusion is given in section 7.3. PROBLEM STATEMENTAt time instance ti, an incremental Big Data Di,i {1, ,n}, is collected. Suppose, Di is partitionedinto m parts, where . Eachℎ1 is saved on processor, and F-List isgenerated to construct local trees FP-Treem.Subsequently, frequent items are extracted andassociation rules are generated to form model Ti.Let Mi and Mi 1 be two models discovered at timeiinstances ti and ti 1 from datasets Dj andj 1i 1 Djj 1respectively. The objective is to update Mi to Mi 1using Di 1 and Mi. Mi is the model discovered at timeti now represents the previously discoveredknowledge (PDK). Mi 1 is the up-to-date modelobtained by adding interesting patterns discoveredfrom Di 1. This is achieved by constructing a modelŤi 1 from Di 1 such that association patterns in Ťi 1have user specified degree of interestingness withrespect to the rules in Mi. Subsequently, Ťi 1 is usedto update Mi to Mi 1.4. A FRAMEWORK FOR INCREMENTALPARALLEL MINING OF INTERESTIn this paper, we present a framework thatefficiently discovers interesting patterns from BigData. It makes use of MapReduce [20] to deal withdata in a parallel manner. Our proposed frameworkis similar to the PFP [8] algorithm except that eachrule generated from frequent itemset list in PFP maynot be interesting. At time Ti, our frameworkcomputes the novelty aspect of interestingnessmeasure with respect to the existing model MTi andpruning uninteresting patterns that are not significantin the current data set. The framework is shown inFig. 1. It comprises 3 phases namely, building localtree, finding frequent itemset, and buildingincremental interesting model. These phases areexplained in the following subsections:4.1 BUILDING LOCAL TREESIn this phase, Big Data is divided into m smallparts, where m can be set manually, among Pprocessors using the MapReduce parallelprogramming model for the purpose of analyzingand mining data. Each P MapReduce first, reads108each small part to achieve parallel count and theintegrated count results into a frequent list called FList, then, it sorts the items of F-List in descendingorder. Finally, MapReduce performs the seconditeration to read each small part and build a local FP-Tree. The phase outputs are FP-Treem. Thefollowing steps are required to build local trees andthe algorithm is presented in Algorithm 1.1. Set m, Define and Clear F – List.2. Division of Di into m parts where 1 j m,and save each dpj on different processorcalled P.3. First scan each Transaction T into pj tocompute supports for all items in parallelmanner.4. Integrate the count results into F-List.5. Sort items of F-List in descending order.6. Second scan each Transaction T into pj: Sort items in descending order of Tbased on F- List. Building the local tree calledlocalTreej as algorithm FP-Tree in[15].7. Return localTreejAlgorithm 1: Building Local TreesProcedure: Building LocalTrees ( , )Set of m;Define and clear F-List : F [];Division of intoℎ1 Sendto differentIn eachF [] Mapper( )Sort F [];Call Reduce( ,F[], )Procedure: Mapper( );{Define and clear []foreach Transactions in dP doforeach Item a in T do[a] ;endendreturn;}Procedure: Reduce( ,F[]){Define and clear the root of: r;foreach Transactions T indoMake ordered according to F ;Call Construct Tree( ; r);endCall Finding Frequent itemset(r,F[])}

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117Figure 1 – The Framework for Incremental Parallel Mining of Interesting Association Patterns for Big Data4.2 FINDING FREQUENT ITEMSETSIn this phase, the FP-Treem generated in the 1stphase is taken by Mappers which connect trees witheach other from different nodes. Subsequently, theReducers extract the frequent itemset from trees, andsave them in memory temporarily. The output of thisphase is the list of frequent itemset. The followingsteps are required to find the frequent itemsets andthe algorithm is presented in Algorithm 2.1. Divide F-list to number of groups(mGroups) called G-list.2. Each G-list is sent to different processorseach of which has MapReduce.3. For every processor, the items of descendingorder of F-list (from that last item to first109

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117item) are examined to find out whether theseitems belong to G-list or not.a. Mapper reads all paths of each item indifferent FP Trees and extract ltemporary local F-list for each item.b. Reduce constructs temporary local treefor each item based on their paths andtemporary local F-list.c. For every temporary local tree, Reduceextracts local frequent items for the itemswith unique paths, otherwise, theprevious steps are repeated.4. Merge all local frequent item list ondifferent processor to form frequent items.Algorithm 2: Finding Frequent ItemsetProcedure: Finding Frequent itemset(Trees local, F-List, σ)Division of F-List in to m Groups G-ListSend G-List to P differentIn each P differentFor(i F-List.Size-1;i 0;i--){a: F-List[i]If (a G-List){Mapper Read all paths in Local Treesto F- List local Temp of a Reducebuilding TreelocalTemp of item(a)If (TreelocalTemp of item (a) is single tion Frequent items local lists toFrequent Items List4.3 BUILDING INCREMENTALINTERESTING MODELIn this phase, association patterns are extractedfrom the frequent itemsets. These patterns areevaluated using confidence measure and prune thepatterns that do not satisfy this criterion resulting ina set of strong association patterns which aresubjected to the novelty criterion [11] with the aimof deciding either these patterns are interesting ornot. This phase takes into consideration the existingmodel Mi representing the known association rulesand consequently resulting in discovering of Mi 1.For each frequent itemsets, only novel rules areextracted and used to update the model Mi 1. Wecompute novelty degree rule with the noveltymeasure (NM), (NM) presented in [10] as shown inequation 1:110 ( 1 2 2 ) 1 2 ( ,), (1)where S1 and S2 are two conjunct sets withcardinalities S1 and S2 respectively. K the pairsof compatible conjuncts between S1 and S2.,this the i pair of compatible conjuncts. The algorithmcomputes novelty measure (NM) at every stage ofrules generation to determine whether a rule is likelyto lead to an interesting rule, or not. A rule becomesa candidate for next stage rule generation if itsnovelty measure (NM) value is 1 or the relevancefactor of the closest rule in M is less than therelevance factor threshold value. An interestingnessvalue of 1 of the partial temporal rule indicates thatthis rule is unlikely to expand to any existingtemporal association rule. The following steps arerequired to build the incremental interesting modeland the algorithm is presented in Algorithm 3.1. Generate association rules R from frequentitem list.2. Compute the Confidence of the rule (R)3. If Confidence (R) Go step 4 else Go step14. Compute the novelty measure (NM) of R withrespect to Model Mi5. If NM(R) Φ Go step 6 else Go step16. Update Model /5. A DETAILED EXAMPLEFor better understanding of our framework,consider a Big Data D arrived at time T1, denoted byD0. It contains 6 transactions as shown in Table 1.Suppose, D0 is partitioned into 3 parts for the sake ofparallel mining, i.e., m 3, each of which is calleddPi, i 1; 2; 3 whereas . Table 2 shows thedata in every partition which has to be sent todifferent computers Pi. The computers Pi in turncomputes support of its items by using Mapper andstore the counts into f-list local. The following F-listlocal are generated from P1, P2 and P3 respectively:f listL {A 1,B 2,C 1,D 2,E 1}, f listL {A 2,B 2,C 1,D 2,E 1,F 1,G 1}, and f listL {A 1,B 2,E 2,C 2,D 1,G 1}.Table 1. The transactions of D0ID123456ItemsABCDBEDABCDFABEDGABECDBEGC

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117Table 2. D0 into 3 partsIDItems123456ABCDBE CBDAEBDCEBCESubsequently, the f-local lists are merged intocumulative list called F-List as follows: F-list {A 4, B 6, C 4, D 5, E 4, F 1, G 2}. Now if weconsider that the minimum, support 50%, theitems G and F will be eliminated from F-list. Then,F-list is sorted on the basis of support in descendingorder as follows: F-list {B 6,D 5,A 4,C 4,E 4}.The final F-list represents the reference for every Piwhere local trees are constructed using Reduce.During construction of local trees, the items in everytransaction are sorted in descending order accordingto their position in F-list and ignore items which arenot in F-list as shown in the third column of Table 2.The ordered frequent items are used to constructlocal trees in which the roots are set to null. Thelocal trees are constructed using FP-growthalgorithm in every computer Pi as shown in Fig. 2.These local trees and F-list, which are maintained inthe memory of Pi by using MapReduce, are theoutcome of the first stage of the proposedframework. As the FP-Growth makes use of bottomup strategy, the last item in F-list is considered firstwhich is E in our example. All paths of E areexamined in all local trees resulting the following:dP1:[D,B:1], dP2:[A,D,B:1], dP3[C,D,B:1]. Thesepaths are used to generate temporary F-list usingMapperasfollows:F-ListnewE [D 3,B 3,A 1,C 1]. Then, the items C and Aare removed as they don’t meet the support criterionand F-Listnew-E are reordered in descending orderand also reorder the paths according to F-Listnew-E.Finally, the frequent items for item E are extractedusing Reduce as the path is unique. The next itemwhich is considered is C in which all paths of thisitem are examined resulting in the following:dP1:[A,D,B:1], dP2:[A,D,B:1], dP3[D,B:1], [B:1].Then, Mapper is used to generate a temporary F-listfortheitemCcontainsF-listnewC [D 3,B 4,A 2]. Note that the item A will beremoved due to minimum support criterion. Finally,the frequent items are generated as the path of theitem C is unique. Similarly, the same process isexecuted for the remaining items and all frequentitems are merged together which form the outcomeof this stage. In our example, the frequent items listis {[B,D,E:3] and [A,B,C,D:3]}. The next stage isto generate association rules from frequent item setsgenerated in the previous stage. Table 3 shows thecorresponding set of discovered association rulesassuming that the confidence threshold value is 0.6for the frequent items {B,D,E:3}.Table 3. The Association Rules Discovered at Time T1RuleNo.R1AssociationrulesB DConfidence3/6 .50NoComparewith Rule-------R2D B3/5 .60YesNullR3B E3/6 .50No-------R4E B3/4 .75YesR2R5D E3/5 .60YesR2,R4R6E D3/4 .75YesR2,R4,R5R7B DE3/6 .50No-------R8BD E3/5 .60YesR5R9BE D3/4 .75YesR6R10D BE3/5 .60YesR5R11DE B3/4 .75YesR2R12E BD3/4 .75YesR2AcceptNovel------0 1 2 0 0 10 1------2 2 2 1 .52 22 2 2 1 0 .52 22 2 2 1 .52 2------2 3 2 2 .22 32 3 2 2 .22 32 3 2 2 .22 32 3 2 2 .22 32 3 2 2 .22 3Novelty 50Add to PDK------Yes------YesYesYes-NoNoNoNoNo111

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117Figure 2 – The Local Trees on Different PiNotice that in Table 3, the rules R1, R3, and R7are eliminated due to confidence criterion which isset to be 60%.The remaining rules are subjected to noveltymeasure which is set in the example to be %50. Asthe data arrived at time T1, no comparison will bemade against Model M1 because there are no novelrules discovered so far at time T1. The last twocolumns of Table 3 show the computation of noveltydegree of the rules and therefore the rules which arenot novel are eliminated as they are uninteresting.Subsequently, the model M1 is updated as shown inFig. 3. Now suppose another data D1 arrives at timeT2 as shown in Table 4. The same stages arerepeated taking into consideration the novel rules inthe model M1. Table 5 shows the corresponding setof the discovered association rules assuming that theconfidence threshold value is 0.6 for the frequentitems {B,C,E : 2}. Notice that in Table 5, the rules{R13,R17,R18} are found to be novel and hence themodel M1 is updated incrementally to form modelM2 as shown in Fig. 4.Table 4. The transactions of D1ID1234ItemsACDBCEABCEBETable 5. The Association Rules Discovered at Time th RuleR13B C2/3 .67YesR2,R4R14C B2/3 .67YesR13R15B E2/2 1YesR4R16E B2/3 1YesR4R17C E2/3 .67YesR5,R13R18E C2/3 .67YesR13R19B CE2/3 .67YesR4R20CE B2/2 1YesR4R21C BE2/3 .67YesR4R22BE C2/3 .67YesR13R23E BC2/3 .67YesR13R24BC E2/2 1YesR17112Novelty 50Add to PDKNovel2 2 2 12 22 2 2 22 22 2 2 22 22 2 2 22 22 2 2 12 22 2 2 12 22 3 2 22 32 3 2 22 32 3 2 22 32 3 2 22 32 3 2 22 32 3 2 22 3 .5Yes .0No .0No .0No .5Yes .5Yes .2No .2No .2No .2No .2No .2No

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117Figure 3 – Model M1Figure 4 – Model M26.1 FIRST EXPERIMENTIn this experiment, the performance of theproposed framework is compared to the FP-Growthalgorithm, PFP-Growth, and PIFP. Since the numberof discovered rules of the FP-Growth algorithm,PFP-Growth and PIFP are similar, we perform thecomparison to the FP-Growth algorithm only.Table 7. The discovered rules on Kosarak datasetusing FP-Growth and T1; T2; T3 in our FrameworkTable 6. Dataset 10I4D100KTimeSize ofDatasetNo inimumconfidenceIn this section, experimental results arepresented, in particular those related to the proposedframework performance. We conducted twoexperiments, the first experiment is shown in section6.1 and the second experiment is in section 6.2. Theproposed framework and other algorithms arewritten in Java and implemented on Hadoop. Allexperiments were conducted on a PC with Intel Corei5 2.6 GHz and 4G main memory, running onMicrosoft Windows 10 64-bit. The experiments areconducted using real-life datasets available athttp://kdd.ics.uci.edu. The datasets are considered asevolving with time, and divided up into threeincrements: D1, D2, and D3 assumed that they havearrived at times T1, T2, and T3 respectively. Table 6shows the characteristics of these datasets.Minimum support6. EXPERIMENTAL 0.650.700.600.650.700.600.650.70Discovered rulesT1 T2 309Our frameworkwith NoveltythresholdΦ 0.50Novel 74T1 T T1 T2 T321478 15671392 14521203 46130133989991918282113

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117The performance is measured in term of thenumber of discovered rules with various thresholdsof minimum support and confidence and fixednovelty threshold Φ 0.50. The dataset used is(Kosarak) and it is considered evolving with timeand partitioned into three parts representing timesT1; T2; T3 respectively as shown in Table 6. As wecan notice in Table 7, the number of discoveredrules is reduced in the proposed frameworkcompared to FP-Growth in all various minimumsupport and confidence.Fig. 5, Fig. 6, and Fig. 7 show the reduction ofthe discovered rules using (Kosarak) dataset at T1,T2,and T3 times.Figure 5 – The Comparison between FP-Growth and the Proposed Framework in term of number of discoveredrules using (Kosarak) datasetFigure 6 – Discovered rules and Novel Rules for dataset at T2114

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-117Figure 7 – Discovered rules and Novel Rules for dataset at T36.2 SECOND EXPERIMENTThe objective of the second experiment is toshow the effectiveness of our framework in reducingthe number of discovered rules against PIFPalgorithm. It is expected that the number ofdiscovered rules keeps on decreasing over the time.We work with four datasets and considered thesedatasets as evolving with time, and partitioned theminto 3 increments: D1, D2 and D3 mined at times T1,T2 and T3 respectively. For each dataset used, theminimum support and minimum confidence arefixed and Novelty threshold Φ varies. It is observedthat the number of interesting rules decreases in ourframework in contrast to the number of rulesdiscovered by PIFP algorithm at T1,T2, and T3.Intuitively, the interesting rules discovered by ourframework at time T1 is no more interesting at timeT2 and the interesting rules discovered at time T2 isno more interesting at time T3. Consequently, as thevalue of Novelty threshold Φ increases, the numberof discovered interesting rules decreases at each timeas per our expectations. The results are demonstratedin Table reshold ΦminimumconfidenceDatasetminimumsupportTable 8. The Comparison between PIFP and the Proposed Framework in term of number of discovered rulesusing many datasets with various novelty threshold 021201613115137282416Our framework115

Ahmed Sultan Alhegami, Hussein Alkhader Alsaeedi / International Journal of Computing, 19(1) 2020, 106-1177. CONCLUSION AND FUTURE WORKIn this research, we proposed a framework forincremental parallel interesting association rulemining for Big Data. The proposed approachincorporates interestingness measure during theprocess of mining. It makes a self-upgrading modelthat utilizes novelty criterion to reflect the usersubjectivity and extract patterns, incrementally, fromdatasets arrive at different points in time. Our futurework includes enhancing the framework to create anassociation system in which the model can adapt to adata stream environment.[10][11][12]8. REFERENCES[1] J. Manyika, M. Chui, B. Brown, J. Bughin, R.Dobbs, C. Roxburgh, et al., Big data: The NextFrontier for Innovation, Competition, andProductivity, McKinsey Global Institute, June2011, pp. 156.[2] A. Kejariwal, “Big data challenges: a programoptimization perspective,” Proceedings of the2012 Second International Conference onCloud and Green Computing, 2012, pp. 702707.[3] S. Kaisler, F. Armour, J. A. Espinosa, and W.Money, “Big data: Issues and challengesmoving forward,” Proceedings of the 2013 46thHawaii International Conference on SystemSciences, 2013, pp. 995-1004.[4] S. Moens, E. Aksehirli, and B. Goethals,“Frequent itemset mining for big data,”Proceedings of the 2013 IEEE InternationalConference on Big Data, 2013, pp. 111-118.[5] R. Agrawal, T. Imieliski, and A. Swami,“Mining association rules between sets of itemsin large databases,” Proceedings of the 1993ACM SIGMOD Conference, 1993, pp. 207-216.[6] J. Park, M. Chen, and P. Yu, “Efficient paralleldata mining for association rules,” Proceedingsof the fourth International Conference onInformation and Knowledge ManagementCIKM’95, 1995, pp. 31-36[7] O.R. Zaane, M. El-Hajj, and P. Lu, “Fastparallel association rule mining withoutcandidacy generation,” Proceedings of the2001 IEEE International Conference on DataMining, 2001, pp. 665-668.[8] H. Li, Y. Wang, D. Zhang, M. Zhang, and E.Y.Chang, “Pfp: parallel fp-growth for queryrecommendation,” Proceedings of the 2008ACM Conference on Recommender Systems,2008, pp. 107-114.[9] L. Liu, E. Li, Y. Zhang, and Z. Tang,“Optimization of frequent itemset mining onmultiple-core processor,” Proceedings of the116[13][14][15][16][17][18][19][20][21]33rd International Conference on Very LargeData Bases, Vienna, Austria, 2007, pp. 1275–1285.V. Bhatnagar, A. S. Al-Hegami,

Incremental Mining; Interesting; Measure Novelty Measure; KDD. Abstract: Association rule mining plays a very important role in the distributed environment for Big Data analysis. The massive volume of data creates imminent needs to design novel, parallel and incremental algorithms for the association rule mining in order to handle Big Data.