Data Mining Case Studies

Transcription

Data Mining Case StudiesProceedings of the First International Workshop onData Mining Case Studiesheld at the2005 IEEE International Conference on Data MiningEdited byBrendan Kitts, iProspectGabor Melli, Simon Fraser UniversityKarl Rexer, Rexer AnalyticsISBN 0-9738918-2-3

Data Mining Case StudiesProceedings of the First International Workshop onData Mining Case Studiesheld at the2005 IEEE International Conference on Data MiningHouston, USA. November 27, 2005Edited byBrendan KittsiProspectGabor MelliSimon Fraser UniversityKarl RexerRexer AnalyticsISBN 0-9738918-2-3

ContentsImproved Cardiac Care via Automated Mining of Medical Patient RecordsR. Bharat RaoSIEMENS MEDICAL . 12The Mining of SAS Technical Support DataAnnette Sanders and Craig DeVaultSAS CORPORATION . 33Closing the Gap: Automated Screening of Tax Returns to Identify Egregious Tax SheltersDave DeBarr and Zach Eyler-WalkerMITRE CORPORATION . 34Data-driven Modeling of Acute Toxicity of Pesticide ResiduesFrank Lemke, Emilio Benfenati, Johann-Adolf MuellerKNOWLEDGEMINER INC. / ISTITUTO DI RICERCHE FARMACOLOGICHE. 42External Search Term Marketing Program: A Return on Investment ApproachPramod Singh, Laksminarayan Choudur and Alan BensonHEWLETT-PACKARD COMPANY . 60Optimal Allocation of Online Advertisements using Decision Treesand Non-Linear ProgrammingDavid MontgomeryPOINDEXTER SYSTEMS . 67Autonomous Profit Maximization in Online Search AdvertisingBrendan Kitts, Benjamin J. Perry, Benjamin LeBlanc, Parameshvyas LaxminarayanIPROSPECT / ISOBAR COMMUNICATIONS CORPORATION . 73

Price Optimization in Grocery Stores with Cannibalistic Product InteractionsBrendan Kitts and Kevin HetheringtonIPROSPECT / MITRE CORPORATION . 74Market Basket Recommendations for the HP SMB StorePramod Singh, Charles Thomas and Ariel SepulvedaHEWLETT-PACKARD COMPANY . 92Publishing Operational Models of Data Mining Case StudiesTimm EulerUNIVERSITY OF DORTMUND . 99Championing LTV at LTCEd Freeman and Gabor MelliWASHINGTON MUTUAL / PREDICTIONWORKS INC. . 107Survival Analysis Models to Estimate Customer Lifetime ValuePaolo Giudici, Silvia Figini, Claudia Galliano and Daniela PollaUNIVERSITY OF PAVIA / SKY ITALY . 114Neural Network to identify and prevent bad debt in Telephone CompaniesCarlos André R. Pinheiro, Alexandre G. Evsukoff, Nelson F. F. EbeckenBRASIL TELECOM / COPPE / UFRJ . 125Data Mining-Based Segmentation for Targeting: A Telecommunications ExampleKasindra Maharaj and Robert CeurvorstSYNOVATE INC. . 140

OrganizersChairsBrendan Kitts, iProspectGabor Melli, Simon Fraser UniversityPrize CommitteeKarl Rexer, PhD., Rexer AnalyticsJohn Elder, PhD., Elder ResearchBrendan Kitts, iProspectProgram CommitteeGregory Piatetsky-Shapiro, PhD., KDNuggetsRichard Bolton, PhD., KnowledgeBase Marketing, Inc.Diane Lye, PhD., AmazonSimeon J. Simoff, PhD., University of Technology SydneyDavid Freed, PhD., Exa Corp.Kevin Hetherington, MITRE Corp.Parameshvyas Laxminarayan, iProspectTom Osborn, PhD., Verism Inc.Ed Freeman, Washington MutualBrendan Kitts, iProspectGabor Melli, Simon Fraser UniversityKarl Rexer, PhD., Rexer AnalyticsJohn Elder, PhD., Elder ResearchMartin Vrieze, HarborfreightMartin Ester, PhD., Simon Fraser UniversityKristen Stevensen, iProspectSponsorsElder Research Inc. (ERI)The Institute of Electrical and Electronics Engineers (IEEE)

ParticipantsAuthorsEmilio BenfenatiAlan BensonRobert CeurvorstLaksminarayan ChoudurDave DeBarrCraig DeVaultNelson F. F. EbeckenJohn ElderTimm EulerAlexandre G. EvsukoffEd FreemanSilvia FiginiClaudia GallianoPaolo GiudiciKevin HetheringtonBrendan KittsParameshvyas LaxminarayanBenjamin LeBlancFrank LemkeKasindra MaharajGabor MelliDavid MontgomeryJohann-Adolf MuellerBenjamin J. PerryCarlos André R. PinheiroDaniela PollaR. Bharat RaoKarl RexerAnnette SaundersAriel SepulvedaPramod SinghCharles ThomasZach Eyler-WalkerIstituto Di Ricerche FarmacologicheHewlett-Packard CompanySynovateHewlett-Packard CompanyMitre CorporationSAS CorporationCoppe / UFRJElder Research Inc.University of DortmundCoppe / UFRJWashington MutualUniversity of PaviaSky ItalyUniversity of PaviaMitre CorporationIsobar CommunicationsiProspectIsobar CommunicationsKnowledgeMiner Inc.SynovatePredictionWorks / Simon Fraser UniversityPoindexter SystemsIstituto Di Ricerche FarmacologicheiProspectBrasil TelecomSky ItalySiemens MedicalRexer AnalyticsSAS CorporationHewlett-Packard CompanyHewlett-Packard CompanyHewlett-Packard CompanyMitre Corporation

Gold SponsorsElder Research Inc.Elder Research is a leader in the practice of Data Mining -- discovering useful patterns in dataand successfully harnessing the information gained. The principals are active researchers in DataMining, contributing to the literature of this emerging field in books, conferences, and throughhighly-regarded short courses and training seminars.IEEE - The Institute of Electrical and Electronics EngineersThe Institute of Electrical and Electronics Engineers is a non-profit, technical professionalassociation of more than 360,000 individual members in approximately 175 countries. The IEEEis a leading authority in technical areas ranging from computer engineering, biomedicaltechnology and telecommunications, to electric power, aerospace and consumer electronics,among others. Through its technical publishing, conferences and consensus-based standardsactivities, the IEEE produces 30 percent of the world's published literature in electricalengineering, computers and control technology.

OverviewMotivationFrom its inception the field of Data Mining has been guided by the need to solve practicalproblems. This is reflected in the establishment of the Industrial Track at the annual Associationfor Computing Machinery KDD conference, and practical tutorials at IEEE's Conference on DataMining. Yet because of client confidentiality restrictions, few articles describe working, realworld success stories. Anecdotally, success stories are one of the most discussed topics at datamining conferences. It is only human to favor the telling of stories. Stories can capture theimagination and inspire researchers to do great things. The benefits of good case studies include:1. Education: Success stories help to build understanding.2. Inspiration: Success stories inspire future data mining research.3. Public Relations: Applications that are socially beneficial, and even those that are justinteresting, help to raise awareness of the positive role that data mining can play inscience and society.4. Problem Solving: Success stories demonstrate how whole problems can be solved. Often90% of the effort is spent solving non-prediction algorithm related problems.5. Connections to Other Scientific Fields: Completed data mining systems often exploitmethods and principles from a wide range of scientific areas. Fostering connections tothese fields will benefit data mining academically, and will assist practitioners to learnhow to harness these fields to develop successful applications.The WorkshopIt is our pleasure to announce the "Data Mining Case Studies" workshop. This workshop willhighlight data mining implementations that have been responsible for a significant andmeasurable improvement in business operations, or an equally important scientific discovery, orsome other benefit to humanity. Data Mining Case Studies organizing committee membersreserve the right to contact the deployment site and validate the various facts of theimplementation.Data Mining Case Studies papers have greater latitude in (a) range of topics - authors may touchupon areas such as optimization, operations research, inventory control, and so on, (b) pagelength - longer submissions are allowed, (c) scope - more complete context, problem andsolution descriptions will be encouraged, (d) prior publication - if the paper was published in partelsewhere, it may still be considered if the new article is substantially more detailed, (e) novelty often successful data mining practitioners utilize well established techniques to achievesuccessful implementations and allowance for this will be given.

The Data Mining Practice PrizeIntroductionThe Data Mining Practice Prize will be awarded to work that has had a significant andquantitative impact in the application in which it was applied, or has significantly benefitedhumanity. All papers submitted to Data Mining Case Studies will be eligible for the DataMining Practice Prize, with the exception of members of the Prize Committee. Eligible authorsconsent to allowing the Practice Prize Committee to contact third parties and their deploymentclient in order to independently validate their claims.AwardWinners and runners up can expect an impressive array of honors includinga. Plaque awarded at the IEEE / ICDM conference awards dinner on November 29th, 2005.b. Prize money comprising 500 for first place, 300 for second place, 200 for third place,donated by Elder Research.c. Article summaries about each of the deployments to be published in the journal SIGKDDExplorations, which will also announce the results of the competition and the prize winnersd. Awards Dinner with organizers and prize winners.We wish to thank Elder Research, their generous donation of prize money, incidental costs, timeand support, and the IEEE for making our competition and workshop possible.

Improved Cardiac Care via Automated Mining of Medical Patient RecordsR. Bharat RaoComputer-Aided Diagnosis & Therapy GroupSiemens Medical Solutions, USA, Incbharat.rao ατ siemens.comAbstractCardiovascular Disease (CVD) is the singlelargest killer in the world. Although, several CVDtreatment guidelines have been developed to improvequality of care and reduce healthcare costs, for anumber of reasons, adherence to these guidelinesremains poor. Further, due to the extremely poorquality of data in medical patient records, most oftoday’s healthcare IT systems cannot providesignificant support to improve the quality of CVDcare (particularly in chronic CVD situations whichcontribute to the majority of costs).We present REMIND, a Probabilistic frameworkfor Reliable Extraction and Meaningful Inferencefrom Nonstructured Data. REMIND integrates thestructured and unstructured clinical data in patientrecords to automatically create high-qualitystructured clinical data. There are two principalfactors that enable REMIND to overcome thebarriers associated with inference from medicalrecords. First, patient data is highly redundant –exploiting this redundancy allows us to deal with theinherent errors in the data. Second, REMINDperforms inference based on external medicaldomain knowledge to combine data from multiplesources and to enforce consistency between differentmedical conclusions drawn from the data – via aprobabilistic reasoning framework that overcomesthe incomplete, inconsistent, and incorrect nature ofdata in medical patient records.This high-quality structuring allows existingpatient records to be mined to support guidelinecompliance and to improve patient care. However,once REMIND is configured for an institution’s datarepository, many otherimportant clinicalapplications are also enabled, including: qualityassurance; therapy selection for individual patients;automated patient identification for clinical trials;data extraction for research studies; and to relatefinancial and clinical factors. REMIND providesvalue across the continuum of healthcare, rangingfrom small physician practice databases to the mostcomplex hospital IT systems, from acute cardiac careto chronic CVD management, and to experimentalresearch studies. REMIND is currently deployedacross multiple disease areas over a total of over5,000,000 patients across the US.1. IntroductionCardiovascular Disease (CVD) is a globalepidemic that is the leading cause of death worldwide(17 million deaths) [78]. The World HealthOrganization estimates that CVD is responsible for10% of “Disability Adjusted Life Years” (DALYs)lost in low- and middle-income countries and 18% inhigh-income countries. (The DALYs lost can bethought of as “healthy years of life lost” and indicatethe total burden of a disease as opposed to countingresulting deaths.)Section 2 motivates our research by describinghow current technologies are unable to combat theCVD epidemic. We begin by describing thecardiology burden faced today, with an emphasis onthe United States, and discuss some of the factorscontributing to the further deterioration of the CVDepidemic. A number of CVD treatment guidelineshave been developed by health organizations to assistthe physician on how to best treat patients with CVD.Yet adherence to these guidelines remains poor,despite studies overwhelmingly showing thatadherence to these guidelines reduces morbidity andmortality, improves quality of life, and dramaticallyreduces healthcare costs.One of the most promising ways to improve thequality of healthcare is to implement these guidelineswithin healthcare IT systems. Unfortunately, as wediscuss in Section 2, due to the poor quality ofhealthcare data in medical patient records (the “DataGap”), most healthcare IT systems are unable toprovide significant support for CVD care: this isparticularly true in chronic CVD situations whichcontribute to the majority of costs. Furthermore, thisIEEE 2005 International Conf. on Data Mining: Data Mining Case Studies WorkshopPage: 12

“Data Gap” is not likely to improve with theintroduction of the Electronic Health Record (EHR),and is further hampered by the lack of standards forclinical data, and the fragmented nature of thehealthcare IT industry. Medical patient data istypically scattered in multiple sources and most ofthe information about the clinical context is stored asunstructured free text – these are dictated byphysicians at different time points over thecontinuum of care delivered to the patient. It isimportant to note that the data is only “poor” fromthe point of view of automated analysis bycomputers; it is of high-enough quality for physiciansto document and summarize the delivery ofhealthcare over multiple patient visits with differentphysicians. Many of the patients we have analyzedalready have electronic data documenting theirmedical histories for more than 5 years (some goingback even 20 years). Over time, exponentiallyincreasing electronic data will be available foranalysis for more and more patients. Analyzing thisdata will allow us to improve the healthcare ofindividual patients and also to mine new populationbased knowledge that can be used to developimproved healthcare methodologies.In Section 3, we introduce our solution forbridging the “Data Gap,” the REMIND algorithm forReliable Extraction and Meaningful Inference fromNonstructured Data. REMIND is a probabilisticframework for automatically extracting and inferringhigh-quality clinical data from existing patientrecords – namely, from patient data collected byhealthcare institutions in the day-to-day care ofpatients, without requiring any additional manualdata entry or data cleaning. We discuss the businessdecisions that influenced the design and developmentof the REMIND platform – namely, the need torapidly deploy REMIND in diverse healthcare ITsituations, for different clinical applications, and fordifferent diseases, and to easily plug in differentanalysis algorithms for natural language processingand probabilistic inference. In Section 4 we brieflyreview the details of the REMIND algorithm [59].Our goal is not to build a solution for a singleapplication (e.g., implement a particular Heart Failureguideline) but to build a general solution that supportmultiple different applications for different diseases.Although REMIND was initially developed forautomated guideline compliance, many other clinicalapplications are also supported by our solution, bothat the individual patient level and the populationlevel. These include automated methods for: therapyselection for individual patients [26]; patientidentification for clinical trials; data extraction forresearch [67]; quality assurance; and relatingfinancial and clinical factors [57].In Section 5 we describe a number of successfuldeployments of our solution for the variousapplications listed above. This section illustrates thatthe REMIND platform can be deployed on the entirerange of healthcare IT systems in use today, fromrelatively simple physician office systems, to some ofthe most complex hospital databases in existence.Further, our solution provides value in both chronicand acute care settings; can support all aspects ofphysician workflow (screening, diagnosis, therapyand monitoring) and healthcare administration; andprovide research support, both in academicinstitutions and for ongoing pharmaceutical andmedical device clinical trials [58][62]. The resultsprovided have been rigorously verified by cliniciansand scientists. In this paper we have focused solelyon cardiac applications from clinical data. REMINDis currently deployed across multiple disease areas ona total of over 5,000,000 patients.We review related research in the field ofmedicine and probabilistic inference in Section 6, Wediscuss some future applications of REMIND inSection 7, and conclude in Section 8 with ourthoughts on further research.2. MotivationSince 1990, more people have died worldwidefrom CVD than from any other cause. Clearly CVDis an international crisis; however, since allapplications described in this paper are from UShealthcare institutions, we focus on the United States.2.1. CVD in the United StatesIn the United States, an estimated 70 millionpeople have some form of CVD. CVD accounts forroughly one million deaths per year (38% of alldeaths), and is a primary or contributing cause in60% of all deaths[4][1]. CVD claims as many livesper year as the next 5 leading causes of deathcombined. Unfortunately, a number of trends suggestthat the problems of cardiovascular disease will onlybe exacerbated in the future. First, the aging of theU.S. population will undoubtedly result in anincreased incidence of CVD [9]. Second, there is anexplosive increase in the number of Americans thatare obese or have type 2 diabetes; these conditionsresult in increased cardiovascular complications.In addition to being a personal health problem,CVD is also a huge public health problem. In theUnited States, it is estimated that 394 billion will beIEEE 2005 International Conf. on Data Mining: Data Mining Case Studies WorkshopPage: 13

spent in 2005 on treatment and management ofcardiovascular disease.By comparison, theestimated cost of all cancers is 190 billion. By anymeasure, the burden of CVD is staggering.Most patients with CVD will never be cured;rather, their disease must be managed. Often, peoplewith CVD will live for 10 or 20 years after initialdiagnosis.A significant portion of the costsassociated with CVD comes about when the chronicdisease is not managed well, and the patient comes tothe emergency room of a hospital with an acutedisease, such as a heart attack or stroke. This isfurther exacerbated by the shortage in the number ofcardiologists in the United States.Of theapproximately 18,000 practicing cardiologists in theUS, over 5,000 are above the age of 55, and 400-500will retire every year, while less than 300 will enterthe workforce. This highlights the need to bettermanage CVD patients after diagnosis – particularly toprovide tools to help the overburdened cardiologistimprove the quality of care delivered to CVDpatients.2.2. CVD GuidelinesAs the problem of CVD has exploded, so hasmedical knowledge about how to best diagnose andtreat it. New diagnostic tests and therapies areconstantly being developed. These tests have showngreat promise for both improving the quality of lifefor the CVD patient, and reducing the burden ofhealth care by reducing the incidence of acuteepisodes. In an attempt to improve the quality of carefor patients, national health organizations, such as theAmerican Heart Association (AHA) and theAmerican College of Cardiology (ACC) have createdexpert panels to review the results of various clinicaltrials and studies, extract out best practices, and thencodify them into a series of guidelines. Theseguidelines attempt to assist the physician on how tobest treat patients with CVD. (This process is notunique to cardiovascular disease, but happens inevery branch of medicine.)Recent studies have shown that strict adherence tothese guidelines result in improvements at a personallevel, including reduced morbidity and mortality andimproved quality of life, as well as reduced costs tothe overburdened health care system. Based on thesestudies CMS (the Center for Medicare & MedicaidServices) has begun a series of programs to rewardphysicians and hospitals who comply with guidelinesin an attempt to improve guideline adherence. These“pay-for-performance” schemes are intended toprovide a direct financial incentive to healthcareproviders – in this case, CMS is working withhospitals to promote the adoption of the heart attackcomponent of the AHA and ACC cardiac treatmentguidelines, which recommend that physiciansprescribe a medicine called a beta blocker early afteran acute heart attack and continue the treatmentindefinitely in most patients. Beta blockers areprescription medicines that help protect the heartmuscle and make it easier for the heart to beatnormally. Despite being well-known, compliance tothis guideline in the U.S. is estimated to be below50%.There is overwhelming evidence showing thehuge benefits of following these guidelines, from theperspective of the patient, physician, hospital, andpublic health. Yet overall guideline adherenceremains woefully low. There are 3 principal factorswhich contribute to this lack of compliance.First, in recent years, there has been an explosionin guidelines. In the United States, the NationalGuideline Clearinghouse (www.guideline.gov) hasalmost 1000 guidelines for physicians to follow.These guidelines are often modified on a periodicbasis, such as every year, in response to new medicalknowledge. A quick search on Google or Med-Linefor heart failure guidelines returns several hundredreferences – some heart failure guidelines, 7][28].Second, with the growing trend of HMOs, and theeconomic realities of medicine today, physicians areforced to see more and more patients in a limitedamount of time. Often, physicians will only average10-18 minutes per patient, and carry a patient load of20-30 patients per day.1Third, there are often multiple physicians andnurses who interact with the patient, and there isoften poor communication between these health careworkers with regards to the patient. In such a hecticand chaotic environment, it is impossible to(manually) consistently and accurately identify andfollow the specific guidelines for that patient among110-20 minutes per patient appears reasonable, but itincludes all activities associated with the patient visit,including: reviewing previous patient history; talking withthe patient about their symptoms and history; examiningthe patient; arriving at a diagnosis; ordering additionaltests and procedures; determining what drugs the patient iscurrently taking; prescribing treatment and medication;explaining the diagnosis and treatment to the patient;counseling the patient on the risks and rewards of thetherapy; and ordering referrals if needed; this time alsoinclude time needed for the physician to record all thedetails of the patient visit including positive and negativefindings, impressions, orders, final instructions, and finallysigning off on the patient bill.IEEE 2005 International Conf. on Data Mining: Data Mining Case Studies WorkshopPage: 14

the hundreds of ever-changing requirements in use.Unless the proper clinical guideline is identified andfollowed at the point of care (that is, when the patientis with his physician), it is not useful.2.3. Electronic Health Records (EHR)The electronic health record (EHR) is increasinglybeing deployed within health care organizations toimprove the safety and quality of care[20]. Becausea guideline is simply a set of eligibility conditions(followed by a set of recommended treatmentactions) it appears fairly straightforward to determineguideline eligibility by evaluating a guideline’sinclusion and exclusion criteria against an EHR.Unfortunately, as discussed below and later inSection 5.3, even the best EHRs in the world do notfully capture the information needed to supportautomated guideline evaluation.Medical patient data in electronic form is of twotypes: financial data and clinical data. Financial dataconsists of all the information required to documentthe physician’s diagnoses and the proceduresperformed, and is collected primarily for the purposeof being reimbursed by the insurance company or thegovernment. Financial data is collected in a highlystructured, well-organized, and normalized fashion,because if it were not in this form, the payers wouldnot reimburse the institution or physician. This datacan, therefore, be analyzed, dissected, andsummarized in a variety of ways using wellestablished database and data warehousing methodsfrom computer science.In addition to structured information about patientdemographics, this “financial data” also includesstandardized patient diagnoses which are classifiedaccording to the internationally accepted standards,ICD-9 (International Classification of Diseases, 9thRevision [76]) and ICD-10 [77]. Many of the criteriaused to determine if a patient is eligible for (andtherefore should be treated according to) a particularguideline, are based upon diagnostic information.Therefore, it appears as if these structured diagnosiscodes would be a rich source for data mining, andparticularly for determining whether a patient waseligible for a particular treatment guideline.Unfortunately, these ICD-9 (and ICD-10) codesare unreliable from the clinical point of view. Variousstudies have shown that the clinical accuracy of ICDcodes is only 60%-80% [7]; in other words, when anICD code is assigned, the patient will have thatcorresponding clinical diagnosis only 60-80% of thetime. The principal reason for this is that billing datareflects financial rather than clinical priorities.In the United States, reimbursement is basedprimarily on the severity of diagnosis: for example,although the patient treatments for AMI (heart attack)and Unstable Angina (a less severe cardiac illness)are virtually indistinguishable, the former diagnosiscode generates twice the reimbursement for theinstitution. There have been several well-publicizedcases, where institutions have received hefty fines for“over-coding” (i.e., assigning higher diagnosis codesthan is justified). Alternately, billing codes may bemissing, or “under-coded”, so that institutions are notaccused by insurance companies of fraudulent claims.Furthermore, at least in the US, this coding is doneby medical abstractors, who although trained to dothis coding, typically lack the medical training toassess the clinical data and arrive at the correctdiagnosis.Clearly, financial data alone is insufficient for anykind of patient-level clinical decision support(including determining guideline eligibility), becausethe errors will multiply when multiple such diagnosesare jointly needed to make a decision (for instance todetermine eligibility for a guideline).Operational clinical systems have very poor dataquality from the standpoint of access and analysis.The structured clinical data in clinical repositories(labs, pharmacy, etc.) is sparse with gaps in data andin time, inconsistent due to variations in terminology,and can be clinically misleading. Key clinicalinformation is stored in unstructured form in theclinical repository, typically as unstructured free textin patient history and physicals, dischargesummaries, progress notes, radiology reports, etc.Further, the nature of the relationships within data arenot well defined, and causal relationships andtemporal dependencies cannot be unearthed withoutmedical knowledge; for example, it may not beimmediately clear to which diagnosis a procedure“belongs”. Efforts to extract key clinical informationbased on natural language processing alone have metwith limited success [44] – and for even slightlycomplex decisions like guideline eligibility,reliability is very poor. Simply put, the data inclinical repositories is often messy, and thus only asmall fraction of the clinical data is available foranalysis.IEEE 2005 International Conf. on Data Mining: Data Mining Case Studies WorkshopPage: 15

C lin i c a l D a ta /O b s e r v a ti o n sM a in l yu n s tr u c t u r e dF in a n c eF a c ilit ie sS u p p lie sHRG apM o n ito rsW a v e fo r m sC lin i c

The Data Mining Practice Prize will be awarded to work that has had a significant and quantitative impact in the application in which it was applied, or has significantly benefited humanity. All papers submitted to Data Mining Case Studies will be eligible for the Data Mining Practice Prize, with the exception of members of the Prize Committee.