Pekka Neittaanmaki Dean Of The Faculty Of Information Technology .


Pekka NeittaanmakiDean of the Faculty of Information TechnologyProfessor, Dept. of Mathematical Information Technology.University of Jyvaskyla.Anthony OgbechieService Innovation ManagementUniversity of Jyvaskyla.

Contents1. How does precision medicine become a reality? The Semantic Data Lake for Healthcaremakes it possible . 32.Medical technologist drives semantic data lake development. 33.Montefiore Semantic Data Lake Tackles Predictive Analytics . 44.Semantic Big Data Lakes Can Support Better Population Health . 45. “Data Lake as a Service” Enables Internet of Things, Precision Medicine . 56. Semantic Computing, Predictive Analytics Need Reliable Metadata . 57. Partners Data Lake Offers Healthcare Analytics as a Service . 68. Semantic Data Lake Delivering Tacit Knowledge - Evidence based Clinical Decision Support. 69. Hadoop, Triple Stores, and the Semantic Data Lake . 710. Medical Insight Set to Flow from Semantic Data Lakes . 711. Semantic graph database underpins healthcare data lake . 812. Data Lakes Get Smart with Semantic Graph Models . 813. Semantic Data Lakes and the Advance of Medicine . 914. Semantic Data Lakes Dives In For Healthcare. 1015. The Data Lake Concept Is Maturing . 1016. The Potential of Data Lake Technology in the Healthcare Industry . 1117. 6 WAYS TO PROTECT MEDICAL DATA LAKES AT THE FACILITY LEVEL . 1118. Dealing with Big Data: The ascendency of data lakes . 1219. Making Data Lakes Usable: Why we need a semantic layer – and why it should be open source. 1320. Implementing Personalized, Precision Medicine with Artificial Intelligence and SemanticGraph Technology . 1321. Big Data and Healthcare Payers. 1422. The Bright Future of Semantic Graphs and Big Connected Data. 1422. Empowering personalized medicine with big data and semantic web technology: Promises,challenges, and use cases . 1523. How the Search for Smart Data Drives Healthcare IT Investment . 1524. Data Lake Management: Do You Know the Type of “Fish” You Caught? . 16

1. How does precision medicine become a reality? The SemanticData Lake for Healthcare makes it possibleOne of the prominent problems plaguing the current healthcare system is the narrow scope ofpatient data used to facilitate most aspects of care, from initial diagnoses to treatment.According to Dr. Parsa Mirhaji, director of clinical research informatics at Montefiore HealthSystem and Albert Einstein College of Medicine, the vast majority of research findings are basedon averages of middle-aged white males: “We don’t really know much about women, otherethnicities, children, you name it—there’s no evidence,” he says.The White House launched The Precision Medicine Initiative in 2015 as a means of redressing thesituation and expanding the breadth of patient data to personalize treatment for individualsand historically underrepresented groups. Achieving that objective requires not only amassingpatient-specific data for wider demographics, but also storing, accessing and analyzing themwith a number of avant-garde data management technologies t-possible-114312.aspx2. Medical technologist drives semantic data lake developmentA pivotal magazine article helped point medical doctor Parsa Mirhaji along a path to a semanticdata lake for healthcare analytics applications, using Hadoop, RDF, graph databases and e-development

3. Montefiore Semantic Data Lake Tackles Predictive AnalyticsThis new approach to analytics eschews the rigid, limited capabilities of the traditionalrelational database and instead focuses on creating a fluid pool of standardized data elementsthat can be mixed and matched on the fly to answer a large number of unique queries.Montefiore Medical Center, in partnership with Franz Inc., is among the first healthcareorganizations to invest in a robust semantic data lake as the foundation for advanced clinicaldecision support and predictive analytics lytics4. Semantic Big Data Lakes Can Support Better PopulationHealthAs healthcare providers navigate the treacherous transitional waters of Stage 2 and try topredict how future regulations will shape their actions, the need to lay the groundwork foradvanced population health management and accountable care is only becoming clearer.No matter what the outcome of debates about the future course of the EHR IncentivePrograms, one thing remains abundantly clear for organizations of all shapes and sizes:advancements in healthcare big data analytics will not be driven solely by rules and mandates,but by the pressing financial need to collect, corral, understand, and leverage information inorder to refine and expand population health management techniques.

ta-lakes-can-support-better-populationhealth5. “Data Lake as a Service” Enables Internet of Things, PrecisionMedicineCan data lake technology simplify the development of the Internet of Things, create awelcoming environment for precision medicine, and change the way providers approach bigdata sionmedicine6. Semantic Computing, Predictive Analytics NeedReliable MetadataReliable metadata is the key to leveraging semantic computing and predictive analytics forhealthcare applications, such as population health management and crisis care.

ing-predictive-analytics-need-reliablemetadata7. Partners Data Lake Offers Healthcare Analytics as a ServiceBig data analytics is a key component of the complex transition, and Partners already has ahistory of success with innovative clinical decision support tools like QPID.In order to add to their analytics toolkit while continuing to attract world-class clinical researchand technical development talent to the Boston area, the health system is pairing its new EHRwith the Integrated Data Environment for Analytics (IDEA) platform.This tool, developed in conjunction with EMC, is geared towards researchers andinvestigators working on everything from precision medicine projects to developing apps forclinical decision support and patient . Semantic Data Lake Delivering Tacit Knowledge - Evidencebased Clinical Decision SupportCan the complexity be removed and tacit knowledge delivered from the plethora of themedical information available in the world.“Let Doctors be Doctors"Semantic Data Lake becomes the Book of Knowledge ascertained by correlation andcausation resulting into Weighted Evidence

delivering-tacit-knowledge-evidenceboray9. Hadoop, Triple Stores, and the Semantic Data LakeHadoop-based data lakes are springing up all over the place as organizations seek low-costrepositories for storing huge mounds of semi-structured data. But when it comes to analyzingthat data, some organizations are finding the going tougher than expected. One solution to thedilemma may be found in Hadoop-resident graph databases and the notion of the semanticdata lake.Despite their growing popularity, data lakes have taken a bit of heat lately as analyst firmslike Gartner call into question their long-term viability. Without a way to organize theschemaless data that people are shunting into Hadoop en masse, the data lakes risk becomesconvoluted quagmires, where data goes in and nothing useful comes ple-stores-and-the-semantic-data-lake/10. Medical Insight Set to Flow from Semantic Data LakesThe potential for data analytics to disrupt healthcare delivery is large, and getting larger by theday. But in many cases, the need to hammer data into a structured format creates a barrier toproductivity. Now a hospital chain in New York City is hoping to change that by adopting aHadoop-based semantic data lake.Located in the Bronx, Montefiore Health System is the first hospital to implement a semanticdata lake as part of the New York City Clinical Data Research Network (NYC-CDRN), anassociation of seven hospitals in the NYC area that are sharing data. As the pioneer, Montefiore

is working with several technology providers, including Intel and Franz, to test a big data systemcapable of delivering precision medicine.Most Datanami readers are familiar with the term “data lake,” but a “semantic data lake”provides an interesting twist on the familiar concept. According to Franz CEO Jans Aasman, asemantic data lake employs a combination of technologies, including Hadoop, graph analytics, asemantic “triple store,” the SPARQL query language, and Spark-based machine learning, toallow doctors to connect the dots between patient conditions and a world of knowledgecontained in structured internal systems, as well as unstructured data sources outside of kes/11. Semantic graph database underpins healthcare data ke12. Data Lakes Get Smart with Semantic Graph ModelsBeen swimming in a data lake recently? Perhaps not, as many companies still are just dippingtheir toes into these waters, as they become more familiar with the general idea of a data lake.As research firm Gartner describes it, a data lake is:“a collection of storage instances of various data assets additional to the originating datasources [and whose purpose is to] present an unrefined view of data to only the most highlyskilled analysts, to help them explore their data refinement and analysis techniques

independent of any of the system-of-record compromises that may exist in a traditional analyticdata store (such as a data mart or data warehouse).”Even as enterprises consider the returns they may expect from diving deeper into data lakes,they’re now being exposed to another twist on the concept. Enter the smart data lake, alsoknown as the semantic data lake. At DATAVERSITY’s Smart Data Conference in San Jose in August,the issue came up in sessions including presentations by Cambridge Semantics CTO Sean Martinand FranzCEO Jans Aasman. Franz, in fact, notes that it has copy written the term Semantic DataLake, and points out that Gartner also has explained that data lakes need semantics in order tobe usable by a broad set of art-with-semantic-graph-models/13. Semantic Data Lakes and the Advance of MedicineThis article charts the long path of Dr. Parsa Mirhaji and his work to bring Semantic Data Lakesand healthcare analytics together: “Mirhaji saw something in the story [from 2001] that spoketo a practical problem he had encountered in searching for and saving information on medicaladvances, and it set the tone for research and analytics work he continues to do now.””I was being challenged to do analytics on heterogeneous data sets from across the Web,” hesaid. “I started to learn about artificial intelligence and software agents and how they could‘grab concepts’ from the Web or databases.”It continues with a much deeper discussion of how Dr. Mirhaji began to integrate his ideas intoreal-world practice, “Mirhaji said the data lake system is still in training mode, being tested outon some specific analytics tasks. It takes in all sorts of genetic, population, health and wellnessdata; that includes data from the U.S. Census, clinical trials and patient records – for example,heart rate, temperature and blood pressure measurements collected by patient mantic-data-lakes-and-the-advance-of-medicine/

14. Semantic Data Lakes Dives In For HealthcareThe Healthcare industry has been having a hard time trying to save time and dollars toimproving patient outcomes for healthcare organizations through the use of data analytics.Many health systems find it difficult capturing and using the data from its patients to make areal impact on their businesses, especially considering the enormous amount of redundanciesand unanswered questions when it comes to dealing with healthcare.“Making sense out of big data is a challenge, particularly in the healthcare industry whereinformation comes from a variety of sources and in different forms including structured,unstructured, images, temporal, geo-location and signal data,” says Jans Aasman, PhD, CEO ofFranz, Inc., specializing in semantic web technologies.The solution may have finally appeared in the form of SDL’s (Semantic Data Lakes) saysAasman. In collaboration between The Montefiore Medical Center in Bronx, New York, Franz,Inc., Intel, Cloudera, and Cisco, SDL is a system that allows you to get all the data together fromdifferent silos for analytics. It will then be used to transform statistical databases, such asspreadsheets, into interactive graph databases that can be used to make better informed andpredictive healthcare decisions. 5. The Data Lake Concept Is lake-concept-is-maturing/fulltext

16. The Potential of Data Lake Technology in the HealthcareIndustryData lake technology has been named the future of big data. Unlike data warehousingtechniques, data lakes offer a different type of data management better suited to handling andcollaborating various forms of data in a way which doesn’t confuse systems. But how exactly dodata lakes work? What are their benefits in the healthcare system?Data lakes embrace a semantic approach to data storage, retaining enormous amounts of rawinformation (whether structured or unstructured) in its original state within a single centralisedlocation. This technology, often referred to as graph databases, enables analysts to select whichdata to use when necessary, giving them the opportunity to reuse it when required. Analystsalso have the option of combining seemingly incompatible sources of information in wayspreviously thought impossible with data ndustry17. 6 WAYS TO PROTECT MEDICAL DATA LAKES AT THE FACILITYLEVELWith the global push towards putting EHRs and the unified medical record in the cloud,hospitals and clinics are finally getting on board this big data ride. EHRs (Electronic HealthRecords) and Patient Portal use has been rising steadily over the past three years with moreand more patients having access to their medical data thru Patient Portals and mobile devices.As providers finally begin to roll out their EHR and Patient Portal programs, both patients and

doctors are enjoying the new found access to the data they need to make medical decisions,and it’s literally at their fingertips. This has given the patients a new sense of ownership of theirmedical data and perhaps a feeling of control over their destiny – especially in an environmentwhere ACA has left them feeling somewhat limited. In today’s connected society, medical dataof all kinds are being gathered and gleaned from a myriad of sources; from EHRs to mobiledevice apps to telemetry at the office level and to test results of all kinds. Medical data lakes,collections of unstructured medical data, are growing and thriving; but what is the risk of thisaccess? What are the purposes of data lakes and how can we protect ct-medical-data-lakes/18. Dealing with Big Data: The ascendency of data lakesThe data lake concept occupies a central place of prominence in contemporary big datainitiatives. The past two years have unveiled numerous headlines, vendor solutions (includingrepackaging of former solutions) and enterprise use cases for the utility of this centralizedapproach for accumulating, analyzing and actuating big data.The fervor for this method of managing big data is based on a simple premise that promisesvalue for organizations regardless of size or vertical industry. Data lakes provide a singularrepository for storing all data – unstructured, semi-structured and structured – in their nativeformats, granting access and insight to all without lengthy IT preparation.Moreover, the data lake movement is largely spurred by adoption rates for Hadoop. AsHadoop’s presence increases, its function as an integration hub for all data delivers morecredence and traction to the notion of data lakes. The data lake concept may be relatively new,but the association of Hadoop and big data is nearly as ubiquitous as big data itself.The combination of these two factors, Hadoop’s deployment as a data lake and the storage andaccess benefits this method produces, is largely responsible for the widespread attention datalakes have garnered. A recent post from Gartner reveals that data lake interest is “becomingquite widespread.” Forbes indicates that “one phrase in particular has become popular for themassing of data into Hadoop, the ‘Data ith-big-data-the-ascendency-of-data-lakes/

19. Making Data Lakes Usable: Why we need a semantic layer –and why it should be open sourceBig Data platforms, like Cloudera’sData Hub, Hortonworks’ DataPlatform, MapR’s Converged Data Platform, and others including IBM, Oracle,Pivotal, promise to easily bring diverse sets of application data together into one data clusterrunning on Hadoop. This collection of data sets is called a data lake.It is a wonderful thing to be able to bring data together so easily with Hadoop’s schema on readability rather than the schema on write required by traditional database systems. But we needto remember that along with the data comes all the data’s associated data problems. Bringingdata into the data lake does not, unfortunately, wash away all the problems of non-standard,non-integrated, redundant, and inconsistent data that are buried in application data.For data lakes, with great ease of data access comes a great need for data management. Butthis not what I want to talk about today (perhaps I’ll do so in a later blog post). I want to talkabout how we need to make it easy for users to access data in a data lake when there is nonintegrated and redundant it-should-be-open-source/20. Implementing Personalized, Precision Medicine with ArtificialIntelligence and Semantic Graph TechnologyPersonalizing healthcare services for individuals creates several demands on data-drivenfunctions in the medical field. Healthcare organizations are tasked with integrating structured,unstructured and semi-structured data, storing and cataloging them in relevant ways across usecases and locations, and leveraging emerging AI techniques for predictive capabilities whichcould potentially save lives.Most of all, this process must occur in time to make a difference for patients.According to Montefiore Health System System Senior Vice President and Chief Medical OfficerAndrew Racine, who spoke at a recent event for the unveiling of Intel’s Xeon ScalableProcessors, all of these measures must be implemented so providers can: “use information inreal time to make clinical decisions that are going to allow us to intervene with patients andprevent them from having adverse outcomes.”Montefiore is currently engaged in such an undertaking with a Semantic Data Lake forHealthcare(SDL). The SDL is powered by Franz’s AllegroGraph, architected by Intel, and fortifiedby Cloudera’s Hadoop distribution. By merging a unique set of data management techniques

with some of the most pertinent technologies across the data landscape, Montefiore is seekingto tailor its medical treatment and diagnoses for individual telligence-semantic-graph-technology/21. Big Data and Healthcare PayersWith the implementation of the Affordable Care Act, the advent of Healthcare InformationExchanges (HIE), the introduction of new provider models, such as Accountable CareOrganizations (ACO), and the transition to a more member-centric relationship model,Healthcare Payers face seismic changes in their business models. As with many large-scale,business transformations, there are challenges to navigate as well as opportunities to realizearound improving patient outcomes, reducing cost, and increasing revenue. Capitalizing onthese opportunities will depend on an organization’s capability to leverage information. Theability to capture, integrate, and interrogate large information sets will be foundational inrealizing objectives, such -healthcare-payers/22. The Bright Future of Semantic Graphs and Big Connected DataThe big data revolution is generating a mess of unruly data that’s difficult to parse andunderstand. This is to be expected–explosions don’t generally occur in a nice, orderly fashion,after all. But if the folks at Cloudera and Franz have their way, the world of connected data willbecome more accessible and useful when viewed through the lens of semantic graphtechnologies.Semantic graph technology is shaping up to play a key role in how organizations access thegrowing stores of public data. This is particularly true in the healthcare space, whereorganizations are beginning to store their data using so-called triple stores, often defined bythe Resource Description Framework (RDF), which is a model for storing metadata created bythe World Wide Web Consortium /

22. Empowering personalized medicine with big data andsemantic web technology: Promises, challenges, and use casesIn healthcare, big data tools and technologies have the potential to create significant value byimproving outcomes while lowering costs for each individual patient. Diagnostic images, genetictest results and biometric information are increasingly generated and stored in electronichealth records presenting us with challenges in data that is by nature high volume, variety andvelocity, thereby necessitating novel ways to store, manage and process big data. This presentsan urgent need to develop new, scalable and expandable big data infrastructure and analyticalmethods that can enable healthcare providers access knowledge for the individual patient,yielding better decisions and outcomes. In this paper, we briefly discuss the nature of big dataand the role of semantic web and data analysis for generating “smart data” which offeractionable information that supports better decision for personalized medicine. In our view, thebiggest challenge is to create a system that makes big data robust and smart for healthcareproviders and patients that can lead to more effective clinical decision-making, improved healthoutcomes, and ultimately, managing the healthcare costs. We highlight some of the challengesin using big data and propose the need for a semantic data-driven environment to addressthem. We illustrate our vision with practical use cases, and discuss a path for empoweringpersonalized medicine using big data and semantic web ument/7004307/?reload true23. How the Search for Smart Data Drives Healthcare ITInvestmentThere’s no question that the healthcare industry has become extremely “data rich” over thepast few years. Thanks to the rapid pace of electronic health record adoption, the vast majorityof healthcare organizations are now sitting on an enormous nest egg of big data, includingpetabytes of clinical, administrative, demographic, and even genomic data on thousands ormillions of their patients.But having data and knowing how to use it are two very different things. Despite the keeninterest in adopting a growing collection of big data analytics, predictive analytics, and riskstratification tools, few organizations have really cracked the secret of how to turn a wealth offresh, unfiltered data into the spendable coin of actionable information.Some fall into the trap of buying new products to solve each individual problem as it arises, notrealizing that they are creating a patchwork of competing technologies, or developing ad hocworkarounds and an endless array of user interfaces that produce more headaches than /08/search-smart-data-drives-healthcareinvestment/

24. Data Lake Management: Do You Know the Type of “Fish” YouCaught?“Different types of fish live in a community and when you understand theirrelationship to each other, you have a better chance of catching what youwant.” thompsonadvertisinginc.comYou navigated your way to the lake and read up on the fundamentals of fishmanagement and introduced the data lake management principles. As you drove up tothe lake, you were thinking about what type of fish are you looking to catch or areyou looking for a trophy fish or to eat a fish? Without knowing the type of fish you’regoing to catch, you don’t know how far off-shore you have to boat for a catch.The primary charter for any data lake initiative is the ability to catalog all the data,enterprise-wide regardless of form (variety) or where it’s stored, whether on Hadoop,NoSQL or an enterprise data warehouse, along with the associated business,technical, and operational metadata. To carry on with our analogy, cataloging fishinto off-shore, near-shore or bottom fish can determine the type of fish you catch andhow far out you go fishing.The catalog must enable business analysts, data architects, and data stewards toeasily search and discover data assets, data set patterns, data domains, data lineageand understand the relationships between data assets – a 360 degree view of thedata. A catalog provides advanced discovery capabilities, smart tagging, data setrecommendations, metadata versioning, a comprehensive business glossary, and drilldown to finer grained data-lake-management-know-type-fishcaught/#fbid L81Agt5pe2b

Located in the Bronx, Montefiore Health System is the first hospital to implement a semantic data lake as part of the New York City Clinical Data Research Network (NYC -CDRN), an association of seven hospitals in the NYC area that are sharing data. As the pioneer, Montefiore is working with several technology providers, including Inteland