Sharing Sensitive HealthData in a Federated DataConsortium ModelAn Eight-Step GuideINSIGHT REPORTJ U LY 2 0 2 0
Contents3Foreword4Introduction6Step 1 Establish and sustain trust9Step 2 Jointly determine the problem for a federated approach11Step 3 Align on incentives and organizational capacity13Step 4 Identify resourcing – team leadership and funding15Step 5 Identify institutional differences or gaps in policy17Step 6 Create a consortium governance model20Step 7 Structure the data21Step 8 Deploy the API 24Endnotes 2020 World Economic Forum. All rightsreserved. No part of this publication maybe reproduced or transmitted in any formor by any means, including photocopyingand recording, or by any informationstorage and retrieval system.Sharing Sensitive Health Data in a Federated Data Consortium Model2
July 2020Sharing Sensitive Health Data in aFederated Data Consortium ModelAn Eight-Step GuideForewordAccessing sensitive health data atscale will advance research, innovationand patient outcomesGenya DanaHead of HealthcareTransformation, Shaping theFuture of Health and Healthcare,World Economic ForumAt the World Economic Forum, we think of dataas the oxygen that fuels the fire of the FourthIndustrial Revolution. It is readily available andnecessary but, if used improperly, it can generatedangerous and unwelcome results. Concerns overhow to protect valuable data, especially sensitive,personal data, are at the core of many countries’and institutions’ data policies. We see a complexand dynamic data policy landscape evolving aroundhealth data in particular; it is becoming more andmore complicated to share data to the extentdesired to advance research, innovation and patientoutcomes. The need to rapidly provide access tohealth data, while protecting patient privacy anddata security, has never been more urgent in thefight against the COVID-19 pandemic.This paper is part of the Forum’s work to createactionable resources for policy-makers, healthcareprofessionals and leaders of the Fourth IndustrialRevolution to navigate complex and sensitivehealth data policies globally. The Forum is testing afederated approach – where data sets are accessedremotely without movement of data from its securelocation of origin – as a practical way to access thedisparate genomic and health data sets needed toaccelerate the diagnosis of rare disease in patientsin four countries. Federated data systems are notnew per se, but they are starting to be used morefrequently as a solution to accessing multiplying,disparate data repositories in a multinational andmulti-jurisdictional world. Being able to quickly andArnaud BernaertHead, Shaping the Futureof Health and Healthcare,World Economic Forumsecurely access disparate data sets accelerates theability to gather insights and inform care decisionsfor precision medicine approach, which uses datato drive more personalized and tailored diagnosisand treatment of disease in patients.Offering practical advice on how to build afederated data consortium is only possible with thepartners in the Breaking Barriers to Health Dataproject, key to the Forum’s precision medicineportfolio of projects. Four genomics institutions inCanada, Australia, the United Kingdom and theUnited States have worked tirelessly to have thedifficult conversations and build the governancemodel that inform this eight-step guide. Weapplaud their leadership. This guide also formsa critical input to the Forum’s Data for CommonPurpose Initiative focused on new models of datagovernance in the Fourth Industrial Revolution.A recently released Roadmap for CrossBorderData Flows: Future-Proofing Readiness andCooperation in the New Data Economy expresslyrecommends that governments should recognizeFederated Data Learning as a valid meansof cross-border data (insight) sharing andshould not be blocked by legislation. Proactiveefforts will be needed to motivate governmentofficials, business leaders and civil societymembers to establish real-world pilots and toenable continuous and active experimentationwith federated data systems, particularly insituations where they are most valuable.Sharing Sensitive Health Data in a Federated Data Consortium Model3
IntroductionAccessing global health data throughfederated consortiums will revealdisease causes and curesIn the current era of the Fourth Industrial Revolution,data is our most valuable resource.1 The five leadingcompanies of our time – Alphabet, Amazon, Alibaba,Facebook and Microsoft – rely on data to fuel theirsuccessful enterprises. Data is also a resource inthe healthcare ecosystem that can improve thestandards, quality and outcomes of healthcare andhealthcare delivery for patients worldwide.But just how are health ecosystems using data? Asvolumes of healthcare data increase, genomic dataand other types of sensitive health data provide atreasure trove of information on how to diagnose,treat and generally manage the most complex andBOX 1destructive diseases – but only if we can look atdata across the global population.Genomic data is a particularly valuable type ofhealth data because it represents the hereditarymaterial in humans (and almost all organisms)called deoxyribonucleic acid (DNA), which storesthe “master code” dictating how our bodiesoperate. More than 99% of genetic code is thesame in all people, making it difficult to pick out“glitches” or specific small differences in the geneticcode useful for research, diagnosis and treatmentof disease without ways to comb through largeamounts of data.Why genomic data?Genomic data represents our shared DNA andcan be broken down into a machine-readableformat in a process called genetic sequencing.During genetic sequencing, DNA is brokendown into its four chemical bases (adenine,guanine, cytosine and thymine) for analysis. Eachhuman DNA consists of about 3 billion bases.2Every human being has such DNA representedby billions of bases, but it is only possible tounderstand more about our shared DNA and,more importantly, how our DNA impacts or evenpredicts our health by mode of comparison usinglarge volumes of DNA. This is because more than99% of bases are the same in all people, makingany differentiation more difficult to discern insmaller data sets. In contrast to a base, a geneis the unit by which an individual’s one-of-a-kindcombinations of DNA bases are inherited. Genescan vary in size from a few hundred DNA bases tomore than 2 million bases per gene.3Aggregating large genomic data sets in ways thatresearchers and clinicians can use to improvepatient outcomes is complicated, in part dueto the flood of genomic data from national andinstitutional genetic sequencing efforts. The humangenome (your genome is the sum of the DNAin your body or the sum of your genetic data)represents roughly 100,000 digital photos. It nowtakes approximately a day to sequence most ofBoth in the sheer scale of genomic data andin the complex health data policy regulatorylandscape, aggregating such data to improvepatient outcomes is complicated. The humangenome (your genome is the sum of the DNAin your body or the sum of your genetic data)represents roughly 100 gigabytes (GB) ofdata, which is equivalent to the size of about100,000 digital photos. In 2011, our sequencingcapacity hit 13 quadrillion bases, which was theequivalent of two miles of stacks of DVDs in datastorage (which were used for storage in this erabefore data storage moved to the cloud). By2018, however, the human genome (roughly 3billion bases) fit on a single DVD disk – ratherthan on the hundreds of discs spanning twomiles in 2011.4 Storing the human genome isprogressively getting easier, smaller in size andcheaper. Comparing genomic data to SiliconValley’s Moore’s Law, which states that computersdouble in speed but half in size every 18 months,genomic data is outpacing Moore’s Law by afactor of four in storage size.5the genome of one person, and several hundreddollars, compared to 13 years and 1 billion in2003. Countries and institutions are sequencinghundreds of thousands of people. In 2018, the UKannounced the completion of 100,000 sequencesfrom National Health Service patients. Accessingall of this data, however, remains a challenge dueto a complex landscape of data protection lawsand health data privacy regulations.Sharing Sensitive Health Data in a Federated Data Consortium Model4
Federateddata systemsare a promisingway to enableaccess to healthdata, includinggenomic data,that must remaininside a country orinstitution becauseof their sensitivity.The World Economic Forum’s Global PrecisionMedicine Council, in its May 2020 PrecisionMedicine Vision Statement, cited the gap in datasharing and interoperability as key to preventingthe wider adoption of a more personalizedapproach to healthcare.6 Precision medicinedepends on the availability of health data in theaggregate. For genomic data in particular, thecosts of storage and analysis are usually moreexpensive than the lab costs of sequencing. Thecost to store, process and analyse the data canbe justified in the global patient interest if the datacan be used beyond its initial diagnostic capacityfor a single patient.7 Accessing and using sensitivehealth data and genomic information to its fullpotential requires care and creativity, with stronggovernance protocols to guide this process.To tackle the challenge of governance of crossborder access to health data, the World EconomicForum led the Breaking Barriers to Health Dataproject, from July 2018 to July 2020. The projecttested how a distributed federated data systemcould be set up and run sustainably acrosscountries with clear governance optimizing foroperational efficiency, patient privacy and datasecurity. Federated data systems are a promisingway to enable access to health data, includinggenomic data, that must remain inside a countryor institution because of their sensitivity. Althoughexamples of federating health and genomic datasets are growing, how to practically create thefederated data system with a group of institutionswas not as clear.8FIGURE 1Allowing access to data sets is not particularlydifficult technically, but there are larger challengesin how to form the necessary relationships betweeninstitutions that enable trust and transparency, andsustained, predictable operations in a consortiummodel. In close partnership with Australia (theAustralian Genomics Health Alliance), Canada(Genomics4RD), the United Kingdom (GenomicsEngland) and the United States (IntermountainHealthcare), the Forum created and led amultistakeholder community that supported theseinstitutions through the journey of determining howto maximize the benefits and minimize risks offederating genomic data to diagnose rare diseases.9In order to federate data, a consortium ofinstitutions must be formed. As outlined in Figure1, this eight-step guide distils the learnings fromthe Breaking Barriers to Health Data project’swork to set up a federated data consortium forthe purposes of diagnosing rare disease usinggenomic data from a global, distributed dataset. Other institutions are also encouraged toadapt this federated data consortium model foradditional use cases. Before creating such a dataconsortium leveraging sensitive health data, it iscrucial to carefully plan for such a consortium andmeticulously consider how to effectively craft – andimplement – clear governance structures. Globalfederated data consortiums provide a tremendousopportunity to improve patient outcomes andhealthcare delivery pathways but also requirerobust security, continually improving policy toprovide safeguards against bad actors, databreaches or other types of preventable risk.Eight steps to follow to build a federated data consortium187Step 1:Establish Trust2Step 8:Deploy the TechnologyStep 7:Structure the Data6Federated DataConsortiumStep 6:Create Governance Model5Step 2:Define Problem34Step 3:Align IncentivesStep 4:Identify ResourcesStep 5:Identify Institutional GapsSharing Sensitive Health Data in a Federated Data Consortium Model5
Step 1Establish andsustain trustGenerating trust is more important thanever and requires the right partners,thorough relationship building and supportfrom leadership teamsThe first step, and the singular component thatappears to make or break a federated dataconsortium, is establishing trust with identifiedprospective partners entering a data consortium.Establishing trust between partners is also themost time-consuming component in establishing asuccessful data consortium.The creators of a new data framework called Trust ::Data Consortium – which include the MassachusettsInstitute of Technology, United Nations, WhiteHouse Cybersecurity Initiative and the Forum –argue that today’s social structures do not readilyaccommodate the new reality of integrated systemsthat can leverage autonomous, dynamic, digitalfeedback mechanisms. Our social structuresstruggle to adapt to digital methods, which canilluminate trust between data-sharing systemsby transparently tracking when and how data isaccessed or exchanged.10 In other words, despitemany technical solutions designed to encouragetrustworthy behaviour between data-sharingpartners once a consortium is up and running,establishing trust at the beginning of the relationshipis nevertheless contingent on our everyday socialstructures and perceived social relationships.1.1 Identify consortium partnersBefore beginning to form social relationships withpartners, however, it is important to select thecorrect partners for a data consortium. Identifyingthe best partners requires understanding ofanother institution’s origin, strategic goals andits research objectives for prospective dataconsortium partners – and whether or not thesealign with similar metrics from your institution. Athorough vetting process at the beginning of therelationship cannot be facilitated with a quickwebsite check or even a phone call but requiresa series of in-person meetings.
Step 1 Establish and sustain trust Step 2 Jointly determine the problem for a federated approach Step 3 Align on incentives and organizational capacity Step 4 Identify resourcing – team leadership and funding Step 5 Identify institutional differences or gaps in policy Step 6 Create a consortium governance model Step 7 Structure the data Step 8 Deploy the API technology Conclusion Appendix .