Data Governance Decisions For Platform Ecosystems

Transcription

Proceedings of the 52nd Hawaii International Conference on System Sciences 2019Data Governance Decisions for Platform EcosystemsSung Une LeeData61, CSIRO2University of New South WalesSungune.Lee@data61.csiro.au1Liming ZhuData61, CSIRO2University of New South WalesLiming.Zhu@data61.csiro.au1AbstractPlatform ecosystem has become an informationsystem research subject after many years of industrysuccess. The concept of platform ecosystem facilitatesfast and self-growing of a platform by encouragingdata contribution/consumption of multiple networks,and thus the importance and value of data in platformsis accentuated. It is essential to understand how datashould be managed in platform ecosystems where thereis complicated relationships between multipleparticipating groups. However, this topic has beenrarely addressed in industry and academia. Industrygovernance frameworks focus on organizational data,and prior research on platform ecosystem is still inearly-stage. To response to the limitation, we proposecritical data governance decisions for platformecosystems, and discuss how they have to beimplemented in practice. This study supports rightdecision making about data, and facilitates a secureplatform ecosystem. We perform a case study toillustrate the practical implications of this study.1. IntroductionThe Facebook-Cambridge Analytica data scandaltoday is one of the hottest topics in the IT press. Anumber of news articles report that this scandal affectsthe share prices and reputation of Facebook. It raisespublic awareness of the business risks caused by dataabuse or misuse. This concern has been highlighted forsome time in both academia and industry.A platform ecosystem (PE) can reach critical massby data contribution from multiple external parties [1].The collected data is analyzed or shared to add value tothe PE, and used by the platform owner, partners orfamily companies and users. Such complicatedinteractions between multiple parties providing, usingor sharing data may arise data abuse or misuse. PEsneed to impose certain regulations to mitigate risksresulting from the use of data by multiple parties [2].Data governance refers to comprehensive control,including processes, policies and structures about dataURI: https://hdl.handle.net/10125/60072ISBN: 978-0-9981331-2-6(CC BY-NC-ND 4.0)Ross JefferyData61, CSIRO2University of New South WalesRoss.Jeffery@data61.csiro.au1assets. Data governance for PEs has to orchestrate thecomplicated processes and relationships affected bymultiple parties’ participation [3]. Lack of or poorimplementation of data governance can lead to unclearownership and access rights of data contributors andinvisible use of data [4]. Existing governanceframeworks deal with general concerns for anenterprise where there is simpler and clearer dataownership and limited use of data. Those concernshave been articulated by a number of studies [5-8].However, prior studies have been less focused on dataand data governance in PEs [9], and there is a lack ofan understanding how data governance should bemanaged such as what are the impact area of datagovernance decisions for a viable and sustainable PE.In the previous study [5], data governance factors forPEs are identified. We here focus on what decisionsshould be made and how they should be implementedfor practical data governance based on the factors. Thedecisions and practices can be used by practitionerswhen they improve existing data governance or designnew one. For researchers, this paper delivers broadinformation and knowledge of PE and datagovernance. Through a case study, we validate thetheoretical concepts discussed in this paper. Weidentify how the theoretically important governancedecisions are addressed in the real world, and illustratethe practical implications of this study.Next section provides broad information to supportunderstanding of PE and data governance. Section 3describes the methodology of this study. We thendiscuss data governance decisions and managementpractices. The result of a case study is presented insection 5. We conclude this study in section 6.2. BackgroundThere are multiple types of governance such asIT/information/data governance. IT governancesupports right decision making about IT assets toensure IT investments support business objectives, butdata governance focuses on data assets [10]. The terminformation governance is often used in the same sensePage 6377

as data governance by some authors [11], but itaddresses information issues rather than individual datapieces [12]. IT/information governance, however, oftenincludes data governance [11]. Thus, data governanceshould align with the goals and concepts of higherlevel governance [10]. A goal cascading mechanism inindustry governance frameworks shows thatstakeholder’s needs, enterprise goals, IT-related goalsand information/data level goals must be aligned [13].A PE is defined a platform which constitutes two ormore sided networks transacting with each other [3]. Itallows interactions between multiple groups byproviding a meeting place [14]. It is regarded as thebuilding blocks that act as a foundation upon which anarray of firms can develop complementary products,technologies or services [15]. For example, YouTubehas a group which provides videos. The other groupwatches the videos. The groups facilitate variousbenefits and grow by providing data by themselves [1].Every PE collect data from the participating groupswhich contribute data such as content or non-contentlike logs, and uses/shares the collected data. The mainpurpose of the use of data can be different according tothe platform type (e.g. content portal/social network),business purpose (e.g. commercial/non-commercial) orplatform strategy. Facebook uses the collected data forthe business and reap the benefits of ecosystem growthsuch as high revenue, but Apple does not use user datafor commercial purpose. Nonetheless, both (all PEs)use user data for service/product improvement, serviceuse analysis and communication with users. Whiletraditional organizations easily control participants(employees) and the relationship between them,platform owners have limited power to fully controlplatforms as there are multiple parties contributing,deriving and using data [3]. It can result in losingcontrol of the use of data (data abuse/misuse), lawsuitby disgruntled users and low quality of data [16].There are data breach cases of Facebook and AOL.Facebook-Cambridge Analytica scandal [17], waspublically uncovered this year. It is reported that 50million user profiles are shared (sold) and used withoutpermission. A similar case has been found in 2008[18]. One research project team collected 1,700 userprofiles from Facebook and then publically opened thedata. The source of data could be quickly identified.An AOL case occurred in 2006. AOL published thesearch log data of users to the public, and the data wasidentified as Personal Identifiable Information (PII)data soon after the revelation [19, 20]. AOL didn’topen any PII data. However, the log data was easilyturned to PII data since it was categorized by user andthe data provided lots of information of individuals.The three incidents remain some data governanceissues such as unauthorized use of data and highambiguity of control mechanisms in the use of data.The current state of data governance of industryPEs is still immature [4]. There is a lack ofconsideration of various sources of data. PEs generallyfocus on user content, and thus there is a lack of cleardefinition of who owns or uses non-user content (e.g.logs or keywords). Data usage in the supply chain isalso invisible to users. The policies of platforms areimprecise, and thus how, when, and who uses the dataare not clear. This issue is claimed by researchers asone of the critical challenges [5, 8], which should beresolved for trust between platform owners and theusers and business success [9, 11].The findings and concerns are supported by priorstudies. A number of studies address unclear dataownership [5-7], the importance of user contributionmodel [2, 21, 22] and invisible data usage [8] aschallenges. However, how such concerns should bemanaged in data governance of PEs has received littleattention in both industry and academia [9].The results of analysis on 19 existing industrygovernance frameworks and academic works [10, 13,14, 16, 23-33] shows that most of them address generalroles and responsibility of stakeholders within anenterprise. It can lead to difficulties in newly applyingor improving data governance in practice when thereare multiple networks. Yet, prior studies pay moreattention to the concept of PE and control mechanismsas they are still at a relatively embryonic stage. How tomanage data is largely neglected, and the importanceof visibility of a data supply chain is overlooked.3. MethodologyThis study used various data sources to identifyscientifically important aspects and grounds, and thepractical implications of data governance for PEs. Weconducted a literature review, survey of existinggovernance frameworks, industry PEs and data breachincidents, and a case study on one industry PE.3.1. Literature review and surveyFor the literature review, we conducted keywordsearch using specific query and exchangeablekeywords [31]. As the keywords, “platformecosystem”, “multi-sided platform” or “two-sidedplatform” and “data governance” or “management”were used. We included literature which addressesplatform governance, the characteristics of PE, or roleof data in PE, to get broad information and knowledge.We then drilled down to specific interests based on theresult of the first step of a literature review. We usedPage 6378

”,“conformance”, “data breach”, “monitor” and“provenance” for the detailed search.Using the result of a literature review, we surveyedfive main industry governance frameworks: COBIT5.0, ISO/IEC 38505-1, DGI framework, Informaticaframework and IBM information governance. We alsosurveyed PEs to identify how governance practices areimplemented and what practices are overlooked in thereal world. Four commercial PEs (Facebook, YouTube,EBay and Uber) and two non-commercial PEs (RIBIT:Australian platform and SW bank: Korean platform)are included. We conducted the survey by analyzingthe policies and websites, and reviewing academicpapers or news articles. In our previous studies, wesurveyed most the mentioned governance frameworksand PEs. In this study, we replaced ISO/IEC 38500with 38505-1 (as the data governance standard hasrecently been released), added new platforms (the twonon-commercial PEs), and used different lens toidentify specific data governance decisions for PEs.Three data breach cases (two Facebook cases andone AOL case) were analyzed by reviewing academicpapers and news articles. We reviewed the cases fromthe point of view of data governance, and identifiedsignificant lessons learned which should be consideredin data governance for PEs.All the collected data were distinguished andcategorized in the form of a table. The data wasexamined and crosschecked among the different datasources. Based on the refined data, we first identifiedfundamental principles which should be commonlyconsidered in every data governance decision area. Wethen identified important governance decisions andpractices which should be made and implemented forsuccessful management of data in PEs.3.2. Case studyA case study was conducted to validate thetheoretical concepts we discuss in the next section [20],and illustrate the practical implementation and possibleimplications of this study. We selected Platform Awhich is currently running and managed by thegovernment agency. We chose the platform as one ofthe authors of this paper used to work at the platform.We surveyed the platform to understand how the PE isaddressing theoretically importantgovernancedecisions in reality: i.e. how and if the proposeddecisions and practices are implemented in practice.We used five sources of evidence to collect datafrom the case following Yin’s principles [44]:documentation, archival records, interviews, directobservations, physical artifacts. We first analyzed thepolicies and websites with other documents. We thenreviewed the collected data and validated them throughinterviews with the former and current managers of theplatform. We got detailed information and opinions. Todo so, we prepared ten open-ended questions based onthe governance decision questions identified in thisstudy (the section 4). The interviews were carried outthrough online channel (phone calls) because theinterviewees are overseas.We analyzed the collected data using the identifiedgovernance decisions and practices (four decisionsdomains and 13 practices). We classified andsummarized the results of how the platformimplements the data governance decisions. We used asimple metric (sufficiency) to test if the platformimplements the proposed data governance decisionsand practices. We used “not implemented/partiallyimplemented/implemented” as follows.Not implemented: no document and observed activity.Partially implemented: found either document oractivity, but implementation is not fully satisfied. E.g.there is defined use cases of data in policies, but whattypes of data are used for each purpose is not clear.Implemented: either document or activity, andimplementation is fully satisfied.In the last step, we discussed the results and drawconclusions. We first presented how the platformimplements the data governance decisions. We thenidentified the gaps between our discussion (theoreticalconsiderations) and the practical implementation. Weidentified potential risks and opportunities based on thegaps. What effects different implementation causeswas analyzed to understand the context of the case.4. Data governance decisions for PEsThere is a broad consensus among researchers thatdata governance must find answer to the questions ofwhat decisions need to be made and which roles andhow the roles should be involved in decision-makingprocess [10, 29]. In this study we concentrate on thefirst question to identify critical decisions.4.1. Key principles for decision makingIT/data governance frameworks are generally builton fundamental principles which present sets ofguidelines and considerations for all decisions [10, 13,25, 26]. In traditional governance, the principles focuson generic goals and a universal approach to managethe data of an enterprise [29]. We pinpoint specificprinciples for a PE based on the characteristics of a PE.They serve as a starting point for designing new datagovernance or evolving legacy one. The first principle(4.1.1) supports to identify significant governancePage 6379

decisions, and the other principles provide keyconsiderations to implement the decisions.certain ways. It helps a PE to design and implementdata governance from all the perspectives of parties.4.1.1. Align with platform governance concepts andbusiness goals. Data governance goals should align thebusiness goals and higher-level governance goals/concepts to maximize the value of a PE [10, 24]. Thebusiness goals influence the direction and design ofdata governance. If a PE aims to increase usersatisfaction, it needs formal and strict controlmechanisms to increase the quality of data [34].Likewise, higher governance concepts affect datagovernance decisions. Roles, revenue sharing, trust andcontrol are the key concepts of platform governance [9,20, 32]. Roles in data governance refer to a form ofdata ownership with clear responsibility. It allows a PEto protect data and the rights of a data owner/subject.Revenue sharing concept gives the idea that a platformowner should consider a reward for data contributors.Trust is regarded as a prerequisite factor to success [9,20, 35]. To improve trust, high transparency of the useof data is essential in data governance. Trust can beincreased by sharing decision rights with platformusers. Otherwise, rigorous control mechanisms have tobe implemented by a platform owner, and the result orprocess of decision making must be open to allparticipating groups. Control has been addressed inliterature as a vital factor for the successful use of data[1, 30-33]. It is related to the concerns of how tomonitor and preserve the use of data and how toconform to data governance rules.4.1.3. Cover all types of data. Platform data iscollected from various source like human or systems.Industry PEs generally focus on user content [4]. Theother types of data are often ignored in the decisionmaking process of data governance. It can lead toambiguous and incomplete governance decisions. PEsgenerally have a focus on privacy laws to protectPersonal Identifiable Information (PII) data. However,PII and non-PII are not immutable [37]. Non-PII datacan be PII data by combination of extra information (asshown in the AOL data breach case). The importanceof non-user content thus must be highlighted for asecure platform. In addition to this, the value of nonuser content increases because of advertising, the mainsource of the revenue of majority PEs. Non-usercontent like service use information (e.g. logs) is usedfor a targeted advertising by PEs. A targetedadvertising mechanism shows how such data is usedthrough invisible and hidden markets [38]. It growsworries of data abuse and privacy violation with ethicalissues [8, 38]. To reduce the risks, data governance ofPEs should take into account how to make a visiblesupply chain for all types of data in a PE.4.1.2. Consider all participating groups. Intraditional data governance, there are simple and clearroles for data management such as create store, update,archive and delete [25]. Data governance of a PE needsto address complicated relationships and interactionsbetween multiple parties. The participating groups of aPE consist of platform owner (including the roles ofplatform sponsor, orchestrator and provider) andplatform user groups (supply side and demand sideusers). All the groups play critical roles in datagovernance of a PE. Governance policies thus shouldbe equally applied to all parties to be fairly applicablerules for everyone [33]. Thus, every participant shouldbe given the same opportunity and accessibility as itresults in more participation and ideas. It ultimatelyleads to new innovation [36]. This principle enables aPE to develop realistic data governance which can berealized by starting with a good understanding of theneeds of all participating groups. It allows a PE toshare a data management strategy which should bedelivered to all participants. If a PE needs moreparticipation and trust, a platform owner can give usersmore chance to join the decision-making processes in4.1.4. Consider different platform context; one sizedoes not fit all. Platforms have to consider differentbusiness strategies, goals and market regulation. Suchdifferent contingencies affect data governance [29].This principle gives the idea that data governancedecisions can be flexibly made based on the context ofa platform and tailored for efficient implementation.For instance, Apple (app store) and Facebook showexplicitly different governance decisions on the controlmechanisms [20]. Apple aims at providing goodquality services, and therefore it adopts tight controlthrough manual reviews. In contrast, Facebook hasloose control by allowing any input with norestrictions.Governance decisions often result in seriousconsequences as shown in the Facebook-Analyticascandal. Since Facebook allows the apps to collect userdata (even the friends’ data) for higher market shareand revenue, the risks of data misuse/abuse and privacyviolation increased a lot. In contrary to this, Apple’spolicies do not allow the apps to collect user data, andrestrict the use of user data for an advertising [39].4.2. Decision domains4.2.1. The architecture overview. Decision domainsrefer to data governance areas which should becontrolled to achieve the business goals of a PE. In thePage 6380

previous study [4], seven data governance factors areidentified for PEs (Table 1). We transform them todecision domains by categorizing based on the similarcharacteristics and aspects (Figure 1). The first fourfactors in Table 1 are identified as the main decisiondomains as they are regarded as core to set governancepolicies and strategies. The rest of the factors areconsidered as subdomains since they generally supportother decisions [10, 13, 27, 28]. The decisions domainsare identified to specifically manage the complicatedsituation and relationship of a PE. Therefore we do notdiscuss here all the domains which can appear in auniversal data governance framework.Every decision needs to be made by harmonizingall the considerations and information of the decisiondomains [10]. As shown in Figure 1, the decisiondomains are tightly interrelated to support rightdecision making in alignment with the principles.There is a common consensus in both industry andacademia that the conceptual difference of governanceand management should be considered [10, 13, 24, 25].While governance means decisions which should bemade to ensure effective management and the use ofdata, management means a set of practices for theimplementation of the decisions. Based on thisconcept, we introduce core governance decisions forPEs and the separated management practices.Table 1. Data governance factors for PEs [4]FactorDescriptionRegulatory environment Regulations, laws or court cases thatcould affect the ownership, use of data.Data ownership andDefinition of who owns, uses andaccessaccesses platform data.Data use caseThe purpose of the collected data by PEs(how to use data).ContributionMechanisms to measure contributionmeasurementagainst value creation by providing data.ConformanceAn audit for compliance based on strictprocesses and rules.MonitoringMechanisms to monitor a data supplychain and all activities related to data.Data provenanceMeans to trace the derivation history ofthe data transparentlyFigure 1. The data governance decision domains4.2.2. Governance decisions. 1) Regulatoryenvironment. The potential decisions of this domainare “what regulations, specific policies, standards andguidelines should be considered?” and “how does theregulatory environment influence the uses of data?”.For the first decision, identifying external legalrequirements and internal policies, and contractualagreements must be implemented. For example, whena PE deals with personal information such as name oraddress in Australia, “Privacy Act 1988” should beconsidered to identify the legal requirements. Inaddition, the decision model of data ownership/accessrights should be established based on legal aspects. Forexample, creativity, originality, investment and sourceof data can be considered. The aspects are derivedfrom the review of regulatory environment such asBerne Convention and its derivatives [40, 41],European Court of Justice (ECJ) in 2004 (William Hillcase [40]) and the policies of platforms (Table 2).Table 2. Regulations for data ownershipCategory DescriptionCreativity Creative data (video/photo)Non-creative data (profile/log)Originality Original data (new, raw data)Derived data (modified,transformed data)Investment Non-creative and manageddata by a platform ownerNon-investment dataSourceInternal (data created in a PE)External (data by users)RegulationBerne Conventionand its derivativesDepends oncontextCourt cases (e.g.European Courtof Justice (ECJ)General policiesof PEsA certain mechanism to track and notify thecompliance of the regulations should be taken intoaccount. Identifying external/internal compliancerequirements, setting conformance targets and auditingthem must be carried out. The concept of due processis regarded as a pivotal control mechanism to copewith the risks of data abuse/misuse. It forces desirablebehavior of participants [8], and supports successfulimplementation of data governance. Platform data isoften used by external users such as partners orresearchers. The use of data should be confirmed if it islegally permissioned. In particular, if the data is takenout and possibly disseminated for secondary use, theopenness of the data and platform policies must bechecked. All those processes have to be audited bythird parties to avoid bias or conflict of interest, andkeep transparency of a PE.2) Data ownership and access definition. Thisdomain refers to the decisions of “who owns and usesthe data in a PE?”. It has been focused as a centralconcept of a platform design [9, 33, 42]. The decisionsenable a PE to clarify the roles, responsibilities, andPage 6381

comprehensive rights to data of all the correspondingparticipants including the data owners and subjects.Defining data ownership and access rights of alltypes of data is identified as the practices of thedecisions. To support implementation and keep theintegrity and consistency of the outcomes of thepractice, it is necessary to collaborate with otherdomains (Figure 2). The data classifications of all typesof data which are defined in data use case domainshould be used. The clarity of data ownership andaccess definition is improved since there might berarely missing data in the definition. Relevantregulations identified in the regulatory environmentdomain must be used to develop a decision model fordata ownership/access rights. As stated, the decisionshould be made based on the relevant regulations, lawsor court cases [10]. To help practitionersunderstanding, we present a potential decision modelwhich can be considered in the real world (Figure 3).The model is established based on the identifiedregulations introduced in Table 3. It supports a primarydecision of who is the owner of (specific) data betweena platform owner and the users (data contributor) of aPE. The decision should be carefully made because itis related to revenue sharing. It often leads to lawsuitlike the Huffington Post case in 2011 [21].Figure 2. Collaboration with other domainsFigure 3. A data ownership decision modelDefining clear access rights facilitates platformtransparency. A certain method should be available tostakeholders for giving appropriate information andsecurity. Yet, the accessibility of data contributor to thedata can be restricted by the policies or context of aPE; a platform prohibits users’ access to the lastpassword for a security reason. The governancedecision makers need to consider such particularcontext for every single type of data in a PE. Wesuggest a Contribute, Own and Access (COA) matrixto support and simplify such complicated circumstance(Table 3). It allows users to clearly understand thedefinition of what data can own/access (or not), and touse the legitimate rights to data properly.Table 3. An example of the use of a COA matrixData typeVideo/photoLocationService useLast p/wContribute(C) -Own(O) Access(A) -Table 4. Facebook data classificationLevel 1 (2) Level 2 (8)User profile User contentExtra information of user contentUser informationInformation about a user fromother usersInformation about a user fromFacebook companiesInformation about othersService use Service use informationinformation Service use information fromthird-partyLevel 3 ( 70)Video, photoCreated time of photoName, EmailPost by othersUser id, NamePost to otherslogins, logoutslog3) Data use case. For PEs, how to use data iscritical concern to win markets. Therefore, a series ofquestions, “what types of data are collected and whatare the uses of data for the business?” and “howshould data be used without losing control?”, shouldbe addressed in this domain.To support the decisions, defining a dataclassification gives good understanding of differenttypes of data [10] as a PE collects data from varioussources. Majority data is from users as they uploadcontent such as video, image or user information(human-sourced data) [43]. While a user uses platformservices, the platform systems leave data like logs,search keywords or location (machine-generated data).This type of data is generally referred to service useinformation. Data is also collected through systemprocesses through transactions, reference tables orinteractions (process-mediated data). All the types ofdata should be considered and included in a dataclassification. To show an example, we identify threelevels of data classification of Facebook by analyzingthe policies (Table 4). The first level consists of userprofile (from human) and service use information(from machine and process). The second level isdivided into eight categories (six and two categoriesrespectively). The last level of data classificationcomprises more than 70 types of data.Page 6382

In addition, the governing body needs to decideappropriate data use cases of the collected data inalignment with the business goals. According to theresult of our survey on the policies of PEs, 11 use caseshave been commonly found: e.g. provide, improve anddevelop (test) services, communicate with users, andshow and measure ads and services. The use casesmust include the information of what types of data canbe used for each case. It helps a platform to detect andprevent the unauthorized use of data in a data supplychain [25]. For this, the data classification identified inthe previous step should be used and confirmed ifevery type (level 3) of data is belong to at least one ofthe use case and vice versa.Monitoring and data provenance can be used asmechanisms for detecting and notifying all activities inthe use of data, and tracking the derivation history ofthe data [8, 10]. Monitoring of the use of data shouldbe implemented based on the defined use casesinformation for visible and reliable data use. Dataprovenance allows a platform to reserve all activitiesabout data, identify all the associated stakeholders andprevent denial of data manipulation. It can be used toexplicitly measure the contribution of data providerswhen there is a multiple ownership issue.

as data governance by some authors [11], but it addresses information issues rather than individual data pieces [12]. IT/information governance, however, often . framework and IBM information governance. We also surveyed PEs to identify how governance practices are implemented and what practices are overlooked in the