Dpia Diagnostic Data In Microsoft Office Proplus

Transcription

DPIA DIAGNOSTIC DATA INMICROSOFT OFFICE PROPLUS5 November 2018Commissioned by the Ministry ofJustice and Security for the benefit overnment)Sjoera NasArnold Roosendaal 2018; Ministerie van Justitie en Veiligheid. Auteursrechten voorbehouden. Niets uit dit rapport magworden verveelvoudigd en/of openbaar gemaakt door middel van druk, fotokopie, microfilm, digitaleverwerking of anderszins, zonder voorafgaande schriftelijke toestemming van het Ministerie van Justitie any.eu

ContentsChange log . 0Summary . 1Introduction . 7DPIA . 7Federal negotiations versus individual DPIAs . 7Definition diagnostic data . 7Previous DPIA on Windows 10 telemetry . 8Technical limitations . 9Meetings with Microsoft . 9Outline. 10Part A. Description of the Office diagnostic data processing .111.Topic: the processing of diagnostic data in Microsoft Office Software . 112.Personal data and data subjects . 142.1 Data Subject Requests and Audit logs .152.2 Definition of personal data . 182.3 Possible types of personal data and data subjects . 213.Data processing through diagnostic data . 233.1 Privacy choices in Office . 274.Purposes of the processing .315.Controller, processor and sub-processors . 386.Interests in the data processing . 447.Transfer of personal data outside of the EU . 468.Techniques and methods of the data processing . 479.Additional legal obligations: ePrivacy Directive . 5010.Retention Period . 53Part B. Lawfulness of the data processing . 5611.Legal Grounds . 5612.Purpose limitation . 6213.Special categories of personal data . 6314.Necessity and proportionality . 6315.Rights of Data Subjects . 66Part C. Discussion and Assessment of the Risks . 69

16.Risks. 6916.1 Identification of Risks . 6916.2 Assessment of Risks . 7116.3 Summary of Risks .75Part D. Description of risk mitigating measures .7717.Risk migitating measures . 7717.1 Announced risk mitigating measures . 7717.2 Residual risks . 78Conclusion . 79ANNEX 1 – Description of key functionalities in Office. 80

Change logVersion0.1Date5 September 20180.29 September 20180.317 September 20180.419 September 20180.5 (final 27 September 2018draft)1.028 September 20181.12 October 20181.23 October 20181.31 November 2018Summary of inputFirst draft, with part A nearly completed, parts B, C and Din bullets, reviewed by Arnold RoosendaalInput processed from D. Paardenkooper (lab), R.B.Herkemij (J&V), P. J. G. van den Berg (Manager SLM Rijk)and S. L. Hartholt (J&V lawyer) on technical and legalaspectsInput processed from internal government feedback(achterbanconsultatie), first complete version of parts Cand D, first draft of Summary, added graphicInternal Privacy Company processing of language andstyle correctionsInput from Microsoft processed, revisions visible inversion with track changesClean final draft, final spelling and lay-out checkAnswers from Microsoft dated 1 October 2018 toquestions 1-10 processed, written comments from R.B.Herkemij, D. Paardenkooper, P.J.G. van den Berg and S.L.Hartholt. Relevant expansion of arguments with regardto legal ground, relevant change in the (numbering of)high risks, added new table with risks and possiblemitigating measures. Revisions are visible in the versionwith track changes.Minor typing and style edits at the request of P.G.B vanden Berg and S.L. Hartholt.Processing of results of discussions with Microsoft andinput on different data classifications

SummaryThe Dutch government has commissioned a general data protection impact assessment on theprocessing of data about the use of the Microsoft Office software. The purpose of this DPIA is tohelp the individual government organisations map and assess the data protection risks for datasubjects caused by this data processing, and to ensure adequate safeguards to prevent or at leastmitigate these risks. This report provides a snapshot of the current risks. As Microsoft will providemore information, and more research can be done to inspect the diagnostic data, new versions ofthis DPIA will be drafted.The Office software is deployed on a large scale by different governmental organisations, such asministries, the judiciary, the police and the taxing authority. Approximately 300.000 governmentemployees work with the software on a daily basis, to send and receive e-mails, create documentsand spreadsheets and prepare visual presentations. Generally, these organisations store thecontent they produce with the Office software in governmental data centres, on premise. Sincethe Dutch government currently tests the use of the online SharePoint / OneDrive cloud storagefacilities, this DPIA also includes the data Microsoft processes about the use of SharePoint to storeand access documents.Federal negotiations versus individual DPIAsThe Dutch government has a Microsoft Strategic Vendor Management office (SLM Rijk). Thisoffice conducts the negotiations with Microsoft for the federal government, but the individualorganisations buy the licenses and determine the settings and scope of the processing byMicrosoft Corporation in the USA. Therefore this general DPIA can help the different governmentorganisations with the DPIAs they must conduct, but this document does not replace the specificrisk assessments the different government organisations must make. Only the organisationsthemselves can assess the specific data protection risks, based on their specific deployment, thelevel of confidentiality of their work and the types of personal data they process.Scope: diagnostic data, not functional dataThis report addresses the data protection risks of the storing by Microsoft of data about theindividual use of the Office software, including the use of Connected Services. These metadata(about the use of the services and software) are called ‘diagnostic data’ in this report. This includesso called ‘telemetry data’.Following the logic of ePrivacy legislation in Europe, this report distinguishes between 3 categoriesof data:1. Contents of communication with Microsofts services, defined by Microsoft as ‘CustomerData’2. Diagnostic data, all observations stored in event logs about the behaviour of individualusers of the services3. Functional data, which should be immediately deleted or anonymised upon completion ofthe transmission of the communication.In this report, the term functional data is used for all data that are only necessary for a short periodof time, to be able to communicate with services on the Internet, including Microsoft’s own appsand services. Examples of such functional data are the data processed by an e-mail server, and thePrivacy Company 2 November 2018page 1 of 83

data stream necessary to allow the user to authenticate or to verify if the user has a valid license.According to the distinction between the 3 categories of data made in this report, functional datamay also include the content of text you want to have translated. In that case, Microsoft maycollect the sentence before and after the sentence you mark for translation, to provide a bettertranslation. The key difference between functional data and diagnostic data as defined in thisreport, is that functional data are and should be transient. As long as Microsoft doesn’t store thesefunctional data, or only collects these data in a strictly anonymous way, they are not diagnosticdata.Microsoft uses different words and classifications. The term ‘diagnostic data’ for Microsoft onlyrefers to the specific telemetry data collected through Office itself about the use of the Officesoftware. Microsoft does not have a overall category for the metadata that are generated on itsservers by the individual use of the services and software, such as the telemetry data and othermetadata stored in server logs. Microsoft uses the term ‘Customer Data’ to refer to all data thatare provided by users when using the software. Most of Microsoft’s contractual privacy guaranteesrelate to these ‘Customer Data’.Data collection via event logs and telemetryTechnically, Microsoft Corporation collects diagnostic data in different ways, via systemgenerated event logs and via the Office telemetry client. Similar to the telemetry client inWindows 10, Microsoft has programmed the Office software to collect telemetry data on thedevice, and regularly send these to Microsoft. After an investigation by several European DPAs in2016-17, Microsoft has published extensive documentation about the Windows telemetry data.Microsoft has also made a data viewer tool available within Windows that allows users to see thetelemetry data Microsoft collects. Microsoft has explained that it collects Office telemetry data ona much larger scale (up to 25.00o event types, compared to the max 1.200 event types in Windows10 telemetry). Within Microsoft, the Office telemetry data are added and analysed by a highernumber of engineering teams (20 to 30 teams, compared with the 10 teams that work on Windowstelemetry).Personal dataCurrently, Microsoft provides no documentation, settings or data viewer tool for the Officetelemetry data. Prior to this DPIA, Microsoft assumed the telemetry data were not personal dataAs a result of this DPIA, Microsoft recognises that many diagnostic data about the use of the Officesoftware and connected services, including the telemetry data, contain personal data.The technical administrators of the Office Enterprise software at the different governmentorganisations (the admins) can see some system-generated event data if they export the audit log.For the purpose of this DPIA, tests were performed by the technical lab of the Ministry of Justiceand Security. The exported audit logs from these tests show that the diagnostic data may includeboth behavioural metadata and data relating to filenames, file path and e-mail subject lines.Roles and purposesMicrosoft considers itself to be a data processor for the processing of most of the data it processesthrough Office, including the Office telemetry data. The only exception is the use of voluntaryConnected Services. In that case, Microsoft considers itself to be a data controller, and mayprocess the diagnostic data for the 12 different purposes described in its general privacy statementthat are not excluded in the Online Service Terms.Privacy Company 2 November 2018page 2 of 83

As a data processor, Microsoft processes the personal diagnostic data ‘to provide Office’. Thiscovers processing for the following purposes:1. Security (identifying and mitigating security threats and risks as quickly as possiblethrough updates to Office ProPlus Applications and remediation of connected services)2. Up to Date (delivering and installing the latest updates to the Office ProPlus Applicationswithout disruption to the experience)3. Performing Properly (identifying and mitigating anomalies, “bugs,” and other productissues as quickly as possible through updates to the Office ProPlus Applications andremediation of connected services)4. Product development (learning to add new features)5. Product innovation (business intelligence, develop new services)6. General inferences based on long-term analysis, support machine learning7. Showing targeted recommendations on screen to the user8. Purposes Microsoft deems compatible with any these 7 purposes.Only data controllers may determine the purposes of the processing. In view of the nature of thedata processing as examined in this DPIA, Microsoft does not act as a data processor, but as a datacontroller. Because government organisations enable Microsoft to process personal data for thesepurposes, the organisations are joint controllers with Microsoft.The government offers employees no choice in using the Microsoft Office tools. They are not freeto select other tools. Employees cannot distinguish between voluntary and mandatory ConnectedServices and the implications of providing data to Microsoft as an independent data controller.That is why the government organisations and Microsoft are also joint controllers for thesediscretionary Connected services.Legal groundsAs joint data controllers, Microsoft and the government organisations can only appeal to 3 of the6 possible legal grounds. Based on the necessity to perform a contract, including the employmentcontract, as well as the necessity for a legitimate interest, government organisations may allowMicrosoft to process personal diagnostic data for the first three purposes (security, providingupdates and troubleshooting). The government organisations can also rely on their legalobligation to process audit logs for security purposes. This can be necessary to collect evidence ofpossible security breaches as a legal ground for the processing of personal data. Currently, norMicrosoft nor the government organisations have a legal ground for the processing of diagnosticdata for any other purpose.RisksCurrently, Microsoft provides no comprehensive documentation, settings or data viewer tool foran accurate overview of the Office telemetry data. There is limited documentation about the auditlogs and system-generated event logs, but no information about the (collection and contents of)telemetry data. New telemetry events, that collect other types of data, can be added dynamically,if they comply with any of the 8 purposes described above.It is not clear what types of content may be included in the diagnostic data. Microsoft has assuredthat the audit logs do not contain any part of email content, but the logs do contain the subjectPrivacy Company 2 November 2018page 3 of 83

lines of emails. Microsoft has also stated that telemetry data may not contain sensitive data orother content, but has simultaneously explained that engineers may have mistakenly addedevents that could include content. Additionally, snippets of content (such as the line preceding andfollowing a word) may be included in system generated event logs about the use of ConnectedServices.Until further examination of the diagnostic data proves otherwise, this report assumes thatdiagnostic data may include both metadata (about the behaviour of users) and content.Microsoft does not accept its role as joint controller for the diagnostic data with the governmentorganisations that use Office. The Office telemetry data and system-generated event logs arestored for a minimum of 30 days, and long term for a period of 18 months in the central Cosmosdatabase in the USA. The data can be stored longer if an individual team has exported its ownsubset of data. There is no central possibility for admins to delete historical diagnostic data, exceptfor terminating the user account. Microsoft has developed rules for the collection of new telemetryevents, but there was no scheme governing the purposes for the addition of telemetry data in thepast. Though Microsoft stores the Customer Data in European data centres, diagnostic data maybe processed and stored anywhere. If an employee uses a voluntary Connected Service, Microsoftmay process the data for 12 broad purposes.These circumstances lead to the following data protection risks:1. No overview of the specific risks for individual organisations due to the lack oftransparency (no data viewer tool, no public documentation)2. No possibility to influence or end the collection of diagnostic data (no settings fortelemetry levels)3. The unlawful storage of sensitive/classified/special categories of data, both in metadataand in for example subject lines of e-mails4. The incorrect qualification of Microsoft as a data processor, in stead of a joint controller asdefined in article 26 of the GDPR5. Not enough control over sub-processors and factual processing6. The lack of purpose limitation both for the processing of historically collected diagnosticdata and the possibility to dynamically add new events7. The transfer of (all kinds of) diagnostic data outside of the EEA, while the current legalground is the Privacy Shield and the validity of this agreement is subject of a procedure atthe European Court of Justice8. The indefinite retention period of diagnostic data and the lack of a tool to delete historicaldiagnostical dataRisk mitigating measuresMicrosoft has committed to publish documentation about the Office telemetry data and to offernew telemetry choices for Office admins. Microsoft has also committed to develop a data viewertool in Office for the Office telemetry data. The timing of these measures is not public information.In the interim, Microsoft has helped the Dutch government to implement settings to minimise theprocessing of telemetry data, based on the blocking of traffic from certain ports that sendinformation to the telemetry end-point in the USA. The effectivity of this solution still has to bePrivacy Company 2 November 2018page 4 of 83

tested in combination with a data viewer tool. Microsoft and SLM Rijk are negotiating about theuse of a data viewer tool. The results of this inspection will be the subject of a follow-up DPIA.Residual risksSome residual risks can be mitigated if the government organisations will use the newly developedsettings to minimise the processing of telemetry data.Assuming Microsoft will be offering a data viewing tool and assuming Microsoft will provide globalsolutions to the risks of the lack of transparency and ability to control the level of telemetrycollection, the first two risks will be mitigated by the measures Microsoft has currently committedto take. Microsoft has not agreed yet to any of the other possible risk mitigating measures.Government organisations must exert every effort to mitigate the remaining high risks, amongstothers by centrally prohibiting the use of Connected Services. They must also block the option forusers to send personal data to Microsoft to ‘improve Office’. Government organisations shouldalso refrain from using the SharePoint/OneDrive online storage, and delay switching to the webonly version of Office 365 until Microsoft has provided adequate guarantees with regard to thetypes of personal data and purposes of the processing.Additionally, the tenants should consider the following measures: delete some specific users such as VIPs and create new AD accounts for them consider using a stand-alone deployment without Microsoft account forconfidential/sensitive data conduct a pilot with alternative software, after having conducted a DPIA on that specificprocessingSLM Rijk should continue to work with Microsoft to obtain further information and conduct followup DPIA’s on future Office versions that may lead to a different appreciation of the data protectionrisks.The risks and possible risk mitigating measures can be visualised in the following table.Nr Risk1Lack of transparency23No possibility to influenceor end the collection oftelemetry dataUnlawful collection andstorage of sensitive/Privacy Company 2 November 2018Possible measure MicrosoftPublic documentation and dataviewer toola. Temporary settings tominimise the processingb. Permanent settings fortelemetry levelsa. Option to delete historicaldiagnostic data by Device IDPossible measure per tenantUse tool when it becomesavailableUse temporary minimisationsettingsDo not useSharePoint/OneDriveDo not use web-only Office365Use setting telemetry Offwhen switch is availableConsider deleting somespecific users and creatingnew accounts for thempage 5 of 83

4classified/specialcategories of datab. Guarantee never to storecontent data in telemetry dataor in other system-generatedevent logs unless strictlynecessaryIncorrect qualificationMicrosoft as dataprocessora. Minimisation of purposes tobe able to act as a processor ORNew framework agreement asjoint controllerb. Only process data fromvoluntary Connected Services asa data processor OR changedefault for voluntary ConnectedServices to ‘Off’More audit rights5Not enough control oversub-processors andfactual processing6The lack of purposelimitationProcessing only for strictlynecessary purposes for whichthe tenants have a legal ground7The transfer of dataoutside of the EEA8The indefinite retentionperiod of diagnostic dataNew contractual guaranteesand/or storage of diagnosticdata within the EUDetermine necessary retentionperiodsProhibit users from sendingpersonal data to Microsoft to‘improve’ OfficeConsider pilot with othersoftware for somefunctionality (afterconducting a separate DPIA)Endorse new frameworkagreement as processor orjoint controllerProhibit voluntary ConnectedServices unless Microsoftoffers these services as aprocessorConsider stand-alonedeployment withoutMicrosoft account forconfidential/sensitive data- no specific measure, seeabove- no specific measure, seeabove- no specific measure, seeaboveGiven the ongoing negotiations with Microsoft to mitigate the remaining risks, SLM Rijkpostpones consultation of the Dutch data protection authority for risks 3 - 8.Privacy Company 2 November 2018page 6 of 83

IntroductionDPIAUnder the terms of the General Data Protection Regulation (GDPR), an organisation may beobliged to carry out a data protection impact assessment (DPIA) under certain circumstances, forinstance where large-scale processing of personal data is concerned. The assessment is intendedto shed light on, among other things, the specific processing activities which are carried out, theinherent risk to data subjects, and the safeguards applied to mitigate these risks. The purpose ofa DPIA is to ensure that any risks attached to the process in question are mapped and assessed,and that adequate safeguards have been implemented to tackle those risks.This DPIA is focussed on the processing of personal data via diagnostic data generated during theinstallation and use of Microsoft Office ProPlus software (installed locally, on the device of theusers, in combination with online Office 365 services). This DPIA follows the structure of the DPIAModel mandatory for the Dutch government.1Federal negotiations versus individual DPIAsThe Dutch government has a Microsoft supply management office (SLM Rijk). This office conductsthe negotiations for the federal government, but the individual organisations buy the licenses anddetermine the settings and scope of the processing by Microsoft Corporation in the USA.Therefore this general DPIA does not replace the specific risk assessments the different procuringorganisations must make, based on their specific deployment, the level of confidentiality of theirwork and personal data they process.Definition diagnostic dataThis report addresses the data protection risks of the storing by Microsoft of data about theindividual use of the Office software, including the use of Connected Services. These metadata(about the use of the services and software) are called ‘diagnostic data’ in this report. This includesso called ‘telemetry data’.Following the logic of ePrivacy legislation in Europe, this report distinguishes between 3 categoriesof data:1. Contents of communication with Microsofts services, defined by Microsoft as ‘CustomerData’2. Diagnostic data, all observations stored in event logs about the behaviour of individualusers of the services3. Functional data, which should be immediately deleted or anonymised upon completion ofthe transmission of the communication.In this report, the term functional data is used for all data that are only necessary for a short periodof time, to be able to communicate with services on the Internet, including Microsoft’s own appsand services. Examples of such functional data are the data processed by an e-mail server, and thedata stream necessary to allow the user to authenticate or to verify if the user has a valid license.1Model Gegevensbeschermingseffectbeoordeling Rijksdienst (PIA) (September 2017). For an explanation andexamples (in Dutch) see: ing-rijksdienst-pia.Privacy Company 2 November 2018page 7 of 83

According to the distinction between the 3 categories of data made in this report, functional datamay also include the content of text you want to have translated. In that case, Microsoft maycollect the sentence before and after the sentence you mark for translation, to provide a bettertranslation. The key difference between functional data and diagnostic data as defined in thisreport, is that functional data are and should be transient.2 As long as Microsoft doesn’t store thesefunctional data, or only collects these data in a strictly anonymous way, they are not diagnosticdata.Microsoft uses different words and classifications. The term ‘diagnostic data’ for Microsoft onlyrefers to the specific telemetry data collected through Office itself about the use of the Officesoftware. Microsoft does not have a overall category for the metadata that are generated on itsservers by the individual use of the services and software, such as the telemetry data and othermetadata stored in server logs. Microsoft uses the term ‘Customer Data’ to refer to all data thatare provided by users when using the software. Most of Microsoft’s contractual privacy guaranteesrelate to these ‘Customer Data’. Microsoft has provided the following examples of Customer Data:Customer password, content of customer’s email account or Azure data base, email subject line,Machine learning built models with data that is unique to a customer, and email content.3The definition of diagnostic data used in this report is independent from the legal role of Microsoftas a data processor or a data controller.Previous DPIA on Windows 10 telemetryThe Ministry of Justice and Security in the Netherlands has a separate Microsoft supplymanagement office. This office (SLM Rijk4) procures the Microsoft software for all employees ofthe federal Dutch government. In the spring of 2018, SLM Rijk commissioned a DPIA report aboutthe telemetry or diagnostic dataflow from both Windows 10 Enterprise and the two differentOffice implementations deployed by Dutch government organisations. SLM Rijk required thisanalysis as a direct result of the findings of the Dutch Data Protection Authority (AutoriteitPersoonsgegevens, hereinafter: Dutch DPA) that the processing of personal data through Windows10 telemetry was not compliant with the Dutch data protection act.This previous DPIA report was

As a data processor, Microsoft processes the personal diagnostic data to provide Office. This covers processing for the following purposes: 1. Security (identifying and mitigating security threats and risks as quickly as possible through updates to Office ProPlus Applications and remediation of connected services) 2.