Citizen Data Science - Royal Society

Transcription

Data governance:from principlesto practiceCivil society, volunteerdata science skills,and open datasetsWorkshop reportNovember 2020

SummaryThere is a significant benefit to be gained from the betteruse of data, and civil society and volunteer groups canbenefit greatly from the use of data that is open,accessible and meaningful. However, there are someimportant considerations relating to how civil society andvolunteer groups gather the skills and infrastructures tomake better use of data, and how they establish thesystems to ensure that the collection and use of data isgoverned appropriately and collectively. Exploring theissues in this area puts into focus some of the mainfindings and recommendations from the Royal Societyand British Academy report Data Management and Use:Governance for the 21st Century.Principle 1Transparent, inclusive and democratic decision-makingabout trade-offs: exploring the concept and practice ofcollaborative data maintenance – the process and datainfrastructure by which organisations and communitiesshare the responsibility and work to collect, maintain,govern and use data.This report summarises the discussions at a workshopheld in partnership with the Ada Lovelace Institute, theAlan Turing Institute, the British Academy, DataKind UK,the Leverhulme Centre for the Future of Intelligence andthe Open Data Institute, on 12 March 2020. The workshopset out each organisation’s different perspective on theopportunity for using data for the benefit of civil society,and the ways in which principles for the governance ofdata use can be put into practice in the voluntary sector.Principle 3Seek out good practices and learn from success andfailure: effective data governance should display acommitment to promoting good practice and embeddingcontinuous learning as a way of improving practicesand standards.Report structureThe principles set out in Data Management and Use:Governance for the 21st Century formed the structure ofthis workshop and therefore, this report, with eachsection exploring an aspect of their application (a detailedoutline of the existing tensions and disconnects in datamanagement and use, and the principles for datagovernance, are provided in Annex B).ContextLearning from citizen science and environmental data: anopening reflection drawing on experience at the interfaceof academic research and citizen data science, tohighlight some of the challenges in volunteer and civicsociety uses of data.Principle 2Individual and collective rights and interests: exploringdata practices and social value, considering how datagovernance can protect both individual rights, goodsand benefits, and collective rights, goods and benefits.Principle 4Enhance existing democratic governance: datamanagement and use should support democraticprocesses, help enact democratic decisions and besubject to democratic oversight.Conclusions and actionsThis note concludes with some actions to promote datause by civil society, including supporting collaboration inthe civil society community; developing guidance andcase studies for organisations to learn from; providingsupport for technical literacy and in navigating the dataspace; and promoting inclusive dialogue. This note doesnot necessarily represent the views or positions oforganisations or individuals who took part. A numberof the sources referred to in the note can be found inAnnex A: resources guide.Full details of sources and locations for these resources are given in Annex A: resource guide and furtherreading on page 20.Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 2

ContextOpen and shared datasets, pro bono data science skills,and civil societyThis opening reflection draws on experience at the interface of academic researchand citizen data science to highlight some of the challenges in volunteer and civicsociety uses of data.Muki Haklay: learning from citizen science andenvironmental dataFour snapshots from the past 25 years, from the area ofenvironmental information and the civic-societal use of it,can show multiple, persistent digital divides, and what wecan learn from them at an organisation, community andindividual level. This draws on a background of 30 yearsof looking at the creation of, public access to, and use ofenvironmental information.Access to Environmental InformationNearly 25 years ago, in 1997, Friends of the Earth foundthemselves in a situation where, in the US, the ToxicRelease inventory (TRI) – a database of what factories arereleasing into the environment – had been open andavailable to the public since 1986. In 1997 a website(Scorecard) was created that shared this informationopenly on the web, but such information was notaccessible in the UK. Susan Pipes and Lesley St James,two technologists from Friends of the Earth, received adonation of a SUN workstation, plus an Oracle Database,plus ArcInfo, geographic information system (GIS)software. With this free access to about 30,000 ofresources, they set up a server providing informationabout the UK chemical release inventory, using a datasetthat was passed to them from the Environment Agency.The FoE system allowed a user to put in a postcode tosee what was going in a particular area via a websitecalled Factory Watch. That changed people’s ability toaccess information and, a year later, the EnvironmentAgency released a website called What’s In YourBackyard? enabling access to this information. Thissnapshot shows the unique ability and innovation of acivic-society organisation, where a web-mapping server– the like of which did not previously exist – was createdby the organisation from scratch.Who could use this information? In 1997 the size of thepopulation that had access to the internet at a speed thatallowed proper browsing of this website was just a fewpeople in universities, potentially some journalists, and afew other people (in total, less than 9% of the population).It was about raising awareness, but there was still somedistance to go in terms of understanding the information,in terms of the toxicity of the chemicals in the inventory.Making data meaningful to civic societyIn a second snapshot, going back to 1999, the nationalresource on air quality provided information enablingviewing of an automatic monitoring station. It waspossible to click on each one of them and on each of thepollutants. However, the output was inherently a CSV(comma separated variables) file of values, which had tobe made sense of. This continued to be updated, and byaround 2008, it was possible to ask for a specific locationand get information. But what does a “level of benzene”in the area mean to a member of the public? Again, whatcan society do with these numbers?In the area of environmental information, data has beenaccessible and open for a very long time, but how has itled to actionable knowledge? Between 2000 and 2010,air quality was not a priority issue for civic society. By2010 it became more prominent, but the data was notenough to be considered actionable information –information that can be used as a basis for action.When information such as air quality data was used incommunity settings, the feedback was that ‘this is notcommunity information in community language that wecan understand.’ The Aarhus Convention was introducedwith an assumption that the issue critical to participationin environmental decision-making, is access toinformation, but my argument is that while air qualityinformation has been open for 25 years, we have foundthat communities find it difficult to make sense of it.Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 3

A third example is to note the significance of the form inwhich data is available. Since the InternationalGeophysical Year in 1957, earth scientists and othershave been used to using digital data – the use ofcomputers has been on the increase since then and it israrely considered now, even in field settings. In terms ofcommunities however, we can see that in 1986 theBreeding Bird Survey was still relying on paper forms,and there is a project that is currently being run by OpenAir Laboratories which is also in paper form. So, incommunity settings, data is not necessarily digital. It isimportant therefore to be aware that not all data is borndigital, and aware of the technical abilities of groups.Civic society is made up not only of charities that are bigand technically capable, but often of volunteers that aremore interested in the issue than in the form of the data.The final example is from my current European ResearchCouncil (ERC) project, Extreme Citizen Science, Analysisand Visualistion (ECSAnVis) which is about creating datacollection and allowing any community, regardless ofliteracy, to carry out citizen science activities. In anexample from the Masai Mara, the community of warriorsand the tribes there are concerned about the impact ofclimate change. With the support of Professor JacquelineMcGlade, who was the Chief Scientist of the UNEnvironmental Programme, they are now collecting dataabout tree health. They created an icon-based appwhereby they can collect information about 170 types oftrees and record the situation in each one of them. Theyrecorded 7,000 data points which are then used with AIto analyse them with remotely sensed data.The issue of literacy is also technological literacy.We are working with people who have not used digitaltechnology before, and surely have not used mappinginformation before. But we have discovered that aerialimagery is accessible to anyone, anywhere, in the sensethat it can be understood by them. If you have highresolution and detailed aerial information, we have doneenough experiments and studies to know that evenpeople who have never seen it before in a digital formcan understand it.Understanding digital data and consent for its useStill, there are issues relating to working with non-literatepeople in remote communities to explain to them thenature and mutability of digital data. There are alsoquestions of how to deal with data that does not belongto individual, but belongs to the community. GDPR anddata protection law is about the individual, and does notnecessarily translate into data ownership ideas indifferent communities.The ECSAnVis project is focusing on creating avisualisation tool to provide a way for people who havenot seen digital technology and are not familiar with thetransferring of information, to understand how datacollected on a device can appear on a server somewhereelse. This is done in a way that it is understandable andsupports a meaningful conversation. We have also usedpractices such as capturing informed consent on video,rather than paper and signature, ensuring that there canbe discussion of consent in contexts where paper formsdo not have meaning. Across my work, it has always beenthat the data does not belong to me; at best, I am acustodian of the data. Every time I use it, I need to ask forthe community for their consent.Access to technology and resourcesAnother aspect is the need for dedicating resources fordigital updates. As someone who has been running asocial enterprise for 10 years which has been using digitaltools, the burden of rewriting code from scratch everyfive years is very heavy, and at least can be supportedthrough access to research funding. What is this like forcharities, that do not have research funding?PCs were difficult to access in the 1990s. Today, in orderto access data science, charities need cloud servers– but what is the financial cost of that and what skills areneeded to set them up? Using GIS is challenging –though made easier by recent apps, and open data cannow be downloaded. But when working withmarginalised communities, it is important to ensure thatthey don’t have to pay for their own data access to useapps and data. And access to open data is notguaranteed – the GEOTHINK consortium in Canadademonstrated that some governments are closing downcertain open services.Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 4

Demographics and data skillsWhen the issue is simply providing data, there is quite abig group of people who can access the data and use it ifit is understandable to them. When the issue is use of asystem, you require understanding about how to use thesystem, which, by definition, reduces the number ofpeople doing that. When you get all the way up tocreating new systems or setting up proper datacollection systems that will work on mobile devices –which requires specialist data science – the number ofpeople with the necessary skills is really small.There is also a persistent issue with the digital exclusionof some demographics, and a generational divide. Thereare people in their fifties and sixties, who are thecommunity activists and would like to use computing anddata, but may not have the skills to navigate existing tech.There is an untapped potential in the explosion of thenumber of people with qualifications from highereducation, including those skilled in data science. Butyounger tech developers may not always know enoughabout user-centred design to contribute products thatwork for a diverse range of users.Roles for intermediariesChallenges in accessing data science skills, and incommunities not knowing what to look for in data, can beaddressed by using intermediaries. Mapping for Changeis an intermediary set up to provide this ability to makesense of and access community information. Civic societyis not expected to go to the Environment Agency websiteor to download the data directly, but they need to knowwho the intermediary is and how they find them.But issues around making data meaningful tocommunities are ‘wicked’ and difficult to solve. We needto think about them as we go on in this area of citizendata science.Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 5

DiscussionThe civil society sector, its challenges and opportunitiesIt is clear that there are many opportunities for the civil society sector to make use of data,but it also faces a wide range of issues. The table below summaries some opportunities andchallenges. The rest of this workshop report sets out the ways that, by putting datagovernance principles into practice, they can be addressed.ISSUEOPPORTUNITIESCHALLENGESData skills in civilsocietyThere is a wealthof volunteers whoare willing to supportcivil societyCivil society organisations need more skills in several areas,including technological skills, research method skills,understanding of ethics, and skills in community engagementand collaboration. The lack of skills and capacity within civilsociety could be addressed by involving more people, additionalfunding and more collaboration. It is important to ensure thatwilling volunteers are valued and cared for.Access to data bycivil societySome data is openand availableThere can be challenges to accessing data, even when data isavailable. Making data open alone is not enough, there needs tobe further support to make it accessible, including support fordeveloping skills. There are challenges relating to data qualityand discoverability, with a lack of common standards andvocabularies relating to data. There are also questions aroundconsent to use the data.Defining datachallengesEnthusiasm, skillsand influencing policyInitiatives might garner enthusiasm and maybe even skills, butthere is a challenge of problem definition that some organisationsstruggle to address. This is the first stage of outlining why theproject is being carried out, and what purpose and outcome weare looking for, beyond the existence or availability of the data.This is about encouraging more of the strategic thinking that mightbe needed, ie the idea of strategy versus hacktivism. This canlead to opportunities to inform policy and to find and addressgaps in services.Securing long-termbenefitsEfforts in differentareas within civilsociety, of peopleworking with data andbuilding relationshipsaround data can bereally powerful,360Giving and OpenData Manchesterbeing an exampleA challenge for civil society groups is ensuring that the opendata is available, useful, and documented, allowing effective useand avoiding the risks of misinterpretation. What happens after adata-led civil society programme runs its course, is the data leftto the side and never used again or incorporated into corporatesystems? How can it be ensured that the organisations that aresupposedly benefiting from these efforts do actually benefit?Key challenges to consider are: how can civil society enablepeople to empower themselves? How do civil societyorganisations ensure that their use of data really doesempower poor and marginalised communities?Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 6

Transparent, inclusive and democraticdecision-making about trade-offsOpen Data Institute: the example of collaborative data maintenanceLeigh Dodds, Rachel Wilson, Chris Thorpe and Julian TaitThe Royal Society and British Academy Datamanagement and use report highlights that there aretensions to be navigated in the governance of data, forexample between individual and collective rights, asdiscussed in the next section. It argues that if thesetrade-offs are to be navigated in a way that is transparent,then all of those affected should have real and effectiveopportunities to participate in making the choices. Howdoes this apply in citizen data science?The Open Data Institute (ODI) defines collaborative datamaintenance as the process and data infrastructure bywhich organisations and communities share theresponsibility and work to collect, maintain, govern anduse data. Collaborative maintenance occurs in a widercontext of “open culture”, including open standards,open source code, and open data.Open cultureOpen standards are reusable agreements that can shapehow we choose to collect, share and use data, availablefor anyone to access, use or share. They can be highlytechnical, for example focusing on file formats or datastructures; or can be higher-level, such as standardisedcodes of practice or checklists. Open standards are mosteffective when all the organisations that might beimpacted through adoption of a standard come togetherto help shape it. Open source code is when organisationsor individuals work together to create reusable code andapplications, resulting in mutual resources for helpingdeploy websites or data collection. Open data involvespublishing data under an open licence, so that it can beaccessed, used and shared by anyone for any purpose.Open data has historically been about increasingtransparency and accountability, but recently the focushas shifted towards using open data to solve sharedchallenges and to address social, economic andenvironmental problems.Collaborative maintenanceCollaborative data goes further than either open sourcecode or open data, to emphasize collaboration across thedata lifecycle of data collection, maintenance,governance and use. An example is OpenStreetMap andHumanitarian OpenStreetMap, a collaboratively producedmap of the world developed by enthusiasts and largeorganisations such as Microsoft, Uber and Apple, as ashared collaborative system – with the communitycollectively involved in data collection, addressing datagaps, and maintaining data accuracy.Research conducted by the ODI has explored the differentways in which communities can be involved incollaborative data maintenance. This ranges from decidingwhat data to collect to ensure relevance, fairness andequity; to sharing the maintenance and governance of thedata in terms of data access, quality, and inclusiveengagement; to working with open source code whendeveloping the tools to support data collection, use andmanagement. In so doing, collaborative data maintenancecan engage communities and organisations in ways thataddress possible tensions around data governance.To support the application of this framework in differentscenarios, the ODI have produced a Collaborative DataMaintenance Guidebook (see resource guide). For example,if you are collecting data with the community, how do youmanage quality when you might have people with verydifferent skills and experience, and different data-collectiondevices contributing to that dataset? How might you betransparent about how data-quality is managed? And howmight you build consensus across the community? To helpnavigate these issues, the language of patterns can beborrowed from architecture: each building is unique, butarchitects must manage similar tensions or challenges suchas ensuring enough sunlight without causing excessiveheat. A pattern catalogue is not overly prescriptive andinstead allows people to quickly recognise that elements ofa particular solution might work for them and the issue thatthey are working on, because of similarities in their context.Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 7

Leadership and data strategyData leadership is central to collaborative datamaintenance. Within large organisations, there are peoplewho will understand what the value of data is, butawareness may not necessarily exist at the top of thoseorganisations. Within smaller organisations, there may befear of uncovering processes that are not as robust as theymight expect. To deter these issues, adopting a standard iskey, and it should come from a leadership that is focusedon making teams feel comfortable with the process ofadopting more open practices. There is no expectation tobe at a gold standard in the beginning of theimplementation phase – it is more important to focus on thejourney of an enhanced and comprehensive data strategy.Promoting this culture depends on leadership across anumber of stakeholders:Leadership in government: The National Data Strategyshould provide more training opportunities, as well asmore opportunities for organisations to receive financialsupport to improve their data practices, throughorganisational change and access to training.Leadership in big data companies: A lot of big datacompanies use their corporate social responsibility (CSR)funds to support exciting projects, however they couldimprove on how they channel and focus that funding andhow they channel their in-house skills.Leadership in charities: Trustees are key in charities, anddigital trustee roles should be a standard asset and for itto be considered good practice to have a leader drivingdata strategy. It is also worth considering the feasibility offunding local Council for Voluntary Services (CVS)umbrella organisations to and support collaborativemaintenance projects. Charities exist for theirbeneficiaries, and trustees need to be clear on thebenefits offered by collaborative data maintenance. Thatincludes the ability to show impact in relation to acharity’s aims.Managing organisational capacity and volunteersCharities and voluntary organisations are often workingat capacity with very limited time and resources. There islikely to be a need for a triaging of resources to ensurethey are most appropriately spent, especially with limitedvolunteer resources. Strategic thinking is required to usetechnical volunteers and data resources collectively andin an impactful way.Collaborative maintenance projects are often successfulbecause they have made space to work withcommunities to develop solutions together. In doing so,the cost and burden is spread across organisations,sectors and networks. This type of multi-levelengagement pays back dividends in the long-term.There is also work to be done on how to makecollaborative data management work for less excitingtopics or projects. It is easy to get people involved innature-conservation work for example, but there areother projects that people will not want to give up theirtime for, so finding a way to increase the appeal of thesetypes of collaborative projects will be important to realisetheir potential impact.Power dynamicsAdding to the time and funding constraints, the powerimbalances relating to funding need more attention: who isfunding a project and why? Collaborative maintenancemay help address some of the tensions around coproduction, equity and ownership. This requires deep andstrategic thinking about the problems organisations aretrying to solve with data collection, about issues aroundpower dynamics in terms of who is guiding and leadingthese projects, and on the extent to which there is realopportunity for communities to get involved in shapingthem. It is important to avoid a situation where those whoare hosting data or providing the governance are also theones with greater power. There is some ambition amongthose who are least advantaged in society to have a say orto be involved in the process of creating these datasetsand this would start to even out the potential imbalances. Itis important to involve such groups in governance, to avoidhierarchies and power asymmetries.Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 8

Individual and collective rights and interestsData practices and social valueAnother key principle for data governance from the report Data management and use was thatdata governance should offer meaningful and effective protection against both tangible andintangible harms, such as discriminatory treatment or exclusion from opportunities respectively.It should protect both individual rights, goods and benefits, such as health, and collectiverights, goods and benefits, such as protection of the environment.The Ada Lovelace Institute: The need to re-think dataReema Patel, Jenny Brennan and Silvia MollicchiRethinking Data is the Ada Lovelace Institute’s largestscale research and public engagement programme. Itconsiders data systems, their complex and emergentproperties, and the interaction between people and datasystems. The challenge with data is its symbioticrelationship with people and society. In an emerging andcomplex system, it is essential to revisit the fundamentalconcepts that underpin how people think about dataitself, and therefore it is necessary to rethink datathrough narratives, practices and regulation.The opening position here is around the fact that data isnever neutral; it is reflective of society. That means thereis an interesting tension between data conceptualised asan objective description of reality in some way, and dataas non-objective in the sense that it is created bypeople and often serves a certain purpose or deliversa certain outcome.Rethinking data: key issuesThere are issues that many have been articulated indifferent contexts around the exploitation of data.Data is often exploited through different ways, possiblythrough enclosure models, so it is gathered and thendeployed and enclosed. Data enclosure may inhibit theability to treat data as a public good, to achieve itssocial value, and even to fully understand how it canhave social value.The rate of change in the emergent system is anotherissue. There are political and administrative institutions,in the UK and beyond, who struggle to govern data in aholistic way and to acknowledge the central role it has inthe modern world. The governments of France, Germanyand the UK have all been working very rapidly tounderstand the emerging issues that are raised by thenew use and governance of data.There is a challenge around agency and over how datais used. If data is thought of as co-created by people,groups and society, it raises compelling questions aboutthe nature of the relationship between the organisationsthat often use, deploy and apply data, and the individualto whom it relates. The nature of the conversations thatwe have about relationships between the NHS, patientsgoing through NHS, and third party organisations suchas DeepMind or Amazon are examples of the powerasymmetries that may exist in the use of data.Shaping a new future: key conceptsTo shape future relationships between people,communities, data and AI, some key concepts needto be well understood. These include the followingideas and functions.The case for the social value for data needs to be madeand emphasised. It is essential to identify and recognisethat injustices exist in terms of the governance and useof data. There are asymmetries of power, so how shouldthey be tackled to have a more inclusive conversationabout the benefits of data for everyone?Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 9

Data stewardship is a necessary and invaluable functionin terms of the relationships between organisationsholding data and the data subject, and in terms of therights and responsibilities between them. What do thoseresponsibilities look like and what does stewardship inthe context of these rights and responsibilities look like?Purpose-driven innovation is the idea whereby peopletalk about innovation as being responsible for generatingoutcomes that work for people in society. This is aboutensuring the creation of the infrastructure for theeffective use of data, and to enable innovation, isresponsible, has legitimacy, is trustworthy and worksfor people and society.Similarly, progress cannot be made unless the regulatoryframeworks and the right kinds of incentives are in placein the system, and so it is about developing data rights.What is meant by ‘data rights’ is an interesting question,as a there is often focus on a very individualisticconception of data rights rather than a recognition ofrights that may belong to groups of people.Creating an inclusive language is important, along withunderstanding how the narratives around data impact onwho can be involved in the discourse about data.The notion of what good data practices look like ac

Data governance: from principles to practice Civil society, volunteer data science skills, and open datasets 2 Summary There is a significant benefit to be gained from the better use of data, and civil society and volunteer groups can benefit greatly from the use of data that is