Handbook On Data Quality Assessment Methods And Tools

Transcription

EUROPEAN COMMISSIONEUROSTATHandbook on Data QualityAssessment Methods and ToolsMats Bergdahl, Manfred Ehling, Eva Elvers, Erika Földesi,Thomas Körner, Andrea Kron, Peter Lohauß, Kornelia Mag,Vera Morais, Anja Nimmergut, Hans Viggo Sæbø,Ulrike Timm, Maria João ZilhãoManfred Ehling and Thomas Körner (eds)

Cover design: Siri BoquistPhoto: Crestock

Handbook on Data Quality AssessmentMethods and ToolsMats Bergdahl, Manfred Ehling, Eva Elvers, Erika Földesi,Thomas Körner, Andrea Kron, Peter Lohauß, Kornelia Mag,Vera Morais, Anja Nimmergut, Hans Viggo Sæbø,Ulrike Timm, Maria João ZilhãoManfred Ehling and Thomas Körner (eds)

Contributors to the handbook:Manfred Ehling (chair), Federal Statistical Office GermanyThomas Körner (chair till 6.2.2007), Federal Statistical Office GermanyMats Bergdahl, Statistics SwedenEva Elvers, Statistics SwedenErika Földesi, Hungarian Central Statistical OfficeAndrea Kron, Federal Statistical Office GermanyPeter Lohauß, State Statistical Institute Berlin-BrandenburgKornelia Mag, Hungarian Central Statistical OfficeVera Morais, National Statistical Institute of PortugalAnja Nimmergut, Federal Statistical Office GermanyHans Viggo Sæbø, Statistics NorwayKatalin Szép, Hungarian Central Statistical OfficeUlrike Timm, Federal Statistical Office GermanyMaria João Zilhão, National Statistical Institute of PortugalWiesbaden, 2007Reproduction and free distribution, also of parts, for non-commercial purposes are permittedprovided that the source is mentioned. All other rights reserved.

ContentsContentsContents .31234Introduction .51.1 Scope of the Handbook .61.2 Aspects of Data Quality .9Data Quality Assessment Methods and Tools .132.1 Quality Reports and Indicators .132.2 Measurement of Process Variables .232.3 User Surveys .292.4 Self-assessment and Auditing .33Labelling and Certification.413.1 Labelling .413.2 Certification to the International Standard on Market, Opinion and SocialResearch (ISO 20252:2006) .44Towards a Strategy for the Implementation of Data Quality Assessment.474.1 The Fundamental Package.504.2 The Intermediate Package.514.3 The Advanced Package.534.4 Recommendations.54ANNEX A: General Framework of Data Quality Assessment .55ANNEX B: Examples.71Examples for Chapter 2.1: Quality Reports and Indicators .73Examples for Chapter 2.2: Measurement of Process Variables.83Examples for Chapter 2.3: User Surveys .86Examples for Chapter 2.4: Self-assessment and Auditing .90Examples for Chapter 3.1: Labelling .100Examples for Chapter 3.2: Certification to the International Standard on market,opinion and social research (ISO 20252:2006) .102ANNEX C: Basic Quality Tools.109ANNEX D: Glossary .115Abbreviations.121List of Figures and Tables .125References .1293

Introduction1IntroductionProduction of high quality statistics depends on the assessment of data quality. Without asystematic assessment of data quality, the statistical office will risk to lose control of the various statistical processes such as data collection, editing or weighting. Doing without dataquality assessment would result in assuming that the processes can not be further improvedand that problems will always be detected without systematic analysis. At the same time,data quality assessment is a precondition for informing the users about the possible uses ofthe data, or which results could be published with or without a warning. Indeed, without goodapproaches for data quality assessment statistical institutes are working in the blind and canmake no justified claim of being professional and of delivering quality in the first place.Assessing data quality is therefore one of the core aspects of a statistical institute’s work.Consequently, the European Statistics Code of Practice highlights the importance of dataquality assessment in several instances. Its principles require an assessment of the variousproduct quality components like relevance, accuracy (sampling and non-sampling errors),timeliness and punctuality, accessibility and clarity as well as comparability and coherence.The code at the same time requires systematic assessments of the processes, including theoperations in place for data collection, editing, imputation and weighting as well as the dissemination of statistics.Several efforts of implementation of data quality assessment methods have been undertakenin recent years. In succession of the work of Leadership Expert Group (LEG) on Qualitysome development projects have been carried out concerning assessment methods like selfassessment, auditing, user satisfaction surveys etc. (Karlberg and Probst 2004). Also anumber of National Statistical Institutes (NSIs) have developed national approaches (see,e.g., Bergdahl and Lyberg 2004). Nevertheless and despite the importance of the topic beinggenerally agreed, there is no coherent system for data quality assessment in the EuropeanStatistical System (ESS). The report on the ESS self-assessment against the European Statistics Code of Practice points in this direction and suggests that quality control and qualityassurance in the production processes are not very well developed in most NSIs (Eurostat2006c).This Handbook on Data Quality Assessment Methods and Tools (DatQAM) aims at facilitating a systematic implementation of data quality assessment in the ESS. It presents the mostimportant assessment methods: Quality reports, quality indicators, measurement of processvariables, user surveys, self-assessment and auditing, as well as the approaches labellingand certification. The handbook provides a concise description of the data quality assessment methods currently in use. Furthermore, it gives recommendations on how these methods and tools should be implemented and how they should reasonably be combined: An efficient and cost-effective use of the methods requires that they are used in combination witheach other. E.g. quality reports could be the basis for audits and user feedback. The handbook presents numerous successful examples of such combinations. Via the recommendations provided, the handbook at the same time aims at a further harmonisation of data qualityassessment in the ESS and at a coherent implementation of the European Statistics Code ofPractice.The handbook is primarily targeted towards quality managers in the ESS. It shall enablethem to introduce, systematise and improve the work carried out in the field of data qualitymanagement in the light of the experiences of colleagues from other statistical instituteswithin the ESS. The handbook shall also help to avoid overburdening the subject matter statisticians with assessment work and making data quality assessment an effective support fortheir work. Finally, the handbook should support top management in their managerial planning in the quality field.After a short presentation of the basic quality components for products, processes and userperception, chapters 2 and 3 give concise descriptions of each of the methods. The presentation focuses on the practical implementation of the methods and, if applicable, their interlinkages among each other. The handbook also names up-to-date examples from statistical5

Introductioninstitutes (see ANNEX B). In order to facilitate the use of the handbook, the chapters presenting the methods are following a standardised structure covering the following items: Definition and objectives of the method(s) Description of the method(s) Experiences in statistical institutes Recommendations for implementation Interlinkages with other methods (where applicable) Recommended readingsChapter 4 proposes a strategy for the implementation of the methods in different contexts.The handbook recommends a sequential implementation of the methods, identifying threepackages with increasing level of ambition. But of course a particular NSI may apply methods and tools from different packages at the same time given the particular circumstances inwhich they function.The number of pages of the handbook being heavily restricted, the handbook can not go verymuch into detail. Especially in order to be able to present more examples and to elaboratecertain aspects in more detail, a comprehensive annex is provided together with the handbook. First it includes a background paper on the position of data quality assessment in thegeneral framework of quality management (ANNEX A). ANNEX B presents good practiceexamples in some more detail. Furthermore, the annex provides a systematic presentation ofbasic quality tools (ANNEX C) and a glossary (ANNEX D).1.1 Scope of the HandbookData quality assessment is an important part of the overall quality management system of astatistical agency (see ANNEX A for more details). However, its scope is limited to the statistical products and certain aspects of the processes leading to their production. Thus, thehandbook does not cover areas like the support processes, management systems or leadership. Neither does it cover the institutional environment of statistics production.Figure 1 shows the issues of DatQAM within the context of quality management. It also refers to the relevant principles in the European Statistics Code of Practice.Figure 1: Scope of the handbook within the context of quality managementElements of a quality management systemCorresponding principles from theEuropean Statistics Code of PracticeUser needsManagement systems& leadershipStatistical productsRelevance, accuracy and reliability, timeliness and punctuality, coherence and comparability, accessibility and claritySupport processesProduction processesSound methodology, appropriate statisticalprocedures, non-excessive burden onrespondents, cost effectivenessInstitutional environment6Professional independence, mandate fordata collection, adequacy of resources,quality commitment, statistical confidentiality, impartiality and objectivity

IntroductionThe methods and tools presented in this handbook facilitate an assessment of statisticalproducts, statistics production processes, as well as the user perception of statistical products.Before discussing methods and tools it should be clarified what is meant by method and whatis meant by tool. In the context of this handbook the term assessment method refers to theapproach of evaluation, e.g. documenting/reporting, calculating (indicators), auditing, selfassessing, questioning the user. The term assessment tool refers to the concrete form howthe method is implemented, e.g. producing a quality report, calculating key indicators, anauditing procedure, a checklist or a user survey.To a certain degree, the methods are relying on a number of preconditions. On the one hand,the application of data quality assessment methods always requires some basic informationon the products and processes under consideration. For this reason, at least a basic systematic quality measurement regarding processes and products should be in place. There alsohas to be some documentation system giving access to key characteristics of the productsand processes. On the other hand, data quality assessment methods require an (internal orexternal) reference against which the assessment can be carried out. Such reference can beprovided in the form of general quality guidelines, policies, minimum standards, ISO (International Organization for Standardization) standards or as process specific guidelines (e.g. forquestionnaire testing or editing). Similarly, the user requirements are a further key input todata quality assessment.As figure 2 shows, different data quality assessment me

data quality assessment is a precondition for informing the users about the possible uses of the data, or which results could be published with or without a warning. Indeed, without good approaches for data quality assessment statistical institutes are working in the blind and can make no justified claim of being professional and of delivering quality in the first place. Assessing data quality .