Chapter 5. The Role Of Big Data For ICT Monitoring And For Development

Transcription

Measuring the Information Society Report 2014Chapter 5. The role of big data for ICTmonitoring and for development5.1 IntroductionOne of the key challenges in measuring theinformation society has been the lack of upto-date and reliable data, in particular fromdeveloping countries. The information andcommunication technologies (ICT) sector isevolving rapidly, as are the types of service andapplication that are driving the informationsociety, all of which makes identifying andtracking new trends even more challenging.As the key global source for internationallycomparable ICT statistics, ITU is continuouslyworking to improve the availability and qualityof those statistics and identify new data sources.In this context, the emergence of big data holdsgreat promise, and there is an opportunity toexplore their use in order to complement theexisting, but often limited, ICT data.There is no unique definition of the relativelynew phenomenon known as big data. At themost basic level it is understood as being datasets whose volume, velocity or variety is veryhigh compared to the kinds of datasets thathave traditionally been used. The emergence ofbig data is closely linked to advances in ICTs. Intoday’s hyper-connected digital world, peopleand things leave digital footprints in manydifferent forms and through ever-increasingdata flows originating from, among other things,commercial transactions, private and publicrecords that companies and governments collectand store about their clients and citizens, usergenerated online content such as photos, videos,tweets and other messages, but also tracesleft by the Internet of Things (IoT), i.e. by thoseuniquely identifiable objects that can be tracked.Big data have great potential to help producenew and insightful information, and thereis a growing debate on how businesses,governments and citizens can maximize thebenefits of big data. Although it was the privatesector that first used big data to enhanceefficiency and increase revenues, the practice hasexpanded to the global statistical community.The United Nations Statistical Commission(UNSC) and national statistical organizations(NSOs) are looking into ways of using big datasources to complement official statistics andbetter meet their objectives for providing timelyand accurate evidence for policy-making.1So far, there is limited evidence as to thevalue added by big data in the context ofmonitoring of the information society, and173

Chapter 5. The role of big data for ICT monitoring and for developmentthere is a need to explore its potential as a newdata source. While existing data can providea relatively accurate picture of the spread oftelecommunication networks and services,there are significant data gaps when it comesto understanding the development of theinformation society. Relatively little information,for example, is available on the demand side.While an increasing number of countriescurrently collect data on the individual useof ICTs, many developing countries do notproduce such information (collected throughhousehold surveys or national populationand housing censuses) on a regular basis.Consequently, not enough data are availableabout the types of activity that the Internet isused for, and little is known about the Internetuser in terms of age, gender, educational orincome level, and so on.In other areas, such as education, health orpublic services, even fewer data are availableto show developments over time and enableinformed policy decisions. The recentlypublished Final WSIS Targets Review report(Partnership, 2014), which attempts to assessdevelopments in the information societybetween 2003 and 2013/14, shows that littleinformation is available to track progress overtime. It is obvious that greater efforts must bemade to overcome the lack of reliable, timelyand relevant statistics on the informationsociety, and that big data have the potential tohelp realize those efforts.In addition to the data produced and held bytelecommunication operators, the broader ICTsector, which includes not just telecommunicationcompanies but also over-the-top (OTT) serviceproviders such as Google, Twitter, Facebook,WhatsApp, Netflix, Amazon and many others,captures a wide array of behavioural data.Together, these data sources hold great promisefor ICT monitoring, and this chapter will explorethe potential of today’s hyper-connecteddigital world to expand on existing access andinfrastructure indicators and move towardsindicators on use, quality and equality of use.174At the same time, there is a growing debateon the role and potential of big data when itcomes to providing new insights for broadersocial and economic development. Big data arealready being leveraged to understand socioeconomic well-being, forecast unemploymentand analyse societal ties. Big data from theICT industry play a particularly importantrole because they are the only stream of bigdata with global socio-economic coverage. Inparticular, mobile telephone access is quasiubiquitous, and ITU estimates that by the end of2014 the number of global mobile subscriptionswill be approaching 7 billion. At the same time,almost 3 billion people – 40 per cent of theworld’s population – will be using the Internet.In recent years, moreover, the strongest growthin telecommunication access and use has beenrecorded in the developing economies, whereICT penetration levels have increased and wherebig data hold great promise for development.However, while there are a growing number ofresearch collaborations and promising proofof-concept studies, no significant project hasyet been brought to a replicable scale in thedevelopment sphere. Future efforts will haveto overcome a number of barriers, includingthe development of models to protect userprivacy while at the same time allowing for theextraction of insights that can improve servicedelivery to low-income populations. To this end,this chapter will contribute to the debate on bigdata for development, highlight advances, pointto some best practices and identify challenges,including in regard to the production andsharing of big data for development.The chapter will first (in Section 5.2) describesome of the current big data trends anddefinitions, highlight the technologicaldevelopments that have facilitated theemergence of big data, and identify the mainsources and uses of big data, including theuse of big data for development and ICTmonitoring. Section 5.3 will examine the rangeand type of data that telecommunicationcompanies, in particular mobile-cellularoperators, produce, and how those data are

Measuring the Information Society Report 2014currently being used to track ICT developmentsand improve their business. Section 5.4looks at the ways in which telecom bigdata may be used to complement officialICT statistics and assist in the provision ofnew evidence for a host of policy domains,while Section 5.5 discusses the challenges ofleveraging big data for ICT monitoring andbroader development, including in terms ofstandardization and privacy. It will also makesome recommendations for mainstreaming andfully exploiting telecom big data for monitoringand for social and economic development,in particular with regard to the differentstakeholders involved in the area of big datafrom the ICT industry.5.2 Big data sources, trendsand analyticsWith the origins of the term “big data” beingshared between academic circles, industry andthe media, the term itself is amorphous, withno single definition (Ward and Barker, 2013).At the most basic level of understanding, itusually refers to large and complex datasets,and reflects advances in technology thatmake it possible to capture, store and processincreasing amounts of data from different datasources. Indeed, one of the key trends fosteringthe emergence of big data is the massive“datafication” and digitization, including ofhuman activity, into digital “breadcrumbs” or“footprints”.In an increasingly digitized world, big data aregenerated in digital form from a number ofsources. They include administrative records(for example, bank or electronic medicalrecords), commercial transactions between twoentities (such as online purchases or credit cardtransactions), sensors and tracking devices (forexample, mobile phones and GPS devices), andactivities carried out by users on the Internet(including searches and social media content)(Table 5.1).Big data is not just about the volume of thedata. One of the earliest definitions, introducedby the Gartner consultancy firm, describesbig data characteristics such as velocity andvariety, in addition to volume (Laney, 2001).“Velocity” refers to the speed at which data aregenerated, assessed and analysed, while theTable 5.1: Sources of big dataSourcesSome examplesAdministrative data Electronic medical records Insurance records Tax recordsCommercial transactions Bank transactions (inter-bank as well as personal)Credit card transactionsSupermarket purchasesOnline purchasesSensors and tracking devices Road and traffic sensorsClimate sensorsEquipment and infrastructure sensorsMobile phonesSatellite/GPS devicesOnline activities/social media Online search activities Online page views Blogs and posts and other authored and unauthored online content and social mediaactivities Audio/images/videosSource: ITU, adapted from UNSC (2013).175

Chapter 5. The role of big data for ICT monitoring and for developmentFigure 5.1: The five Vs of big dataVELOCITYSpeed at whichdata aregenerated andanalyzedDifferenttypes and forms ofdata, includinglarge amounts ofunstructured dataVARIETYVALUEPotential of bigdata forsocioeconomicdevelopmentLevel ofquality,accuracy anduncertainty ofdata and datasourcesVERACITYVOLUMEVast amountsof datagenerated throughlarge-scaledataficationand digitization ofinformationSource: ITU.term “variety” encompasses the fact that datacan exist as different media (text, audio andvideo) and come in different formats (structuredand unstructured). The three-Vs definition hascaught on and been expanded upon. A fourthV – veracity – was introduced to capture aspectsrelating to data quality and provenance, and theuncertainty that may exist in their analysis (IBM,2013). A fifth V – value – is included by someto acknowledge the potentially high socioeconomic value that may be generated by bigdata (Jones, 2012) (Figure 5.1).at the forefront of extracting value from thisdata deluge. Encouraged by promising resultsbut also reduced budgets, the public sector isturning towards big data to improve its servicedelivery and increase operational efficiency. Inaddition, there are uses for big data in broaderdevelopment and monitoring, and there is anincreasing focus on big data’s role in producingtimely (even real-time) information, as well asnew insights that can be used to drive social andeconomic well-being.Included within the scope of big data is thecategory of transaction-generated data (TGD),2also sometimes described as “data exhaust” or“trace data”. These are digital records or tracesthat have been generated as by-products ofdoing things (such as processing payments,making a phone call and so on) that leave behindbits of information. The value of this subset ofbig data is that it is directly connected to humanbehaviour and its accuracy is generally high.Most of the data captured by telecommunicationcompanies can be classified as TGD.Big data uses by the private and public sectorsAs is often the case with technologicalinnovation, it is the private sector that has been176Marketing professionals, whose constant aimis to understand their customers, are nowincreasingly shifting from conventional methods,such as surveys, to the extraction of customerpreferences from the analysis of big data.Walmart, the world’s biggest retailer, has beenone of the largest and earliest users of big data.In 2004, it discovered that the snack food knownas Pop Tarts was heavily purchased by UnitedStates citizens preparing for serious weatherevents such as hurricanes. The correlationanalysis revealed a behaviour associated with

Measuring the Information Society Report 2014a specific condition that then led Walmart toimprove its production chain – in this case, byincreasing the supply of Pop Tarts to areas likelyto be affected by a disaster. Walmart has alsomade use of predictive analytics, which usespersonal information and purchasing patternsto extrapolate to a likely future behaviour, andto better target and address customer needs.Together, large-scale automated correlationanalysis and predictive analytics are two of thekey techniques that have helped unleash thevalue of big data.Nor is the private sector’s use of big datatechniques restricted solely to market research.Companies and whole industries (healthcare,energy and utilities, transport, etc.) are usingsuch techniques to optimize supply chains andproduction (see Box 5.1 for an example fromthe energy industry). New value is extractedby being able to link new information oncustomers to the production process in a waythat enables companies to tailor and segmenttheir products at low cost. Firms that are highlyproficient in their use of data-driven decisionmaking have been found to have productivitylevels up to 6 per cent higher than firms makingminimal to no use of data for decision-making(Brynjolfsson, Hitt and Kim, 2011). Significantly,industries now have the ability to conductcontrolled experiments at a scale and witha speed that are unprecedented. Google,for example, is running about a thousandexperiments at any given point in time (Varian,2013a). Telecom network operators makeextensive use of such techniques when rollingout new services, among other things for thepurpose of pricing. Telecom operators also usebig data techniques to understand and controlchurn, optimize their management of customerrelations and manage their network quality andperformance.These fundamental shifts in data exploitationto generate new socio-economic value,coupled with the simultaneous emergenceof new rich data sources that can potentiallybe linked together and analysed with ease,have also sparked the interest of governments,researchers and development agencies.Encouraged by the potential of big data toproduce new insights and slimmer budgets,governments (at all levels) are now looking toexploit big data and increase the application ofdata analytics to a range of activities, includingmonitoring and improvement of tax complianceand revenues, crime detection and prediction,and improvement of public service delivery(Giles, 2012; Lazer et al., 2009).To this end, governments, in addition to thedata they collect and generate themselves,Box 5.1: How big data saves energy – Vestas Wind Systems improves turbine performanceVestas, a global energy company dedicated to wind energy, withinstallations in over 70 countries, has used big data platforms toimprove the modelling of wind energy production and identifythe optimal placement for turbines.Wind turbines represent a major investment and have a typicallifespan of 20 to 30 years. To determine the optimal placement fora turbine, a large number of location-dependent factors must beconsidered, including temperature, precipitation, wind velocity,humidity and atmospheric pressure. By using big data techniquesbased on a large set of factors and an extended set of structured andunstructured data, Vestas was able to significantly improve customerturbine location models and optimize turbine performance.Big data have enabled the creation of a new informationenvironment and allowed the company to manage and analyseweather and location data in ways that were previously notpossible. These new insights have led to improved decisionsrelating not only to wind turbine placement and operation,but also to more accurate power-production forecasts, not tomention greater business-case certainty, speedier results, andincreased predictability and reliability. This reduces the cost tocustomers per kilowatt-hour produced, while increasing theaccuracy of the customer’s return-on-investment estimates.Source: ITU, based on IBM (2012).177

Chapter 5. The role of big data for ICT monitoring and for developmentcomplement their official statistics byleveraging data from new sources, includingcrowd-sourced data generated by the public.In the United States, for example, Boston CityHall released the mobile app “Street Bump”,which uses a phone’s accelerometer to detectpotholes while the app user is driving aroundBoston and notifies City Hall.3 Some of therichest data sources for enabling governmentsand development agencies to improve servicedelivery are actually external. Such externaldata include those captured and/or collectedby the private sector, as well as the digitalbreadcrumbs left behind by citizens as they goabout their daily lives.According to a recently published White Housereport, United States government agenciescan make use of public and private databasesand big data analytics to improve publicadministration, from land management to theadministration of benefits. The Department ofthe Treasury has set up a “Do Not Pay” portal,which links various databases and identifiesineligible recipients to avoid wrong paymentsand reduce waste and fraud4 (The White House,2014).Big data for development and ICTmonitoringOne of the richest sources of big data isthe data captured by the use of ICTs. Thisbroadly includes data captured directlyby telecommunication operators as wellas by Internet companies and by contentproviders such as Google, Facebook, Twitter,etc. Big data from the ICT services industryare already helping to produce large-scaledevelopment insights of relevance to publicpolicy. Collectively, they can provide richand potentially real-time insights to a hostof policy domains. It should be noted that insome countries and regions the use of bigdata, including big data from the ICT industry,is subject to national regulation. In the EU, forexample, a number of directives require data178producers to obtain users’ consent beforegathering any of their personal data.5One of the best-known examples of leveragingthe online population’s digital breadcrumbs fordevelopment purposes is Google Flu Trends(GFT). Following its launch in 2008, GFT wasremarkably accurate in tracking the spread ofinfluenza in the United States, doing so morerapidly than the Centers for Disease Control andPrevention (CDC), with a lag time of only one dayas opposed to one week. Although it has sincebeen subject to criticism (see Section 5.5), GFTwas held up as an outstanding example of bigdata in action and of the great potential of bigdata for broader development and monitoring(Mayer-Schönberger and Cukier, 2013; McAfeeand Brynjolfsson, 2012). GFT worked bymonitoring health-seeking behaviour expressedthrough online searches, with the search termsbeing correlated wherever they related to flu-likesymptoms (Ginsberg et al., 2009). This proved tobe so successful that it spawned similar effortsfocusing on the use of search-engine data tounderstand dengue fever outbreaks,6 monitorprescription drug use (Simmering, Polgreen andPolgreen, 2014), predict unemployment claimsin the United States (Choi and Varian, 2009) andGermany (Askitas and Zimmermann, 2009), andforecast near-term values for economic indicatorssuch as car and home sales and internationalvisitor statistics (Choi and Varian, 2012).The Internet has also been a rich source of bigdata beyond the realm of user search terms.Online job-posting data are being used tosupplement traditional labour statistics in theUnited States7 and other countries. In anothereffort, an academic project at MIT known as theBillion Prices Project collects high-frequency pricedata from hundreds of online retailers.8 The dataare then used by researchers to understand awhole host of macroeconomic questions relatingto, among other things, pricing behaviour, dailyinflation and asset-price movements. This has theadvantage of providing near real-time inflationstatistics that are traditionally published monthly.

Measuring the Information Society Report 2014Box 5.2: How Twitter helps understand key post-2015 development concernsAs the process of formulating the post-2015 developmentagenda continues, UN Global Pulse and the MillenniumCampaign are using big data and visual analytics to identify themost pressing development topics that people around the worldare concerned about and consider a priority.all the other topics. This information provides insight as to whichTheir interactive visualization tool shows the 16 topics thatpeople have tweeted the most about. Users can select a countryto see the number of tweets generated by its Twitter users inregard to the highlighted topic, as compared to tweets aboutranked, in 7th position, was phone and Internet access. By clickingof the various post-2015 issues are being talked about the most.In September 2014, at the global (“all countries”) level, An honestand responsive government was the key priority, followed by Betterjob opportunities and Freedom from discrimination. Also highlyon any of the data points in the chart, the application providesinformation on the number of tweets (per month) for each topic.It also lists the top words that those tweets contained.Chart Box 5.2: Using Twitter to visualize trends in global development topics1'200'000An honest andresponsive government1'000'000Better job opportunitiesFreedom fromdiscrimination800'000A good education600'000Political freedoms400'000Protecting forests riversand oceans200'000Phone and Internet accessEquality between menand womenSept. Oct. Nov. Dec. Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sept.2013 2013 2013 2013 2014 2014 2014 2014 2014 2014 2014 2014 2014Source: UN Global Pulse, see: http://post2015.unglobalpulse.net/#.UN Global Pulse, a UN initiative to use big datafor sustainable development and humanitarianaction, has been mining Twitter data fromIndonesia (where Twitter usage is high)9 tounderstand food price crises. Global Pulse wasable to identify a consistent pattern amongspecific food-related tweets and the daily foodprice index. In fact, it was able to use predictiveanalytics on the Twitter data to forecast theconsumer price index several weeks in advance(Byrne, 2013). As discussions on the post-2015development agenda continue, UN GlobalPulse is also using Twitter data to understandand compare the relevance of differentdevelopment topics among countries (Box 5.2).In fact, the ICT sector is itself using the Internetas a source of big data for monitoring purposes.Regulators and others are now using theInternet to crowd-source quality of service(QoS) data on broadband quality. For example,the United States Federal CommunicationsCommission (FCC) has released mobileapps that enable consumers to check theirbroadband quality. The test results, whichare anonymous, are then used by FCC tounderstand and address coverage and qualityissues in different areas.10179

Chapter 5. The role of big data for ICT monitoring and for developmentMobile dataDespite the rapid growth in Internet access,60 per cent of the world’s population is stillnot using the Internet. Household Internetpenetration in developing economies isexpected to reach 31 per cent by the endof 2014, as against almost 80 per cent indeveloped economies. In addition, as Internetpenetration rates remain limited, Internet usersare not (yet) representative of the populationat large. For example, Internet users tend to beyounger, relatively well educated, with men stillmore likely to be online than women, especiallyin developing countries11 (ITU, 2013).Depending on the source of Internet data,results may also be more or less biased. A 2013study into the characteristics and behaviourof Facebook users, for example, revealedthat while in many ways Facebook usershave real-life behaviour and characteristics,in many ways the social network fails as arepresentation of society. On the one hand,for example, the American Facebook user’srelationship status of “married” on Facebookis very similar to real life (census) data onthe average age when American people getmarried. On the other hand, however, theaverage American Facebook user is muchyounger than the average citizen.12 This is justone example, but it highlights the need to takeaccount of particular characteristics and thelimitations of producing representative resultswhen extracting information from online users’behaviour.Given the popularity of mobile-cellularservices, non-Internet-related mobilenetwork big data seems to have the widestsocioeconomic coverage in the near term,and the greatest potential to producerelatively representative information globally,particularly in developing countries. Bythe end of 2014, the number of mobilecellular subscriptions is expected to benearing 7 billion, and the number of mobilecellular subscriptions per 100 inhabitants is180expected to reach 90 per cent. Mobile dataare already being utilized for research andpolicy-making, not only in developed butalso in developing economies.There are various examples of how mobilephone records have been used to identifysocio-economic patterns and migrationpatterns, describe local, national andinternational societal ties, and forecasteconomic developments.13 Data are also beingused to improve responsiveness in the eventof natural disasters or disease outbreaks. Lu,Bengtsson and Holme (2012) used mobile callrecords to study the population displacementsfollowing Haiti’s 2010 earthquake, with aview to using such methods to improve theeffectiveness of humanitarian relief operationsimmediately after a disaster. Call recordshave also been merged with epidemiologicaldata to understand the spread of malaria inKenya (Pindolia et al., 2012; Wesolowski et al.,2012a), and of cholera in Haiti after the 2010earthquake (Bengtsson, Lu, Thorson, Garfieldand von Schreeb, 2011) and in Côte d’Ivoire(Azman, Urquhart, Zaitchik and Lessler, 2013).Mobile network big data have been utilizedto great effect in the area of transportation,helping to measure and model people’smovements (even in real time) and understandtraffic flows (Wu et al., 2013).It is evident from the examples given that bigdata from the ICT sector, and especially thoseavailable to telecommunication operators,have wide applicability for informing multiplepublic policy domains. Leveraging such datato complement official statistics and facilitatebroader development will enable governmentsas well as development agencies to betterserve their citizens and beneficiaries. Less usehas thus far been made of telecommunicationbig data with a view to understanding itspotential for producing additional informationand statistics on the information society.In assessing that potential, including thepotential for providing complementary

Measuring the Information Society Report 2014information on the development of theinformation society, it is first important tobetter understand the type of data that can bemade available.5.3 Telecommunication dataand their potential for bigdata analyticsFixed and mobile telecommunication networkoperators, including Internet service providers(ISPs), are an important source of data andfor the purpose of this chapter, all forms oftelecommunication big data (either volume,velocity or variety) are being considered. Mosttelecommunication data can be considered asTGD,14 that is, the result of an action undertaken(such as making a call, sending an SMS, accessingthe Internet or recharging a prepaid card).Since the service with the widest coverage andgreatest uptake and popularity is the mobilecellular service, data from mobile operators havethe greatest potential to produce representativeresults and reveal developmental insightson the population, including in developingcountries and, increasingly, low-income areas.Not surprisingly, the big data for developmentinitiatives (outlined in Section 2.2) have mainlydrawn on mobile-network big data rather thanon those from fixed-telephone operators orISPs. Figure 5.2 illustrates some of the similaritiesand differences in the type of informationthat mobile-network operators, as opposed tofixed-telephone operators and ISPs, produce,and shows some of the additional insights, inparticular in terms of the location and mobilityinformation that mobile networks and servicesgenerate.Telecommunication dataThe mobile telecommunication data thatoperators possess can be classified into differenttypes, depending on the nature of the informationthey produce. They include traffic data, serviceaccess detail records, location and movementdata, device characteristics, customer details andtariff data. For a more detailed overview of thesetypes of data, see Chapter 5 Annex.To collect traffic data, operators use a range ofmetrics to understand and manage the trafficflowing through their networks, including themeasurement of Internet data volumes, call, SMSand MMS volumes, and value-added service(VAS) volumes. Internet service providers canalso use deep packet inspection (DPI),15 whichis a special process for scanning data packagestransiting the network.Service access detail records, including calldetail records (CDRs), are collected by operatorswhenever clients use a service. They are usedto manage the infrastructure and for billingpurposes, and include information on thetime and duration of services used and thetechnology used, for example, for the mobilenetwork (2G, 3G, etc.). These data are potentiallyalso very useful for building a rich profile ofcustomers, as outlined in this section.Mobile networks capture a range of movementand location variables to identify user locationand movement patterns. The degree of accuracyof this information depends on a number offactors, including the network used and devicegeneration, and can be broadly classifiedinto two different types: passive and activepositioning data, with the latter providing moredetailed and precise location information.Since mobile user devices used to access m

fully exploiting telecom big data for monitoring and for social and economic development, in particular with regard to the different stakeholders involved in the area of big data from the ICT industry. 5.2 Big data sources, trends and analytics With the origins of the term "big data" being shared between academic circles, industry and