Analytics-Driven Smart Community Framework -

Transcription

Analytics-Driven Smart CommunityFrameworkWei LinAlbert TuranoApplication and Big Data/IoT TransformationApplication and Big Data/IoT TransformationDell EMC ServicesDell EMC Global Servicesw.lin@dell.comalbert.turano@dell.comJosh SiegelWilliam SchmarzoApplication and Big Data/IoT TransformationDell EMC Big Data ConsultingDell EMC Global Serviceswilliam.schmarzo@dell.comChief Data Scientist,Director,joshua.siegel@dell.comKnowledge Sharing Article 2017 Dell Inc. or its subsidiaries.Director,CTO

Table of ContentsAbstract. 31.Introduction . 32.Smart Community Analytics . 42.1 Streaming analytics . 52.2 Machine learning . 52.3 Spatial analytics . 62.4 Time series analytics . 63.Smart Community Objectives . 74.Smart Community Analytics: Improve the Flow of patients through our hospitals . 8. Smart Community Analytics: Increase Education Efficiency and Quality.225.226.Smart Community Analytics: Prevent Unplanned Community Electricity Interruptions .327.Conclusion .47References .48Disclaimer: The views, processes or methodologies published in this article are those of theauthors. They do not necessarily reflect Dell EMC’s views, processes or methodologies.2017 Dell EMC Proven Professional Knowledge Sharing2

AbstractCities and communities in the U.S. and around the world are entering a new era oftransformational change, in which their inhabitants and the surrounding built and naturalenvironments are increasingly connected by smart technologies, leading to new opportunities forinnovation, improved services, and enhanced quality of life.The goal of this paper is to demonstrate the basic framework to support interdisciplinary andintegrative activities that will improve understanding of smart and connected communities andlead to enabling sustainable change to enhance community functioning. There are three basiccommunity-supporting functions discussed:1. Improve community healthcare2. Improve education quality3. Prevent unplanned energy interruption1. IntroductionSmart communities [2] are driven by commitments to better use their natural resources, generateeconomic opportunities, and preserve the natural quality of the region. These approaches lead tooptimization in areas of land use, economic opportunity, transportation, infrastructure, real estate,education, healthcare, food systems, communication, renewable energy solutions, and practicesof sustainable development. In the basic analytics framework for the smart community,healthcare, education and renewable energy solutions are the focus.From the data and analytics perspectives, Smart Community analytics will require operating withheterogeneous data types, different levels of data granularities on device/protocol level, and/oron network/middleware aggregated levels. Smart community data presents various naturalrhythms from connected assets, which are generated by collaboration with other organizationsand systems. Detailed data can provide previously undiscovered or hidden visibility into events’turn-key points and could potentially execute decision support automation accordingly.Combining both characteristics above, it forms an analytics-driven smart community. Theanalytics-driven smart community could leverage data from the ever-growing network of humanorganizations, and physical objects that feature a cohesive data store (such as a data lake) andIP addressed connections for internet connectivity [1]. One of the benefits of the smartcommunity is to alleviate the decision-making challenges imposed by information collectionprocesses, e.g. long process/wait time, lacking information about data (creation) geolocations anddata generations’ circumstances. These constraints could be overcome by predictive analyticsand smart devices-to-smart devices communications.It is possible to integrate both physical and digital worlds’ information into useable knowledge(right time/location/actionable and more accurate) for fast communication, fast responses andbetter comprehensive optimization. It could greatly extend our capabilities within a community toexamine complex questions better, evaluate decision thresholds, and predict future scenarios.2017 Dell EMC Proven Professional Knowledge Sharing3

Execution of analytical cycles used in this content consist of Descriptive, Exploration, Predictive,and Prescriptive. These are the four essential analytics analyses components to understandemerged patterns, to convert outliers into controllable variables, to estimate time to events, tomake real time prediction and to lead to analytics-driven smart interactions.This paper is arranged in sections. Section 1 is an Introduction of the Basic Analytics Framework for a Smart CommunitySection 2 describes Smart Community AnalyticsSection 3 describes Smart Community ObjectivesSection 4 describes Smart Community Analytics: Improve the Flow of patients through ourhospitalsSection 5 describes Smart Community Analytics: Increase Education Efficiency andQualitySection 6 describes Smart Community Analytics: Prevent Unplanned CommunityElectricity Interruptions andSection 7 is the Conclusion.2. Smart Community AnalyticsDespite the existence of abundant technological tools to collect and assess data almostinstantaneously, enormous time lags persist in reporting vital statistics and proper responses toconcerned events. More up-to-date and extensive data for business decisions are urgentlyneeded.There needs to be a methodology for estimating urgency and degree of treatments in order toprovide more timely data that integrates all relevant information, e.g. including findings data thathave become available in the network.Improvements in collecting high-quality data on events decide how the knowledge can beconsumed and will allow for a more complete and current assessment of the state ofasynchronous data. It provides more effective steps to reduce risks or increase values. Themodeling and/or scorings are based on source data and data sources that are continually revisedby the Business which is responsible for its compilation, and data review within each predefinedtime intervals to reflect these revisions.These revisions result in improvements in the data, but they also mean that the Smart communityanalytics modeling scores from different years’ Smart community modeling scores may not becomparable with one another. However, the Smart Community modeling scores’ objectives suchas throughput (for example, the velocity one moves through a system, such as a hospitalemergency room) contains a relatively absolute number to be comparable with one another,allowing for in-depth analyses of trends.This multidimensional approach offers several advantages. It reflects the treatment situation notonly of the population as a whole, but also of a risk-vulnerable segment for which a lack ofresponse leads to a high risk of negative development, and long term impacts. In addition, by2017 Dell EMC Proven Professional Knowledge Sharing4

combining independently measured indicators, it reduces the effects of random measurementerrors. The risk is ranking on a 100-point scale in which zero is the best score (no risk) and 100is the worst, although neither of these extremes is reached in practice. The next step is to processthe severity index from “extremely alarming to low alarming” and associate with the range of othervariables’ possible scores.The analytics driven Smart community includes the following five analytics.2.1 Streaming analyticsStreaming analytics [9], also called event stream processing, is the analysis of large, in-motiondata called event streams. These streams comprise events that occur as the result of an actionor set of actions, such as a financial transaction, equipment failure, or some other trigger at theGlobal, Regional, or National level.Stream Analytics performs real time analytic computations on data streaming from devices,sensors, web sites, social media, applications, infrastructure systems, and more.Stream data or processing’s job specifies the input sources and syncs outputs as the results intostorage by monitoring and adjusting the scale/speed to form a kilobyte to a gigabyte or more ofevents processed per second.Scenarios of real-time streaming analytics can be evaluated. Examples of such include,personalized, real-time stock-trading analysis and alerts, real-time fraud detection, data andidentity protection services, analysis of data generated by sensors and actuators embedded inphysical objects e.g. Internet of Things, web clickstream analytics and customer relationshipmanagement applications issuing alerts when a customer’s experiences within a time frame isdegraded.Stream Analytics leverage stream ingestion. Analytics results can be written from StreamAnalytics to Storage Blobs or Tables in the Data Lake Stores, Event Hubs, Service Bus Topics orMessaging Queues, and Business Intelligence (BI), where it can then be visualized, furtherprocessed by workflows, used in batch analytics or processed again as a series of events. It ispossible to compose multiple Stream Analytics together with other data sources and processingagain as the secondary streaming computations.2.2 Machine learningMachine learning [8] is commonly defined as "gives computers the ability to learn without beingexplicitly programmed". From the study of pattern recognition and computational learning theoryin artificial intelligence, machine learning explores the study and construction of algorithms thatcan learn from and make predictions on data. Such algorithms overcome the rigidity of staticprogram instructions and make data-driven predictions or decisions. Through building a modelfrom sample inputs, machine learning could be employed in a range of computing tasks to takeadvantage of designing and programming explicit algorithms.Machine learning focuses on prediction-making through the use of computers. It has ties tomathematical optimization, which delivers prescriptive methods, theory and application. Machinelearning also includes data mining capacities on exploratory data analysis known as unsupervised2017 Dell EMC Proven Professional Knowledge Sharing5

learning and can be used to learn and establish baseline behavioral profiles and then further beleveraged to find meaningful anomalies.2.3 Spatial analyticsSpatial analysis [10] is a type of geographical analysis which seeks to explain patterns of humanbehavior and its spatial expression in terms of mathematics and geometry, that is, locationalanalysis, e.g. nearest neighbor analysis and polygons. Many of the models are grounded in microeconomics and predict the spatial patterns which should occur in the growth of networks andurban systems, given a number of preconditions such as the isotropic plain, movementminimization, and profit maximization.The methodology of spatial analysis includes geo-computation/spatial statistical theory and studyof entities using their topological, geometric, or geographic properties. Spatial analysisapplications extend to a variety of fields such as studies of the placement relationships and "dropand route" algorithms to build complex routing structures.Spatial analysis has fundamental dependencies in the definition of its objects of study,construction of the analytic operations, and limitations of particularities of the analyses [2]. Spatialdependency is the co-variation of properties within geographic space: characteristics at proximallocations appear to be correlated, either positively or negatively. Spatial dependency leads to thespatial autocorrelation problem in statistics such as temporal autocorrelations. This changessome assumptions of statistical observations or samples that assume independence amongobservations. It is also possible use spatial dependency as a source of information rather thansomething to be corrected.Spatial sampling involves determining a limited number of locations in geographic space forproperly measuring events that are subject to dependency and heterogeneity. Basic spatialsampling schemes including random, clustered and systematic and can be applied at multiplelevels in a designated spatial hierarchy (e.g. urban area, zip, city, zone, street, and block). It isalso possible to exploit ancillary data, e.g. property values, interesting groups’ density, and thepopulation spatial sampling scheme, which is used to correlate and measure elements such aseducational attainment and income. Spatial models such as autocorrelation statistics, regressionand interpolation can also dictate sampling design.2.4 Time series analyticsA time series is a sequence taken at successive spaced points in time, namely, a sequence ofone-way order, discrete time data. Time series analysis can be applied to real-valued, continuousdata, discrete numeric data, or discrete symbolic data. Examples of time series are hourlytemperature of a patient, minute by minute counts of travelers in an airport, daily closing value ofthe Dow Jones Industrial Average, and largely in any domain of applied science and engineeringwhich involves temporal measurements.Time series analysis [8] comprises methods for analyzing time series data in order to extractmeaningful statistics and characteristics of the data such as trend, seasonality and randomness.Time series forecasting is the use of a model to predict future values based on previouslyobserved values. Time series data have a natural temporal ordering. This makes time series2017 Dell EMC Proven Professional Knowledge Sharing6

analysis work well with cross-sectional studies, e.g. correlating trends of wages by reference toone’s respective education levels, where the individuals' data could be entered to reference.Regression analysis is often employed first to test the hypothesis of one or more independenttime series’ effect on another time series. Regression could compare values of a single time seriesor multiple dependent time series at the same or different points in time. A time series stochasticmodel generally could reflect the fact that observations close together in time will be more closelyrelated than observations further apart.3. Smart Community ObjectivesFor a given municipal community, there is detailed historicalhousehold information such as name, address, social securitynumber, property values, tax map, boundaries of counties, etc.Public service installations within the community are tracked in themunicipality. Figure 1 is an example of a township data list that isavailable for public accessing.To transform a community to a smart community, the community willneed to integrate internal data, connect and cross reference eachresident’s other non-public essential services such as medicalservices, education, and electrical suppliers to achieveindividualized smart assistance via analytics. The Smart communitycould serve as the overarching “federated resident data hub” to linkdata currently in silos. There is no shortage of possibilities in theFigure 1: Example of asmart and connected community.community data list attownship level.There are three stated goals selected for the foundation in deliveringpublic value for the analytics-driven smart community.1. Improve the Flow of People through our Hospitals [4]: Due to the aging population, one ofthe key objectives is to increase the patient treatment quality, with enhanced velocity toconnect patient to care provider or physician, known as throughput, reduce response timeon patient hospital visits, and enhance automation in the hospital to enable theaforementioned.2. Increase education efficiency and quality [6]: Education is a key to a prosperous futureand a foundation for economic improvement, improved job/labor market skills and povertyreduction. One of the key objectives is to help schools to achieve high quality educationfor students and teachers and students to achieve effective teaching and effectivelearning. Teacher could leverage years of student grades history to identify students’learning strengths and weaknesses and devise an approach to enrich.3. Prevent Unplanned Community Electricity Interruptions [7]: Continuous Electricity isimportant for businesses development, manufacturing outcomes, and communityresidents’ quality of life. To prevent unplanned electricity interruption, a key component isto conduct predictive maintenance on the equipment.2017 Dell EMC Proven Professional Knowledge Sharing7

4. Smart Community Analytics: Improve the Flow of patientsthrough our hospitalsTo analyze the flow of the patient, the focuses herein are limited to study 1. Admissions VolumeForecast (ED): Correlate ED admissions to daily, monthly and seasonal patterns and externalevents (weather, holidays, flu). 2. Service Variance Analysis (OR): Uncover OR service andpatient movement patterns that may impact cost and care (readmissions). 3. Improve hospitalefficiency: The efficiency of hospital highly depends on staff efficiency.4.1 Admissions Volume Forecast (ED)From an admission volume overview perspective, for a given hospital in study, total hospitalencounters between 08/ 2012 – 08/ 2014 have 22,232 total encounters. The breakdown are166,932 Outpatient (52%), 118,802 Emergency Department (37%) and 36,498 Inpatient (11%).Figure 3 and 4 show the ED volume patterns. Hospital admissions vary on a daily basis andoutpatient types have the largest range. Figure 2, 3 and 4 show the range of daily hospitaladmission counts by patient type within a given year.Figure 3: Patient Admission histogram and Trend line.Figure 2: The daily patient Min/Max rangeFigure 4: ED visit total counts broken down per year pertype and annual variancesFigure 5: Hospital encounters by month by type2017 Dell EMC Proven Professional Knowledge Sharing8

For hospital Visits by Month (30 Month Aggregate Count using Admission Date), ED and IP(inpatient) admissions are uniform throughout each month. OP (outpatient) admissions declinesteeply later in the month. When months are broken into quarters, the IP admissions quarterlydeviation from the average is: Q1: 14.1%, Q2: 5.3%, Q3: -0.8%, and Q4: -7.2%. Figure 5shows the hospital encounters by month by type.The notable monthly variations ( /- 5% deviation from expected monthly) are Outpatient: October: 12.6% and August: -7.0%, Inpatient: November: -5.3% and February: -6.1%, Emergency:August: 7.1%, February: -9.3% and November: -5.0%.Figure 7: Hospital Encounters by Day of WeekFor Hospital Encounters by Day of Week (24Month Aggregate Count – Day of Admission), EDadmissions are roughly uniform across all daysof the

Analytics to Storage Blobs or Tables in the Data Lake Stores, Event Hubs, Service Bus Topics or Messaging Queues, and Business Intelligence (BI), where it can then be visualized, further processed by workflows, used in batch analytics or processed again as a series of events.