Causal Analytics In IIoT AI That Knows What Causes What .

Transcription

Causal Analytics in IIoT – AI That Knows WhatCauses What, and WhenAuthors:Dr. John GallowayFounderAu Sablejohnjgalloway01@gmail.comPieter van SchalkwykChief Executive OfficerXMPropieter.vanschalkwyk@xmpro.comDann MarianEngineering and Projects Managerdann.marian@iinet.net.auIIC Journal of Innovation-1-

Causal Analytics in IIoT – AI That Knows What Causes What, and Whencausality of business and operationalevents such as equipment failure oroperational issuesINTRODUCTIONFinding the “because” behind certainbusiness or operations events has alwaysbeen a key part of any engineering,maintenance or operations manager’s job inindustrial businesses. “The First Stagecompressor failed because ” or “the supplytank ran dry because ” are commonphrases in maintenance and operationsdepartments in industrial businesses.Finding the “because” traditionally relies onexperienced engineers that can interpretevent, contextual and temporal data todeduce the likelihood of specific factorscausing others in either a negative orpositive way. How Reliable Causal Analytics providesdata-driven decision support fortraditional Root Cause Analysisapproaches The approach to embed this causalanalytics methodology in IoT ProcessManagement software to be able toperform this in a repeatable andautomated manner.rCA is the result of many years of researchand application of causal analytics in realworld scenarios. Through this, Au Sabledeveloped rCA that enables cause and effectrelationships to be identified from sensordriven data and made known to the analyst(e.g. wear on part #105 has causallyimpacted the performance of device #65with a causal coefficient of 0.86), as well ascorrelation relationships in the data.Knowing the real root causes of events iscritical to resolving problems rather thancontinuously dealing with the symptoms. Itresulted in popular, formalized approachessuch as “Root Cause Analysis,” or RCA as it isgenerally known. The challenge is that thereare often multiple causal factors for theseevents, and finding the one “root cause”may not always be possible. Understandingother causal factors that may influence theoutcome of industrial processes and thebehavior of equipment need to beconsidered.This means: the risk of making false decisions aboutwhat were, or will be predictively, thecausal drivers of an effect is reduced,and the potential for costly or disastrousmistakes is thereby reduced.Au Sable, in collaboration with e-based, approach for “ReliableCausal Analytics” (rCA) in industrial IoTapplications. This article demonstrates:This article provides background ontraditional Root Cause Analysis and theevolutionofCausalAnalytics.Itdemonstrates how to automate theanalytics to scale with an IoT ProcessManagement platform and how it is appliedin an industrial application. It provides apractical example of Reliable Causal It is possible to perform Reliable CausalAnalytics using industrial IoT data andArtificial Intelligence (AI) to determine-2-June 2018

Causal Analytics in IIoT – AI That Knows What Causes What, and WhenAnalytics (rCA) applied to a floatingproduction storage and offloading (FPSO) 1vessel for an Oil & Gas company.ROOT CAUSE ANALYSIS BASED ONCORRELATION DOESN’T WORK INTHE IOT ERACrude oil, gas and water from the reservoirare separated on board the FPSO. Oil isstored on the facility in six pairs of tanks,before export to trading tankers. The vesselis designed to store 1.4 million barrels of oiland processes approximately 170,000barrels of oil per day (bopd).Industrial RCA BackgroundFormal Root Cause Analysis for industrialapplications started with the Total QualityManagement (TQM)2 movement advocatedby Deming in Japan in the late 1980’s andearly 1990’s.Paul Wilson et al3 described the root causeanalysis process for Quality Management indetail during the TQM era. “Root causeanalysis is a method of problem-solving usedfor identifying the root causes of faults orproblems. A factor is considered a root causeif removal thereof from the problem-faultsequence prevents the final undesirableoutcome from recurring; whereas a causalfactor is one that affects an event's outcome,but is not a root cause. Though removing acausal factor can benefit an outcome, it doesnot prevent its recurrence with certainty.”Example Floating Production Storage and Offloading VesselAu Sable’s rCA has functioned in intelligence,defense and anti-terrorism applications formany years. The solution described in thisarticle is the combination of advanced IoTProcess Management software from XMProand the rCA AI software from Au Sable.Even though root cause analysis formallyoriginated in TQM, it finds many applicationsin industrial environments:4 Safety-based Root Cause Analysisarose from the fields of accidentanalysis and occupational safety andhealth.1Floating production storage and offloading https://en.wikipedia.org/wiki/Floating production storage and Wilson, Paul F.; Dell, Larry D.; Anderson, Gaylord F. (1993). Root Cause Analysis: A Tool for Total Quality Management.Milwaukee, Wisconsin: ASQ Quality Press. pp. 8–17. ISBN 0-87389-163-5.4Adapted from https://en.wikipedia.org/wiki/Root cause analysis (classification)IIC Journal of Innovation-3-

Causal Analytics in IIoT – AI That Knows What Causes What, and When Production-based Root Cause Analysishas roots in the field of quality controlfor industrial manufacturing. Process-based Root Cause Analysis, afollow-on to production-based RCA,broadens the scope of RCA to includebusiness processes. Failure-based Root Cause Analysisoriginates in the practice of failureanalysis as employed in engineeringand maintenance. Systems-based Root Cause Analysishas emerged as an amalgam of thepreceding schools, incorporatingelements from other fields such aschange management, riskmanagement and systems analysis.assess root causes, but also find other causalfactors. These causal factors may not lead toequipment or process failure but may stillimpact equipment or process performance.Root Cause Analysis became popular as anapproach to methodically identify andcorrect the root causes of events instead ofaddressing symptomatic results of theseevents. The objective of root cause analysisis to prevent problem recurrence. Somepopular root cause analysis techniquesinclude “Five Whys” and Cause and Effect(Fishbone) diagrams. These techniques relyon human interpretation of eventinformation and data and requireexperienced practitioners to conduct theanalysis. It is often limited to a few criticalproduction assets as the manual process istime-consuming and laborious. Wilson’sdistinction between root causes and othercausal factors provides some guidance onthe application of causal analytics in an IoTcontext for this article. Traditionaltechniques focused only on finding the rootcauses through manual review. Moderntechniques such as rCA described in thisarticle, combined with IoT data andadvances in AI, enable engineers to not onlyThere are three main reasons to find areliable, data-driven approach to findingroot causes and causal factors for equipmentfailure and operational performance inindustrial environments: Aging workforce and a large numberof experienced engineers retiring soon Complexity of equipment, making itharder to troubleshoot Inaccuracy of Root Cause AnalysisRecent advances in cloud computing and AIprovide the necessary infrastructure toanalyze event data for IoT and other sourcesat massive scale. This means analysts canhave a more expansive view of causal eventsrather than a reductionist view where thescope of an analysis is limited to what ahuman can process.MOTIVATION FOR DATA-DRIVEN,RELIABLE CAUSAL ANALYTICSRetiring WorkforceWith a retiring workforce in many industrialsectors, the experience needed to conductmeaningful RCAs is decreasing. As much ofthe traditional approaches rely onobservational analysis, the number ofexperienced engineers that can providereliable analysis is fast reducing.According to a January 2017 assessment bythe US Department of Energy, 25% of USemployees in electric and natural gas utilities-4-June 2018

Causal Analytics in IIoT – AI That Knows What Causes What, and Whenwill be ready to retire within 5 years5. The USDepartment of Labor also estimates that theaverage age of industry employees is nowover 50 and up to half of the current energyindustry workforce will retire within 5-10years.6for telecommunication devices that states“the effect of a telecommunicationsnetwork is proportional to the square of thenumber of connected users of the system(n2)”.Metcalfe’s law, now also used in economicsand business management, provides somequantification of the impact of the increasingcomplexity of equipment to troubleshootpotential causal relationships betweenoperational events.Manual RCA requires the combination of arigorous methodology, fault analysistechnology and experience to evaluate thepossible causes of business events such asequipment failure, quality problems orsafety incidents. Much of the expertiseneeded will be lost with the retiringworkforce. A data-driven, algorithmicapproach provides a viable replacement forthe experience of people to determinecausal relationships between businessevents.Inaccuracy of Root Cause AnalysisRoot Cause Analysis gained popularity inindustrial and other sectors such ashealthcare. One of the main challenges thatemerged centers around the fact that itrequires facilitation and analysis by peoplewho can process only limited amounts ofinformation. People are also susceptible toopinions and organizational influences suchas politics. Peerally 9 et al describe theproblem with Root Cause Analysis with these8 main challenges: The unhealthy quest for “the” rootcause Questionable quality of RCAinvestigations Political hijackComplexity of Industrial EquipmentAsindustrialequipmentbecomes7increasingly sophisticatedand morecomplex, the ability to perform diagnosticsbecomes increasingly more difficult. Asequipment becomes more complex andsophisticated, the number or combinationsand permutations of potential causal factorsfor certain events increases exponentially. Itfollows a similar pattern to Metcalfe’s law85U.S. Department of Energy, Quadrennial Energy Review (QER) Task Force report second installment titled “Transforming theNation’s Electricity System.” Chapter V: Electricity Workforce of the 21st-Century: Changing Needs and New Opportunities.January 2017. Retrieved from ergy-review-qer6U.S. Department of Labor Employment and Training Administration “Industry Profile – Energy.” Retrieved fromhttps://www.doleta.gov/brg/indprof/energy profile.cfm7Challenges To Complex Equipment Manufacturers: Managing Complexity, Delivering Flexibility, and Providing Optimal df8Metcalfe's law https://en.wikipedia.org/wiki/Metcalfe%27s law9The problem with root cause analysis http://qualitysafety.bmj.com/content/26/5/417IIC Journal of Innovation-5-

Causal Analytics in IIoT – AI That Knows What Causes What, and When Poorly designed or implemented riskcontrols Poorly functioning feedback loops Disaggregated analysis focused onsingle organizations and incidents Confusion about blame The problem of many handsrelationship to each other and the strength(causal coefficient) of the relationships. Itoffers additional insights into events andoften finds causation that may becounterintuitive to the views of the peoplethat do it manually. An algorithmic approachalso provides repeatability and scale. It willanalyze the IoT and contextual data in aconsistent way that is independent of theperson performing the analysis.Many of these are as a result of thesubjective nature of the people doinganalysis and can be addressed with a moreobjective, data-driven approach. Peoplecan’t process all the potential data sourcesof event and contextual information.Modern advances in data, stream and eventprocessing address some of that challengeand AI provides a means to make sense ofthe data at scale. It removes the reliance onthe subjective nature of human analysis andopens the opportunity to analyze fact-basedinformation at scale to derive insights.USING CAUSAL ANALYTICS TOPERFORM ALGORITHMIC ROOTCAUSE ANALYSISCorrelation is Not CausationIn this era of big data, it is commonly saidthat data analytics is a prime driver of valueto enterprises10. This is true, but only if theanalytics performed across the data are wellgrounded methodologically and performwell and efficiently to derive the value.The unhealthy quest for “the” root causefurther describes a challenge that can bebetter addressed with an algorithmicapproach to Root Cause Analysis. Peerallystates that “the first problem with RootCause Analysis is its name. By implying—even inadvertently—that a single root cause(or a small number of causes) can be found,the term ‘root cause analysis’ promotes aflawed reductionist view.”Big data creates big and complex datavolumes. This is of limited value however, ifit is not accompanied by the best availableanalytics to enable the most valuable,accurate and reliable decisions to occur 11 .Hence, there is an increasing requirementfor the analytics component in industrial IoTsolutions to be fast, reliable and accurate toidentify the problems and opportunities andAn algorithmic approach often providesmore potential causal factors, their10How does business analytics contribute to business value? j.1210111The Age of Analytics: Competing in a data-driven worldhttps://www.mckinsey.com/ he-Age-of-Analytics-Full-report.ashx-6-June 2018

Causal Analytics in IIoT – AI That Knows What Causes What, and Whenensure that such problems are addressedcorrectly and urgently.the sole cause, nor does it always cause theeffect.Correlation of events and systems is often astarting point for problem-solving inindustrial environments but “correlation isnot causation”12. Correlation helps to pointthe way, helps indicate what might becandidate causative or driving factors forsome particular effect yet keeping in mindthat correlation is simply a measure ofassociation not causation.Correlations can be misleading. Valuableresults and insights are often found, but thecorrelation methods upon which decisionsare made mean that risks are inherent andcould lead to mistaken or sub-optimaldecision-making and outcomes.The chief analytics tool of most industrial IoTanalytics vendors is correlation. Mostsensor-driven data (IoT and machinegenerated logs) is analyzed using a provenbut older form of statistical methods (evenwhen operating within a machine learningframework). Correlational methods are thedominant form of analytics.Introductory statistics courses tell us that itis not possible to prove causation unless oneconducts an experiment whereby treatmentand control groups are randomized.This is totally correct but is just not feasibleto conduct an experiment in 99% of realworld situations. Algorithmic methods havea probabilistic and contributory approach –spurred on by big data’s need forempirically-based data-driven decisions – toanswering questions about what causedwhat or will. For example, a causalcoefficient of 0.83 of X as a causativeinfluence on Y, does not mean that X isnecessarily the sole cause of Y (there may bemultiple causes) nor does it always cause Y.X is identified however as a contributorycause of Y. Similarly, smoking is acontributory cause of lung cancer; it is not12Some examples from IoT vendorpublications and websites demonstrate thisapproach:1. Cisco (Attaining IoT Value): enablethe company’s customers to performreal-time data correlation and, as aresult, quickly react to irregularities132. Huawei (‘The IoT's Potential forTransformation’): enablescorrelation-based process andproductivity improvements.14Correlation does not imply causation https://en.wikipedia.org/wiki/Correlation does not imply causation13Attaining IoT Value: How To Move from Connecting Things to Capturing Insightshttps://www.cisco.com/c/dam/en ite-paper.PDF14The IoT’s Potential for Transformation http://e.huawei.com/ensa/publications/global/ict insights/201703141505/focus/201703141643IIC Journal of Innovation-7-

Causal Analytics in IIoT – AI That Knows What Causes What, and WhenThe result is that an observed correlationover time may or may not be coincidental;or, the observed correlation (and anyimplied causation) may be the result of oneor more third-party variables (hiddenconfounders), e.g. another variable thatinfluences two events that are seeminglycorrelated. An example of this may be icecream sales and boating accidents that arecorrelated, but both are affected by summertemperatures, and so a causal inferencewould be spurious. In this example summertemperature is causal, but one mayincorrectly infer causation that an increasein ice cream sales leads to boating accidentsdue to the high correlation factor. Morehumorous examples of these erroneouscorrelations can be found at SpuriousCorrelations.3. ThingWorx's capabilities make itpossible for users to correlate data,deriving powerful insights 154. Siemens PLM: quantitativestatistical relationships to real-lifeusage, called customer correlation165. Industrial Internet Consortium: common issue in IIoT systems iscorrelating data between multiplesensors and process control states17Correlational methods are established aspowerful aids to decision-making as iswitnessed in the rise of platforms thatprovide the capability. Correlations oftenvary such that at a given time one entity andanother may be positively related and atother times only weakly related or not at all.There is no fact-based causal coefficient thatdescribes the strength of potential causalrelationships.Mathematically-based causal analyticsattempts to improve on correlation forcausality identification.The lack of stability in correlations indicatescomplexity in the relationships and thepresence of a dynamical system (common inIIoT). This results in variability according tothe system state and nonlinearity in systembehavior. It means that traditional statisticalmethods, correlation included, havelimitations for obtaining precise analyticsand improved decision making aboutperformance in IIoT.15The Evolution of Causal AnalyticsCAUSALITY FOR REAL-WORLD APPLICATIONSIt is well accepted that causation cannot beproven statistically unless one conducts anexperiment with randomization to controlA survey of IoT cloud platforms S231472881630014916Customer Correlation Durability html17Industrial Analytics: The Engine Driving the IIoT Revolution https://www.iiconsortium.org/pdf/Industrial Analyticsthe engine driving IIoT revolution 20170321 FINAL.pdf-8-June 2018

Causal Analytics in IIoT – AI That Knows What Causes What, and Whenfor spurious relationships 18 19 , which issimply not practicable in most real-worldsituations. The position of the authors is thatcorrelational methods have served well andare proven to provide useful insights, but arenonetheless prone to producing .outstanding work led him to be awarded in2012 the industry’s equivalent of the NobelPrize, the Turing award, for advances in bothmachine learning and causality.As noted earlier, causality research has beenundertakentodevelopdifferentprobabilistic methods and approaches foridentifying cause and effect relationships innon-experimental or ‘observational’ data.Problems remained however, e.g. how toidentify a causal relationship when unknowndelays occur between cause and effect. And,what are termed hidden confounders, weredifficult to identify and control for. Earlier,the work of Weiner (1950s) laid the basis forseveral information-theoretic measures ofcausality (and for well-known datacompression algorithms).Causal analytics evolved over the past fewdecades from academic studies to practicalsolutions such as rCA. A stumbling blockhistorically in reaching this goal has been todevise causal algorithms that producereliable and accurate results for commercialand government application.A landmark innovation was that of CliveGranger 23 , awarded a Nobel prize fordeveloping a test of causality: X is said tocause Y, if the past values of X containinformation that helps predict future valuesof Y, above and beyond the informationcontained in past values of Y – graphically:ADVANCES IN CAUSAL ANALYTICS AND THEDEVELOPMENT OF RELIABLE CAUSAL ANALYTICS (RCA)In the 1980s, mathematical advances byJudea Pearl22 from UCLA showed that causalrelationships can be represented from datain terms of probabilities and led him later todeclare that “causality has beenmathematized”. The mathematization wasperhaps a little premature, but les/upm-binaries/14289 en.wikipedia.org/wiki/Spurious relationship22Judea Pearl https://en.wikipedia.org/wiki/Judea Pearl23Clive Granger https://en.wikipedia.org/wiki/Clive GrangerIIC Journal of Innovation-9-

Causal Analytics in IIoT – AI That Knows What Causes What, and WhenFigure 1: Granger causality testResearchers extended this framework, e.g.to allow for analysis of multiple time seriesgenerated by nonlinear models, for laggingthe cause and effect variables and for causalgraphical models for better handling oflatent variables.The application of TE to empirical analyticshas been substantial in areas of biomedicineand climate science. However, furtherdevelopments were needed to helpovercomeshortcomingsrelatedto26unreliability and a lack of accuracy . AuSable’s work on improving the reliability ofTE, combined with other Au Sableproprietary algorithms, have led to thedevelopment of an algorithmic approach tocausal analytics that can process IoT eventdata and provide reliable results. This meansthat Causal Analytics can now be applied toreal-world scenarios with non-experimentaldata.Transfer Entropy 24 25 (TE) is a laterimplementation of the principle that causesmust precede and predict their effects. TEimproves on Granger in that it directly catersfor nonlinear interactions and helpsminimize problems of noisy data. TE is amodel-free and non-parametric measure ofdirected information flow from one variableto another.24Transfer Entropy https://en.wikipedia.org/wiki/Transfer entropy25Transfer entropy between multivariate time series S100757041630502026Progress in Root Cause and Fault Propagation Analysis of Large-Scale Industrial 2/478373/- 10 -June 2018

Causal Analytics in IIoT – AI That Knows What Causes What, and Whenindustrial applied mathematics." 31 This is afairly recent set of developments andespecially with respect to incorporating AIand machine learning where thesealgorithms can be applied to IIoT data atscale.These real-world applications involvemethods that take into account thecomplexity of systems (thereby includinganalytics of system machine and log data).The inter-dependencies and dimensionalityof many IIoT system devices mean thatidentifying their behavior (causal andotherwise) can be extremely difficultdepending on the magnitude and nature ofthe couplings. One variable may be found tobe a driver of another, but not alone. Themultiple influences that have an impact on aparticular variable must be teased out, suchas the timings, state-dependencies andmulti-dimensionality of other influences thatimpact an ‘effect’ of interest, such as adecrease in pressure or rise in temperature.These are identified as part of the rCAprocess for IIoT.AutomatingApplicationsinIndustrialIoTAlthough the rCA approach can be employedon an ad-hoc basis by an analyst, the realbenefits come from automating the rCA AIanalysis as part of an IoT process. The rCAfunction can be executed based on triggerevents such as data changes or exceptions.The rCA software and algorithms areembedded in the functions library of theXMPro IoT Process platform for IIoTapplications.This approach has led to an area of causalresearch from a dynamical systemsperspective. A dynamical system is one inwhich a function describes the timedependence of a point in a geometricalspace.27 28 29 30 A dynamical systems courseat Harvard states that the methods have afocus on the behavior of systems on areas “ are diverse andmultidisciplinary, ranging over areas ofapplied science and engineering, includingbiology, chemistry, physics, finance, and27https://en.wikipedia.org/wiki/Dynamical system28https://en.wikipedia.org/wiki/Dynamical systems theory29https://mathinsight.org/dynamical system idea30http://math.huji.ac.il/ rd.edu/siams/am-147-nonlinear-dynamical-systemsIIC Journal of InnovationrCA- 11 -

Causal Analytics in IIoT – AI That Knows What Causes What, and WhenFigure 2: XMPro IoT Process Stream for rCAanalyses are automated at different timeintervals such as daily for high impactequipment and weekly or monthly for otherareas. This is configurable by the end usersand ad hoc analysis can also be performed.In this example, event data is ingested fromtheirHoneywell historianandcontextualized with asset data from theirIBM Maximo EAM system. Further contextis provided from operational data stores.This information is passed to the rCA CausalAnalytics AI function that creates the causalcoefficient matrix and other outputsdescribed later in the article.CUSTOMER EXAMPLE: RCA IN OILAND GAS PROCESSINGBackground to the Application of rCA in Oil& GasThis automated, process-based approachensures repeatability, consistency and that itcan be done at scale for a large number ofassets in a process stream. The automatedprocess can process and analyze much largervolumes of IoT data than human RCAanalysts. In the FPSO example differentThe example demonstrates how rCA canenable an FPSO to optimize production andproductivity as well as predict and avoidincidents which threaten health, safety,environment, community and financial- 12 -June 2018

Causal Analytics in IIoT – AI That Knows What Causes What, and Whenoutcomes. The initial field study project wasaimed at three key objectives:The innovative solution and integratedapplication suite enables interoperability ofdata feeds from sensors and devices, withthe associated referential information fromthe asset registry and maintenanceframework. It combines data from both ITand OT and this new information providesinsights that can be shared collaborativelybetween OT, IT and Operations. It makesnew levels of operational excellence,collaboration and sustained productivityimprovements possible.EFFICIENT OPERATIONS AND MAINTENANCEThis project will drive down the costs ofreduced or lost production caused byunplanned failures. Other operationalefficiency gains will be achieved by reducingthe risks of environmental impact caused byoperational failure and the risks to personnelsafety caused by breaches of operationalstandards. Furthermore, the costs of assetmaintenance will be reduced and thecapability of diagnosing asset health inremoteandchallengingoperatingenvironments is increased.The project mirrored an upstream oil and gasprocess flow including value-added servicesat each stage of the supply chain leveragingreal-time IoT big data, machine learning andartificial intelligence.SAFETY AND SOCIAL LICENSE TO OPERATEGoing beyond the obvious elements thatcause an interruption to production, rCA isused to find root causes and interdependency which may be overlooked ornot realized with current technology. Thiswill enable the operations team onboard theFPSO to keep it in production withoutinterruption for long periods and, whendown, to be repaired and brought on-streamfaster.Equipment failure and/or an unsafe workenvironment can potentially result in harmto humans or the environment, ultimatelyincreasing operational risk and impacting anorganization’s social license to operate.This solution will assist in providing a safeproduction environment. In addition,through to the inbuilt predictive analytics,further eliminate operational risks whichcould impact the social license to operate ifundetected and left uninvestigated andunaddressed.Most importantly, these improvementsreduce the risk of events that impact thesafety of all personnel on the FPSO andprotect the environment on the vessel and inthe geographic vicinity.ENABLING EFFECTIVE COLLABORATIONTraditionally there exists a significant dividebetween the operational technology (OT) inheavy asset sectors like Oil & Gas and theinformation technology (IT) arena. Not onlyare they typically separated by physical,geographical and network constraints, theyare also generally isolated philosophically.IIC Journal of InnovationProject BackgroundThe FPSO plant had experienced occasionalperiods of operational instability. Thesewere largely unexplained, yet somesignificant and costly problems resulted. Itwas particularly challenging to identify the- 13 -

Causal Analytics in IIoT – AI That Knows What Causes What, and Whenactual cause(s) of the problems. Routinecorrelational methods of analysis, such astraditional RCA, had been applied butprovided the operators with only limitedassistance.The analysis approach in phase 2 consists ofthree main process steps: Ingest data: Real-t

Production-based Root Cause Analysis has roots in the field of quality control for industrial manufacturing. Process-based Root Cause Analysis, a follow-on to production-based RCA, broadens the scope of RCA to include business processes. Failure-based Root Caus