Failure Mode And Effects Analysis Of Software-based .

Transcription

STUK-YTO-TR 190 / A U G U S T 2 0 0 2FAILURE MODE ANDEFFECTS ANALYSISOF SOFTWARE-BASEDAUTOMATION SYSTEMSHaapanen Pentti, Helminen Atte(VTT Industrial Systems)In STUK this study was supervised by Marja-Leena JärvinenSTUK ION AND NUCLEAR SAFETY AUTHORITYOsoite/Address Laippatie 4, 00880 HelsinkiPostiosoite / Postal address PL / P.O.Box 14, FIN-00881 Helsinki, FINLANDPuh./Tel. (09) 759 881, 358 9 759 881 Fax (09) 759 88 500, 358 9 759 88 500 www.stuk.fi

The conclusions presented in the STUK report series are those of the authorsand do not necessarily represent the official position of STUKISBNISBNISSN951-712-584-4 (print)951-712-585-2 (pdf)0785-9325D a r k O y, Va n t a a / Fi n l a n d 2 0 0 2

S T U K - Y TO - T R 1 9 0HAAPANEN Pentti, HELMINEN Atte (VTT Industrial Systems). Failure mode and effects analysis ofsoftware-based automation systems. STUK-YTO-TR 190. Helsinki 2002. 35 pp Appendices 2 pp.Keywords: safety, safety analysis, reliability analysis, automation, programmable systems, softwarebased systems, reactor protection systems, nuclear reactor safety, failure mode and effects analysisAbstractFailure mode and effects analysis (FMEA) is one of the well-known analysis methodshaving an established position in the traditional reliability analysis. The purpose of FMEAis to identify possible failure modes of the system components, evaluate their influences onsystem behaviour and propose proper countermeasures to suppress these effects. Thegeneric nature of FMEA has enabled its wide use in various branches of industry reachingfrom business management to the design of spaceships. The popularity and diverse use ofthe analysis method has led to multiple interpretations, practices and standards presenting the same analysis method.FMEA is well understood at the systems and hardware levels, where the potential failuremodes usually are known and the task is to analyse their effects on system behaviour.Nowadays, more and more system functions are realised on software level, which hasaroused the urge to apply the FMEA methodology also on software based systems. Software failure modes generally are unknown—“software modules do not fail, they onlydisplay incorrect behaviour”—and depend on dynamic behaviour of the application. Thesefacts set special requirements on the FMEA of software based systems and make it difficult to realise.In this report the failure mode and effects analysis is studied for the use of reliabilityanalysis of software-based systems. More precisely, the target system of FMEA is definedto be a safety-critical software-based automation application in a nuclear power plant,implemented on an industrial automation system platform. Through a literature study thereport tries to clarify the intriguing questions related to the practical use of softwarefailure mode and effects analysis.The study is a part of the research project “Programmable Automation System SafetyIntegrity assessment (PASSI)”, belonging to the Finnish Nuclear Safety Research Programme (FINNUS, 1999–2002). In the project various safety assessment methods andtools for software-based systems are developed and evaluated. The project is financedtogether by the Radiation and Nuclear Safety Authority (STUK), the Ministry of Tradeand Industry (KTM) and the Technical Research Centre of Finland (VTT).3

S T U K - Y TO - T R 1 9 0HAAPANEN Pentti, HELMINEN Atte (VTT Tuotteet ja tuotanto). Ohjelmoitavien automaatiojärjestelmien vikaantumis- ja vaikutusanalyysi. STUK-YTO-TR 190. Helsinki 2002. 35 s. liitteet 2 s.Avainsanat: turvallisuus, turvallisuusanalyysi, luotettavuusanalyysi, automaatio, ohjelmoitavatjärjestelmät, reaktorin suojausjärjestelmät, reaktoriturvallisuus, vika- ja vaikutusanalyysiTiivistelmäVika- ja vaikutusanalyysi (VVA) on tunnettu analyysimenetelmä, jolla on vakiintunutasema perinteisissä luotettavuusanalyyseissä. VVA:n tavoitteena on tunnistaa järjestelmän komponennttien mahdolliset vikantumistavat, arvioida niiden vaikutuksia järjestelmän käyttäytymiseen ja ehdottaa sopivia vastatoimenpiteitä haitallisten vaikutustenestämiseksi. VVA:n yleispätevä luonne on mahdollistanut sen soveltamiseen mitä moninaisimpiin kohteisiin ulottuen liiketoiminnan hallinnasta avaruusalusten suunniteluun.Menetelmän laaja suosio ja erilaiset käyttötavat ovat johtaneet useisiin erilaisiin menettelyä koskeviin tulkintoihin, käytäntöihin ja standardien syntyyn.VVA hallitaan hyvin järjestelmä- ja laitetasoilla, joilla mahdolliset vikaantumistavatyleensä tunnetaan ja tehtävänä on analysoida niiden vaikutuksia järjestelmän käyttäytymiseen. Nykyään yhä suurempi osa järjestelmien toiminnoista toteutetaan ohjelmistotasolla, mikä on herättänyt halun soveltaa VVA metodologiaa myös ohjelmoitaviin järjestelmiin. Ohjelmistojen vikaantumistapoja ei yleensä tunneta – ”ohjelmistot eivät vikaannu,ne vain voivat käyttäytyä ennakoimattomalla tavalla” – ja ne riippuvat sovelluksendynaamisesta käyttäytymisestä.Tässä raportissa on selvitelty vika- ja vaikutusanalyysin soveltamista ohjelmoitaviinjärjestelmiin. Tarkemmin sanottuna analyysin kohteena on ajateltu olevan teolliseenautomaatiojärjestelmään implementoitu ydinvoimalaitoksen turvallisuuskriittinenautomaatiosovellus. Lähinnä kirjallisuuden perusteella on yritetty selvittää ohjelmoitavaan tekniikkaan liittyviä vika- ja vaiktusanalyysin erityisongelmia.Tutkimus on ollut osa Suomen kansalliseen ydinturvallisuustutkimusohjelmaan(FINNUS 1999–2002) kuuluuvaa ”Ohjelmoitavan automaation turvallisuuden arviointi(PASSI)”-projektia. Projektissa on kehitetty ja arvioitu erilaisia ohjelmoitavien järjestelmien turvallisuuden arviointimenetelmiä. Projektia ovat rahoittaneet yhdessä Säteilyturvakeskus (STUK), Kauppa- ja teollisuusministeriö (KTM) ja Valtion teknillinen tutkimuskeskus (VTT).4

S T U K - Y TO - T R 1 9 reviations6691INTRODUCTION1.1 Overview1.2 Failure Mode and Effects Analysis1.3 Software Failure Modes and Effects Analysis111111132SURVEY OF LITERATURE153FMEA STANDARDS3.1 Overview3.2 IEC 608123.3 MIL-STD-1629A3.4 SAE J-173920202020204SWFMEA PROCEDURE4.1 Level of analysis4.2 Failure modes4.3 Information requirements4.4 Criticality rds & guidelines2835APPENDIX 1 EXAMPLE OF THE FMEA-WORKSHEET (IEC 60812)36APPENDIX 2 AN EXAMPLE OF A SWFMEA FORM (TÜV NORD)375

S T U K - Y TO - T R 1 9 0DefinitionsTermsAnalysis approachVariations in design complexity and availabledata will generally dictate the analysis approach to be used. There are two primaryapproaches for the FMECA. One is the hardware approach that lists individual hardwareitems and analyzes their possible failuremodes. The other is the functional approachthat recognizes that every item is designed toperform a number of outputs. The outputs arelisted and their failures analyzed. For morecomplex systems, a combination of the functional and hardware approaches may be considered.Block diagramsBlock diagrams that illustrate the operation,interrelationships, and interdependencies ofthe functions of a system are required to showthe sequence and the series dependence orindependence of functions and operations.Block diagrams may be constructed in conjunction with, or after defining the system andshall present the system breakdown of itsmajor functions. More than one block diagramis sometimes required to represent alternativemodes of operation, depending upon the definition established for the system.Compensating provisionActions that are available or can be taken byan operator to negate or mitigate the effect ofa failure on a system.6ContractorA private sector enterprise engaged to provideservices or products within agreed limits specified by a procuring activity.Corrective actionA documented design, process, procedure, ormaterials change implemented and validatedto correct the cause of failure or design deficiency.CriticalityA relative measure of the consequences of afailure mode and its frequency of occurrences.Criticality analysis (CA)A procedure by which each potential failuremode is ranked according to the combinedinfluence of severity and probability of occurrence.Damage effectsThe result(s) or consequence(s) a damagemode has upon the operation, function, orstatus of a system or any component thereof.Damage effects are classified as primary damage effects and secondary damage effects.Damage modeThe manner by which damage is observed.Generally describes the way the damage occurs.

S T U K - Y TO - T R 1 9 0Damage mode and effects analysis (DMEA)The analysis of a system or equipment conducted to determine—the extent of damagesustained from given levels of hostile damagemechanisms and the effects of such damagemodes on the continued controlled operation—and mission completion capabilities of the system or equipment.Design data and drawingsDesign data and drawings identify each itemand the item configuration that perform eachof the system functions. System design dataand drawings will usually describe the system’s internal and interface functions beginning at system level and progressing to thelowest indenture level of the system. Designdata will usually include either functionalblock diagrams or schematics that will facilitate construction of reliability block diagrams.DetectionDetection is the probability of the failure beingdetected before the impact of the effect isrealized.Detection mechanismThe means or method by which a failure canbe discovered by an operator under normalsystem operation or can be discovered by themaintenance crew by some diagnostic action.EnvironmentsThe conditions, circumstances, influences,stresses and combinations thereof, surrounding and affecting systems or equipment duringstorage, handling, transportation, testing, installation, and use in standby status and mission operation.FailureDeparture of a system from its required behaviour; failures are problems that users orcustomers see.Failure causeThe physical or chemical processes, designdefects, part misapplication, quality defects,part misapplication, or other processes whichare the basic reason for failure or which initiate the physical process by which deterioration proceeds to failure.Failure definitionThis is a general statement of what constitutes a failure of the item in terms of performance parameters and allowable limits for eachspecified output.Failure effectThe consequence(s) a failure mode has on theoperation, function, or status of an item. Failure effects are classified as local effect, nexthigher level, and end effect.Failure modeThe manner by which a failure is observed.Generally describes the way the failure occursand its impact on equipment operation.Failure mode and effects analysis (FMEA)A procedure by which each potential failuremode in a system is analyzed to determine theresults or effects thereof on the system and toclassify each potential failure mode accordingto its severity.FMECA—Maintainability informationA procedure by which each potential failure isanalyzed to determine how the failure is detected and the actions to be taken to repair thefailure.FMECA planningPlanning the FMECA work involves the contractor’s procedures for implementing theirspecified requirements. Planning should include updating to reflect design changes andanalysis results. Worksheet formats, groundrules, assumptions, identification of the levelof analysis, failure definitions, and identification of coincident use of the FMECA by thecontractor and other organizational elementsshould also be considered.7

S T U K - Y TO - T R 1 9 0Functional approachThe functional approach is normally usedwhen hardware items cannot be uniquelyidentified or when system complexity requiresanalysis from the top down.Functional block diagramsFunctional block diagrams illustrate the operation and interrelationships between functional entities of a system as defined in engineering data and schematics.Ground rules and assumptionsThe ground rules identify the FMECA approach (e.g., hardware, functional or combination), the lowest level to be analyzed, andinclude statements of what might constitute afailure in terms of performance criteria. Everyeffort should be made to identify and recordall ground rules and analysis assumptionsprior to initiation of the analysis; however,ground rules and analysis assumptions maybe adjusted as requirements change.Hardware approachThe hardware approach is normally used whenhardware items can be uniquely identifiedfrom schematics, drawings, and other engineering and design data. This approach isrecommended for use in a part level up approach often referred to as the bottom-upapproach.Indenture levelsThe item levels which, identify or describerelative complexity of assembly or function.The levels progress from the more complex(system) to the simpler (part) divisions.Initial indenture levelThe level of the total, overall item which is thesubject of the FMECA.Other indenture levelsThe succeeding indenture levels (second, third,fourth, etc) which represent an orderly progression to the simpler division of the item.8InterfacesThe systems, external to the system beinganalyzed, which provide a common boundaryor service and are necessary for the system toperform its mission in an undegraded mode;for example, systems that supply power, cooling, heating, air services, or input signals.Level of analysisThe level of analysis applies to the systemhardware or functional level at which failuresare postulated. In other words, how the systembeing analyzed is segregated (e.g., a section ofthe system, component, sub-component, etc.).OccurrenceRating scale that shows the probability orfrequency of the failure.Reliability block diagramsReliability block diagrams define the seriesdependence, or independence, of all functionsof a system or functional group for each lifecycle event.Risk Priority Number (RPN)The risk priority number is usually calculatedas th

3.2 IEC 60812 20 3.3 MIL-STD-1629A 20 3.4 SAE J-1739 20 4 SWFMEA PROCEDURE 21 4.1 Level of analysis 22 4.2 Failure modes 23 4.3 Information requirements 24 4.4 Criticality analysis 25 5 CONCLUSIONS 26 BIBLIOGRAPHY 28 Standards & guidelines 35 APPENDIX 1 E XAMPLE OF THE FMEA-WORKSHEET (IEC 60812) 36 APPENDIX 2 A N EXAMPLE OF A SWFMEA FORM (TÜV .