Microelectronics Reliability: Physics-of-Failure Based .

Transcription

National Aeronautics and Space AdministrationMicroelectronics Reliability:Physics-of-Failure Based Modelingand Lifetime EvaluationMark WhiteJet Propulsion LaboratoryPasadena, CaliforniaJoseph B. BernsteinUniversity of MarylandCollege Park, MarylandJet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, CaliforniaJPL Publication 08-5 2/08

National Aeronautics and Space AdministrationMicroelectronics Reliability:Physics-of-Failure Based Modelingand Lifetime EvaluationNASA Electronic Parts and Packaging (NEPP) ProgramOffice of Safety and Mission AssuranceMark WhiteJet Propulsion LaboratoryPasadena, CaliforniaJoseph B. BernsteinUniversity of MarylandCollege Park, MarylandNASA WBS: 939904.01.11.10JPL Project Number: 102197Task Number: 1.18.5Jet Propulsion Laboratory4800 Oak Grove DrivePasadena, CA 91109http://nepp.nasa.gov

This research was primarily carried out at the University of Maryland under the direction ofProfessor Joseph B. Bernstein and was sponsored in part by the National Aeronautics and SpaceAdministration Electronic Parts and Packaging (NEPP) Program, the Aerospace Vehicle SystemsInstitute (AVSI) Consortium—specifically, AVSI Project #17: Methods to Account forAccelerated Semiconductor Wearout—and the Office of Naval Research.Reference herein to any specific commercial product, process, or service by trade name,trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by theUnited States Government or the Jet Propulsion Laboratory, California Institute of Technology.Copyright 2008. All rights reserved.ii

PREFACEThe solid-state electronics industry faces relentless pressure to improve performance, increasefunctionality, decrease costs, and reduce design and development time. As a result, device featuresizes are now in the nanometer scale range and design life cycles have decreased to fewer than fiveyears.Until recently, semiconductor device lifetimes could be measured in decades, which wasessentially infinite with respect to their required service lives. It was, therefore, not critical toquantify the device lifetimes exactly, or even to understand them completely. For avionics,medical, military, and even telecommunications applications, it was reasonable to assume that alldevices would have constant and relatively low failure rates throughout the life of the system; thisassumption was built into the design, as well as reliability and safety analysis processes.Technological pressures on the electronics industry to reduce transitor size and decrease costwhile increasing transitor count per chip, however, runs counter to the needs of most highreliability applications where long life with exceptional reliability is critical. As design rules havebecome tighter, power consumption has increased and voltage margins have become almost nonexistent for the designed performance level. In achieving the desired performance levels, thelifetime of most commercial parts is the ultimate casualty. Most large systems are built with theassumption that electronic components will last for decades without failure. However, counter tothis assumption, device reliability physics is becoming so well understood that manufacturingfoundries are designing microcircuits for a three- to seven-year useful life, as that is what most ofthe industry seeks. The military, aerospace, medical, and especially the telecommunicationsindustries cannot afford to depend on custom parts for their most sophisticated circuit designs.Hence, we have developed this guideline document as an approach for system designers anddevice reliability engineers to develop a better understanding of device failures as a result ofwearout, and to provide a better understanding of how current reliability models are applied inpractice. We describe the best possible approaches to modeling reliability concerns in some of theiii

Prefacemore advanced microelectronic technologies, and provide in-depth descriptions on how toimplement into reliability equivalent circuits for spacecraft, planets, instrument, C-matrix, events(SPICE) simulation. Within the inherent limitations of high-power, high-speed, commercialComplementary Metal Oxide Semiconductor (CMOS) devices, suggestions are developed on howto model the incipient failure rate, how to trade circuit performance with reliability, and how toobtain a predictable end-of-life or component-level system repair rate through realistic timedependent reliability prediction.The development of this handbook for evaluating and simulating microelectronic systemsreliability has been an ongoing project of the Microelectronics Reliability Engineering program atthe University of Maryland, College Park, for more than six years. The program has been fundedby the Aerospace Vehicle Systems Institute (AVSI) Consortium and the NASA Electronic Partsand Packaging (NEPP) Program Scaled CMOS Reliability Task, as well as the Office of NavalResearch. Several doctoral dissertations have resulted from this work and major contributions werecarried out by a number of individuals, including Jöerg Walters, Xiaohu Zhang, Xiaojun Li, BingHuang, Jin Qin, Mark White, Moshe Gurfinkel, Shahrzad Salami, Qinguo Fan, Zvi Gur, MichaelTalmor, and Yoram Shapira.iv

ACRONYMSADCAnalog-to-Digital ConverterAHIAnode Hole InjectionAHRAnode Hydrogen ReleaseALTAccelerated Life TestingASTAccelerated Stress TestsATPGAutomatic Test Pattern GenerationAVSIAerospace Vehicle Systems InstituteBERTBerkeley Reliability ToolsBIRBuilt-In-ReliabilityBTIBiased Temperature InstabilityCADComputer Aided DesignCADMP-2Computer-Aided Design of Microelectronic PackagesCALCEComputer-Aided Life-Cycle EngineeringCDFCumulative Distribution FunctionCFRConstant Failure RateCHCChannel Hot CarrierCHEChannel Hot ElectronCMOSComplementary Metal Oxide l-to-Analog ConverterDAHCDrain Avalanche Hot CarrierDFRDesign-For-ReliabilityDNLDifferential NonlinearityEMElectromigrationEOSElectrical OverstressETMEffective Temperature Modelsv

AcronymsFaRBSFailure Rate Based SPICEFPGAField Programmable Gate ArrayFITFailure in TimeFNFowler-NordheimGCAGradual Channel ApproximationGIDLGate-Induced Drain LeakageGOSGate Oxide ShortHCDHot Carrier DegradationHCIHot Carrier InjectionHISREMHot Carrier Induced Series Resistance Enhancement ModelHTOLHigh Temperature Operating LifeICsIntegrated CircuitsINLIntegral NonlinearityITRSInternational Technology Roadmap for SemiconductorKCLKirchhoff’s Current LawLDDLightly Doped DrainLEMLucky Electron ModelLNALow Noise AmplifierLSBLeast Significant BitMACROMaryland Circuit-Reliability OrientedMIL-HDBKMilitary HandbookMOSMetal Oxide SemiconductorMOSFETMetal Oxide Semiconductor Field Effect TransistorMSMMatrix Stressing MethodMTBFMean Time between FailuresMTTFMean-Time-To-FailureNBTINegative Bias Temperature InstabilityNEPPNASA Electronic Parts and Packaging Programvi

AcronymsNMOSN-Channel Metal Oxide SemiconductorNMOSFETN-Channel Metal Oxide Semiconductor Field Effect TransistorPBTIPositive Bias Temperature InstabilityPMOSP-Channel Metal Oxide SemiconductorPMOSFETP-Channel Metal Oxide Semiconductor Field Effect TransistorPoFPhysics-of-FailureRACReliability Analysis CenterRAMPReliability Aware Micro-ProcessorRFRadio FrequencyRTRoom ndary Generated Hot ElectronSHASample-and-Hold dSNDRSignal-to-Noise-Plus-DistortionSNMStatic Noise MarginSPICESpacecraft, Planets, Instrument, C-matrix, EventsSRAMStatic Random Access Time-to-BreakdownTCADTechnology Computer Aided DesignTDDBTime-Dependent Dielectric BreakdownUIUCUniversity of Illinois at Urbana-ChampaignVHDLVery High Density LogicVTCVoltage Transfer CharacteristicsVLSIVery Large Scale Integrationvii

CONTENTSExecutive Summary .11Introduction. 31.1 Organization . 31.2 Reliability Prediction from a Historical Perspective . 41.2.1Traditional Approach. 51.2.2Physics-of-Failure Approach.161.2.3Recent Approach: RAMP.181.3 Reliability Modeling and Prediction Today .201.3.1Competing Mechanisms Theory .231.3.2FaRBS .241.3.3MaCRO .251.4 Summary.252Electron Device Physics of Failure.272.1 Electromigration .272.1.1Introduction .272.1.2Basic Physics Process of EM.282.1.3Statistical Models of EM .352.2 Hot Carrier Degradation.402.2.1Introduction .402.2.2Hot Carriers.402.2.3Hot Carrier Injection Mechanisms .422.2.4HCD Models .452.2.5Acceleration Factors .522.3 Time-Dependent Dielectric Breakdown .522.3.1Introduction .522.3.2Physics of Breakdown .532.3.3Oxide Breakdown Models .622.3.4Acceleration Factors .652.4 Negative Bias Temperature Instability .652.4.1Introduction .652.4.2NBTI Failure Mechanisms .662.4.3NBTI Models .703Failure Rate Based SPICE (FaRBS) Reliability Simulation .733.1 Introduction.733.2 Modules and the Process of FaRBS.73ix

Table of Contents3.33.43.53.2.1Sensitivity Analysis .743.2.2SPICE Simulation .753.2.3Wearout Models.763.2.4System Reliability Model .77Parameter Extraction Model .77Derating Voltage and Temperature for Reliability.783.4.1Circuit Design and Simulation .793.4.2Simulation Results and Analysis .80FaRBS Application: An Analog-to-Digital Converter Reliability Simulation.943.5.1Introduction .943.5.2ADC Circuits.953.5.3FaRBS Analysis of ADC Reliability.1044Microelectronic Circuit Reliability Analysis and MACRO .1114.1 Introduction.1114.2 Hot Carrier Injection.1114.2.1Failure-Equivalent Circuit Model.1164.3 Time-Dependent Dielectric Breakdown .1244.3.1Failure Equivalent Circuit Model.1304.4 Negative Bias Temperature Instability .1414.4.1Failure Equivalent Circuit Model.1504.5 MaCRO Application: An SRAM Reliability Simulation and Analysis .1544.5.1Introduction .1544.5.2SRAM Circuit Design and Simulation.1544.5.3Preview of SRAM Failure Behaviors.1614.5.4Device Lifetime Calculation.1624.5.5SPICE Reliability Simulation with Circuit Models .1664.5.6Reliability Design Techniques.1764.5.7Summary .1775Microelectronic System Reliability.1795.1 Introduction.1795.2 Individual Failure Mechanism Lifetime Models .1805.3 Microelectronic System Voltage and Temperature Acceleration.1835.3.1Non-Arrhenius Temperature Acceleration.1855.3.2Stress-Dependent Voltage Acceleration Factor .1865.3.3Combined Voltage and Temperature Acceleration Factor .1885.4 Quali cation Based on Failure Mechanism.1885.5 Summary.189References .199x

EXECUTIVE SUMMARYThis handbook presents a physics-of-failure approach to microelectronics reliability modelingand assessment. Knowledge of the root cause and physical behavior of key failure mechanisms inmicroelectronic devices has improved dramatically over recent years and has led to thedevelopment of more sophisticated reliability modeling tools and techniques. Some of these toolsare summarized here.Chapter 1 provides an overview of traditional reliability prediction approaches, i.e., MILHDBK-217 compared with some of the more recent reliability modeling and predictionapproaches, including Reliability Aware Micro-Processor (RAMP) Model, Failure Rate BasedSPICE (FaRBS) reliability simulation, and Maryland Circuit-Reliability Oriented (MaCRO)simulation. Chapter 2 describes the intrinsic wearout mechanisms of the electron device, includingphysics processes, mechanisms and models of electromigration (EM), hot carrier degradation(HCD), time-dependent dielectric breakdown (TDDB), and negative bias temperature instability(NBTI). In Chapter 3, the modules and processes of FaRBS reliability simulation, model parameterextraction, and derating of voltage and temperature for reliability are described. Sensitivity analysisand spacecraft, planets, instrument, C-matrix, events (SPICE) simulation of the wearout models arealso discussed. To account for the effect of wearout mechanisms on circuit functionality andreliability, the device-level accelerated lifetime models are extended to microelectronic circuitlevel applications and an analog-to-digital converter reliability simulation using the FaRBSapplication is provided. Lifetime and failure equivalent circuit models for HCI, TDDB, and NBTIare presented in Chapter 4, Microelectronic Circuit Reliability Analysis and MaCRO. This chapterincludes an illustrative case study for the purpose of demonstrating how to apply MaCRO modelsand algorithms to circuit reliability simulation, analysis, and improvement. The most commoncircuit structures used in reliability simulations are the ring oscillator, the differential amplifier, andthe Static Random Access Memory (SRAM). The SRAM is selected as a case study vehicle toshow the applicability of MaCRO models and algorithms in circuit reliability simulation andanalysis. Chapter 5, in conclusion, describes the microelectronic system aspect of reliability,including impact to the system of individual failure mechanism lifetime models, voltage andtemperature acceleration, and qualification based on failure mechanism and application. A failure-1

Executive Summarymechanism-based qualification methodology using specifically designed stress conditions overtraditional approaches (i.e., one voltage and one temperature) can lead to improved reliabilitypredictions for targeted applications and optimized burn-in, screening, and qualification test plans.2

1INTRODUCTION1.1OrganizationMicroelectronics integration density is limited by the reliability of the manufactured product ata desired circuit density. Design rules, operating voltage, and maximum switching speeds arechosen to ensure functional operation over the intended lifetime of the product. To determine theultimate performance for a given set of design constraints, reliability must be modeled for itsspecific operating condition.Reliability modeling for the purpose of lifetime prediction is, therefore, the ultimate task of afailure physics evaluation. Unfortunately, existing industrial approaches to reliability evaluationfall short of predicting failure rates or wearout lifetime of semiconductor products. This is mainlyattributed to the lack of a unified approach for predicting device failure rates, and the fact that allcommercial reliability evaluation methods rely on the acceleration of a single, dominant failuremechanism.Over the last several decades, knowledge of the root cause and physical behavior of the criticalfailure mechanisms in microelectronic devices has grown significantly. Confidence in historicalreliability models has led to more aggressive design rules that have been successfully applied to thelatest Very Large Scale Integration (VLSI) technology. One result of improved reliability modelinghas been accelerated performance; that is, performance beyond the expectation of Moore’s Law. Aconsequence of more aggressive design rules has been a reduction in the significance of a singlefailure mechanism. Hence, in modern devices, there is no single-failure mode that is more likely tooccur than any other within a range of specified operating conditions. This is practicallyguaranteed by the integration of modern simulation tools in the design process. The consequenceof more advanced reliability modeling tools is a new awareness that device failures result from acombination of several competing failure mechanisms.3

1 Introduction1.2 Reliability Prediction from a Historical PerspectiveReliability modeling and prediction is a relatively new discipline. Only since World War II hasreliability become a subject of study due to the relatively complex electronic equipment usedduring the war and the high failure rates observed.Since then, there have been two different approaches for reliability modeling corresponding todifferent time periods. Until the 1980s, the exponential, or constant failure rate (CFR), model [1]had been the only model used for describing the useful life of electronic components. It wascommon to the six reliability prediction procedures that were reviewed by Bowles [2] and was thefoundation of the military handbook for reliability prediction of electronic equipments (known asthe Military-Handbook-217 [MIL-HDBK-217] [3] series) that became the de facto industrystandard for reliability prediction. Although the CFR model was used without physicaljustification, it is not difficult to reconstruct the rationale for the use of the CFR model, whichmathematically describes the failure distribution of systems wherein the failures are due tocompletely random or chance events. Throughout that period, electronic equipment complexitybegan to increase significantly. Similarly, the earlier devices were fragile and had several intrinsicfailure mechanisms that combined to result in a constant failure rate.During the 1980s and early 1990s, with the introduction of integrated circuits (ICs), more andmore evidence was gathered suggesting that the CFR model was no longer applicable. Phenomenasuch as infant mortality and device wearout dominated failures; these failures could not bedescribed using the CFR model. In 1991, two research groups, IIT Research Institute/HoneywellSSED and the Westinghouse/University of Maryland teams, both recommended that, on the basisof their research and findings, the CFR model should not be categorically applied [4] to furtherupdates of MIL-HDBK-217. They further recommended that the exponential distribution shouldnot be applied to every type of component and system without due awareness.The end of the CFR as a sole model for reliability modeling was officially set with thepublication of the “Perry Memo.” Responding to increasing criticism of CFR, Secretary of DefenseWilliam Perry issued a memorandum in 1994 that effectively eliminated the use of most defense4

1 Introductionstandards, including the MIL-HDBK-217 series. Many defense standards were cancelled at thattime and, in their place, the Department of Defense (DoD) encouraged the use of industrystandards, such as the ISO 9000 series for quality assurance.Since then, the physics-of-failure approach has dominated reliability modeling. In thisapproach, the root cause of an individual failure mechanism is studied and corrected to achievesome determined lifetime. Since wearout mechanisms are better understood, the goal of reliabilityengineers has been to design dominant mechanisms out of the useful life of the components byapplying strict rules for every design feature. The theoretical result of this approach is, of course,that the expected wearout failures are unlikely to occur during the normal service life ofmicroelectronic devices. Nonetheless, failures do occur in the field and reliability prediction hashad to accommodate this new theoretical approach to the virtual elimination of any one failuremechanism limiting the useful life of an electronic device.1.2.1Traditional ApproachMIL-HDBK-217The first brick of all traditional (empirical) reliabililty improvement methodologies was laid withMIL-HDBK-217. It was published in 1965 to achieve the following goals:xTo organize the reliability-data collected from the field.xTo find the basis for better designs.xTo give the “quantitative reliability requirements.”xTo estimate the reliability before full-scale production [5].MIL-HDBK-217 soon became a standard; it was subsequently updated several times to keeppace with technology advancement as well as the changes in prediction procedures. Meanwhile,other organizations started to develop their own prediction models suitable for their own industries.5

1 IntroductionIn the 1990s, attempts were focused on finding an electronic system reliability assessmentmethodology, including causes of failures, that could be used in the design and manufacturing ofelectronic systems. To cover the vast range of electronic devices, the notion of a “similar-system”was invented. The term “similar-system” refers to a system that uses similar technology and is builtfor similar application, or performs a similar function. The next step was to find whether the“similar-system” was used for existing field data. The data from a predecessor system could beused to generate the prediction of a new “similar-system” to the extent that the new generation wasevolutionary (not revolutionary). The key process was the translation of the almost-old data to thenew similar-system by considering the differences reflected in complexity and temperature, as wellas the environmental and learning factors [5].The last version of MIL-HDBK-217 (MIL-HDBK-217F) covers a wide range of majorelectronic component categories used in modern military systems, from microcircuits and discretesemiconductors to passive components such as resistors and capacitors [6]; for each of these areas,the handbook presents a straightforward equation for calculating the failure rate in failures permillion hours. According to its claim, the goal of the handbook is to “establish and maintainconsistent and uniform methods for estimating the inherent reliability of the mature designs ofmilitary equipment and systems” [3].It is possible to classify the concepts behind the traditional MIL-HDBK-217F predictionprocedures as:1. The constant-failure-rate: The constant-failure-rate reliability model is used by most of theempirical-electronic reliability prediction approaches. The failure rate of the system containingdifferent components is the summation of its components, which means that all systemcomponents are in series.2. factors: Almost all of the traditional prediction methods have a base failure rate modified byseveral factors. Microcircuits, gate/logic arrays, and microprocessors incorporate stressmodels as a combination of package and parts.Examples of factors include CF(Configuration Factor), E (Environmental Factor), and Q (Quality Factor).6These

1 Introductionmultiplication factors are included in the total failure rate calculation, Equation (1.1); aredefined in MIL-HDBK-217F; and are based on different configuration levels, environmentalstress levels, and quality levels for the part.3. Two basic methods for performing reliability prediction based on the data observation includethe parts count and the parts stress analysis. The parts count reliability prediction method isused for the early design phases, when not enough data is available but the numbers ofcomponent parts are known. The information for parts count method includes generic parttypes, part quantity, part quality levels (when known or can be assumed), and environmentalfactors. The general expression for item failure rate with this method is:(1.1)where S is the total failure rate, g is the failure rate of the ith generic part, Q is the quality factor ofthe ith part, Ni is the quantity of the ith generic part, and n is the number of the generic partcategories. If the parts operating in the equipment are operating in more than one environment, theabove equation is applied to each portion of the equipment in a distinct environment. The overallequipment failure rate is obtained by summing the failure rates for each environment.The part stress model is based on the effect of mechanical, electrical, and environmental stressand duty cycles, such as temperature, humidity, vibration, etc., on the part failure rate. The partfailure rate varies with applied stress and the strength-stress interaction determines the part failurerate [7]. This method is used when most of the design is complete and the detailed part stressinformation is available. It is applicable during later design phases as well. Since more informationis available at this stage, the result is more accurate than the part count method. An example of themicroelectronic circuit part stress is:(1.2)7

1 Introductionwhere p is the part failure rate and C1, C2 are the complexity of the die base failure rate (such asthe number of gates) and the complexity of the package type (such as pin count), respectively; T isthe temperature acceleration factor for the related failure mechanism; E is the environmenta

reliability applications where long life with exceptional reliability is critical. As design rules have become tighter, power consumption has increased and voltage margins have become almost non-existent for the designed performance level. In achieving the desired performance levels, the li