Neill Doertenbach QualMark Corporation

Transcription

Highly Accelerated Life Testing – Testing With a Different PurposeNeill DoertenbachQualMark CorporationBiographyMr. Doertenbach holds an Electrical Engineering degreefrom CSU in Fort Collins, Colorado. He has a widerange of industry experience, including digital hardwaredesign, software and Quality Assurance, and is the OVSTechnical Sales Manager of QualMark Corporation.AbstractThis paper describes the technique of HALT – HighlyAccelerated Life Testing – and the advantages gainedby using the technique. HASS – Highly AcceleratedStress Screening –is also introduced and described. Thepaper begins with a discussion of the HALT philosophyand how it differs from traditional Design VerificationTesting (DVT). The advantages of the technique arehighlighted. The process of HALT is described indetail, with emphasis on contrasting HALT with DVTand the logic behind the differences. The discussion ofthe technique will include preparing for the test,fixturing, the sequence of the applied stresses and thepost-test activities. HASS is introduced, including thedevelopment of a screen, proof of screen and fixturemapping.Keywords: HALT, HASS, Accelerated Life Testing,Accelerated Stress Screening, DVT, ESS, Proof ofScreen.Overview and DefinitionsIn recent years, the test techniques known as HALT(Highly Accelerated Life Testing) and HASS (HighlyAccelerated Stress Screening) have been gainingadvocates and practitioners. These test methods, quitedifferent from standard life testing, design verificationtesting and end-of-production testing, are becomingrecognized as powerful tools for improving productreliability, reducing warranty costs and increasingcustomer satisfaction. This paper provides a basicdescription of these techniques, highlights thedifferences between these techniques and moreconventional testing and provides a guideline for theirimplementation. IEST, 2000 proceedings, February, 2000.HALT is a test that is performed on a product as part ofthe design process. Typically it is performed on aproduct when pre-pilot or pilot run units are available,before the design verification testing begins. DuringHALT, a product is stressed far beyond itsspecifications as well as far beyond what the productwill encounter in a typical use environment. The actualfunctional and destruct limits of the product are foundand pushed out as far as possible. These limits are usedas the basis for the implementation of HASS during theproduction of the product. HASS is a production screentest, performed on products built as part of theproduction process. Since HALT is required for theimplementation of HASS, HALT will be discussed first.HALT vs. DVT - the difference is the purposeWhen first exposed to the concept of HALT, manydesign engineers are skeptical of the method. Much ofthis skepticism stems from the fact that these engineersare used to doing standard life testing and designverification testing, and the HALT methods differ sodramatically from these conventional methods that theyseem to be almost at odds with them. The key tounderstanding the value of HALT lies in understandingthe basic difference in the purpose of the testing beingdone. The basic purpose of Design Verification Testing(DVT) is well understood - it is to demonstrate that theproduct meets its specifications, and to demonstrate thatthe product will function in its intended environment.DVT is considered successful when all the tests arepassed, with no failures detected.The purpose of HALT is dramatically different. InHALT, the goal is to over-stress the product and to veryquickly induce failures in the product. By applyingthese stresses in a controlled, stepped fashion, whilecontinuously monitoring the product for failures, thetesting results in the exposure of the weakest points inthe design. At the completion of HALT, the functionaland destruct limits of the product are known, and a“laundry list” of design and process limitations aredefined, with corrective actions often defined as well.In short, the goal of HALT is to quickly break theproduct and learn from the failure modes the productexhibits. The key value of the testing lies in the failuremodes that are uncovered and the speed with which theyare uncovered. HALT is considered a success whenPage 1

failures are induced, the failure modes are understood,corrective action has been taken, and the limits of theproduct are clearly defined and pushed out as far aspossible. Unlike DVT, HALT is not a pass/fail test. Itis a process of discovery and design optimization.Although these failure modes are induced by stresses inexcess of specification, they are typically valid failuremodes that would show up in the product in the field. Afull failure analysis of all modes found will helpconfirm this. The important thing to remember is thatHALT is finding the weakest parts of the design. Theseweak links will be the source of warranty problems inthe field. The controlled over-stresses applied duringthe HALT process simply accelerated the precipitationof these failures to allow early detection and correction.The advantage of HALT is that it quickly finds failuremodes that would not be brought out in DVT. A typicalHALT will take only 3 to 5 days.Because the purpose of the tests is so clearly different,HALT is not intended to replace DVT. It is true thatHALT will find most, if not all, of the failure modesthat would show up in DVT (along with many more).However, HALT will not provide you with thedocumented evidence that you often need to prove thatyour product meets specification. By doing HALTbefore DVT is started, you help insure that your DVTwill be completed in one pass, with no defects found.This will greatly speed your time to market, avoidingthe slow process of repeating DVT until no morefailures are precipitated and detected.Choosing HALT stresses and equipmentThe basic concept of HALT can be implemented usingmany different stresses. However, the stresses mostoften used are thermal extremes, extreme thermal ratesof change, vibration and the combination of thermal andvibration. Other stresses, such as voltage margining,frequency margining, power supply loading and powercycling can also be applied, resulting in additional validfailure modes being exposed.It is worth remembering that HALT is not intended todemonstrate that a product will function in its intendedenvironment. Consequently, the stresses do not attemptin any way to duplicate those expected in “real life”.Rather, the stresses are specifically designed to quicklybring out failure modes. This logic affects the choice ofchamber used to apply the stresses as well as the type ofvibration fixturing used and the routing of the air flowthrough the product. Given that extreme stresses are tobe applied, the chamber must be capable of reachingboth hot and cold thermal extremes, executing very fast IEST, 2000 proceedings, February, 2000.thermal ramps and providing high vibrational energythat will quickly bring out failure modes. This, ofcourse, precludes the use of mechanical refrigerationsystems.The vibration system that has been proven to be themost effective for HALT is a Repetitive Shock (RS)system with a wide frequency and acceleration rangeand 6 degree-of-freedom vibration. In order to rapidlyand effectively bring out failure modes it is important toexcite the product at the resonant frequency of allassemblies, sub-assemblies, components and leads andlegs of components in the product, regardless of whatthat resonant frequency, or the orientation of theassembly or component may be. An RS shaker,designed to provide energy from 2 Hz to 10,000 Hz willdo this most effectively.Preparing for HALT and planning the testOnce the purpose of HALT is understood and accepted,the process and stresses used during the testing begin tomake more sense. Because the stresses applied areincreased until failure occurs, it is not necessary to test alarge population of product to insure that a failure modewill be found. A relatively small sample - typically 4 to6 units - is adequate.This number will allowverification of a failure mode in more than one unit aswell as providing for a spare or two in the event of acatastrophic failure of a unit under test. In order topreserve these samples and get as much information aspossible from them, the stresses are applied startingwith the least destructive and going to the mostdestructive. For the thermal and vibration stresses, thismeans starting with cold step stressing, then hot, thenrapid thermal extremes, then vibration, followed by afinal combined thermal/vibration environment.If the product being tested is more complex than simplya single board or small system, then one of the firstquestions to consider is what level of the product to test.In general, the goal of HALT is most effectively met bytesting at the lowest possible subassembly. Card cagesor other assemblies can dampen vibration and block airflow, reducing the stresses applied to subassembliesinside them. Of course, the trade-offs of functionalityand testability must be considered. Also, there will beinterconnect circuitry and connections that may not betested at the subassembly level. An ideal HALT on acomplex product would include HALT on allsubassemblies, with a final HALT on the upper levelassembly as well.The functional test equipment used during HALT isextremely important. Since the value of HALT is thePage 2

detection of failure modes induced, it is critical to beable to detect the failures when they happen. Thismeans that the units under test must undergo completediagnostics while they are being stressed. Muchvaluable information will be lost if the product isstressed without being monitored, then removed fromthe stress and tested at ambient. By testing understress, you will be able to detect “soft” failures that onlyshow up under a particular stress or combination ofstresses. These soft failures define the operating limit ofyour product, and can be the source of troublesome “nodefect found” failures when the product reaches thefield.The vibration fixturing used in HALT is very differentfrom that used when testing with typical ElectroDynamic (ED) or hydraulic shakers. In HALT, thefixture is not designed to mimic the real-life mountingof the product. Instead, it is designed to maximize thetransmission of energy into the product to speed theprecipitation of failures.This results in simple,inexpensive fixturing with the goal of simply clampingthe product to the vibration table as tightly as possible.Figure 1 shows a typical product fixtured in a HALTchamber. To maximize air flow through the product aswell as to improve the transmission of the lowfrequency energy, the product is set up on aluminum uchannel rather than being placed directly on the tabletop. The u-channel across the top of the product and theall-thread rod and nuts clamp the product to the table.Air flow through the product is also planned with theHALT goal in mind. Using flexible air ducts, the airflow is routed to maximize the temperature rate ofchange on the thermally sensitive parts of the productand to insure that all parts of the product experiencemaximum temperature extremes. The normal air flow IEST, 2000 proceedings, February, 2000.through the product during use is not considered whenthe ducting is designed. If necessary, holes should becut in the product’s case to allow sufficient air flowacross its components.To aid in failure analysis and to insure that the stressesare being coupled into the product effectively, it isimportant to instrument the product under test.Thermocouples should be placed at key points on theproduct, and accelerometers can be placed on boardsand subassemblies to evaluate the transmission ofenergy into the product.However, the actualaccelerometer placement should be delayed until afterthe thermal portion of the stressing is complete, sincethe accelerometers would be exposed to stress levelsthat may shorten their life.A final, important part of the HALT setup is to clearlydefine what parameters in the product will bemonitored, and what constitutes a failure. This fairlyobvious step in the test process can be easily missed,making the interpretation of HALT findings moredifficult.Margin Discovery – the core of HALTWith the test set up, the process of Margin Discoverycan begin. As mentioned above, HALT will uncoverthe operational and destruct limits of your product.During testing, the stress is steadily increased in astepwise fashion, with a complete functional test done ateach step. The operational limit is defined as the stressnecessary to cause a product to malfunction, but theproduct returns to normal operation when the stress isremoved. Essentially, it is the point of “soft” failure.The destruct limit, as you may guess, is the level ofstress necessary to cause a permanent, or “hard” failureto occur. The difference between these limits and youroperating specifications is your margin for thatparticular stress. As the failure modes are found andeliminated that are responsible for these limits, you pushthe limits further and further out, maximizing yourmargins and increasing your product’s life andreliability.Figure 2 graphically represents these limits. The stressapplied is shown in the X axis, with number of failuresshown in the Y axis. The curve drawn around each ofthe limits represents the distribution of the failure that isresponsible for that particular limit. The operatingspecifications and margins are also shown.This figure can be helpful in gaining an intuitiveunderstanding of the value of HALT. Consider a failuremode – say, a high ripple on the output of a powerPage 3

supply – that causes a unit under test to fail. If you wereable to test hundreds of units, you could see andunderstand the distribution on that failure mode, assketched on the graph. However, you do not typicallyhave that luxury. By increasing the stress until thefailure is seen, then it doesn’t matter where in thatdistribution the unit under test falls – the failure modewill be detected. If the tail of that distribution happensto fall in the operating specifications, then the failuremode would have been an out-of-box failure mode onsome fraction of your products. By doing HALT andstressing to failure, you will find the failure modewithout having to hope that your sample size is bigenough to exhibit the failure within operatingspecifications.But, what if the tails of the distribution are well outsideof the product specification, as shown on the graph? Isthe high ripple a failure mode that can be ignored?Consider for a moment what happens to this distributionand limits as your product ages in the field.Components fatigue and begin to drift out ofspecification, power cycles and lightning strikes stressthe product, and these limits begin to creep in. If youhave chosen to ignore the failure, then you will find thatit is one of the first failures to begin showing up inwarranty issues. By pushing the stress until the failureoccurs, you have effectively accelerated time,precipitating a failure mode in just a few days that couldhave taken months to come up in the field.As illustrated in the above example, a failure modefound beyond the operating limits of the product can,indeed, be a “valid” failure mode that could causewarranty problems in the future. However, it is alsoclear that you may find a failure mode that is completelydue to the extreme stress applied, and would neveroccur in the field. Consider a failure mode precipitatedby the softening of a plastic boss at high temperature. Abrief failure analysis will reveal that the distribution on IEST, 2000 proceedings, February, 2000.this failure mode is clearly understood, will never havea tail that is in the product specification, and will notshift with time and fatigue. Consequently, this failuremode can be safely ignored. Of course, the distributionon most failure modes is not that easily understood.This is one reason why a complete failure analysis isalways necessary on HALT failures. In general, it isunusual when a HALT failure can be safely ignored. Itis important to resist the urge to ignore a failure modesimply because it happened outside of the specificationfor the product.As you test to higher and higher extremes of stress,pushing limits further and further, an obvious questioncomes up – When do I stop testing? The stopping pointwill be either the limit of the test equipment, or thefundamental limit of the technology.1 This fundamentallimit is the point where multiple failures begin to occurwith small increases in stress. Failure analysis revealsfundamental and catastrophic failures across severaldevices, with corrective action being prohibitive orimpossible. In vibration testing, multiple componentsare coming off the board.With this understanding of the margin discoveryprocess, the process of margin discovery can begin. Asdescribed earlier, stresses are applied starting with theleast destructive and progressing to the most destructive.This helps conserve samples. Cold step is done first.Cold step testing begins at ambient temperature Thetemperature is dropped in 5 Co steps. At each step thetemperature is allowed to stabilize for 10 minutes. Thisdwell helps insure that the entire product is stabilized atthis temperature, and makes the testing more repeatable.At the end of 10 minutes, a full functional test of theproduct is done. If the product passes, the temperatureis dropped again, and the process repeated. When afailure occurs, the testing is stopped and aninvestigation into the failure is done. Often, once thefailure mode is defined, it is possible to “work around”the failure with a quick patch and continue testing,saving the intensive failure evaluation for later. Asdescribed above, this step process is continued until youreach the limits of your test equipment or until youreach the fundamental limit of the technology.After the cold step is completed, hot step testing is donein a similar manner. Again, testing is started at ambient,then increased in 5 Co steps . The dwell and functioaltesting are identical to those done in cold step testing.The third stress applied in HALT is rapid thermalextremes. Now, the product is functionally testedcontinuously while the product temperature is changedas rapidly as allowed by the chamber. The upper andPage 4

lower limits of these ramps are determined by theresults of the step stressing, and stay within theoperating limits found there (there is no point inrepeating failures that were found earlier). If theproduct cannot tolerate these maximum thermal ramps,then the ramp rate is decreased, and then increased in astepwise fashion, similar to the thermal step stressing.When failures are encountered, they are addressed in asimilar fashion as before.With the thermal only portion of the testing completed,the product is now exposed to vibration.Withaccelerometers applied to the product to verify adequateenergy transmission to the product, vibration testing isbegun at a stress level of 3 to 5 GRMS. Just like in thethermal phase, there is a 10 minute dwell, then acomplete functional test of the product is executed.Again, the stress is stepped up, in 3 to 5 GRMSincrements, until the chamber limit is reached or youbegin to see the catastrophic failures indicative of thefundamental limit of the technology.The final environment is combined thermal andvibration. Now, the temperature is ramped as it wasduring the “rapid thermal extremes” portion of thetesting, while the vibration is stepped up as it wasduring the vibration only portion.It is important to remember that the HALT will be mademore effective if additional stresses can be incorporated.By combining more and more stresses, you will bringout failure modes that may occur in the field only undera unique stress situation. This can eliminate a failuremode that could cause a lot of headaches if you wereforced to look for it using traditional methods, after theproduct was released.At the completion of the step stress testing, you willhave found many valuable failure modes for yourproduct. You will have a clear understanding of themargins in your product. You will know not only whatyour limits are, but WHY they are where they are,giving you a unique understanding of the weaknesses inyour product. After doing a root cause failure analysison all failures found and implementing correctiveaction, you can do a verification HALT to test yourfixes and make sure you have not introduced any new“weak links” in the design with your changes. In theend, you will have optimized the design of your productso that it will last as long as possible in the field.anyone who has seen a product into production knows,the production process can introduce many failuremodes that are not related to a faulty design, and thesustaining process can certainly introduce new designproblems. HASS is intended to catch these new failuremodes more quickly and more effectively than burn-inor other ESS testing done in production.Once again, an understanding of the purpose of the testis helpful. Burn-in is designed to weed out infantmortality in a product, aging it to induce early lifefailures before the product ships. HASS has a broaderpurpose. The goal in HASS is to verify that no new“weak link” has crept into the product since HALT thathas shifted either the operational or destruct limits foundin HALT.An important first step to setting up HASS is thecompletion of HALT on the product. The HASS limitswill be set based on the operational and destruct limitsfound in HALT. Prior to setting up HASS, it isimportant that corrective action has been implementedon all HALT failures and a verification HALT has beendone.The HASS process and equipmentThe equipment used to do HASS is similar to that usedin HALT, although often a larger chamber is used toaccommodate production quantities. The fixturing canbe quite different in HASS, simply to accommodate theproduction flow. The speed with which product can befixtured in the chamber becomes important, as well asmaximizing the number of products in the chamber.Quick release clamps are often used in lieu of nuts andbolts for securing the product.HASS – maintaining optimizationAn important part of designing a fixture for HASS is themapping of the fixture. The goal is to insure that thevibration and thermal stresses at each point in thefixture are roughly equal (although precise uniformity isnot important). Mapping the fixture involves takingaccelerometer and thermocouple readings on a productin each of the fixture locations. It is important thefixture is completely loaded with product for the test,since the load will affect the vibration characteristics.Thermal inconsistencies can be corrected by changingair flow through baffling or other air distributionchanges. Vibrational inconsistencies can be correctedthrough fixturing changes, with the introduction ofdampening materials or changes in clampingmechanisms.After your design is ruggedized through HALT and youhave completed DVT, you will begin production. AsDuring HASS, the stresses are applied simultaneously.Typically, the product is subjected to continuous IEST, 2000 proceedings, February, 2000.Page 5

vibration while the temperature is ramped between itslimits, with short dwells at the extremes.distribution looks like on these limits or where the tailsmay be. Consequently, a more empirical method isDefining the screenThe levels of the stresses to be applied during the screenare based on the limits found during HALT. There aretwo parts to the screen.3The first part is thePrecipitation screen. This screen stresses the productbeyond the operational limits and near the destructlimits found in HALT. It is intended to precipitatefailures in the product due to latent defects. Because theproduct is being stressed beyond its operational limit,you do not expect it to function properly, so no testing isdone on the product at this point. The product should bepowered, however, since applied power can be asignificant stress for the product in itself whencombined with the other stresses of HASS. The secondpart of the screen is the Detection screen. During theDetection screen the product is stressed to near theoperational limit found in HALT. Now, the product isbeing functionally tested. Any hard failures inducedduring the Precipitation screen will be detected, as wellas any soft failures that may be induced by the stresses.Figure 3 can provide an overview of the purpose andused. A baseline for the stresses is derived byguardbanding the limits found in HALT. Typically,vibration is reduced by 50% and thermal excursions arereduced by 20%.1,2 These limits can be used as astarting point for the Proof of Screen process.Proof of Screen (PoS) is a critical part of HASSimplementation. The goal of PoS is to demonstrate thatthe screen will reliably find defects without inducingfailures or significantly reducing the life of the product.The process of PoS is fairly straightforward. A sampleof product – typically a full chamber load – is runthrough the proposed HASS multiple times. Thesample includes some seeded failures – perhaps some“no defect found” failures from field trials. The finalconfiguration of the screen will depend on two factors –the number of cycles through the screen necessary toprecipitate the seeded failures, and the number of cyclesgood product is able to tolerate before exhibiting endof-life failures.Figure 5 demonstrates the logic behind PoS.limits of these screens. It shows the margin discoverycurves, overlaid with the Precipitation and Detectionscreens. The limits on the screens are set so that theyare outside of the tails of the distribution of the failuremode(s) that define the operational and destruct limitsfor the product. Consequently, product which has nonew latent failure modes should pass the screenundamaged. Any new failure mode, however, will beexposed. Figure 4 illustrates a typical thermal profilefor a HASS screen.There is one key problem with setting up the limits onthe screens from this data – the small sample size usedin HALT means that you really have no idea what the IEST, 2000 proceedings, February, 2000.Page 6Ideally,

one or two cycles through the screen will precipitate allthe seeded failures. This will yield a short, efficientscreen, typically lasting less than 2 hours. As Figure 5shows, if seeded failures are not precipitated untilseveral passes through the screen, then the severity ofthe screen should be increased. This part of the PoSverifies that the screen will reliably find defects.Multiple repetitions of the screen will demonstrate thatthe screen is not taking an unacceptable amount of lifeout of the product. Ideally, good product will tolerate20 to 50 passes through the screen without exhibitingfailures. If end-of-life failures are seen before 20 ormore cycles are complete, the screen may need to bereduced in severity. A rough estimation can be made ofthe amount of life being removed from the product bythe screen by simply comparing the number of cycles inthe proposed production screen to the number of cyclesnecessary to cause end of life failures to occur. Forexample, if your production screen consists of 2 passesthrough the precipitation and detection screens, andyour proof of screen showed that 20 cycles through thescreen induced no end-of-life failures, then your screenis removing less that 2/20, or 10%, of the useful life ofyour product.SummaryA clear understanding of the unique goals of HALT andHASS provides the basis necessary for introducing thetechniques into an R&D and production process. Thisunderstanding will also enable you to intelligently makechanges in the process. If carefully executed, the endresult will be increased product life and reliability,reduced warranty expenses, faster time to market anddelighted customers.1Hopf, A.M., “Highly Accelerated Life Testing forDesign and Process Improvement”, Sound andVibration, November, 1993, pp. 20-242McLean, H., “Exceeding the Limits of TraditionalReliability Tests”, Medical Device & DiagnosticIndustry, April, 19943“HALT and HASS, The New Quality and ReliabilityParadigm”, G.K.Hobbs, 1996. Published by HobbsEngineering Corporation, Available Upon RequestThe stress levels can be adjusted, or the vibration dutycycle can be changed, to achieve the proper balancebetween the number of cycles necessary to bring outdefects versus the amount of life being taken out of theproduct. If stresses are increased as a result of the PoS,the PoS must be repeated on new, unstressed samples.In reality, in can often be difficult to seed failuressufficiently to accurately verify that the screen will finddefective units. Consequently, it is typically necessaryto make a conservative estimate of the number of passesthrough the screen that are necessary, then tune thescreen after a reasonable population of product has beenthrough it. If you find that all of your failures are beingprecipitated in the first one or two passes through thescreen, then no more than two passes should benecessary. Conversely, if you are running 3 passesthrough the screen and are seeing equal failures in eachpass, you should either make the screen more aggressiveor increase the number of passes through the screen.Once your HASS process is defined and proven, it is notnecessarily “set in stone”. Product changes can bringacceptable changes in the limits, if they are understood.However, it is always important to base your decisionson a complete failure analysis and a thoroughunderstanding of the impact of the change. Rememberthat a verification HALT is a useful tool whenconsidering these changes. IEST, 2000 proceedings, February, 2000.Page 7

In recent years, the test techniques known as HALT (Highly Accelerated Life Testing) and HASS (Highly Accelerated Stress Screening) have been gaining advocates and practitioners. These test methods, quite different from standard life testing, design verification testing and end-of-production testing, are becoming