XMBS: A New UPS Bypass Architecture - EnerSys

Transcription

XMBS: A New UPS Bypass ArchitectureDouglas MannessEnerSysSenior Systems EngineerBurnaby, BC V5J 5M4 CANADAdoug.manness@alpha.caAbstract - UPS systems are designed to provide clean ACpower for critical loads. In normal operation the utilityinput is “double converted” and supplied to the load,harmonic and transient free. There are also redundantsystems in case of utility failure: a battery source cansupply energy temporarily, and a backup generator cansupport long term outages. Combined, these ensure areliable energy source. To achieve very high availabilitythe inverter itself also requires redundancy, usuallyaccomplished through paralleling systems or modules,depending on whether it is a monolithic or modular system.And finally, some systems have static bypass circuits whichcan directly connect the utility to the load in case ofinverter failure.This presentation will include:Despite these efforts, there are still single points of failurethat can cause the load to be dropped. For example: powersurges (lightning strikes) can cause widespread damage,inverter control systems can malfunction, inter-modulecommunication can fail, or safety and protectionmechanism can experience nuisance trips. But mostfailures occur when transferring between power sources,especially operator error during maintenance operations.Reliable AC power is critical for many telecom, cable anddatacenter systems. Suppliers have responded with a widerange of UPS equipment that can be configured andoptimized for each application. While most equipment hascontinuously evolved, a critical component – themaintenance bypass – remains unchanged even though it isa single point of failure and, in practice, a significant factorin system reliability.When engineers at Alpha Technologies (an EnerSyscompany) analyzed the reliability and availability of atypical AMPs UPS installation, the opportunity for thebiggest improvement was determined to be themaintenance bypass component. The ideal system includesan internal bypass switch for synchronized source transfer,an external static bypass switch in case of inverter failure,and an external bypass switch to isolate the inverter systemfor maintenance.This document starts with an explanation of maintenancebypass, how it fits into the general system and, in particular,impacts reliability. A description of a new patent pendingtechnology, the “Smart Bypass” will then be introduced,including a summary of the features and benefits along witha description of the implementation and testing to validatethe claims. We will then return to the system level andcompare how the Smart Bypass compares with alternateproducts, and finally propose some potential futureapplications of the same technology to create the idealreliable system.This architecture has several problems. Firstly, the cost isprohibitive so that most systems omit the static bypass, andmany (especially smaller) systems exclude the externalbypass switch. Secondly, the operation of the internal andexternal switches depends on a sequence of manualoperations. Even with significant training this can still bechallenging, because it is infrequently practiced, and mustbe done urgently when the load power is interrupted.Finally, it turns out in practice that the Inverter and StaticTransfer Switches are not truly independent, and sometimesa failure in one can cause a failure in the other. To addressthese issues a new bypass architecture XMBS wasdeveloped, to be trialed in the market staring in Q2 of 2019.1. A practical review of the reliability of UPS.2. A discussion of the challenges of providing reliablepower in a manner that meets safety and regulatoryrequirements (EPO, back-feed, breaker configurations).3. The design of the XMBS including the technical factors,design decisions, and testing, that was part of thedevelopment of the XMBS.4. A summary of the current status, as well asexpectations, for how the XMBS fits into the UPSecosystem.I.INTRODUCTIONA technical note for those experienced in ReliabilityAnalysis. It is a challenge to collect accurate reliability data,because it isn’t collected or because it is proprietary. Also,the variety of possible architectures, each with complexcausal interactions between components, compounds thechallenge. Therefore, for both clarity and practicality in thisdocument, the author uses simplified designs which, whilethey omit components, are still believed to accurately reflectthe relative impact of those being discussed.For those not familiar with Reliability Analysis, I provide abrief caveat about MTBF at the end of the document.

II.BACKGROUNDA typical UPS system with traditional Maintenance Bypassis shown in Figure 1. The key functions of theMaintenance Bypass are (1) making a connection betweenthe AC source to the load and (2) isolating the UPS outputfrom AC so that it can be safely repaired or replaced.Additionally, to maintain continuous power to the load, theswitch must make the AC source connection first,momentarily allow the UPS, Utility, and Load to beconnected all together, and subsequently disconnect theUPS output. In industry language, a make before break(MBB) transition is required to avoid “dropping the load”.devices. Instead the Inverter output is first switched to theAC source internally, either using a static bypass switch oranother internal bypass.Figure 4: UPS System exposing Internal BypassFigure 1: Critical Power System with Maintenance BypassThe two common architectures of Maintenance Bypass Rotary switch and Breaker-Breaker - are shown below inFigure 2 and Figure 3 respectively.A Rotary Bypass consists of a rotor with cross connectionsand wipers for each electrical connection. As the rotormoves the appropriate connection sequence is made. Aninternal spring mechanism prevents the switch from stayingin an intermediate position, making the operation bothsimple and predictable. However, reasonable cost rotaryswitches are only available at lower current levels ( 250A)and they don’t have a high short-circuit current rating.Therefore, fuses or breakers are usually placed in serieswith each connection to act as protection devices.Figure 2: RotaryMaintenance BypassFigure 3: Breaker-BreakerMaintenance BypassBreaker-Breaker Maintenance Bypass design eliminates theneed for separate protection simply by using breakers as theswitches themselves; however properly sequencing thebreaker operation becomes a challenge. The standardsolution is to use mechanical breaker locks (referred to asKirk Key Interlocks after the major manufacturer), wherethe sequence is controlled by capture and release of keys.While allowing higher currents by using larger breakers,the disadvantage is the operation complexity requires moretraining and can still be confusing for operators, especiallywhen under the stress of trying to recover a load.Neither the Rotary Switch nor Breaker-Breaker bypass isactually suitable for arbitrarily switching the load becauseconnecting the inverter output to the AC source directlycould create large transient currents and trip the protectionIf you think having an both internal and external bypass isunnecessary and reduces reliability, I would agree; howeverhistorically it has been needed. With an internal bypass themanufacturer of the UPS takes responsibility for the ACsource-to-inverter transition, which includes protecting theload and inverter from any transients that might causeservice interruption or damage. Then the external bypassmust only switch between two sources which are in fact thesame, and of course it must be external to provide thenecessary isolation.III.COMMON CAUSE EFFECTSSignificant efforts and advancements in UPS technologyhave led to systems with high Availability reliability, andtheoretical analyses usually report 1M hours MTBF (114years). Anecdotal information indicates a much lowerpractical result. The reason seems to be the underestimated impact of problems which simultaneously effectmultiple, supposedly independent, components. Inreliability analysis these are categorized as “commoncause” problems and they include catastrophic failures,cascading failures, design flaws, and most of all, humanerror. The Uptime Institute, a respected consulting firmfocused on Datacenter Reliability has reported on thisregularly. In a 2016 Survey of Site Downtime they reportedthat 79% of electrical system failures were from UPS toload, and 49% of these were caused by humans. In their2018 Data Center Survey Results they reported 11% ofsurveyed datacenters had experience a power failureoutage. In their 2019 Webinar, “Data Center OutageTrends, Causes and Costs,” they listed the 7 main causes: :1.2.3.4.5.6.7.Lightning strikes, leading to surges ad lost power.Back-up software/configuration failed.Intermittent failures with transfer switches, leading tofailure or transfer to second data centerUPS failures and failure to transfer to secondarysystemOperator errors, turning off/misconfiguring powerUtility power loss and subsequent of failure ofgenerator or UPSDamage to IT equipment caused by power surgesIT equipment not equipped with dual power supplies tosecondary feed

Note that four of the leading causes were related to powertransfer between and configuration of between sources, twowere related to light strikes and power surges, and only onewas a lack of designed in backup.well as modelling and thorough testing confirmed there isenough margin (approximately 5mm) for a very reliablesystem.As a leading supplier of UPS systems to the cable TVmarket, we already understood the necessity for thoroughoperator training for using Maintenance Bypass switches,due to both the complexity of operation and the stressfulconditions that operators can be under. After doing aReliability Analysis of the current systems, the opportunityto significantly increase reliability led to a developmentprogram and eventually the “Smart Bypass”.IV.DESIGN SPECIFICATIONThe initial criteria set for the “Smart Bypass” design are allfocused on reliability:CriterionConditionCheckingEase ReasonThe bypass must verify conditions are suitablebefore switching and never drop a load.Operation should be obvious without (or at leastwith minimal) training.Fewer components in Breaker-Breaker bypassarchitectures make them inherently morereliable, and they are suitable for all powerlevels.A bypass transfer must predictably completeonce started so the required energy must bestored in advance and must not be affected byoperator interaction.The bypass must be as reliable as existingoptions, and due to the conservative nature ofthe market, demonstratable.The cost must be similar (within 20%) ofexisting bypass options due to cost sensitivity ofmarket.To meet these requirements a unique design was created,the “Smart Bypass” as shown in Figure 5. At the core ofdesign is one breaker mounted right-side up beside anothermounted upside down. Doing so means when both handlesare pushed in one direction one of the breakers turns on,while the other turns off. This allows a single actuator toperform the switching sequence with a single motion.Importantly, the actuating mechanism is a single separateassembly. Thus, when it is not operating, the reliability isequal to the breakers themselves, and should any problemarise, the entire assembly can be quickly replaced.In order to implement the Make-before-Break sequence theactuator needed a unique design. Rather than tightlycoupling to the breaker handles, the actuator has openingswithin which the handles are free to move. The geometryis such that when the actuator moves, the edge of onewindow first contacts and moves one breaker to the “on”position and only then does the second window contact thesecond breaker and turn it off. The design of the openingsand the speed of the actuator must take into accountgeometry of the breaker handle motion, acceptable forceson the breakers, motion from the bi-stable mechanisms ofthe breakers themselves. A tolerance stack-up analysis asFigure 5: Transfer Mechanism Modular AssemblyTo achieve the desired simple actuation and reliable bistable motion, two simple rotating linkages with aninterconnected spring are used. One link is the operatorhandle which extending through the case provides anobvious indication of the desired state. The other link is theactuator which rotates about the same axis at the breakerhandles. Because the spring is between the two links, itstably holds the switch in position until the handle is movedbeyond the center line of the actuator, and then the handleand actuator are driven in opposite directions to completethe motion.If all that were desired were to simplify the actuation of abreaker-breaker bypass the design would be complete atthis stage; however, preventing mis-operation requires away to prevent motion. Locking the handle position wasconsidered but not chosen for two reasons. Firstly, theoperator may try to force the handle and damage theswitch. Secondly if conditions were not suitable, theoperator would continuously have to try moving the handleuntil they were. Therefore, the internal actuator was fittedwith a latch instead. With this design the operator couldmove the handle to the desired state, and when conditionswere suitable, the switch would transfer.To meet the reliability requirement, one final feature wasadded. Because the control system represented a source oferror, an over-ride feature was needed. A sliding lever wasadded to each latch with access through a small hole in thefront cover. Although it circumvents the protectionsprovided by the design, this eliminates any possible singlepoint of failure.Figure 6: 100A Smart Bypass designed into 19" Rack

V.OPERATION & CONTROLLERAfter designing the mechanical system to be equallyreliable to a breaker-breaker bypass, the overall reliabilityand availability are increased though a fault-tolerant controlsystem. Some of the key features are:1.Mechanical over-ride features for operation withoutpower or in the unlikely event the controller fails.2.Redundant controller power supplies sourcedseparately from the Inverter source, Utility Source anda 48Vdc auxiliary supply.3.Load Protection hardware which prevents transfer ifany phase would experience a loss of power. Thiscircuit is made fail safe by only providing power to thelatch release under the right conditions and over-ridesall other checks.4.Phase Comparator circuit which compares the desiredand selected sources and prevents transfer unless theyare within 1 degree. (Industry standard is 15Vacdifference which corresponds to 7 degrees for 120Vac.1 degree achieves 2 Vac difference and providesmargin for changing phase). Phase synchronization isinitiated by a request signal from the bypass, manuallyby the operator at the UPS interface, or by opening theUPS input circuit breaker so that it free runs (known asan “opportunistic” transfer). In all cases the bypasstracks both the phase difference and the stability of thephase difference to make sure transfer will complete inideal conditions.5.6.The Phase Synchronization check ensures that thephase between the voltages of the correspondingphases of the AC sources are within an acceptableangle. Industry standard is that voltage differenceshould upon transfer should be less than 15Vac,corresponding to approximately 7 degrees for 120Vacsystem. The controller monitors the zero crossing ofcorresponding phases from the Connected and Desiredsource, and prevents transfer if they are above a setthreshold, for example 1 degree. In the preferredembodiment, this check is implemented in hardwarecircuitry so that regardless of microprocessor state thesynchronization check will function.Auto-transfer circuit detects power failure of theInverter and releases the switch to Utility mode.(Restores power to load after brief interruption similarto transfer switch). It includes a recovery feature forwhen AC power is lost and the Inverter batteries aredepleted. In this case, the auto-transfer will hold offupon return of AC power to allow Inverter time toinitialize.7.Checks the phase of the Utility and Inverter wiring tomake sure it is done properly. (L2 Utility not wired toL3 Inverter, for example)8.CAN bus-connected microprocessor controller whichmonitors voltages and currents of each port to providenear-revenue grade power metering capability throughtouchscreen graphical interface.9.Remote activation of switch though controller. Thiswould be used if redundancy of the inverter werecompromised if the alternate sources were deemed morereliable until technicians could be dispatched forrepairs.10. Self-diagnostic/Predictive Maintenance.Everyoperating cycle of the bypass is recorded in real timeand performance is compared to previous cycles. It hasbeen determined that wear-out failures can be readilyidentified with changes in timing. In addition, thebreaker position auxiliary contacts are wired to generatean alarm should breakers ever both be in the sameposition (on or off).Starting Position withswitch in “Bypass” modeHandle is up and actuator isdown with spring holdingthem stably in thesepositions. When theactuator is down the Utilitybreaker is on and theInverter breaker is off.Handle moved down torequest Inverter ModeThe geometry of the handleand spring now push theactuator up, however thelatch on the opposite side(not visible in this view)prevents motion.Transfer to Inverter ModeThe controller released thelatch and the actuator istravelling upward. Motionis deterministic and controlsthe make-before-breaksequence to ensure a briefbut guaranteed overlap.Final postiion with switchin “Inverter Mode”The swtich in the Bypassposition. Again the springholds the handle andactuator in a stable positionwhich is also guaranteed bythe latch.Figure 7: Operation Sequence of Smart Bypass Transfer

VI.SMART BYPASS COMPARED TOTRADITIONAL BYPASSA theoretical component-based analysis of the SmartBypass in manual operation compared to a traditionalbypass will yield very similar results because it isdominated by the breaker reliability. However, because thephase checking feature eliminates the need for the internalbypass, and because these elements are both in series withthe inverter, the Reliability of the path between the Inverterand the load is ver, this ignores the benefit of preventing operatorerror. If it is accepted that 50% of failures are related tooperator error and that operator error is eliminated byprotection features of the design, then combined with theimproved inverter path, the failure rate should be reducedby2/3rds.In addition to the Reliability improvement, the Availabilityimproves by the reduced time to repair using Auto transfer(according to calculations, from about 5 x 9’s to more than7 x 9’s). Typically, manually switching to bypass takes 5minutes for an on-site technician, however it can be muchlonger for off-site technicians, especially for remote sites.For distribution type equipment like that used in cable thedifference in customer satisfaction can be enormous. Ashort interruption can be tolerated, but extended issues arequickly escalated through the management hierarchy.VII.SMART BYPASS COMPARED TO STATICBYPASSA common alternative to the internal bypass, especially forlarger (non-modular) systems, is the static bypass. A staticbypass incorporates semiconductor devices to transfer frominverter to utility within a quarter-cycle of the ACwaveform. The key benefit is no interruption to the criticalload, which would seem preferable to the “Smart Bypass”one second transfer delay.Fundamentally however; a static bypass serves a differentfunction and has a different implementation than amaintenance bypass. Firstly, a static bypass is intended toincrease Reliability of the inverter system by creating aredundant path. To do this, it must be completely inparallel and independent and must include its own inputand output breakers. For monolithic inverters this bringsthe MTBF from the 150,000 hours range to 1M hours -- ahuge benefit. However, a similar improvement is notpossible for modular inverter systems which alreadyachieve similar levels of reliability using redundantmodules. One modular UPS manufacturer does include astatic bypass circuit within each module, but this is thoughtto be more for transient load surge capability. Compared toa monolithic inverter static bypass, it does not provide aparallel path for the main input and output breakers.Historically there have also been concerns about the use ofa static bypass. For very critical applications UPS systemsare always run in double-convert mode where the load isisolated from the AC source. Connecting the load to theutility to ride through transients is therefore not an idealstrategy. Secondly there have been cases of a Static Bypassfailure cascading to the inverter system, exposing thereality that no two systems connected in parallel are trulyindependent, regardless of the calculations. It wouldtherefore be recommended for a modular system to increasereliability using more redundancy (for example from N 1to N 2) rather than incorporate a static switch.Regardless of the above discussion, any system requires anisolating maintenance bypass for service, and the operatorproof protections of the Smart Bypass would benefit anyinstallation.VIII.SMART BYPASS COMPARED TOAUTOMATIC TRANSFER SWITCHAn Automatic Transfer Switch is another alternative to theAuto-transfer functionality of the Smart Bypass. These aretypically used for switching the input source (for examplefrom Generator to Utility) but could be also used for theoutput. The main advantage of an Automatic TransferSwitch is that they are motor driven and can be remotelycontrolled to cycle back and forth between two sources.The Smart Bypass in comparison is manually charged andcan only perform the Inverter to Utility change once andthen needs operator involvement. The trade-off in this casefor the Smart Bypass are cost and Reliability. At ½ to 1/3rdthe cost, and with far fewer components to fail, theincrement cost of a Smart Bypass over a TraditionalMaintenance bypass is simply more pragmatic.IX.TESTINGFunctional TestingDuring the development process a sophisticated testplatform was developed to monitor precise timing of themoving parts including breaker power contacts, breakerauxiliary contacts, actuator position, and handle position.This setup allowed for an exhaustive functional testprotocol to identify and resolve potential causes of failure,and to optimize operating margin.

A typical mechanical transfer test is shown in Figure 8below. The waveforms are: 60Hz AC (red), positiontransducer for actuator (cyan), actuator utility position(green), actuator inverter position (yellow). The total traveltime of 40 milliseconds is typical for the 400A version ofSmart Bypass currently under development. The overlaptime is not shown but is approximately 5 milliseconds.The scope capture image shown in Figure 9 shows a bypassto inverter transition. The blue trace is the Load Output(originally connected to Utility) while the Cyan trace in theInverter Output. Note that to make the transition visible,the Inverter Voltage was intentionally set differently fromthe Utility voltage. At point (a) the Inverter (Cyan) andUtility (Blue) merge, indicating that the inverter breakercontacts have first touched after which it takesapproximately 600 microseconds for the the contacts tosettle as per breaker specifcations. When the UtilityBreaker opens, 12 milleconds later, there is no visibletransition.Figure 8: Typical Transfer Waveforms 400A BypassCycle TestingAfter functional testing was complete, several systems weresubjected to cycle testing using a pneumatic actuator. Theanticipated required cycle life assumed operation twice permonth (including switch to and from bypass each cycle) for20 years is 500 cycles. The threshold for a criticaltransfer switch however is 10,000 cycles so this target wasselected. Five (5) systems of both 100A and 250A capacitywere assembled and tested, all terminated due to the samebreaker failure mode. The cycle life ranged from 8,80016,000 cycles, giving a 3 Sigma confidence interval of 980cycles.Figure 10: Make-before-Break Current WaveformsThe same transition is shown in Figure 10, this time with afocus on the current. The load voltage (pink) and current(yellow) remain constant. The Inverter current (Cyan) isinitially zero and starts when the Inverter Breaker closes.The inverter current is irregular while both breakers areclose as the inverter shares the load with the utility. Afterabout 12 milliseconds the utility breaker opens and theInverter supplies all of the load. When the utilitydisconnects the load, current settles for 0.5 millisecondswhile the inverter control loop stabilizes.While this far exceeds the real anticipated use, due to thecritical nature of this component along with risk aversion inthe industry, another test program with a larger number ofunits is underway.A second goal of the cycle test was to develop predictivemaintenance capability. For this three gradual failuremodes were detected: (1) the breaker reactive forcedecreases gradually as it nears end-of-life and suddenlyupon failure, (2) the actuator friction decreases initiallythen increases very slowly, and (3) the actuator friction ismuch higher if improperly assembled. The controllermonitors the actuator transit time (speed) and canpredict/detect these failures.Figure 9: Make-before-Break Voltage WaveformsFigure 11: AMPS HP2 Load at Enersys Burnaby FacilityBeta TestingThe technology is now being beta (field) tested: 250A Smart Bypass system has been in use at AlphaBurnaby location in Vancouver, Canada as part of theIT Backup system for 5 months 250A and 100A Smart Bypass systems have beeninstalled in the EnerSys demo center in Suwannee,Georgia for more than 8 months. 100A Smart Bypass has been shipped to an externalcustomer test facility for evaluation.A number of integrated systems, including AMPS HP2UPS systems, DC disconnects, 100A Smart Bypass, andrectifier system in a self-contained 19” box bay, are underdevelopment, while in parallel a rigorous test programcontinues in our EnerSys Burnaby lab.

X.FUTUREWhile the technology is still being proven as a directreplacement for the existing Maintenance Bypass, otheropportunities are being explored. For example, anAutomatic Transfer Switch for low cost applications wheremanual reset would be acceptable.XI.CONCLUSIONIn this paper a pragmatic review of bypass systems hasbeen presented and the Smart Bypass technology has beenintroduced. In order to safely repair any UPS system, itmust be isolated from the utility source and load, and thisinherently makes it a single point of failure. Whenreconfiguring power, either for maintenance but especiallyto restore a dropped load, there is significant stress on thetechnician. Though theoretically easy, executing the rightsequence under these conditions has historically proven tobe prone to error. The proposed technology takes aminimalist approach by adding error checking withoutdecreasing the reliability of the underlying architecture.Although it requires a manual operation to recharge theactuating spring after each use, the simple switchmechanism is low cost and reliable. Furthermore, byintegrating independent phase checking, the normallyrequired internal bypass can be eliminated, thus creating anoverall more reliable lower cost total system.An advanced feature of the Smart Bypass is the semiAutomatic Transfer. Compared to a Static Bypass, it doesnot increase the system Reliability because there is amomentary interruption to the load. However, for remotesites, or any installation where a technician is not readilyavailable, the automatic transfer can significantly improveAvailability. Preliminary feedback from the market is thatthe most important benefit is reducing the stress on therepair technician, resulting in avoidance of the rapid callescalation that occurs when sites remain down.Figure 12: 100A Smart Bypass, Armed for Auto Transferwith Handle Lockout Installed (wall mounted)XII.CAVEATS ABOUT MTBFIn preparing this document I have fielded many questionsabout system Reliability and Availability. The biggestconfusion is about MTBF, so I am including a few notes forthe interested reader.The common assumption is that MTBF is the expected timea product will last. There are three major problems withthis assumption. Firstly, the MTBF only applies to stressrelated failures and is only significant during the “UsefulLife” of a product, which is after defective componentfailure and before wear-out. It is a fact that something canhave a very low failure rate yet relatively short useful life,for example a cell phone on a single battery charge.FailureRateUseful LifeTimeThe second challenge with MTBF is that is only applies toa population of devices. In the population, as devices failthere are fewer remaining, so the population follows anexponential curve and by the time 1 MTBF period elapses63% of product has failed.10% failed at of 0.1MTBF50% failed at of0.7 MTBFOnly 37% stillworking at 1.0 MTBF

The third problem with MTBF is it is rarely an accuratemeasured value. Often it is a calculated estimate based onstandard parts, but this assumes that design quality isconstant. Theory is no substitute for testing or experience inthis case.REFERENCES[1]Uptime Institute Research. (Producer). (2019). Data Center OutageTrends, Causes and Costs [Video webinar].Retrieved ata-center-outagetrends-causes-and-updates.[2] Tom Gruzs, Emerson Network Power. Telecom & IT Power Telecom& IT Power [PowerPoint slides].Retrieved from pdf.[3] Uptime Institute Research. (Producer). (2018). 2018 Data CenterSurv

an external static bypass switch in case of inverter failure, and an external bypass switch to isolate the inverter system for maintenance. This architecture has several problems. Firstly, the cost is prohibitive so that most systems omit the static bypass, and many (especially smaller) systems exclude the external bypass switch.