Improving Software Quality In SAP HANA - Undo

Transcription

Improvingsoftwarequality inSAP HANAwith Live Recorder from Undo

SAP is the market leaderin enterprise applicationsoftware. It helps 437,000businesses across 180countries manage theirbusiness operations andcustomer relations.Its flagship product is SAP HANA - a scalable,heavily multi-threaded, feature-rich in-memorydatabase built from millions of lines ofhighly-optimized Linux C code.SAP HANA forms the foundation of SAP’stechnology stack and its product portfolio. It isthe backbone of major businesses worldwide- making quality, stability and reliability a corerequirement for the engineering team.

A comprehensiveapproach to testingSAP HANA invests a considerable amount of resources into ensuringsoftware quality and reliability. The SAP HANA engineering team usescontinuous integration and fuzz-testingas part of its routine QA process.Fuzz-testing is a technique in which randomized test behaviorsare presented to the system under test, making it possibleto catch corner case defects that were not anticipatedby the system’s designers. Combined with internaland external consistency checks, this approachprovides a means to discover errors thatwould not be revealed by more traditionaltesting approaches.3

Test status: Testfailedstatus:Testfailedstatus: failedUnknown cause.UnknownTest runcause.SeptUnknownTest03 runcause.SeptTest03 run Sept 03Test status: Testsuccessstatus:Testfailedstatus: failedTest run Sept Unknown07cause.UnknownTest runcause.SeptTest14 run Sept 14Test status: Testfailedstatus:Testfailedstatus: successUnknown cause.UnknownTest runcause.Sept18TestTestrun runSeptSept07 18status:TestsuccessTest status: Testfailedstatus: failedTestTestrun runSeptSept07 14Unknown cause.Unknowncause. Test run Sept 18The challengeHowever, the resulting test failures proved challenging to diagnose, dueto a set of factors that are familiar to modern software vendors:Complex control flowHuge code baseNon-deterministic failuresDifficult to makeinferences about how afailure unfoldedCollaboration acrossteam is essential topinpoint a bugDifficult to reproducereliably in order toinvestigate the root causeThe non-deterministic nature of many of SAP HANA’s test failures meansthese failures could not be reliably reproduced on a developer’s machinefor debugging.“ The non-deterministic nature of many of SAPHANA’s test failures means these failures couldnot be reliably reproduced on a developer’smachine for debugging. ”4

Traditional softwaredefect resolutionmethods were notsatisfactoryBefore approaching Undo, SAP HANA developersinvestigated test failures using three primary methods:Analyzing logs from failed runsLogs helped to produce a partial picture as to why afailure happened, but often did not capture enoughof the right information for the root cause to beeasily identifiable.Reproducing failures on live systemsFor complex problems that neededto be debugged within a runningsystem, a developer had to reproducethe original failure on a live system- which for rare faults was a timeconsuming and unproductive useof resources.Developer collaborationWhen the above methods did not help,a group of developers with specializedknowledge would work together tofigure out the source of the problem. Butdevelopers could not reliably reproducetest failures on more than one machine; sodevelopers often did not see the sameprogram behavior.5

Solution: softwareflight recordingtechnologySAP HANA identified Live Recorder from Undo as asolution to make test failure results actionable by“closing the loop” between the defect manifestingitself and the root cause being understood.The aim was to reduce time to resolutionof software defects and resolve the mostchallenging defects which could not bediagnosed any other way.Live Recorder was used to record failedprocesses in test and capture failures ‘in the act’- providing engineers with a standalone, reproducibletest case in the form of a recording artifact.Recording files were then loaded up in UndoDB (LiveRecorder’s reverse debugging capability); and engineers usedUndoDB to replay the recording and analyze execution historyby inspecting the program state at any point in time.The SAP HANA team was able to quickly hone in on the rootcause of defects by navigating to the point of interest usingthe full functionality expected of modern debuggers (suchas scripting, conditional breakpoints and watchpoints, fullinspection of globals and locals, etc.) in both forwards andreverse execution.6

1. Record test runs2. Share recordings withdevelopmentint spline 0x4f23;while (!done);reticulate(spline);RECORD MODE3. Analyze recordingdown to instruction levelLive Recorder captures all non-deterministicdata (down to instruction level) and recreates theprocess’ entire memory and register state - ondemand and with minimal overhead.Recordings can then be shared among engineersand analyzed on a different machine to the one onwhich the error occurred.7

Debugging withLive RecorderLive Recorder can be activated with one simple command- making it easy to use Undo’s technology with minimumchanges to the SAP HANA team’s existing workflow.Live Recorder generates recordings of every testfailure, helping engineers find and fix defects asthe software is being written.Trying to reproduce intermittent defectsusing traditional methods can takedays, sometimes weeks or more. and often lead to dead ends.Instead, Live Recorder eliminatesthe guesswork in software defectdiagnosis by capturing bugs in theact - turning sporadic failures into100% reproducible test cases. TheSAP HANA engineering team is ableto get total visibility into what theirprogram did before it failed and why itfailed. It is allowing the team to significantlyaccelerate software defect resolution, whileimproving stability and code quality.Failures no longer need to be replicated on the machineon which they originally occurred: by sharing recordings,engineers can analyze an identical copy of the original failure,while collaborating on a fix. With hundreds of developers workingon the SAP HANA database across multiple countries, the SAPHANA engineering team can overcome language, communicationand time-zone barriers when fixing software defects - furtherenhancing the team’s responsiveness to issues that appear intesting and speeding up the development cycle.8

OutcomesLive Recorder from Undo has helped SAP HANA acceleratesoftware defect resolution by eliminating the guesswork insoftware failure diagnosis.In addition to this, SAP HANA engineers managed to captureand fix 7 challenging high-priority bugs, including: a number of sporadic memory leaks andmemory corruption defects incorrect flushing of a receive buffer incorrect parallel access to shareddata-structure a race condition in thetransaction managementcacheSAP HANA is committed to deliveringa reliable data management systemtheir customers can trust; and itsadoption of software flight recordingtechnology allows SAP HANA todeliver their latest innovation tocustomers faster.9

“ With Live Recorder, wewere able to dramaticallycut down the analysistime that is required tounderstand the root causeof very complex softwaredefects.“Dr. Alexander BöhmChief Development Architect, SAP HANALearn more on https://undo.io

Live Recorder Live Recorder can be activated with one simple command - making it easy to use Undo's technology with minimum changes to the SAP HANA team's existing workflow. Live Recorder generates recordings of every test failure, helping engineers find and fix defects as the software is being written. Trying to reproduce intermittent defects