Application Performance Management

Transcription

Institute of Software TechnologyReliable Software SystemsApplicationPerformanceManagementState of the Art andChallenges for the FutureAndré van HoornMario MannDušan OkanovićChristoph HegerTutorial @ 8th ACM/SPEC International Conference on Performance Engineering,April 22, 2017, L‘Aquila, Italy

Who Are These Guys?André van HoornMario MannDušan OkanovićChristoph HegerUniversity of StuttgartNovaTec ConsultingUniversity of StuttgartNovaTec ConsultingInst. of Softw. Techn.Reliable SW SystemsApplicationPerformanceManagementInst. of Softw. Techn.Reliable SW SystemsApplicationPerformanceManagement

Performance Problems are OmnipresentAn unexpectederror occuredTry again laterTemporarilynot availableWe areexperiencingheavy demandPlease visit usagain later more capacityis on the wayApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Influence of Poor Performance on the Success of Businesses

“Application performance management (APM),as a core IT operations discipline, aims toachieve an adequate level of performance duringoperations. To achieve this,APM comprises methods, techniques, and tools for continuously monitoring the state of an applicationsystem and its usage, as well as for detecting, diagnosing, and resolvingperformance-related problems using the monitoreddata.”„C. Heger, A. van Hoorn, D. Okanović, M. Mann.:Application performance management: State of the art and challenges for the future.In: Proc. 8th ACM/SPEC ICPE, ACM (2017)

AgendaIntroductionto APMAPM ToolsAPMAdvancedDiscussionApplication Performance Management: State of the Art and Challenges for the Future45 min45 15 min60 min15 min4/22/2017

André van Hoorn und Stefan Siegl.Application Performance Management (APM). Continuous Monitoring of Application Performance.(Poster in ObjektSpektrum magazine; in German)Order for free: http://www.sigs-datacom.de/wissen/fachposter.html

1.Collecting Data from All System Levels Agents collect data from all system levels On application level the agents are often technology-dependentWhere?What?How?

Trace-based Metrics (Selection)What?MetricResponse TimeCPU TimeMethod NameReturn TypeLogging LevelSQL StatementError Message Okanović, D., van Hoorn, A., Heger, C., Wert, A., Siegl, S.:Towards performance tooling interoperability: An open format for representing execution traces.In: Proc. EPEW ’16. LNCS, Springer (2016)

Monitoring (Measurement-based Performance tionServer(fictional abaseServer4.9sData AnalysisVisualizationDataRecorderApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Measurement Approaches and Techniques (Examples) Event-driven measurementHow? Event: a change in system state e.g., incoming requests, access to HDD, operation execution, throwing exceptions Calculate performance metrics whenever an event occurs Simplest metric: counter Tracing (also event-driven) Recording a certain part of the system state when an event occurs Example: Code adjustments printf – easy, but with questionable maintainability Bytecode engineering Aspect-oriented Programming (AOP) Example: Sampling Measurement (state or counter) is performed in specified time intervals Through overhead and increased resource consumption, measurementsinfluence the system!Application Performance Management: State of the Art and Challenges for the Future4/22/2017

2.Reconstructing Information from Data Data is collected from the system represented as time series Application Performance Management: State of the Art and Challenges for the Future4/22/2017

2.Reconstructing Information from Data Data is collected from the system represented as time series and as detailed execution traces, andused to support problem analysis

Traces in an APM ToolApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

3.Visualization Through Navigable Views High quantity of information has to be pre-processed It has proven useful to use different views to show the data Views are navigable and can be categorized by both scope and detail level

Example: Application Topology Discovery and Visualization AppDynamicsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

4.Interpreting and Using the InformationManual or automated conclusions and actions can be derived from theinformation, e.g., Problem detection and alerting E.g., increased response times and resource utilization Detection, for instance, based on thresholds and baselines Problem diagnosis and root cause isolation E.g., N 1 problem, too many remote calls, poor DB queries Detection based on monitoring information System refactoring and adaptation E.g., auto-scaling in cloud-based architecturesApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Dynamic Software Analysis and ApplicationPerformance Management http://kieker-monitoring.netApplication Performance Management: State of the Art and Challenges for the Future1/20/201622

Institute of Software TechnologyReliable Software SystemsApplicationPerformanceManagementPart 2 – APM ToolsAndré van HoornMario MannDušan OkanovićChristoph HegerTutorial @ 8th ACM/SPEC International Conference on Performance Engineering,April 22, 2017, L‘Aquila, Italy

Contents Commercial APM tools Magic quadrant Timeline of APM tools Goals of APM tools Architecture EUM, Server, Database Monitoring How APM tools can help you to identify problems Open Source APM toolsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Commercial APM Tools

Magic QuadrantSource: Gartner (2016)

Timeline of ToolsWily – CA 08Kieker2017CiscoCommercial toolsOpen source toolsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Goals of APM ToolsEnd-UserPerspectiveEnd to EndMonitoringSmartAnalyticsLifecycle byDesign@DynatraceApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Architecture – DynatraceWeb Server /Node.js/ NGINX / PHPJava.NETMainframe,Native, Agent/PurePathCollector GroupMonitoring CollectorDynatrace Backend ServerPerformance WarehouseDatabaseDynatraceClientDynatrace Frontend ServerWeb UI Dashboards@DynatraceDynatrace Sessions

Sensors – How Do They Work? Every sensor (custom and OOTB) instruments Java/.NET methods Code gets added to each method that matches the sensor rule to measure execution time capture method arguments and return values count method invocations capture exceptionsUninstrumented Application@DynatraceApplication Performance Management: State of the Art and Challenges for the FutureInstrumented ApplicationExecution Timeof Diagnosis Code4/22/2017

Architecture – nload/attachments/34271888/dbmondplussingleapp appd architecture.png?version 4&modificationDate 1487053783658&api v24/22/2017

Sensors – How Do They Work?10 ms10 ms10 msApplication Performance Management: State of the Art and Challenges for the Future10 ms4/22/2017

Application Overview – DynatraceShow the total Tiertime with Process,Host, andTransactionUEM Visits grouped byChannel and ApplicationApplications andInfrastructure OverviewAnalysis dashboardsDeeper analysis@DynatraceApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Application Overview – AppDynamicsTiers pplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Real User Monitoring – AppDynamicsEnd User MonitoringApplication Performance sApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Real User Monitoring – DynatraceSource: 9465d2cea98e055f606e.webpApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

End User Monitoring – on Performance Management: State of the Art and Challenges for the Future4/22/2017

Server Monitoring – AppDynamicsAvailabilityCPU UsageMemoryUsageNetworkProcessesVolumes

Database Monitoring – AppDynamics

@AppDynamicsAlerting – AppDynamics

Dashboarding@DynatraceApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

How APM tools can help to identify problems?User ExperienceBusinessTransactionPotential Issues@AppDynamicsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

How APM tools can help to identify problems?Top findings to optimizeperformanceFilter@Dynatrace

Challenges Share configurations between stages Identify business transactions and map this to functional levelApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Open Source APM tools

Weaknesses of Commercial APM ToolsLicensing costs Typical model: x per agent scaling?Microservices[Abbildung: http://blog.wso2.com]Mobile Revolution[Abbildung: https://uxmag.com]Internet of Things[Abbildung: Christian Hinkelmann, http://nahverkehrhamburg.de]Vendor Lock-InFlexibilityInteroperabilitySustainability Adaption to ownneeds Sources Stopp development Tools for analyse Bug fixing OtherAPM tools Change of strategy ofAPM vendorApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Open Source APM toolsMonitoring &Application Deep DivePerformanceModelingReal User MonitoringLoad TestingSystem &Resources MonitoringLow-LevelPerformance ProfilingJRatJMemProfWeb PerformanceAnalysis

What is inspectIT?Platform-principlepen Source Integration with tools ExtensibilityPlatf rmOpen-Source APM solution Development since 2005 Open Source since 20158 2Application Performance Management: State of the Art and Challenges for the FuturePareto-principle Focus on main functionality Experience of APM projects4/22/2017

Gathering and Visualizing Timeseries DataData CollectorsCustom CodeGraphing Tools Query & VisualizePersist dataTime Series DatabasesApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

DashboardsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Dashboards

The Flaw of AveragesGil Tene – https://www.youtube.com/watch?v lJ8ydIuPFeUThe Flaw of Averages: Why We Underestimate Risk in the Face ofUncertaintySam L. Savage, with illustrations by Jeff Danziger –http://flawofaverages.comApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

DashboardsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

AlertingApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Anomaly Detection & AlertingApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

What is Web Performance Analysis about?End User Experience?SatisfiedToleratingPerformance Performance issuesFunctional ErrorsNetwork (Local ISPs, Mobile network carriers)Third party content providersFrustratedUser Actions First user actionLast user actionDid the visit convert?Did the visit bounce?Users DeviceResolutionBrowser VersionsGeolocation[Image on Performance Management: State of the Art and Challenges for the Future4/22/2017

Measuring End-User Experience

Testing and Monitoring Web PerformanceReal UserTestingSyntheticMonitoringReal UserMonitoringExternal testing with all major Continuous external testing Real End Users browsers, Known testing nodes Real Interaction operating systems, Different Locations (ISP, Networks) Real traffic mobile devices No baseline traffic requiredand real world data Competitor Benchmark Availability testing (incl. 3rd Party)e.g., every 30 mins from different nodesQoS OptimizationQoS ValidationQoS ExpectationAnomalyApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

ance optimization.php?test 160623 DF MV7&run 1&cached 0]

Server MonitoringSource: Screen-Shot-2016-08-29-at-16.11.23.pngApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Benefits of Open Source Tools No licensing costs Growing community Pick the tools which work for your needs and combine themApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Challenges From perspectives of industry and research

Institute of Software TechnologyReliable Software SystemsApplicationPerformanceManagementPart 3 – APM AdvancedAndré van HoornMario MannDušan OkanovićChristoph HegerTutorial @ 8th ACM/SPEC International Conference on Performance Engineering,April 22, 2017, L‘Aquila, Italy

Problem DiagnosisDetect problemInterpretation ofmeasurementsComponentDeep-DiveApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Problem: Too Many Traces to AnalyzeLots of data toanalyzeApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

diagnoseIT Overviewhttp://diagnoseit.github.io/Node 1JVMMonitoring ToolDynamicInstrumentationApp 1SUTAPI LabelingTracesInstrumentationRefinementRequestApp 2LocationIdentificationResultResult QueryNode nInstrumentationQuality ManagerC. Heger, A. van Hoorn, D. Okanović, S. Siegl, and A. Wert.Expert-guided automatic diagnosis of performance problems in enterprise applications.In Proc. EDCC ’16. IEEE, 2016.4/22/2017

Agenda APM interoperability Anti-patterns and trace based detection Mobile Overhead control Results aggregation Business integrationApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

APM Interoperability.datXconvert.itds.mapApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

APM tconvertOPEN.xtraceApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Data Export in APM ble Application Performance Management: State of the Art and Challenges for the Future4/22/2017

Comparison table csDynatraceNew RelicResponseTime CPU Time MethodName Return Type LoggingLevel SQLStatement ErrorMessage D. Okanovic, A. van Hoorn, C. Heger, A. Wert, and S. Siegl.Towards performance tooling interoperability: An open format for representing execution traces.In Proc. EPEW ’16, pages 94–108, 2016.4/22/2017

OPEN.xtrace Traces Execution trace is a data structure that captures control flow of methodexecution for a request served by the system (Ammons et al., Callables Method executions DB calls Remote calls Logging ErrorsSubTrace Application Performance Management: State of the Art and Challenges for the Future4/22/2017

OPEN.xtrace What is available: Trace model based on the available data in these tools Adapters to convert the data between OPEN.xtrace and tools Serialization Planned: OPEN.timeseries More on OPEN.xtrace D. Okanovic, A. van Hoorn, C. Heger, A. Wert, and S. Siegl. Towards performancetooling interoperability: An open format for representing execution traces. In Proc.EPEW ’16, pages 94–108, 2016. https://research.spec.org/apm-interoperability/ https://github.com/spec-rgdevopsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

OpenTracing Track what is going on in a complex, heterogeneous systems Capture important events during processing of client request Avoid vendor lock-in Wide industry support (http://opentracing.io)Application Performance Management: State of the Art and Challenges for the Future4/22/2017

diagnoseIT OverviewNode 1JVMMonitoring ToolDynamicInstrumentationApp 1SUTAPI LabelingTracesInstrumentationRefinementRequestApp 2LocationIdentificationResultResult QueryNode nInstrumentationQuality ManagerC. Heger, A. van Hoorn, D. Okanović, S. Siegl, and A. Wert.Expert-guided automatic diagnosis of performance problems in enterprise applications.In Proc. EDCC ’16. IEEE, 2016.

(Performance) AntipatternsAnti-patterns are conceptuallysimilar to patterns, howeverdescribe recurrent solutions todesign problems which, however,may have a negative effect ondifferent software qualityattributes. Performance AntipatternsKoenig, A. (1998). “Patterns and Antipatterns”. In: The PatternsHandbooks. Ed. by L. Rising. New York, NY, USA: CambridgeUniversity PressApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Example 1: One-Lane Bridge[Smith and Williams, 2001] Problem„[ ] a point in the execution where one, or only a few, processesmay continue to execute concurrently. All other processes must wait. [ ]” Cause „It frequently occurs in applications that access a database. Here, a lock ensuresthat only one process may update the associated portion of the database at atime. It may also occur when a set of processes make a synchronous call to anotherprocess that is not multi-threaded; all of the processes making synchronouscalls must take turns “crossing the bridge.”“ Solution“Shared Resources principle[:] responsiveness improves when we minimize thescheduling time plus the holding time. Holding time is reduced by reducing theservice time for the One-Lane Bridge, and by rerouting the work.”

Example 1: One-Lane Bridge – Causes Synchronization in source-code:public synchronizedvoid buyItems (Collection Item items) { } Thread- and connection- pools in application servers Connector . maxThreads "300" acceptCount "150". / Application Performance Management: State of the Art and Challenges for the Future4/22/2017

Example 1: One-Lane Bridge – SymptomsResponse time[ms]403020100110 20 30 40 50Number of usersCPU-utilization[%]100806040200110 20 30 40 50Number of usersApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Example 2: N 1-Problem

Performance-Antipatterns– ClassificationsParsons, T. (2007): “Automatic Detection ofPerformance Design and DeploymentAntipatterns in Component BasedEnterprise Systems”. PhD thesis.Wert, A. (2015). Performance ProblemDiagnostics by SystematicExperimentation, PhD thesis.

Analysis automationApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

List of Currently Detectable Antipatterns N 1 Query Problem Circuitous Treasure Hunt The Stifle The Ramp Traffic Jam More is Less Application Hiccups Garbage Collection HiccupsApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Detecting Antipatterns in Mobile acingspansBufferOPEN.xtraceGUIApplication Performance Management: State of the Art and Challenges for the Future4/22/2017

Antipatterns in Mobile Environment Anti-patterns Too many remote calls Too many remote calls to the same URL Too many remote calls to the same server Hard-disk utilization too high Hard-disk usage increases too fast (ramp) RAM utilization too high RAM usag

Apr 22, 2017 · @Dynatrace Applications and Infrastructure Overview Analysis dashboards UEM Visits grouped by Channel and Application Show the total Tier time with Process, Host, and Transaction Deeper analysis Application Performance Management: State of the Art and Challenges