Transcription
Mike CanneyApplication Performance Analysis1
contactWelcome to Sharkfest ‘122Mike Canney,Principal Network Analyst, Tektivity, s.com
agendaAgenda3 So why focus on the application? Creating a CDA (Capture to Disk Appliance) Using Pilot for “back in time” troubleshooting with yourCDA and Wireshark Application QA Lifecycle Top Causes for Application Performance issues––––Application TurnsTCPLayer 7 IssuesTCP Retransmissions Using Wireshark to create custom profiles to troubleshootCIFS/SMB
focusSo why focus on the Application?4 In many cases it is the Network Engineers thathave the tool set to help pinpoint where theproblem exists. “It’s not the Network!” - The Network is guiltyuntil proven innocent. Application performance issues can impactyour business/customers ability to makemoney. User Response time is “Relative”. Intermittent performance issues (movingtarget).
The “moving target”target Analyzer placement - Two options5– Move the analyzers as needed– Capture anywhere and everywhere To defend the Network multiple capturepoints of the problem is the best solution.
Commercial vs. Free Capturecapture Define your capture strategy6– Data Rates– What are my goals? Troubleshooting vs.Statistical information.– Do I need to capture every packet?
Capture to Disk Appliance (on a budget)budget What is needed?7– dumpcap is a command line utility includedwith the Wireshark download to enable ringbuffer captures.– Use an inexpensive PC or laptop (best tohave 2 NICs or more).– Basic batch file to initiate capture.– Cascade Pilot (optional but recommended)
exampleDumpcap Example8cd \program files (x86)\wiresharkdumpcap -i 1 -s 128 -b files:100 -b filesize:2000000 –w c:\traces\internet\headersonly1.pcapThis is a basic batch file that will capture offof interface 1, slice the packets to 128bytes, write 100 trace files of 2 Gigabytes,and write the trace file out to a pcap file.
trace fileSo why did I write multiple 2 Gig trace files?9 Pilot! Pilot can easily read HUGE trace files. This allows us to utilize our CDA in waysno other analyzer can. I personally have sliced and diced 50 GBtrace files in Pilot in a matter of seconds.
practiceSo how does this all work together?10 Directory full of 2GB trace files, all timestamped based on when they were writtento disk. User calls in and complains that “thenetwork” is slow. Locate that trace file based on time anddate and launch Pilot.
demoInstructor Demo11Troubleshooting user“Network Issue”
Think about the possibilities hmmm From a multix GB trace file we were able to:– Look at the total Network throughput.– See what applications were consuming thebandwidth.– Identify the user that was responsible for consumingthe bandwidth.– Identify the URI’s the user was hitting and what theresponse times were.– Drill down to the packets involved in the slow webresponse time in Wireshark. All in a matter of a few seconds.12
Why are there so many application issues? Applications are typically developed in a“golden” environmenthelp– Fastest PCs– High Bandwidth/low latency13 When applications move from test (LAN)to production (WAN) the phone startsringing with complaints coming in.
qa cycleThe Application QA Lifecycle14 In most organizations, applications go through aQA process Typical QA/App developers test the following:––––Functional testsRegression testsStress tests (server)Rinse and Repeat What is often missing is “Networkability” testing All QA Lifecycles should include Networkabilitytesting
testingApplication Networkablility Testing15 Identify key business transactions, number ofusers and network conditions the applicationwill be deployed in. Simulation vs. Emulation– Simulation is very quick, often gives you roughnumbers of how an application will perform overdifferent network conditions.– Emulation is the only way to determine when anapplication will “fail” under those conditions. A Combination of both is recommended.
top 5Top Causes for Poor Application Performance16 Application TurnsTCPLayer 7 BottlenecksCongestion (network)Processing Delay
turnsCauses for Slow Application Performance17Application Turns
turnsApplication Turns18 An Application Turn is a request/responsepair For each “turn” the application must waitthe full round trip delay. The greater the number of turns, theworse the application will perform over aWAN (latency bound).
App TurnBeginEnd19
Example in WiresharkDisplay Filter:882 Application Turns in this trace20
latencyApp Turns and Latency21 It is fairly easy to determine App Turnsimpact on end user response time– Multiply the number of App Turns by theround trip delay: 10,000 turns * .050 ms delay 500 seconds dueto latency Note, this has nothing to do withBandwidth or the Size of the WAN Circuit
causeSo what causes all these App Turns?22 Size of a fetch in a Data Base call Number of files that are being accessed Loading single images in a Web Pageinstead of using an image map Number of bytes being retrieved and howthey are being retrieved (block size)
Causes for Slow Application PerformacetcpTCP23
sizeTCP Window Size24 The TCP Window Size defines the host’sreceive buffer. Large Window Sizes can sometimeshelp overcome the impact of latency. Depending on how the application waswritten, advertised TCP Window Sizemay not have an impact at all (more onthis later).
inflightTCP Inflight Data25 The amount of unacknowledged TCP datathat is on the wire at any given time. TCP inflight data in limited by thefollowing:– TCP Retransmissions– TCP Window Size– Application block size The amount of TCP inflight data will neverexceed the receiving devices advertisedTCP Window Size.
TCP Inflight Data in Wireshark26
TCP Inflight Data in WiresharkThe Bytes in Flight Column shows us how much payload is in each packet.27
TCP Inflight Data in WiresharkThe Bytes in Flight Column shows us how much payload is ineach packet.28
TCP Inflight Data in Wireshark29
TCP Inflight Data in WiresharkGraphed in Excel30
Easier way for SMB/CIFS31
tcpTCP Retransmissions32 Every time a TCP segment is sent, aretransmission timer is started. When the Acknowledgement for thatsegment is received the timer is stopped. If the retransmission timer expires beforethe Acknowledgement is received, theTCP segment is retransmitted.
tcp flowTCP Retransmissions33 Excessive TCP Retransmissions can havea huge impact on application performance. Not only does the data have to get resent,but TCP flow control (Slow Start) kicks intoaction.
demoApplication Performance34Layer 7 Bottlenecks
ulpULPs (upper layer protocols)35 TCP often gets blamed for the ULPs problem.– The application hands down to TCP amount of data to go retrieve(application block size)– TCP then is responsible for reliably getting that data back to theapplication layer TCP has certain parameters in which to work with and can usuallybe tuned based on bandwidth and latency Many times too much focus is put on “tuning” TCP as the fix forpoor performance in the network If the TCP advertised receive window is set to 64K and the application isonly handing down to TCP requests for 16K, where is the bottleneck?
ULPs (upper layer protocols)ulpCase in point: CIFS/SMB36
Cifs/smbTroubleshooting CIFS/SMB37 Arguably the most common File Transfermethod used in businesses today. SMB was NOT developed with the WAN inmind. One of the most “chatty” protocols/applications I run into (with the exception ofpoorly written SQL).
CIFS/SMB Quiz What is faster using MS File Sharing?quiz– Pushing a file to a file server?– Pulling a file from a file server?38
ULPs (upper layer protocols)39
ULPs (upper layer protocols)40
cifs/smbCIFS/SMB41 What is faster using MS File Sharing?– Pushing a file to a file server?– Pulling a file from a file server? SMB Write (Pushing the file) can almost be 2X asfast as pulling (SMB Read) Depends on the Latency
CIFS/SMB Tuningtuning SMB Maximum Transmit Buffer Size42– Negotiated MaxBufferSize in the NegotiateProtocol response– Default for Windows servers is typically16644 (dependent upon physical memory)– Client default typically 4356
CIF/SMB Tuning43
CIFS/SMB Tuning Caveat:tuning– SMB is extremely dependent upon the API44 Even though you set the max buffer size to 64K,windows “share” data will always get truncated to60K (61440) even though the server can support64K
CIFS/SMB Tuningtuning Custom SMB APIs45– The Windows limitation can be exceeded byprograms written to use SMB as they filetransfer protocol
CIFS/SMB TuningNote the SMB writes of 65,536This is a file transfer using a custom API on a Windows XP machine46
CIFS/SMB Tuning (Preallocation)Prealloca:on sets the file info for SMB Writes and can dras:cally reduce someof the “cha@yness” of SMB47
demoInstructor Demo of SMB Profiles48Demo of SMB Tracefiles
My personal SMB Profile49
pointsTake Away Points50 Building your own CDA is easy to do andmay fit in a majority of the areas you needto capture from Pilot, Pilot, Pilot, it’s not just a fancyreporting engine for Wireshark! Test your applications “Networkability”before they hit production. Use the Wireshark Profiles, they will saveyou a ton of time.
Mike CanneyPrincipal Network Analyst51
– Look at the total Network throughput. – See what applications were consuming the bandwidth. – Identify the user that was responsible for consuming the bandwidth. – Identify the URI’s the user was hitting and what the response times were. – Drill down to the packets involved in