Mike Canney - Wireshark

Transcription

Mike CanneyApplication Performance Analysis1

contactWelcome to Sharkfest ‘122Mike Canney,Principal Network Analyst, Tektivity, s.com

agendaAgenda3 So why focus on the application? Creating a CDA (Capture to Disk Appliance) Using Pilot for “back in time” troubleshooting with yourCDA and Wireshark Application QA Lifecycle Top Causes for Application Performance issues––––Application TurnsTCPLayer 7 IssuesTCP Retransmissions Using Wireshark to create custom profiles to troubleshootCIFS/SMB

focusSo why focus on the Application?4 In many cases it is the Network Engineers thathave the tool set to help pinpoint where theproblem exists. “It’s not the Network!” - The Network is guiltyuntil proven innocent. Application performance issues can impactyour business/customers ability to makemoney. User Response time is “Relative”. Intermittent performance issues (movingtarget).

The “moving target”target Analyzer placement - Two options5– Move the analyzers as needed– Capture anywhere and everywhere To defend the Network multiple capturepoints of the problem is the best solution.

Commercial vs. Free Capturecapture Define your capture strategy6– Data Rates– What are my goals? Troubleshooting vs.Statistical information.– Do I need to capture every packet?

Capture to Disk Appliance (on a budget)budget What is needed?7– dumpcap is a command line utility includedwith the Wireshark download to enable ringbuffer captures.– Use an inexpensive PC or laptop (best tohave 2 NICs or more).– Basic batch file to initiate capture.– Cascade Pilot (optional but recommended)

exampleDumpcap Example8cd \program files (x86)\wiresharkdumpcap -i 1 -s 128 -b files:100 -b filesize:2000000 –w c:\traces\internet\headersonly1.pcapThis is a basic batch file that will capture offof interface 1, slice the packets to 128bytes, write 100 trace files of 2 Gigabytes,and write the trace file out to a pcap file.

trace fileSo why did I write multiple 2 Gig trace files?9 Pilot! Pilot can easily read HUGE trace files. This allows us to utilize our CDA in waysno other analyzer can. I personally have sliced and diced 50 GBtrace files in Pilot in a matter of seconds.

practiceSo how does this all work together?10 Directory full of 2GB trace files, all timestamped based on when they were writtento disk. User calls in and complains that “thenetwork” is slow. Locate that trace file based on time anddate and launch Pilot.

demoInstructor Demo11Troubleshooting user“Network Issue”

Think about the possibilities hmmm From a multix GB trace file we were able to:– Look at the total Network throughput.– See what applications were consuming thebandwidth.– Identify the user that was responsible for consumingthe bandwidth.– Identify the URI’s the user was hitting and what theresponse times were.– Drill down to the packets involved in the slow webresponse time in Wireshark. All in a matter of a few seconds.12

Why are there so many application issues? Applications are typically developed in a“golden” environmenthelp– Fastest PCs– High Bandwidth/low latency13 When applications move from test (LAN)to production (WAN) the phone startsringing with complaints coming in.

qa cycleThe Application QA Lifecycle14 In most organizations, applications go through aQA process Typical QA/App developers test the following:––––Functional testsRegression testsStress tests (server)Rinse and Repeat What is often missing is “Networkability” testing All QA Lifecycles should include Networkabilitytesting

testingApplication Networkablility Testing15 Identify key business transactions, number ofusers and network conditions the applicationwill be deployed in. Simulation vs. Emulation– Simulation is very quick, often gives you roughnumbers of how an application will perform overdifferent network conditions.– Emulation is the only way to determine when anapplication will “fail” under those conditions. A Combination of both is recommended.

top 5Top Causes for Poor Application Performance16 Application TurnsTCPLayer 7 BottlenecksCongestion (network)Processing Delay

turnsCauses for Slow Application Performance17Application Turns

turnsApplication Turns18 An Application Turn is a request/responsepair For each “turn” the application must waitthe full round trip delay. The greater the number of turns, theworse the application will perform over aWAN (latency bound).

App TurnBeginEnd19

Example in WiresharkDisplay Filter:882 Application Turns in this trace20

latencyApp Turns and Latency21 It is fairly easy to determine App Turnsimpact on end user response time– Multiply the number of App Turns by theround trip delay: 10,000 turns * .050 ms delay 500 seconds dueto latency Note, this has nothing to do withBandwidth or the Size of the WAN Circuit

causeSo what causes all these App Turns?22 Size of a fetch in a Data Base call Number of files that are being accessed Loading single images in a Web Pageinstead of using an image map Number of bytes being retrieved and howthey are being retrieved (block size)

Causes for Slow Application PerformacetcpTCP23

sizeTCP Window Size24 The TCP Window Size defines the host’sreceive buffer. Large Window Sizes can sometimeshelp overcome the impact of latency. Depending on how the application waswritten, advertised TCP Window Sizemay not have an impact at all (more onthis later).

inflightTCP Inflight Data25 The amount of unacknowledged TCP datathat is on the wire at any given time. TCP inflight data in limited by thefollowing:– TCP Retransmissions– TCP Window Size– Application block size The amount of TCP inflight data will neverexceed the receiving devices advertisedTCP Window Size.

TCP Inflight Data in Wireshark26

TCP Inflight Data in WiresharkThe Bytes in Flight Column shows us how much payload is in each packet.27

TCP Inflight Data in WiresharkThe Bytes in Flight Column shows us how much payload is ineach packet.28

TCP Inflight Data in Wireshark29

TCP Inflight Data in WiresharkGraphed in Excel30

Easier way for SMB/CIFS31

tcpTCP Retransmissions32 Every time a TCP segment is sent, aretransmission timer is started. When the Acknowledgement for thatsegment is received the timer is stopped. If the retransmission timer expires beforethe Acknowledgement is received, theTCP segment is retransmitted.

tcp flowTCP Retransmissions33 Excessive TCP Retransmissions can havea huge impact on application performance. Not only does the data have to get resent,but TCP flow control (Slow Start) kicks intoaction.

demoApplication Performance34Layer 7 Bottlenecks

ulpULPs (upper layer protocols)35 TCP often gets blamed for the ULPs problem.– The application hands down to TCP amount of data to go retrieve(application block size)– TCP then is responsible for reliably getting that data back to theapplication layer TCP has certain parameters in which to work with and can usuallybe tuned based on bandwidth and latency Many times too much focus is put on “tuning” TCP as the fix forpoor performance in the network If the TCP advertised receive window is set to 64K and the application isonly handing down to TCP requests for 16K, where is the bottleneck?

ULPs (upper layer protocols)ulpCase in point: CIFS/SMB36

Cifs/smbTroubleshooting CIFS/SMB37 Arguably the most common File Transfermethod used in businesses today. SMB was NOT developed with the WAN inmind. One of the most “chatty” protocols/applications I run into (with the exception ofpoorly written SQL).

CIFS/SMB Quiz What is faster using MS File Sharing?quiz– Pushing a file to a file server?– Pulling a file from a file server?38

ULPs (upper layer protocols)39

ULPs (upper layer protocols)40

cifs/smbCIFS/SMB41 What is faster using MS File Sharing?– Pushing a file to a file server?– Pulling a file from a file server? SMB Write (Pushing the file) can almost be 2X asfast as pulling (SMB Read) Depends on the Latency

CIFS/SMB Tuningtuning SMB Maximum Transmit Buffer Size42– Negotiated MaxBufferSize in the NegotiateProtocol response– Default for Windows servers is typically16644 (dependent upon physical memory)– Client default typically 4356

CIF/SMB Tuning43

CIFS/SMB Tuning Caveat:tuning– SMB is extremely dependent upon the API44 Even though you set the max buffer size to 64K,windows “share” data will always get truncated to60K (61440) even though the server can support64K

CIFS/SMB Tuningtuning Custom SMB APIs45– The Windows limitation can be exceeded byprograms written to use SMB as they filetransfer protocol

CIFS/SMB TuningNote  the  SMB  writes  of  65,536This  is  a  file  transfer  using  a  custom  API  on  a  Windows  XP  machine46

CIFS/SMB Tuning (Preallocation)Prealloca:on  sets  the  file  info  for  SMB  Writes  and  can  dras:cally  reduce  someof  the  “cha@yness”  of  SMB47

demoInstructor Demo of SMB Profiles48Demo of SMB Tracefiles

My personal SMB Profile49

pointsTake Away Points50 Building your own CDA is easy to do andmay fit in a majority of the areas you needto capture from Pilot, Pilot, Pilot, it’s not just a fancyreporting engine for Wireshark! Test your applications “Networkability”before they hit production. Use the Wireshark Profiles, they will saveyou a ton of time.

Mike CanneyPrincipal Network Analyst51

– Look at the total Network throughput. – See what applications were consuming the bandwidth. – Identify the user that was responsible for consuming the bandwidth. – Identify the URI’s the user was hitting and what the response times were. – Drill down to the packets involved in