Syslog Connector Performance Tuning

Transcription

Syslog ConnectorPerformance TuningGirish Mantry, Moehadi LiangTechnical Solutions Consultants Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningAgendaIn this session we will take a look at 2Syslog connector variantsConnector components and operationStages in the event flowPerformance bottlenecks and tuning at each stageOut of memory problems and tuningCustomer casesGeneral recommendations Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector PerformanceTuningSyslog connector variants, components, operation andevent flow Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningSyslog Connector VariantsNetworkListenersSyslogDaemonUDPRaw TCPDefault port 514SyslogNGDaemonArcSight CEFEncryptedSyslog (UDP)UDPRaw TCPTLSDefault port 1999UDPSymmetric KeyEncryptionDefault port 514Only CEF formatSupported on all platformsConfigurable interfaces and ports4FileReaders Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.SyslogPipeSyslogFileUnix PipeRegular FileSupported only on unix platformsWork in conjunction with the native syslog daemon

Syslog Connector Performance TuningSyslog Connector ComponentsDestination FlowDeviceType 1DeviceType 2SubagentC2ESMTransportMain FlowQueueRaw EventsDeviceType NC1SubagentC1Parsed EventsSubagentProcessed EventsC2CacheDestination FlowC1C2LoggerTransportCacheNote: Queuing only applies to network listeners and not for file readers5ESM Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.Logger

Syslog Connector Performance TuningEvent FlowEvent Reception Receives networkpackets on UDP/TCPsockets Extracts humanreadable syslog rawevents from networkpackets6Event QueuingEvent Parsing Raw events are written Raw events are pickedto a queue of files onup from the file queuethe file system in thein a FIFO manner andorder in which they areparsed using regularreceivedexpressions Information fromdevice log formatsnormalized intoArcsight event format Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.Event Processing Normalized events arecategorized andprocessed in manyways useful forcorrelation and assetmodeling Events are batched,filtered or aggregatedas required forefficiencyEvent Transport Enriched Arcsightevents are sent toESM/Loggerdestination Events cached whendestinations are downand resent when theyare back up

Syslog Connector PerformanceTuningPerformance Bottlenecks in the Event Flow and Tuning Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningEvent Reception Choice of Transport Protocol–UDP performs better on reliable networks–Use Raw TCP on unreliable networks–Use TLS for encrypted transport with Syslog NG Bottleneck (when dealing with Raw TCP or TLS)–Java applications do not know when a client closes the connection with a FIN–Connections remain idle in a CLOSE WAIT state until closed explicitly by the application–Idle connections can grow over a period of time and can exceed the connector limit or OS limit–Happens faster with large number of devices or with devices that create new connections frequently checktimeout-1Set it to 30000 msec or higher to tell the connector to check for connections closed by peer proactivelyand close them on the connector side as welltcpmaxsockets1000Increase it higher as required to accommodate simultaneous connections from a large number of devices Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningEvent QueuingRaw events received over the network are written to a file queue consisting of a certain number of files of fixed size Bottleneck–With high event volumes, file queue can build up faster leading to significant delays–When file queue becomes full, connector starts dropping events Tuning–Enable syslog parser multithreading (may need to follow up with memory increase if required)–Increase the file queue ltithreading.enabledfalseSet it to ‘true’ to enable multithreadingsyslog.parser.threadcount-1Set it to a specific number on a single processor machine. You can do the same on amultiprocessor machine or leave it for connector to decide based on the number of processorssyslog.parser.threadsperprocessor1Takes effect only when the threadcount is set to -1. Leave it at 1 or increase it as required. Totalnumber of threads number of processors * e this parameter to increase the number of files in the file queuefilequeuemaxfilesize100000Specified in bytes. Increase this parameter to increase the size of each file in the file queue9 Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningEvent Parsing Inspection and Device Type Detection–Multiple subagents with one subagent per device type with a parser that has a regex to match something unique in the log–Subagent parsers are ordered such that specific regexes come ahead of generic ones to detect device types accurately–Connector inspects messages from senders applying regexes in the order to detect the device type and associates the subagent with the senderwhen a match is found. A single sender could be associated with multiple device types and subagents–Associated subagent parsers are used to parse messages from a sender and inspection process is not reapplied unless a message from a new devicetype is encountered from the same sender–Syslog senders and their associated subagent types can be seen in current/user/agent/syslog.properties Bottleneck–Inspection process involving regex matching could be expensive because connector has more than 100 subagents Tuning–10If you are sure of device types in your environment, you can restrict the subagent list by following ubagentlistfalseSet it to true to make the connector consider the customized subagent listcustomsubagentlistList of subagents ( 100)Set it to the restricted subagent list based on device types in your environment. Preservethe original relative order of subagents not to affect the accuracy of subagent detection Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningEvent Parsing - continued Regular expressions in parsers–BottleneckA badly written regular expression in the parser can be a big performance hit on the connector–OptimizationFor supported device types, development went through optimizing the regular expressions in the respective parsers. If you are authoring your ownsyslog flex connector parsers, consider the following guidelines. Make your regexes generic only as much as needed. Specific regular expressions perform better than generic ones Use generic greedy expressions like .* and . at the end and not in the beginning or middle of a regular expression. Replace them with non-greedyequivalents like .*? and . ? with a clear character or token marking the boundary. Use of greedy expressions with more specific characters or meta characters is okay, ex:- \s for a continuous string of whitespace characters or\d for a continuous string of numerals or \w for a continuous string alpha numerals Maximum number of devices–BottleneckConnector allows up to a max of 5000 devices and does not process events from newer devices once this limit is og.max.device.count5000Increase it as required to match the number of devices in your environment11 Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningEvent Processing Agent Batching–Batch size controls how many events go together from component to component in the event flow and eventually to the destination–Doubling or tripling default size of 100 could help improve the performance internally as well as over networks with latency–Do not increase beyond that because it could have a negative impact by increasing memory requirements to hold the batches Categorization–Categorization files for different device types are loaded into memory and some of those can be big–Connector base memory usage can be high when dealing with a large number of device types–Java heap space may need to be bumped up External Map File Processing–External map file query is executed for every batch of events–Make sure the query is simple and returns fast, if you are using this feature Connector Filtering–12Make sure that the filter condition is optimized and not extremely complex Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningEvent Processing - continued Field Based Aggregation–Groups events with same values in specified fields into buckets and produces aggregated events on time interval expiry or reaching event threshold–Restrict the field set to minimum required and choose an optimal event threshold value to keep the number of event field comparisons low–Choose an optimal time interval not to block the event flow for too long–Avoid using ‘preserve common fields’ setting in a high event volume environment Name Resolution13–Name resolutions are done in background threads and the event flow is not normally blocked for the answers to come back–If the ‘Wait For Name Resolution’ feature is enabled, then the event flow is blocked for a certain timeout period for the answers to come back–Do not enable ‘Wait For Name Resolution’ feature in an environment requiring frequent resolutions Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningEvent TransportEvent caching can occur for a number of reasons - network latency and problems in the destination are the common reasons Bottleneck–Excessive caching can cause delays in events reaching their destination–When cache becomes full, connector starts dropping events Tuning14–Enable transport multithreading (except when the root cause is a problem in the destination)–For the logger smart message transport, turn on the https persistent connection feature–Increase the cache size to hold events for longer in the cache and prevent loss of .threadcount1Applies only to the ESM transport. Increase it by small increments as loggersecure.connection.persistentfalseApplies only to the logger secure transport. Change it to true for reusing the existingHTTPS connections and not tear them down for every batch of eventsCache Size1GBIncrease it as required up to a limit of 50GB. This is a destination setting which can beconfigured using ESM console, connector appliance GUI or local connector setup wizard.Applies only to the logger secure transport. Increase it by small increments as required. Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector PerformanceTuningOut of Memory Problems and Tuning Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningJava Process Memory and ManagementMemory allocated to a java process consists of Heap Space and Native Memory Heap space is allocated as instructed by java run time parameters–-Xms (Initial heap size), -Xmx (Maximum heap size), 256 MB by default on connectors Native memory size Process Memory size – Size of Heap Space Garbage Collection reclaims the memory of unused objects–Minor collections (GC), reclaims memory in YOUNG generation and moves survivors into OLD–Major collections (Full GC), reclaims memory in all of the Heap space, takes much longer–JVM stops the application threads during GC or Full GC–Frequent Full GCs affects application performance severely A clear indicator for the need to increase the maximum heap sizeProcess MemoryYOUNG GenerationNewly created objectsOLD GenerationOld objects surviving minor GCsPERMANENT GenerationClasses, methods, etcCode Generation Out of memory errors can happen in any of these memory areasSocket Buffers Memory limitations in 32 bit connector buildThread Stacks–Total addressable space is 4GB, Kernel space ranges from 1GB to 2GB depending on OS–User space available for process is 2GB to 3GB depending on OS–Limits exist on max heap space: 1GB (connector appliances), 1.5 GB (Windows), 2 GB (Unix) Use 64 bit connector build for higher memory16 Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.HeapSpaceDirect Memory SpaceJNI CodeGarbage CollectionJNI Allocated memoryNativeMemory

Syslog Connector Performance TuningDealing with Java out-of-memory errorsErrorsjava.lang.OutOfMemoryError:Java heap spacejava.lang.OutOfMemoryError:Requested array size exceedsVM limitjava.lang.OutOfMemoryError:PermGen spacejava.lang.OutOfMemoryError:Unable to create a new nativethreadOut of Memory Error(allocation.cpp:211),pid 16950, tid 185514280017Root Cause and RecommendationGarbage collection is unable to free up more space and memory could not be allocated for new objects Increase the maximum heap size using -Xmx option in increments as required up to the limit If this still does not help, there could be a potential memory leak or a bug – open a support incident supplying thelogs and heap dumpsPermanent generation area has become full due to loading many classes statically or creating dynamic classes orcreating too many interned strings Default max size of PermGen space is 64 MB. Increase it in small increments using -XX:MaxPermSize optionJVM is low on native memory and unable to create a new VM thread. Make more native memory available by Reducing the heap space using –Xms and –Xmx options Reducing the stack space of using –Xss optionDisplayed in the fatal error logs when the JVM crashes due to a malloc failure. The system could be out of physicalRAM or swap space or the process size limit was hit on a 32 bit system. Take one or more of the following actions Reduce memory load on the system or increase physical memory or swap space Decrease the number of application threads, reduce the java heap space and stack space Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningAdjusting memory optionsOn a software connector, add or edit settings in a file under current/user/agent folder agent.wrapper.conf when running as a service setmem.sh (Unix) or setmem.bat (Windows) when running as a standalone application. This file may have to be created if it does not already exist.set ARCSIGHT MEMORY OPTIONS "-Xms256m -Xmx256m“ (Example only. Add or remove options as required inside the double quotes)export ARCSIGHT MEMORY OPTIONS (only on Unix)On a connector appliance Only heap space can be changed using a container command ‘Configure Memory Settings’ Other settings can be changed using SSH or diagnostic tools file editor using the same mechanisms as for a software connectorMemory TypeHeap SpacePerm Gen SpaceStack space18Running as serviceRunning as a standalone applicationwrapper.java.initmemory 256 (initial heap size)wrapper.java.maxmemory 256 (maximum heap size)-Xms256m –Xmx1024m”It is recommended to increase only the max heap sizeAdd additional java parameters with adjusted indexeswrapper.java.additional.7 -XX:PermSize 64mwrapper.java.additional.8 -XX:MaxPermSize 128m-XX:PermSize 64m -XX:MaxPermSize 128mIt is recommended to increase only the max perm sizeAdd an additional java parameter with adjusted indexwrapper.java.additional.9 -Xss 64k-Xss 64k Default stack size is OS dependent. Adjust andobserve. Too low a value can cause StackOverflowError Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector PerformanceTuningCustomer Cases Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector Performance TuningTroubleshooting Changing the transport protocol to UDP or Raw TCP did not help Could not reproduce the problem in house Customer captured tcpdump packets and analyzed them using Wireshark–Large number of “TCP Window Full” messages–SEQ/ACK analysis showed that at times there is more than 10KB data in flight indicating that thereceiver is too slow to process the incoming flood of packets–TCP receive buffer and window sizes got reduced over time which contributed to the slow reception–Further enquiries revealed that the Syslog NG connector is receiving TLS data from 2 other sources–With this new discovery of customer environment, problem could also be reproduced in house–Observed a high memory usage and Increased the heap space to1024 MB, but it did not helpRoot Cause Destination Syslog NG connector did not close TCP connections when sources closed connections Growing TCP connections forces receive buffer size to be reduced causing slower receptionSolution Set the ‘tcppeerclosedchecktimeout’ parameter to 30000 msec (half a minute) This parameter tells the connector to proactively check and close any TCP sockets20 Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.Customer Case 1- ProblemCEF Syslog TLS destination was caching at only at200eps, while ESM and Logger destinations did notcache for the same event rateESMSourceConnectorSyslog NGSource 1Syslog NGSource 2LoggerCEFSyslogTLSTLSSyslog NGConnector

Syslog Connector Performance TuningObservations Incoming event rate was much higher than the processing rate and connector was queuing heavily During peak hours, queuing has exceeded the size limit and dropped a huge number of events Caching observed during peaks hours and some events were dropped when cache size limit is exceededCustomer Case 2- ProblemHuge difference of event counts found betweenFortigate Firewall and Logger via Syslog connector High memory usage and frequent Full GCs were observed affecting the performance of the eue Rate(SLC) vs Events/Sec(SLC)Queue Drop CountMemory usage (Total vs pper.log:INFOEvents/Sec(SLC) vs Throughput(SLC)21Cache Size and Current Drop count jvm 1 jvm 1 jvm 1 jvm 1 jvm 1 jvm 1 jvm 1 jvm 1 2012/12/05 11:35:29 [Full GC 2012/12/05 11:37:08 [Full GC 2012/12/05 11:38:52 [Full GC 2012/12/05 11:40:30 [Full GC 2012/12/05 11:42:06 [Full GC 2012/12/05 11:43:47 [Full GC 2012/12/05 11:45:32 [Full GC 2012/12/05 11:47:10 [Full GCFrequent Full GCs Copyrigh

For supported device types, development went through optimizing the regular expressions in the respective parsers. If you areauthoring your own syslog flex connector parsers, consider the following guidelines. Make your regexes generic only as much as needed. Spec