How Analytics Can Help Backup Administrators

Transcription

HOW ANALYTICS CAN HELPBACKUP ADMINISTRATORSBalaji PanchanathanEMC - Avamar - Engineerbalaji.panchanathan@emc.com

Table of ContentsIntroduction . 3Data collection . 3Data Protection Advisor . 3Backup and Recovery Manager. 4Avamar . 6Enterprise Manager . 6MCGUI . 7Data Analysis . 7Range. 7Variation . 8Coefficient of Variation . 8Time series Analysis.11Regression Analysis .12Storage Usage trend .12Input data .12Regression output .13CPU usage Disk I/O .14Visualization .14Conclusion .15Reference .16Disclaimer: The views, processes, or methodologies published in this article are those of theauthor. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.2014 EMC Proven Professional Knowledge Sharing2

IntroductionThis article will focus on how analytics can make solve problems and make a backupadministrator’s job easier and more fruitful.Backup administrators typically face a couple of problems.1. Backup failures2. Ever increasing datasets and need for increasing backup windowTheir jobs will become more fruitful if they:1. Improve backup efficiency, robustness2. Improve the reliability3. Periodically report to the management which types of systems/database are backedup. (This will help management determine percentage usage of each departmentand whether the appropriate things are backed up)Backup administrator’s life will be made easier by doing an analytics project, which usuallythree stages.1. Data Collection2. Data Analysis3. Data ReportingData collectionCustomers using EMC Avamar backup products can collect backup data collected from:1. Data Protection Advisor (DPA)2. Backup and Recovery Manager (BRM)3. Enterprise Manager (EM) in Avamar4. MCGUI – Avamar Administrator GUIData Protection AdvisorData Protection Advisor monitors, analyzes, and reports the backup environment, can managemultiple backup products, and list the details in the data set.2014 EMC Proven Professional Knowledge Sharing3

DPA features which will help in improving the backup administrator’s life include:1. Backup reports – How many clients are backed up, backup failures across client, etc.2. Capacity planning – capacity reports3. Utilization – CPU/Memory. This report will help find bottlenecks in case of problems. Theprevious section mentioned that if there are backup failures or the backup speed is lowduring the particular time, then the CPU/memory utilization at that time can be checkedusing Data Protection Advisor.Backup and Recovery ManagerBackup and Recovery Manager (BRM) can be considered a miniature version of DataProtection Advisor. DPA can monitor backup environments, storage devices, and also backupsfrom different vendors, whereas BRM can monitor EMC backup devices; Avamar , NetWorker ,and Data Domain .The BRM tool can be used to forecast capacity usage. In BRM under Reports tabs, underSystem Summary report.The reports section has options to run the backup summary report from which analysis can bedone (explained in the data analysis section of this article). The backup report has details aboutthe time zone, duration, dataset, and domain which can be used to perform further analysisrelated to max, range, variance, etc, explained in the data analysis section of the article. Belowis a snapshot of the backup summary report exported in Excel.sysclientsystemgroupstastartTimendTimdu is.br1.BRSribis.brmp -2014-atio2014 EMC Proven Professional Knowledge SharingData401//?/MO8004

lted01-117.BR401Avar85500028 00SVBLR4:34.179:06.031.COM0-08:006-08:002014 EMC Proven Professional Knowledge Sharing14000005

A snapshot of where the capacity forecast can be done is shown below.AvamarEnterprise ManagerEnterprise Manager can be used to manage multiple Avamar servers and provide capacityforecast reports. Below is a snapshot of one of the UI windows where reports can be exported.2014 EMC Proven Professional Knowledge Sharing6

MCGUIMCGUI is an administrative tool provided by Avamar for managing the backup environment. InMCGUI, capacity reports can be run to check when the capacity will be reached. Below is thesnapshot from where you can run the capacity report.Similarly in Data Domain, if AutoSupport feature is enabled, a report can be sent to a centralserver where regression analysis will be done to predict when capacity will be reached.Data AnalysisIn this section we will see how the data collected in the previous section can be put to use. Wewill start with simple analytics functions and gradually move on to complex analytical tools andhow they can be used to solve problems faced by backup administrators.RangeSuppose that we measure the backup throughput of different backups taken and check themaximum and minimum throughput. If the range is very limited, i.e. minimum is 100Mb/hr andmaximum is 105Mb/hr, the scope of analysis will be limited. In other words, the benefit of doingthe analysis will be less. If the range is very wide, i.e. minimum is 10Mb/hr and 1Gb/hr, the2014 EMC Proven Professional Knowledge Sharing7

range is very high and it makes sense to analyze further. The next part of analysis will start withmeasuring variation.VariationThe range could be very high if just one of the backups took a long time to take or one of thebackups took a very short time to complete. Hence, calculating the variation will give moreinformation variability of the backup throughput at various times. If the variance is high, furtheranalysis needs to be done to find which factors cause the variance to be high. If we export thereport to Excel, variance can be calculated easily. Functions in the Excel sheet is displayedbelow.Coefficient of VariationJust as looking at variation might be misleading, the best approach to find the coefficient ofvariation is standard deviation/mean. The variation or standard deviation might be misleadingbecause depending on the unit of measurement or the range of values, the variation might givea wrong picture. For example, if the unit of backup speed is in Kbps and values are in a range of1000Kbps-1500Kbps, then standard deviation can be in the range of 400. If the unit is in Mbpsand value is in the range of 1Mbps-1.5Mbps, then the standard deviation will be in the range of0.5. Clearly, we cannot come to a conclusion directly from the value. However, a conclusion canbe easily reached from the value of co-efficient of variation. The snapshot below of an Excelsheet with both standard deviation and coefficient of variation make it clear why co-efficient ofvariation is a better measure.VariationStandard DeviationCoeeficient of 73642.171.470.30As shown, even though the variation in column 1 is low, the standard deviation is highcompared to column 2, due to its higher values. However, the coefficient of variation reflects thevariation properly. Thus, with coefficient of variation we could correctly conclude that thevariation is greater in column 2 than in column 1.2014 EMC Proven Professional Knowledge Sharing8

To measure the variation of a range of values the best thing is to measure the co-efficient ofvariation. The Excel commands that can be used to calculate these values are shown below. VAR(D4:D9) STDEV(D4:D9) STDEV(D4:D9)/AVERAGE(D4:D9)First, filter the backup speed of the various backups by different factors, i.e. client, time,geography, etc., then calculate the variance under each category to get more clues.Step 1 Calculate the average of the backup speed by categories such as client, time period,geography, etc.Step 2 Compare the averages among the clients for the backup speed and find the coefficientof variation of those averages. If the coefficient of variation is high, look for the outliers, i.e. forwhich client the speed is low. In a similar fashion, take the average backup speed for each timeperiod (different time periods such as 9AM – 10 AM, 10AM – 11AM, etc.) and find the coefficientof variation among these averages. If the coefficient of variation is high, conduct furtheranalysis. This type of variation calculation will be done for different categories; client, timeperiod, geography, etc.Step 3 The next step for each category where the coefficient of variation is high is to look ateach category where the average backup speed is very low and frame rules so that the averagebackup speed increases. For instance, first look at time period (9AM – 10 AM) to determine ifthe backup speed is slow during that particular time.Step 4 The next step is a repeat of step 2. That is, in that time period, take all the backups andsee the coefficient of variation. If it is very low, backups from this time period for all clients willbe moved to another time period where the backup speeds are high. This will be one rule.The second rule will be to perform the steps below if the coefficient of variation is high. For each client, check the backup speed For clients with lower backup speed, check whether the backup speed is betterfor the same client in another time period2014 EMC Proven Professional Knowledge Sharing9

If it is better, frame a rule such that the backups for that client are triggered onlyduring the second time period where it was found that the backup speeds arebetterStep 5 After framing a set of rules from Step 4, the Avamar server backup scheduler willschedule the backups using those rules and monitor the backup speeds for a period of time(configurable).Step 6 After monitoring it will again go to Step 1 and continue. The ideal is to have very lowcoefficient of variation across all categories.Next, we will look at some of the simple analytic methods that can be used to analyze thebackup errors. Sort the backup errors by error/codes. Then look at the error codes which contribute toerrors most and start analyzing those backup failures. . After the first step, determine whether the majority of backup errors occur for a particularclient, time zone, geography, etc. This type of analysis by time zone, etc. can be doneeasily if we export the data to an Excel spreadsheet.2014 EMC Proven Professional Knowledge Sharing10

Flow chartAnalyse the data using stat functions,i.e range/standard deviationPick the outliers (where standard deviation is greater)Derive hypothesis from the outliers based ontime/client/domain, etc.Test the hypothesisTime series AnalysisThis type of analysis will help predict the backup speeds going forward and dataset growthwhich will help guide backup administrators for their planning purposes. The time seriesanalysis also will help in predict when storage capacity will be exhaustedThe time series can be done using Excel. First, we will focus on the capacity management.To predict backup speed over a period of time, find the trend of average backup speed over aperiod of time. Some of the trend could be decreasing linearly or exponentially. If it is goingdown, further analysis can be done whether the backup speed has gone down for allclients/time period or a particular set of time period/client. Based on that, appropriate action canbe taken.2014 EMC Proven Professional Knowledge Sharing11

Regression AnalysisRegression analysis is used to find the factors on which the result depends. There will beseveral independent variables and one dependent variable. In our case, the dependent variableis the result and the factors are independent variables. The results for the backup administratorcould be Backup speed for a client/domain, etc. Backup failures time period Storage usage trend CPU usage or Average disk I/OIn this section, we will look at results for backup failures, storage usage trend, and CPUusage/disk I/OStorage Usage trendStorage usage depends on: Number of clients Retention policy Time for which the system is upAn equation can be framed like the one below.Y (storage usage trend) a b*no of clients c*time for which system is up. Excel containsfunctions to perform this analysisA sample analysis is shown below:Input dataTime indaysNo ofclients12345678910Capacity 11.511.611.711.8122014 EMC Proven Professional Knowledge Sharing12

Regression outputSUMMARYOUTPUTRegression Statistics0.9917658Multiple R940.9835995R Square88Adjusted 4827252Standard Error0.5306610.0251180.004855InterceptX Variable 1X Variable 2MS1.9199860.009147F209.9093t 018481Significance .003347Upper95.0%9.6332170.2030850.026308In the above output, first look at the value R Square and if value is only greater than 0.8, theregression model is correct. In other words, the prediction error is less.The equation would be capacity required 8.3 0.14 * no of days 0.014 * no of clients.Now you can predict the storage required if you predict that the number of clients will be 200 bythe end of 50 days.The storage required according to the above equation would be 8.3 0.14 * 50 0.014*200 18.1GB. Thus, the backup administrator would be able to determine when the capacity might beexhausted and plan accordingly.2014 EMC Proven Professional Knowledge Sharing13

CPU usage Disk I/OThe factors on which CPU usage and disk I/) might depend on1. Backed up data per day2. Number of clientsThe equation would be CPU usage a b*backed up data per day (in GB) c*number ofclients.If we follow the steps in the section above under storage usage trend, the administrator will beable to predict the CPU usage or disk I/O usage over a period of time.This data will be helpful in the below scenarios1. Backup speed is decreasing over a period of time2. Backup failures are increasingIf we see the above failures and if disk or CPU usage is very high or has increased dramatically,that would have caused these failures. Corrective steps can be taken, i.e. adding capacity(adding more disks will lower the disk I/O and most likely increase the backup speed)VisualizationThere are a number of tools which can be used to visualize the data we have. One such populartool is Tableau. Using Tableau software, one can connect to different databases, transfer datafrom Excel spreadsheets and then do visualization. The screenshots below lists some of whatcan be achieved using Tableau.Tableau enables graphs to be seamlessly printed from Excel spreadsheet or any database.Some other features of the tableau software are options to forecast, calculate the variation, andstandard deviation.2014 EMC Proven Professional Knowledge Sharing14

ConclusionWith the set of data analysis shown above, backup administrators can perform the followingactivities in a better way.1. Discover why backups are failing and take correcting actions2. Forecast capacity and budgeting3. Report the backup data used by department or by domainFindings and benefits accrued because of these activities can be represented to management ina visual format with tools such as Tableau.2014 EMC Proven Professional Knowledge Sharing15

rce uploads/downloads/113318765X ech/full -profiles/h8692-cp-emc-it-dpa.pdf2014 EMC Proven Professional Knowledge Sharing16

EMC believes the information in this publication is accurate as of its publication date. Theinformation is subject to change without notice.THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATIONMAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TOTHE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIEDWARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.Use, copying, and distribution of any EMC software described in this publication requires anapplicable software license.2014 EMC Proven Professional Knowledge Sharing17

Backup administrator's life will be made easier by doing an analytics project, which usually three stages. 1. Data Collection 2. Data Analysis 3. Data Reporting Data collection Customers using EMC Avamar backup products can collect backup data collected from: 1. Data Protection Advisor (DPA) 2. Backup and Recovery Manager (BRM) 3.