HP StoreOnce Catalyst And HP Data Protector 7 .

Transcription

HP StoreOnce Catalyst and HP Data Protector 7 –Implementation and Best Practice Guide – Release 2Executive SummaryThis guide is intended to enable the reader to understand the basic technology of HP StoreOnceCatalyst and to design a Data Protector solution. It is not intended to be a full guide to HP DataProtector 7 as there already exists extensive documentation on this software. However this guide willprovide the extra information concerning best practice for a StoreOnce B6200 implementation usingthe StoreOnce Catalyst technology.ContentsExecutive Summary1Introduction to HP StoreOnce catalyst1HP StoreOnce Catalyst – the basics2Introduction to the StoreOnce B6200 Architecture3Terminology StoreOnce B62006HP StoreOnce B6200 Catalyst Store specifications7HP StoreOnce Catalyst best practice – What information is required for planning?8HP StoreOnce Catalyst and Data Protector 7 integration11Terminology HP Data Protector12Data Protector Gateways14Creating a Data Protector 7 specification for backup to a StoreOnce Catalyst Store17Sizing the Backup Server34Reclaiming disk space from ‘expired’ backups39Using Data Protector 7 ‘Object Copy’ with StoreOnce Catalyst40Restoring Data from a StoreOnce Catalyst backup & replication45Recovering from total loss – ‘Disaster Recovery’ planning (DR)481

Networking best practice49Housekeeping51HP StoreOnce Catalyst Licensing52HP StoreOnce B6200 Autonomic Restart55Conclusion and key learning points56Introduction to HP StoreOnce CatalystHP StoreOnce Catalyst brings the HP StoreOnce vision of a single, integrated enterprise-widededuplication algorithm a step closer. It allows the seamless movement of deduplicated data acrossthe enterprise to other StoreOnce Catalyst systems without rehydration. This means that you canbenefit from:Simplified management of data movement from a single pane of glass: tighter integration withyour backup application to centrally manage file replication across the enterprise.Seamless control across complex environments: supporting a range of flexible configurationsthat enable the concurrent movement of data from one site to multiple sites, and the ability tocascade data around the enterprise (sometimes referred to as multi-hop).Enhance performance: distributed deduplication processing using StoreOnce Catalyst stores onthe B6200 and on multiple servers can optimize loading and utilization of backup hardware,network links and backup servers for faster deduplication and backup performance.Faster time to backup to meet shrinking backup windows: up to 100TB/hour *aggregatethroughput, 4x faster than backup to a NAS target*Actual performance is dependent upon configuration data set type, compression levels, number ofdata streams, number of devices emulated and number of concurrent tasks, such as housekeeping orreplication.HP StoreOnce Catalyst is currently available on the HP B6200 Backup System and also as a softwarecomponent of HP Data Protector 7. In addition to HP Data Protector 7, HP StoreOnce Catalyst is alsosupported by Symantec NetBackup 7.x and Backup Exec 12. The HP B6200 can support Catalyst Stores,Virtual Tape and NAS (CIFS/NFS) on the same system and so is ideal for customers who have legacyrequirements for VTL and NAS but wish to move to StoreOnce Catalyst.HP StoreOnce Catalyst does require a separate license. VTL/NAS emulations do not require licensesexcept if they are used as replication targets devices. If VTL/NAS replication is used in addition toStoreOnce Catalyst then both licenses are required.2

HP StoreOnce Catalyst - the basicsHP StoreOnce Catalyst is a new type of storage and is more closely integrated with the data protectionsoftware. In the case of Data Protector 7 the application programming interface is embedded within theData Protector media agent (Fig 1). Data transfers and commands are transferred by standard IPconnection and the HP B6200 offers both 1GbE and 10Gbe connections. (10GbE recommended forperformance). HP StoreOnce catalyst offers advanced features such as deduplication at the backupserver and movement of backups between systems under the command of the HP Data Protector. HPStoreOnce catalyst also very importantly allows the data Protector 7 software to release disk spaceoccupied by ‘expired’ backups. This feature is not available in virtual tape. Normally customers developa scheme where backups are kept for varying periods of time. For example: A full backup is made onsay a weekly basis with incremental backup performed every day. The incremental backups are expiredwhen the next full backup is taken as they are no longer required. The weekly full backups could bekept for 4 weeks and then a monthly full backup is created. The weekly backups can then be expiredand so on. This really is customer dependent and varies according to the data. HP StoreOnce Catalysthas the additional advantage that backups can then be moved offsite to another catalyst store allunder control of the software. The data is moved without rehydration i.e. only new data ‘chunks’ aremoved between stores. It is possible to move to multiple stores. Data duplication uses the HP DataProtector ‘object copy’ functionality and can replicate data to multiple HP StoreOnce Catalyst stores.Figure 1 shows the data paths between the B6200 and the backup server equipped with Data ProtectorSoftware (media agent). The B6200 is shown as a 2 node/single couplet system. Only catalyst storesare shown but VTL and NAS can co-exist. The network connection is shown as a WAN or LAN because HpStoreOnce Catalyst Protocol is designed from the outset to accommodate possible latency differencesbetween a local network and a wide area network. Only a Data Protector media server is shown andcould either backup data which is contained on directly attached disks or from clients which only have amedia agent. The networking will be covered later in a separate section. The HP StoreOnce Catalyst APIis embedded in the Data Protector media agent.3

Fig.1: StoreOnce B6200 data paths to Data Protector backup serverKey Points: HP StoreOnce Catalyst is a unique interface and is fundamentally different from virtualtape or NAS.Optional deduplication at the backup server enables greater performance and reducedbandwidth requirements. This can be controlled at backup session/job level.Enables advanced features such as duplication of backups between appliances in a networkefficient manner under control of the backup application.HP StoreOnce catalyst protocol runs on a standard IP network.Enables space occupied by ‘expired’ backups to be returned for re-use.Enables asymmetric expiry of dataEnables store creation if required from within Data Protector.Backup jobs can restart automatically if the B6200 has a node failover condition. (Requiresa restart script).Scalable licensing – pay as you growIntroduction to the StoreOnce HP B6200 Backup System ArchitectureThe B6200 Backup System is designed for high availability and consists of up to 8 ‘nodes’ arranged inpairs known as ‘couplets’. Each node consists of a HP Proliant server with dual hex core Intel XeonCPUs. These are linked with disk arrays for the data storage. The ‘master disk’ unit (one per node)contains dual 6Gbps SAS interfaces and each array is connected to the ‘partner’ node in the ‘couplet’.This means that in the case of node failure the other node can access all the data stores. A basicarchitecture layout is shown in Fig 2. There is an internal 10GbE network for internal systemmanagement complete with dual ProCurve network switches. There is a 1GbE network which managesthe network switches and connects to the iLO port on each server. The iLO port is used to shut downnodes in the case of malfunction. There is no shared storage between couplets. Each node has separatedual 8Gb fibre channel connections, dual 10GbE ports and dual 1GbE ports for connection to thecustomer’s SAN and network infrastructure. All network connections are ‘bonded’ and are accessed bya ‘virtual’ IP address or VIF.Each node has a service set which consists of the software modules which run the virtual tape, NAS andStoreOnce Catalyst deduplication devices. In failover the whole service set runs on the partner node inaddition to its own service set. The shared storage is accessed by the additional 6Gbps SAS connection.(Disk controllers have dual 6Gbps SAS interfaces and the master disk unit has 2 controllers forresilience. The virtual IP addresses of the service sets remain the same so no reconfiguration isnecessary. The management console will always be active on one node. If that node fails another nodewill activate its management console to take over. The management console network connection willbe maintained throughout failover.4

There is an interruption of service during failover and arrangements have to be made to restart backupjobs. This procedure varies according to ISV applications. As StoreOnce Catalyst protocol had regular‘checkpoints’ backups can normally resume from the last checkpoint. This feature is known as‚Autonomic Restart‛.Power supplies to theB6200 are all dual (n 1) and it is highly recommended to arrange for dual mainspower supply.Each node has a usable capacity of 64TB. If this is exceeded a node can use storage from its partnernode in the cluster. However performance is compromised if this overflow mode is used so bestpractice is to avoid using it. And of course there may not be space available on the partner node.Fig 2. B6200 ArchitectureKey Points: 5Maximum configuration is 64TB per node up to a maximum of 8 nodes.Nodes operate in pairs known as ‘couplets’.Autonomic restart operates at a couplet level.There is NO user access to the internal networking or iLO ports.User networks connect to the node network ports.

The ProCurve switches are for the internal network only.Customer data and management access is directly to a node.All network connections are ‘bonded’. No special switch configuration is required.1GbE and 10GbE network connections available for user management and/or data.HP StoreOnce Catalyst uses the network connections only.Fibre Channel is for Virtual Tape emulation only.Dual mains supplies are required and there are 4 power connectors per rack.Use the HP B6200 planning guide to select the correct power connection and to plan thenetworking.10GbE is essential for large configurations which require maximum performance. 10GbEsupports copper or fibre connections (10GbE SFPs are NOT supplied and need to be orderedseparately)Terminology – HP StoreOnce B6200Terminology is important to understand and can be confusing. This section will cover the terms used inthe HP B6200.Term:Details:ClusterCoupletGenerally used for the whole HP B6300 applianceConsists of 2 interconnected HP Proliant servers each with disk storage. Failoveroccurs within a couplet.NodeManagementconsoleBondednetwork portsHP Proliant server hardware with SAS attached disk arraysSoftware residing on all nodes managing the cluster and monitoring fornode failure. Accessed via virtual ip address. Based on Ibrix fusion manager.A method of combining network ports for either resilience of performance.The ports have each a physical IP address but a common virtual IP address.The MAC address appears the same. Sometimes this is known as linkaggregation or in Windows ‘teaming’.Virtual Interface. VIF has an IP address for access to the management GUI.HP term for the group of software modules which provide the Virtual Tape,NAS or StoreOnce Catalyst storage devices.The process in which a node is shut down (or fails) and the softwaremodules all move over to the partner node. Failover only can occur within a‘couplet’.Manual process which is the reverse of failover. Used following a failover.The unit of data in which the HP StoreOnce deduplication process divided upa data stream. The average chunk size is 4KB.This refers to a StoreOnce Catalyst Backup where deduplication takes placeat the client – in the case of data protector the media agent. After the firstbackup only new data is sent over the network to the B6200 catalyst store.VIFService up6

HighbandwidthbackupThis refers to a StoreOnce Catalyst Backup where there is no deduplicationat the client (data protector media agent).HP B6200 Catalyst Store Configuration SpecificationsPer node***Per CoupletMaximum number of4896catalyst stores*Number of ‘clients’no limitno limitsupported per store*Maximum number of192384inboundbackup/restore/ copyjobs**.Maximum number of4896outbound copy jobs*This maximum includes and NAS shares or VTLs configured.Per Cluster (max config)384no limit1536384** If Data Protector backups are using multiple streams each stream counts as a job.*** More accurately per service set as node in failover will run 2 service sets and could support 96catalyst stores. Of course performance is reduced in failover.HP StoreOnce Catalyst best practice – What information is required forplanning?In order to achieve success with HP StoreOnce Catalyst it is important to ask the correct questions atthe planning stage. There are many different data protection scenarios deployed by customers. Tounderstand some of the important variable it is best described in several sections. This paper will notgo in to depth but regard it as a starting point. However it will highlight the next steps and what toolsare available.Topology and geographical location of dataFirstly it is important to determine the number of servers which require data protection and where theyare located. The tendency in recent years has been to centralize servers in order to simplifymanagement and maintain much higher service levels. This has been made possible by increased WANspeeds. But some business models particularly in the retail sector still maintain servers at multiplesites for greater resilience but keeping central data warehouses. Any server located on a remote site7

still needs backup and the preference is not to require staff on that site to be involved. A table of sitesand the size of the local data storage is most useful as a starting point. Try and build a table similar tothe example below. For enterprise customer this will of course be much larger and there may bemultiple central locations.SitelocationMainsite nperiod6 monthsmain datacenterMainsite #2main datacenter10080TB1 weekremote #1remote #2remote #3remote #4remote #5remote 1 week1 week1 week1 week1 week2 weeksremote #7ROBO33TB2 weeksData typeWAN linksdatabase/filedatadatabase/filedatafile datafile datafile datafile datafile ps2Mbps2Mbps2Mbps2Mbps2Mbps2MbpsWorking from this data start to plan the solution. In the example above the customer has 2 datacenters and a number of remote sites. The WAN links may already be in place in which case the sizingwill dictate the backup window. It will also be necessary to determine how many server will have theirown media agent and backup directly to the HP StoreOnce B6200.The key decision is whether the local sites need to keep data onsite for faster recovery and data isstored locally or whether all backups are held in the main data centers. Before StoreOnce Catalyst itwas necessary to install a local D2D system and replicate back to the main data centre. This is still anoption with catalyst but now HP Data Protector 7 with catalyst support in software changes theoptions. Any server with a media agent loaded can perform a low-bandwidth backup over a WAN link.Only new data is transferred at each subsequent backup. There is no additional charge for DataProtector media agents. It is also possible to backup locally using software deduplication and objectcopy the data to the central data centers.Key Point: If fast local recovery is required then the last backup should be held locally as restoreare NOT ‘low-bandwidth’.Although supported the HP catalyst protocol for backup is best restricted to nationalWAN connections but replication has been designed for international WAN links withhigher latency values.Type of dataDeduplication performance varies with the type of data and how it is being stored. Data falls into 2general classes: structured and unstructured. Structured data would be database files which are8

application specific. Unstructured data is normally stored in a standard filesystem and can vary incontent. Some data such as files which have a degree of compression and encrypted data cause poordeduplication performance. Most common database applications are Microsoft SQL server and Oracle.Data Protector has agents for both these products. Although not a database as such MicrosoftExchange is structured data and has a dedicated Data protector ‘agent’. Note that the data type isunimportant to HP StoreOnce Catalyst technology but has performance implications and implicationsfor Data Protector Software.Key Points: Best practice is to keep similar data in the same catalyst store. E.g. dedicate a B6200 catalyst‘store’ to Oracle backups and a different store for SQL server.The number of Data Protector ‘client’ systems which need to be backed up per data protectorcell is dependent on the number of unique file names. Server with large complex file systemsput a greater load on the cell manager internal database. Guidelines are around 300 ‘clients’per cell manager. Multiple cell manager can be controlled by data Protector Manager ofManagers option (MoM)Incremental backup ideally should be in a separate store from full backups. (This is not alwayspossible in certain customer rotational schemes.)Virtual MachinesIt is likely that customer will have extensive virtual machines to back up. These normally achieve veryhigh deduplication ratios. Data Protector 7 is well equipped to back up virtual machines. Keep these in asimilar store.How the data is stored also has implications because later in this document we will discuss how to setup multiple streams for performance.Data Volumes, change rate and retention periods.These are all essential parameters in designing the best solution. The B6200 has a maximum practicalcapacity of 64TB per node. Do NOT exceed this limit in the design phase. The system can use storage ifavailable on its partner node but this could lead to loss of performance. The retention period isimportant because deduplication works best with retention of multiple backups. General best practiceis not to retain data on disk for longer than 12 months. Data Protector can easily migrate data to tapefor longer storage periods. Change rate is more difficult to estimate. The incremental backup sizes givesome idea but for accurate information a trial run with an actual D2D would be recommended.Customers typically increase their data volumes on an annual basis and it is recommended to estimatethis. Having accumulated this data then the HP Sizer tool will make the appropriate hardwarerecommendation.Replication over the WAN link can be sized by the Sizer Tool and either the replication time window fora fixed WAN speed can be given, or a link speed for a given replication ‘window’ requirement.In regards to sizing HP StoreOnce catalyst is very little different from VTL and NAS. The only additionalfeature is backup from clients directly to the catalyst store over the WAN.9

Mixed StoreOnce Catalyst, VTL and NAS environmentsIt is highly likely that customers will be using VTL and NAS at the same time as StoreOnce catalystunless this is a completely new project. This is perfectly possible as it causes minimum disruption. Thenumber of VTLs, NAS shares and catalyst stores per service set ( a node)is limited to 48. The differentdevices can exist in any combination. The replication and limits for VTL and NAS devices are separatefrom catalyst job limits.Key Points: For VTL and NAS replication it is necessary to purchase a replication license for

Backup jobs can restart automatically if the B6200 has a node failover condition. (Requires a restart script). Scalable licensing – pay as you grow Introduction to the StoreOnce HP B6200 Backup System Architecture The B6200 Backup S