Hitachi Content Platform V8.0 With Apache Hadoop V2.8.1 .

Transcription

Hitachi Content Platform v8.0 withApache Hadoop v2.8.1 Support ofAmazon Simple Storage Service vS3aLab Validation ReportBy Federick BrillantesFebruary 2018

FeedbackHitachi Vantara welcomes your feedback. Please share your thoughts by sending an email message toSolutionLab@HitachiVantara.com. To assist the routing of this message, use the paper number in the subject and the titleof this white paper in the text.Revision HistoryRevisionChangesDateSL-030-00Initial releaseFebruary 9, 2018

Table of ContentsProduct Features2Hitachi Content Platform2Apache Hadoop2Test Environment Configuration3Hardware Components4Software Components4Test Methodology5Analysis5Apache Hadoop S3A Configuration5Apache Hadoop S3A mapping of Hitachi Content Platform using Hitachi API for Amazon S35Apache Hadoop S3A Write Operation to Hitachi Content Platform5Apache Hadoop S3A Read or Retrieval Operation from Hitachi Content Platform5Apache Hadoop S3A Delete Operation from Hitachi Content Platform6Test Results6Apache Hadoop S3A Configuration6Apache Hadoop S3A Mapping of Hitachi Content Platform with Hitachi API for Amazon S37Apache Hadoop S3A Write Operation to Hitachi Content Platform7Apache Hadoop S3A Read Operation from Hitachi Content Platform7Apache Hadoop Delete Operation to Hitachi Content Platform8

1Hitachi Content Platform v8.0 with Apache Hadoop v2.8.1Support of Amazon Simple Storage Service vS3aLab Validation ReportThis lab validation report provides integration evaluation of Apache Hadoop version 2.8.1 support of Amazon SimpleStorage Service version S3a when used with Hitachi Content Platform v8.0 (HCP) using Hitachi API for Amazon SimpleStorage Service to provide software defined storageThis guide is written for IT professionals who are one or more of the following with cloud storage responsibilities: Hitachi Content Platform administrators Storage administrators and implementers Those who implement and Apache HadoopIt is expected that you have a basic knowledge of SAN concepts, compute, and networking, and have working experiencein managing and administering Apache Hadoop.Testing showed that the integration of Apache Hadoop v2.8.1 – S3A with Hitachi API for Amazon S3 works without anyissues encountered to provide file storage and secure file sharing.Note — Testing of this configuration was in a lab environment. Many things affect production environments beyondprediction or duplication in a lab environment. Follow the recommended practice of conducting proof-of-concept testingfor acceptable results in a non-production, isolated test environment that otherwise matches your productionenvironment before your production implementation of this solution.1

2Product FeaturesThis solution was tested using these products.Hitachi Content PlatformHitachi Content Platform is a distributed object store that provides advanced storage and data management capabilities.This helps you address challenges posed by ever-growing volumes of unstructured data. Divide a single Content Platforminto multiple virtual object stores, secure access to each store, and uniquely configure each store for a particular workload.Eliminate storage silos using Content Platform with a single object storage infrastructure that supports a wide range of datatypes, applications, and users with different service level needs in enterprise and cloud environments.Content Platform is designed to enable archiving of fixed content in a manner that does the following: Ensures content integrity, authenticity, security, completeness and accessibility over the long term, in accordance withrelevant laws and regulations Offers fast, online access to content Allows integrated searching and indexing of the archive, including search of file contents Supports business continuity, data recovery, compliance search and retention needsContent Platform scales horizontally to support multiple applications and content types. It scales vertically to supportcontinued data growth.Apache HadoopThe Apache Hadoop project develops open-source software for reliable, distributed computing.The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets acrossclusters of computers using simple programming models. It is designed to scale up from single servers to thousands ofmachines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the libraryitself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of acluster of computers, each of which may be prone to failures.The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It hasmany similarities with existing distributed file systems. However, the differences from other distributed file systems aresignificant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides highthroughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIXrequirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the ApacheNutch web search engine project. HDFS is now an Apache Hadoop subproject.The Apache Hadoop project includes these modules: Hadoop Common — The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS) — A distributed file system that provides high-throughput access toapplication data. Hadoop YARN — This is a framework for job scheduling and cluster resource management. Hadoop MapReduce — This is a YARN-based system for parallel processing of large data sets.2

3Test Environment ConfigurationFigure 1 shows the test environment for Apache Hadoop v2.8.1 S3A with Hitachi Content Platform v8.0.Figure 13

4Hardware ComponentsThis is environment used to test Apache Hadoop v2.8.1 with Hitachi Content Platform v8.0: Rack optimized server for solutions, 2U one node, to host VMware ESX 6.0 Server 300 GB HDDIntel 2 4-core E5620 processor, 2.4 GHz4 GB RAM40 GB diskRuns Microsoft Windows clients as virtual machines4-core Intel Xeon E5-2680 v3 processor @2.5GHz 12GB RAMBrocade VDX 6740 switch (1 unit) for front-end network 12 GB RAMRack optimized server for solutions, 2U one node to host Hitachi Content Platform v8.0 as a single node virtualmachine Intel Xeon E5-2680 v3 processors @2.50GHz (dual-core), 2.4GHzRack optimized server for solutions, 2U one node to host VMware vCenter clients 256 GB RAMRack optimized server for solutions, 2U one node to host Apache Hadoop v2.8.1 as a virtual machine: Intel Xeon E5-2680 v3 processors @ 2.50 GHz, 2 sockets, 12 cores per socket, 24 CPUs 2.494 GHz24-Port1 GbEBrocade VDX 6740 switch (1 unit) for back-end network 24-Port1 GbESoftware Components Hitachi Content Platform v8.0 software Apache Hadoop v2.8.1 VMware ESX Server 6.0 VMware vCenter Client 6.0 Microsoft Windows Server 2012 R2 Standard (64Bit) Runs on VMware virtual machine clientUbuntu Linux v17.044

5Test MethodologyThe goal of this testing was to perform compatibility testing of the Apache Hadoop S3A client with Hitachi Content Platformto ensure the following: Data can be written from Hadoop S3A to Content Platform Data can be read or retrieved from Content Platform using Hadoop S3A.The tested environment had the following: Three virtual machines An Apache Hadoop server An Apache Hadoop S3A clientIntegration testing covered the following:1.Apache Hadoop S3A configuration Install JavaConfigure Hadoop configuration files2.Apache Hadoop S3A mapping of Hitachi Content Platform using Hitachi API for Amazon Simple Storage Service3.Apache Hadoop S3A write operation to Content Platform4.Apache Hadoop S3A read or retrieval operation from Content Platform5.Apache Hadoop S3A delete operation from Content PlatformThe following was not evaluated: Apache Hadoop S3 and S3N direct integration with Content Platform using Hitachi API for Amazon Simple StorageService Apache Hadoop S3A multi-node cluster integrationAnalysisThis analysis includes observations and summarizes results of integration testing.Apache Hadoop S3A ConfigurationConfiguration of Apache Hadoop S3A (single node cluster) found no issues. For details, see Test Results.Apache Hadoop S3A mapping of Hitachi Content Platform using Hitachi API for Amazon S3Integration of Apache Hadoop S3A with Hitachi Content Platform v8.0 using Hitachi API for Amazon S3 worked, finding noissues. For details, see Test Results.Apache Hadoop S3A Write Operation to Hitachi Content PlatformIntegration test of Apache Hadoop S3A on write operation to Hitachi Content Platform v8.0 using Hitachi API for AmazonS3 worked, finding no issues. For details, see Test Results.Apache Hadoop S3A Read or Retrieval Operation from Hitachi Content PlatformIntegration test of Apache Hadoop S3A on a read operation to Hitachi Content Platform v8.0 using Hitachi API for AmazonS3 worked, finding no issues. For details, see Test Results.5

6Apache Hadoop S3A Delete Operation from Hitachi Content PlatformIntegration test of Apache Hadoop S3A on a delete operation to Hitachi Content Platform v8.0 using Hitachi API forAmazon S3 worked, finding no issues. For details, see Test Results.Test ResultsThe integration test covered the following: Apache Hadoop S3A Configuration Install and configure JavaConfigure Hadoop configuration files Apache Hadoop S3A Mapping of Hitachi Content Platform with Hitachi API for Amazon S3 Apache Hadoop S3A Write Operation to Hitachi Content Platform Apache Hadoop S3A Read/Retrieval Operation from HCP Apache Hadoop S3A Delete Operation from HCPApache Hadoop S3A ConfigurationTable 1 contains test results of the Apache Hadoop S3A configuration.TABLE 1. APACHE HADOOP S3A CONFIGURATIONTest CasesResults1. Install and configure java.Pass2. Configure Hadoop configuration files.2a. Core-Site.xmlPass2b. Hdfs-site.xmlPass2c. Yarn-site.xmlPass2d. Mapred-site.xmlPass6

7Apache Hadoop S3A Mapping of Hitachi Content Platform with Hitachi API for Amazon S3Table 2 contains test results of the Apache Hadoop S3A mapping of Hitachi Content Platform with Hitachi API for AmazonS3.TABLE 2. APACHE HADOOP S3A MAPPING OF HITACHI CONTENT PLATFORM WITH HITACHI API FOR AMAZONS3Test CasesResults1. Configure Content Platform tenant and namespace with Hitachi API for Amazon S3.Pass2. Configure JetS3t mapping of Hitachi Content Platform with Hitachi API for Amazon S3 as target.Pass3. Configure Apache Hadoop S3A mapping target Content Platform with Hitachi API for Amazon S3.Pass4. Verify Apache Hadoop S3A has mapped HCP S3PassApache Hadoop S3A Write Operation to Hitachi Content PlatformTable 3 contains test results of the Apache Hadoop S3A write operation to Hitachi Content Platform.TABLE 3. APACHE HADOOP S3A WRITE OPERATION TO HITACHI CONTENT PLATFORMTest CasesResults1. Load data from local source to Apache Hadoop S3A.Pass2. Verify data is loaded successfully to Apache Hadoop S3A.Pass3. Copy data from Apache Hadoop S3A to Content Platform.Pass4. Verify data is written to Content Platform.PassApache Hadoop S3A Read Operation from Hitachi Content PlatformTable 4 contains test results of the Apache Hadoop S3A read operation from Hitachi Content Platform.TABLE 4. APACHE HADOOP S3A READ OPERATION FROM HITACHI CONTENT PLATFORMTest CasesResults1. List objects in HCP namespace(bucket)Pass2. Open object/file written in HCPPass3. Verify object/file opened successfullyPass7

8Apache Hadoop Delete Operation to Hitachi Content PlatformTable 5 contains test results of the Apache Hadoop S3A Delete Operation to Hitachi Content Platform.TABLE 5. APACHE HADOOP S3A DELETE OPERATION TO HITACHI CONTENT PLATFORMTest CasesResults1. Verify object exist in HCP namespace(bucket)Pass2. Run delete operation for loaded object in HCP namespace(bucket)Pass3. Verify delete operation completed successfully and files/object removed from HCPPass8

For More InformationHitachi Vantara Global Services offers experienced storage consultants, proven methodologies and a comprehensiveservices portfolio to assist you in implementing Hitachi products and solutions in your environment. For more information,see the Services website.Demonstrations and other resources are available for many Hitachi products. To schedule a live demonstration, contact asales representative or partner. To view on-line informational resources, see the Resources website.Hitachi Academy is your education destination to acquire valuable knowledge and skills on Hitachi products and solutions.Our Hitachi Certified Professional program establishes your credibility and increases your value in the IT marketplace. Formore information, see the Hitachi Vantana Training and Certification website.For more information about Hitachi products and services, contact your sales representative, partner, or visit the HitachiVantara website.

1Hitachi VantaraCorporate Headquarters2845 Lafayette StreetSanta Clara, CA 96050-2639 USAwww.HitachiVantara.com community.HitachiVantara.comRegional Contact InformationAmericas: 1 408 970 1000 or info@hitachivantara.comEurope, Middle East and Africa: 44 (0) 1753 618000 or info.emea@hitachivantara.comAsia Pacific: 852 3189 7900 or hds.marketing.apac@hitachivantara.com Hitachi Vantara Corporation 2018. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. All other trademarks, service marks and company names areproperties of their respective owners.Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered byHitachi Vantara.SL-030-00, February 2018

4 4 Hardware Components This is environment used to test Apache Hadoop v2.8.1 with Hitachi Content Platform v8.0: Rack optimized server for solutions, 2U one node, to host VMware ESX 6.0 Server Intel Xeon E5-2680 v3 processors @ 2.50 GHz, 2 sockets, 12 cores per socket, 24 CPUs 2.494 GHz 256 GB RAM Rack optimized server for solutions, 2U one node to host Apache Hadoop v2.8.1 as a virtual .