Integrating Apache Hadoop With NoSQL Database - Oracle

Transcription

Integrating Apache Hadoop with NoSQL DatabaseIn this tutorial, you will start an Oracle NoSQL Database instance that has radio data, load thedata into the schemas and generate top 10 most streamed tracks.Pre-requisites:Hardware Requirements: Disk space RAM above 2 GBSoftware Requirements: Linux-based operating system NoSQL Database Oracle Java Development Kit 1.6 or later Hadoop 2.2Steps to run a Hadoop operation in NoSQL Database:1. Start Kvlite.2. Load data into NoSQL Database.3. Start the Hadoop interface.4. Run the MapReduce job.5. Display the output.Follow the steps below:Step 1: Login as Oracle user.

Step 2: Navigate to radio directory where the files pertaining to this demo are located.[oracle@kvhost u02] cd radioStep 3: Start KVLite[oracle@kvhost radio] ./startKvlite.shKVLite started .Step 4: Identify the schemas and load data into NoSQL DatabaseStep 5: Before inserting records into the schema. You can observe each schema(customer, plays and songs).

Step 6: Navigate to tmpdata and observe the data that will be loaded.For exampleThese text files are the initial input provided to the Top10.java program, which is used tocalculate the top 10 songs.Step 7:Navigate to appropriate directory and run load.sh

The data is loaded.Step 8: Show the number of Playlist, songs and users that is loaded.aggregate –count –key /PLStep 9: Open another terminal, login as hadoop user and start hadoop./start.shStep 10: Create a dir called lib under KVHOME and set the classpathStep11: Observe the program Top10.java which calculates the top10 songs.Gedit Top10.java

An example demonstrating a small chain of Map/Reduce processes based on data originatingfrom Oracle NoSQL Database. This job first counts the number of streams of each song in aonline radio database. It then sorts the results and reports the top 10 most-streamed songs.The KVAvroInputFormat and related classes are located in the lib/kvclient.jar file so this mustbe included in the Hadoop classpath at runtime. The arguments to the program are the kvstorename, the helperHost:port pair and the HDFS output path.Step 12: Run MapReduce operationhadoop jar myjar.jar hadoop.Top10 -libjars HADOOP CLASSPATH kvstorelocalhost:5000 /output

Open http://localhost:50070/nn browsedfscontent.jsp and click outputStep 13: Click the part-r-00000

The output displays the top 10 songs that are clicked by the users.It signifies that the track id 733143757 is listened 154 times and hence is the top rated song.

Hadoop 2.2 Steps to run a Hadoop operation in NoSQL Database: 1. Start Kvlite. 2. Load data into NoSQL Database. 3. Start the Hadoop interface. 4. Run the MapReduce job. 5. Display the output. Follow the steps below: Step 1: Login as Oracle user.