Introduction To Apache Kafka: Hands-On Exercises

Transcription

201601aIntroduction to ApacheKafka: Hands-On ExercisesTable of ContentsHands-On Exercise: Using Kafka from the Command Line . 1Hands-On Exercise: Using Kafka as a Flume Sink (Flafka) . 4Hands-On Exercise: Using Kafka as a Flume Source (Flafka) . 7

1Hands-On Exercise: Using Kafka from the CommandLineFiles and Data Used in This ExerciseExercise directory /training materials/add/exercises/kafkaIn this exercise you will use Kafka’s command line tool to create and subscribe toKafka topics. You will also use the command line producer and consumer clientsto publish and read messages.Create a Kafka Topic1. Open a new terminal window and create a Kafka topic named device alertsthat will contain alert messages about devices on Loudacre’s network. Since this is asingle-node cluster running on a virtual machine, we will use a replication factor of1 and a single partition. kafka-topics --create \--zookeeper localhost:2181 \--replication-factor 1 \--partitions 1 \--topic device alertsYou will see the message: Created topic "device alerts". You may alsosee a warning that using both underscores and periods in a topic name can causeconflicts.2. Display all Kafka topics to confirm that the new topic you just created is listed: kafka-topics --list \--zookeeper localhost:2181Produce and Consume MessagesYou will now use Kafka command line utilities to start producers and consumers for thetopics created earlier. Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.1

23. Start a Kafka producer for the device alerts topic: kafka-console-producer \--broker-list localhost:9092 \--topic device alertsAfter running the command above, you may see a warning that states, “Propertytopic is not valid (kafka.utils.VerifiableProperties).” This is due to a bug in Kafka[KAFKA-1711] and can safely be ignored.4. Publish a message to the device alerts topic by pasting or typing the followingtext into the terminal window running the producer you started:Model: unknown, Device ID: 1234, Alert: High, Description:Unauthorized device detectedBe sure to hit the Enter key after you input the message text.5. Open a new terminal window and adjust it to fit on the window beneath theproducer window.Tip: This exercise involves using multiple terminal windows. To avoid confusion,you may wish to set a different title for each one by selecting Set Title on theTerminal menu:6. In the new terminal window, start a Kafka consumer that will read from thebeginning of the device alerts topic: kafka-console-consumer \--zookeeper localhost:2181 \--topic device alerts \--from-beginning7. You should see the status message you sent using the producer displayed on theconsumer’s console: Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.2

3Model: unknown, Device ID: 1234, Alert: High, Description:Unauthorized device detected8. Type CTRL-C to stop the device alerts consumer, and restart it, but this timeomit the --from-beginning option to this command. You should see that nomessages are displayed.9. Switch back to the producer window and type the following message into theterminal, followed by the Enter key:Device: Sorrento F5, Device ID: 6735, Alert: High,Description: Registration of duplicate device attempted10. Return to the consumer window and verify that it now displays the alert messageyou published from the producer in the previous step.Cleaning Up11. Press CTRL-C in the consumer window to end its process.12. Press CTRL-C in the producer window to end its process.13. Close all remaining terminal windows.This is the end of the exercise. Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.3

4Hands-On Exercise: Using Kafka as a Flume Sink(Flafka)Files and Data Used in This ExerciseExercise directory /training materials/add/exercises/flafsinkIn this exercise, you will use Flume’s Kafka sink to write data received by Flumeto a Kafka topic. You will use a Kafka consumer to read the data as it is sent byFlume to Kafka.Create a Kafka Topic1. Open a new terminal window and create a Kafka topic named app events thatwill contain messages about user behavior events in our e-commerce application.Since this is a single-node cluster running on a virtual machine, we will use areplication factor of 1 and a single partition. kafka-topics --create \--zookeeper localhost:2181 \--replication-factor 1 \--partitions 1 \--topic app events2. Confirm that the new topic you just created is listed among all available topics: kafka-topics --list \--zookeeper localhost:2181Create the Flume ConfigurationIn the /training materials/add/exercises/flafsink/ directory, you willfind a partially configured flume.conf file. It uses a netcat source to read event datasent by the Web application to TCP port 44444. You will edit this flume.conf file tocomplete the implementation, based on the following characteristics: The sink type is Kafka (org.apache.flume.sink.kafka.KafkaSink) The sink writes messages to the app events topic The sink uses a single Kafka broker: localhost:9092 Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.4

5Run the Flume Agent and Kafka Consumers3. Open a terminal and change to the exercise directory cd /training materials/add/exercises/flafsink4. Start the Flume agent: flume-ng agent \--conf /etc/flume-ng/conf \--conf-file flume.conf \--name agent1 \-Dflume.root.logger INFO,console5. Next, launch a separate terminal window and use the command line tool tolaunch a consumer that will display the content of new messages received in theapp events category: kafka-console-consumer \--zookeeper localhost:2181 \--topic app events6. Launch a third terminal window and run the following command to have the netcatclient connect to the TCP port where the Flume agent is listening for incoming eventdata: nc localhost 444447. Each line you type into this terminal window will be received by the Flume source,converted to a new Flume event, and written to the Kafka sink. Type the followinglines, which represent event data that might be generated by an e-commerceapplication, into this window.20160115,12:35,jsmith,LOGIN SUCCESS20160115,12:37,jsmith,VIEW PRODUCT20160115,12:41,jsmith,BUY PRODUCT8. Switch to the terminal window in which you started the Kafka consumer. Youshould observe that it displays the event data you sent to the Flume source. Thisindicates that it was successfully published to the app events topic. Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.5

6Cleanup9. Press CTRL-C in the Flume agent window to end its process.10. Press CTRL-C in the netcat (nc) window to end its process.11. Press CTRL-C in the consumer window to end its process.12. Close all remaining terminal windows.This is the end of the exercise. Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.6

7Hands-On Exercise: Using Kafka as a Flume Source(Flafka)Files and Data Used in This ExerciseExercise directory /training materials/add/exercises/flafsrcIn this exercise, you will use Flume’s Kafka source to read data published to aKafka topic and write it to an HDFS directory.Create a Kafka Topic1. Open a new terminal window and create a Kafka topic named calls placedthat will contain messages about calls placed through one of Loudacre’s telephoneswitches. kafka-topics --create \--zookeeper localhost:2181 \--replication-factor 1 \--partitions 1 \--topic calls placed2. Confirm that the new topic you just created is listed among all available topics: kafka-topics --list \--zookeeper localhost:2181Create the Flume ConfigurationIn the /training materials/add/exercises/flafsrc/ directory, you willfind a partially configured flume.conf file. You will edit this file to complete theimplementation, based on the following characteristics: The source type is Kafka (org.apache.flume.source.kafka.KafkaSource) The source’s ZooKeeper connection string is localhost:2181 The source receives messages from the calls placed topic The HDFS sink will write files to the following path:/user/training/calls placed Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.7

8Create the HDFS Path3. Create the HDFS path to hold data from the Kafka topic: hdfs dfs -mkdir /user/training/calls placedRun the Flume Agent and Kafka Consumers4. Open a new terminal window and change to the exercise directory cd /training materials/add/exercises/flafsrc5. Start the Flume agent: flume-ng agent \--conf /etc/flume-ng/conf \--conf-file flume.conf \--name agent1 \-Dflume.root.logger INFO,console6. Next, launch a separate terminal window and use the command line tool to launch aproducer that will publish messages to the calls placed topic: kafka-console-producer \--broker-list localhost:9092 \--topic calls placed7. Each line you type into this terminal window will be published as a message in thecalls placed topic, received by the Kafka source, and written to the HDFS sink. Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.8

9Type the following lines, which represent data that might be collected from thetelephone switching system about calls placed by customers, into the producer’sterminal 0655541578. Open a new terminal window and list the contents of the /user/training/calls placed directory in HDFS: hdfs dfs -ls /user/training/calls placed9. You should observe one or more files in this directory. Run the following commandto display the content that Flume has written to HDFS:hdfs dfs -cat /user/training/calls placed/FlumeData.*Cleanup10. Press CTRL-C in the Flume agent window to end its process.11. Press CTRL-C in the Kafka producer window to end its process.12. Close all remaining terminal windows.This is the end of the exercise. Copyright 2010–2017 Cloudera. All Rights Reserved.Not to be reproduced or shared without prior written consent from Cloudera.9

In this exercise, you will use Flume's Kafka sink to write data received by Flume to a Kafka topic. You will use a Kafka consumer to read the data as it is sent by Flume to Kafka. Create a Kafka Topic 1. Open a new terminal window and create a Kafka topic named app_events that