Apache: Big Data 2015 Kafka Architecture The Best Of Apache

Transcription

Apache: Big Data 2015The Best of ApacheKafka ArchitectureRanganathan Balashanmugam@ran than

Helló Budapest

About Me Graduated as Civil Engineer. dev 10 years /dev Thoughtworker from ”India”/ Organizer of Hyderabad Scalability Meetup with 2000 members.

“Form follows function.”- Louis Sullivan

Gravity DamIndirasagar Dam, Indiaimg src: http://www.montanhydraulik.in

Forces on a gravitydamHead WaterDamweightTail WaterUplift

publish-subscribe messaging servicedistributed commit/write-ahead log“producers produce, consumers consume, in large distributedreliable way -- real time”

Why Kafka? DBsLogsBrokersHDFS“For highly distributed messages, Kafka stands out.”

Kafka Vssrc: https://softwaremill.com/mqperf/

TimelineOpen sourced by LinkedIn, as version 0.6Graduated from ApacheSeveral Engineers who built Kakfa createConfluentLatest stable - 0.8.2.120112012201320142015

A Kafka e length message contentkafka.message.MessageChange requested:KAFKA-2511

Producers - pushRequest RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]KafkaBrokerResponse [TopicName [Partition ErrorCode ducer

TopicRemove messages based onnumber ofmessagestimesizekafka.common.Topic

PartitionsServes: Horizontal scaling, Parallel consumer readskafka.cluster.Partition

Consumers - pullConsumer 2Consumer impleConsumer

Consumer offsetscommitting and fetching consumer offsetsimg src: resion-offset1.jpg

kafka:// - protocol“Binary protocol over TCP” MetadataSendFetchOffsetsOffset commitOffset fetch

MechanicalSympathy"The most amazing achievement of the computer software industry is its continuingcancellation of the steady and staggering gains made by the computer hardwareindustry." - Henry PeteroskiImage source: http://www.theguide2surrey.com

Persistence“Everything is faster till the disk IO.”

Disk faster than RAMsrc: http://queue.acm.org/detail.cfm?id 1563874

Linear Read & WritesOn high level there are only two operations:Append to end of logfetch messages from apartition beginning from aparticular message idsequential file I/O

“Let us play pictionary”

Linux Page Cache“Kafka ate my RAM”

ZeroCopysrc: py/

Batchingsmall latency to improve throughputimg src: 15/05/tirupati.jpg

Compressionbandwidth is more expensive per-byte to scale than disk I/O, CPU,or network bandwidth capacity within a facilitykafka.message.CompressionCodec

Log compactionkafka.log.LogCleaner, LogCleanerManagerimg src: http://kafka.apache.org/083/documentation.html

Message DeliveryAtleast onceAtmost onceExactly once

Replicationun-replicated replication factor of one

Quorum based Better latencyTo tolerate “f” failures, need “2f 1” replicas

Primary-backupreplicationTopic 1Topic 1Topic 1Topic 2Topic 2Topic 2Topic 3Topic 3Broker 3Broker 4Topic 3Broker 1Broker 2

ZooKeepercluster coordinator

THANK YOUFor questions or suggestions:Ran.ga.na.than Branganab@thoughtworks.com@ran than

The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015. Helló Budapest. About Me Graduated as Civil Engineer. dev 10 years /dev Thoughtworker from ”India”/ Orga