Transcription
Apache: Big Data 2015The Best of ApacheKafka ArchitectureRanganathan Balashanmugam@ran than
Helló Budapest
About Me Graduated as Civil Engineer. dev 10 years /dev Thoughtworker from ”India”/ Organizer of Hyderabad Scalability Meetup with 2000 members.
“Form follows function.”- Louis Sullivan
Gravity DamIndirasagar Dam, Indiaimg src: http://www.montanhydraulik.in
Forces on a gravitydamHead WaterDamweightTail WaterUplift
publish-subscribe messaging servicedistributed commit/write-ahead log“producers produce, consumers consume, in large distributedreliable way -- real time”
Why Kafka? DBsLogsBrokersHDFS“For highly distributed messages, Kafka stands out.”
Kafka Vssrc: https://softwaremill.com/mqperf/
TimelineOpen sourced by LinkedIn, as version 0.6Graduated from ApacheSeveral Engineers who built Kakfa createConfluentLatest stable - 0.8.2.120112012201320142015
A Kafka e length message contentkafka.message.MessageChange requested:KAFKA-2511
Producers - pushRequest RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]KafkaBrokerResponse [TopicName [Partition ErrorCode ducer
TopicRemove messages based onnumber ofmessagestimesizekafka.common.Topic
PartitionsServes: Horizontal scaling, Parallel consumer readskafka.cluster.Partition
Consumers - pullConsumer 2Consumer impleConsumer
Consumer offsetscommitting and fetching consumer offsetsimg src: resion-offset1.jpg
kafka:// - protocol“Binary protocol over TCP” MetadataSendFetchOffsetsOffset commitOffset fetch
MechanicalSympathy"The most amazing achievement of the computer software industry is its continuingcancellation of the steady and staggering gains made by the computer hardwareindustry." - Henry PeteroskiImage source: http://www.theguide2surrey.com
Persistence“Everything is faster till the disk IO.”
Disk faster than RAMsrc: http://queue.acm.org/detail.cfm?id 1563874
Linear Read & WritesOn high level there are only two operations:Append to end of logfetch messages from apartition beginning from aparticular message idsequential file I/O
“Let us play pictionary”
Linux Page Cache“Kafka ate my RAM”
ZeroCopysrc: py/
Batchingsmall latency to improve throughputimg src: 15/05/tirupati.jpg
Compressionbandwidth is more expensive per-byte to scale than disk I/O, CPU,or network bandwidth capacity within a facilitykafka.message.CompressionCodec
Log compactionkafka.log.LogCleaner, LogCleanerManagerimg src: http://kafka.apache.org/083/documentation.html
Message DeliveryAtleast onceAtmost onceExactly once
Replicationun-replicated replication factor of one
Quorum based Better latencyTo tolerate “f” failures, need “2f 1” replicas
Primary-backupreplicationTopic 1Topic 1Topic 1Topic 2Topic 2Topic 2Topic 3Topic 3Broker 3Broker 4Topic 3Broker 1Broker 2
ZooKeepercluster coordinator
THANK YOUFor questions or suggestions:Ran.ga.na.than Branganab@thoughtworks.com@ran than
The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015. Helló Budapest. About Me Graduated as Civil Engineer. dev 10 years /dev Thoughtworker from ”India”/ Orga