Kafka Distributed Streaming Platform
Apache Kafka® is a distributed streaming platform.
What exactly does that mean?
A streaming platform has three key capabilities:
- Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
- Store streams of records in a fault-tolerant durable way.
- Process streams of records as they occur.
Kafka is generally used for two broad classes of applications:
- Building real-time streaming data pipelines that reliably get data between systems or applications
- Building real-time streaming applications that transform or react to the streams of data
To understand how Kafka does these things, let's dive in and explore Kafka's capabilities from the bottom up.
First a few concepts:
- Kafka is run as a cluster on one or more servers that can span multiple datacenters.
- The Kafka cluster stores streams of records in categories called topics.
- Each record consists of a key, a value, and a timestamp.
Kafka has four core APIs:
- The Producer API allows an application to publish a stream of records to one or more Kafka topics.
- The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
- The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
- The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
In Kafka the communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol.
This protocol is versioned and maintains backwards compatibility with older version.
We provide a Java client for Kafka, but clients are available in many languages.
Your Virtual Machine Specs
Your Kafka Server will be running on an isolated and secure Virtual Machine with the following configuration 1 :
- CPU: 1 vCPU on 7th Generation Intel® Core™ i5-7260U Physical Processor(s)
- Base Frequency: 2.20 GHz
- Max Turbo Frequency: 3.40 GHz
- Memory: 1024 MB on 32 GB DDR4-2133 Physical Memory Chip(s)
- DDR4-2133 1.2V SO-DIMM
- Max Memory Bandwidth: 34.1 GB/s
- Disk Size: 16.06 GB on 1TB M.2. SSD Physical Storage Chip(s)
- M.2 Solid-State Drive (SSD)
- Sequential Read: 530 MB/s
- Sequential Write: 510 MB/s
- Random Read IOPS: 92 K
- Random Write IOPS: 83 K
Note 1 : Virtual Machine Resources are already optimized for performance. Under extreme usage or circumstances, more resources can be easily acquired via our Add-ons section.