Amazon Kinesis and Apache Kafka are both popular data streaming platforms. They offer similar features, but there are also some pretty noteworthy differences. These platforms cater to the burgeoning demand for real-time analysis in surveillance camera operations and IoT data streaming. Here’s what you should know about Kinesis vs. Kafka, their use cases, and how they compare to Nabto, which specializes in secure P2P (Peer-to-Peer) IoT streaming.

What is Kinesis?

Amazon Kinesis is a fully managed data streaming service that makes it easy to collect, process, move around, and store data on any scale. Moving around a constant supply of data is called streaming data. For example, you can use Kinesis to continuously collect and process data from Internet of Things (IoT) devices and then stream that data to a dashboard so you can view it, or to a data warehouse for storage.

Kinesis is easy to use and manage, since it’s backed by Amazon Web Services (AWS), and the software operates on a pay-as-you-go basis so you don’t pay for any services you don’t use. You just set up and configure your stream and leave it up to AWS to handle the infrastructure, maintenance, and scaling. The software offers real-time data processing, allowing you to get instant insights from data, and easy deployment that let’s get started in just minutes. Some of the most common use cases for Kinesis include real-time analytics and reporting, streaming data to machine learning models, and loading data into data warehouses for analysis or storage.

Let’s take a closer look at those use cases one at a time. If you use Kinesis for real-time analytics, you can, for example, power a dashboard that gives sales teams a real-time view into buyer habits and product insights. The dashboard can update every time new information streams in, allowing business decision makers to stay fully informed.

Machine learning is one of the top use cases for the big data and data streaming industries as a whole. Data streaming allows you to update and train machine learning models based on the latest information, allowing them to instantly adjust to any important changes in, for example, seasonal buying habits. Then lastly, you can simply use Kinesis to load data into data warehouses or data lakes to be used or analyzed at a later date.

So how exactly does Kinesis move data around? Think of Kinesis as the management system for a series of digital post offices – called nodes – that handle a stream of packages full of information. Those packages are called “shards,” which hold any data you want to stream. Each piece of information in a package receives a sequence number and is sorted within the package based on that number. Those packages are then streamed to a data lake, data warehouse, or other repository. If a single shard gets too full, Kinesis just adds another “package.” On the other hand, if for some reason there are multiple packages that are only half-full because the data stream changes in some way, Kinesis will dump the data into a single shard.

Kinesis stores data records in shards for up to 24 hours by default. After all, it’s a streaming platform, not a data warehouse, so it’s not designed to store data for long periods of time. The 24 hours simply offers enough time to process data directly in the stream if needed so you can visualize it in a dashboard or use it for powering machine learning models. The software does, however, have an Extended Data Retention feature that allows you to store data for up to seven days if you need to do some extra processing, though longer storage incurs an additional cost.

What is Kafka?

Now that you understand a little about Kinesis, let’s move on to Kafka. Apache Kafka is a data streaming platform originally developed by LinkedIn. LinkedIn later open-sourced the software and made it publicly available. Kafka is a distributed message system. That means that it’s a system of multiple post offices, or nodes. Each one operates independently but is part of a bigger network. So, even if one post office is too busy or temporarily closes, others can still function, and the mail system as a whole keeps running.

You can run Kafka on-premises or in the cloud. It is more flexible and configurable than Kinesis because you can customize the code, but it also requires more manual setup and maintenance, unless you opt for managed services through a third-party vendor. For example, Amazon offers Amazon Managed Streaming for Kafka (MSK).

Kafka is a good choice for applications that need high throughput and low latency, because it’s highly scalable and very fast. Specifically, Kafka’s low latency makes it perfect for something called event-driven applications.

To understand event-driven applications, imagine you’re at home and you’ve set up a doorbell with a camera. Every time someone presses the doorbell, the camera takes a photo and sends it to your phone. This way, even if you’re in the backyard or upstairs, you instantly know who’s at the door. Think of the action of someone pressing the doorbell as an event. Your system (doorbell with a camera) is set up to react to this event by taking a photo, and the result is you receiving a photo on your phone.

An event-driven application works in a similar way. Instead of continuously checking for changes or updates, it waits and listens for specific events, like the doorbell ringing. When those events occur, the application reacts and performs a certain action or task, like sending you a photo. In the case of Kafka, Kafka is the system that informs the application that the doorbell just rang. It gets data from the doorbell and forwards that information to the event-driven application so the application can act based on that event.

To do this, Kafka sorts data into groups based on characteristics like where the data needs to be sent or where it comes from. Those groups are called partitions. Within a partition, the data is stored in a log. Unlike Kinesis, where the data has to be sorted based on a sequence number, Kafka simply adds new data at the end of a log. Once the data is written down, it can’t be changed or written over, so Kafka tends to be a very reliable source for streaming data.

Kafka can add as many nodes, or “post offices,” as is necessary, so it can scale as much as you want, meaning it can handle even more data than Kinesis. You can configure it to retain data for as much or as little time as you want. Plus, Kafka comes with a lot of other community-contributed resources and other software like Kafka Streams to help process the data, or Kafka Connect so you can integrate Kafka with data sources and data repositories.

The Kafka software itself is free, since it’s open source, but you’ll have to consider the infrastructure and maintenance costs since you’ll need to set up all of the data sources and manage maintenance and security yourself.

Understanding Kinesis vs. Kafka

That was a lot of information, and none of it really answers the fundamental question: between Kinesis and Kafka, how do you know which one you should use? Here are some of the main differences in terms of scaling, availability, ease of use and deployment, security, and cost.

When I talk about scaling, I mean how a piece of software changes to meet increased demand. Amazon Kinesis scales vertically while Kafka scales horizontally. Vertical scaling is a way to increase the throughput of a system, meaning the amount of data it can process, by adding more computing resources to a single node to increase its processing power. It’s like adding more workers to a single post office so the post office can handle more packages. There are limits in the system, though, because a node can only add a certain amount of resources, just like a post office can only hold so many workers.

By contrast, the horizontal scaling that Kafka offers is a way to increase the throughput of a system by adding more nodes to the system. In other words, it’s like adding more post offices so the postal system can handle more packages instead of just adding more workers within a particular post office. Horizontal scaling means you can increase Kafka’s capacity as much as you need to whenever you want.

The next important difference is availability. What I mean by that is that, in our post office analogy, if one post office shuts down in the middle of its operations for some reason, all of the packages inside are unavailable until you get the post office up and running again. But Kinesis and Kafka both have features that ensure that packages of data are always available.

In the case of Kinesis, the software automatically replicates data across three availability zones (AZs). In other words, it copies the data into multiple locations so that if one part of the system goes down, the rest of the system can keep running and the data is readily available at any given time. You can easily increase the number of availability zones or use availability zones in different geographical regions to ensure that your data is protected no matter what happens.

Kafka also replicates data across brokers, the equivalent to Kinesis’ availability zones, but the difference is that it’s not automatic. A broker here is essentially a manager that deals with multiple post offices/partitions at the same time. The replication doesn’t happen by default, which means that users have to manage the replication themselves. This entire process ensures that the post offices keep working smoothly. Kafka can have multiple brokers working concurrently, and you can configure the software to replicate data to more than one broker to ensure availability.

Next you have ease of use, which is thankfully a much simpler concept. Kinesis is easy to set up, especially for businesses that already use AWS. It’s a managed service, and you don’t have to do all that much work to get it up and running in a short period of time. By contrast, Kafka will require a lot of configuration and effort to manage.

That same distinction applies to security for Kinesis and Kafka. They actually have very similar security features and access controls available, but in Kinesis, those features are enabled by default, whereas with Kafka you have to set them up yourself.

When it comes to cost, here’s where things get complex again. Since Kafka is open-source, the software itself is technically free, but you’ll likely have to pay engineers to configure and maintain it, and those costs can be high. After that, both options are pay-as-you-go. You’ll also have to pay to retain data, so if you retain data beyond the seven-day limit, you’ll wind up with even higher costs.

As a result of all of these differences, businesses often go for Kinesis if they’re already operating within the AWS ecosystem and want a fully managed, easy-to-deploy solution. On the other hand, companies may go for Kafka to support higher-throughput scenarios, on-premises deployments, and event-driven applications due to Kafka’s great scalability and low latency.

Here’s your summary of Kinesis vs. Kafka:

Feature Apache Kafka Amazon Kinesis
Open source Yes No
Managed service Third-party managed services available Yes
Ease of setup More difficult Easier
Throughput Higher Lower
Latency Lower Higher
Flexibility More flexible Less flexible
Scalability Horizontally scalable Vertically scalable
Availability Requires manual configuration Highly available by default
Architecture Sorts data based on partitions Sorts data based on shards
Replication Requires manual configuration Replicates data across multiple AZs by default
Security Requires manual configuration Supports encryption and access controls by default
Cost Pay-as-you-go, may be more expensive Pay-as-you-go
Community support Large and active community Smaller community
Ecosystem Large ecosystem of tools and plugins Smaller ecosystem of tools and plugins
Use cases Real-time streaming analytics, data pipelines, event-driven applications Real-time streaming analytics, data pipelines, data warehousing

Kinesis vs. Kafka vs. Nabto

So, now we can discuss how those two options differ from Nabto. Nabto is a remote connectivity platform for IoT devices. It is based on peer-to-peer (P2P) technology, which allows IoT devices to connect to each other directly without the need for a cloud server. By contrast, Kinesis and Kafka are data streaming platforms. They allow you to collect, process, and store streaming data in real time. In addition, Kafka and Kinesis are cloud-based, while Nabto bypasses the cloud to only store data at the device level.

Though these systems have different purposes, they have some similar use cases. For example, they can all be used as part of an IoT ecosystem. Nabto lets you set up a secure, low-latency connection between, say, a mobile app and an IoT device, for instance in a smart industrial HVAC system. Through that connection, you can stream small amounts of data to be stored at the device level, bypassing the cloud entirely. Kinesis and Kafka, on the other hand, can be used to collect, process, and store large amounts of data in the cloud from those same IoT devices.

So Kinesis and Kafka will more commonly be used for situations in which a lot of data processing is necessary. For example, a smart weather forecasting system has to collect and process enormous amounts of data to predict the weather. Amazon Kinesis Video Streams, which is one of the services offered through Kinesis, is also perfect for streaming live video or other time-encoded data or sending pre-recorded files for on-demand processing to power, for example, a computer vision system that watches for any smoke in a manufacturing plant and can send alerts to nearby systems.

By contrast, you can use Nabto to stream smaller amounts of data, like P2P video streameaing data or temperature data, between IoT devices and a client device in situations in which no additional data processing or cloud storage is necessary. Some examples of client devices include smartphones or laptops. You could use Nabto to send basic commands to a smart HVAC system or to set up a connection between your phone and a smart video camera so you can view the video.

Nabto’s simplified model means you don’t have to pay for an expensive cloud server and you can predict all of your costs up front with ease so there are no surprises. It prevents you from wasting resources while also providing the lowest level of latency, since data goes straight from the client device to the IoT device without a detour through the cloud.

Final thoughts

Kinesis vs. Kafka is a complex topic with a lot of nuances that might make it tough to decide which option is right for you. Hopefully you now have enough information to make a smooth transition to either one depending on your needs. Or, if you want to know more about how Nabto’s P2P connectivity can provide an alternative to expensive cloud streaming, book a consultation with one of our experts.

Read our other resources

We’ve also published a range of IoT device resources for our community, including:

Want to learn more about P2P IoT?

Please visit the:
P2P IoT Academy

Deep dive Into our documentation?

Please visit the:
Nabto Platform Overview

Try our demo for Video Surveillance?

Please visit the:
Nabto Edge
Video Cam Demo

Looking for other Great posts?

###