We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
The appeal of processing data in real-time is on the rise. Historically, organizations adopting the streaming data paradigm were driven by use cases such as application monitoring, log aggregation and data transformation (ETL).
Organizations like Netflix have been early adopters of the streaming data paradigm. Today, there are more drivers to growing adoption. In Lightbend’s 2019 survey, Streaming Data and the Future Tech Stack, new capabilities in artificial intelligence (AI) and machine learning (ML), integration of multiple data streams and analytics are starting to rival these historical use cases.
The streaming analytics market (which depending on definitions, may just be one segment of the streaming data market) is projected to grow from $15.4 billion in 2021 to $50.1 billion in 2026, at a Compound Annual Growth Rate (CAGR) of 26.5% during the forecast period as per Markets and Markets.
Again, historically, there has been a sort of de-facto standard for streaming data: Apache Kafka. Kafka and Confluent, the company that commercializes it, are an ongoing success story, with Confluent confidentially filing for IPO in 2021.
In 2018, more than 90% of respondents to a Confluent survey deemed Kafka as mission-critical to their data infrastructure and queries on Stack Overflow grew over 50% during the year. As successful Confluent may be and as widely adopted as Kafka may be, however, the fact remains: Kafka’s foundations were laid in 2008.
A multitude of streaming data alternatives, each with a specific focus and approach, have emerged in the last few years. One of those alternatives is Apache Pulsar. In 2021, Pulsar ranked as a Top 5 Apache Software Foundation project and surpassed Apache Kafka in monthly active contributors.
StreamNative, a company founded by the original developers of Apache Pulsar and Apache BookKeeper, just released a report comparing Apache Pulsar to Apache Kafka regarding performance benchmarks. StreamNative offers a fully managed Pulsar-as-a-service cloud and enables enterprises to “access data as real-time event streams.”
Pulsar vs. Kafka
StreamNative isno’t the first company founded around Pulsar. Streamlio, another company founded by Pulsar core committers, was acquired by Splunk in 2019. Today, two of Streamlio’s founders, Sijie Guo and Matteo Merli, serve as StreamNative’s CEO and CTO, respectively.
As Addison Higham, StreamNative’s chief architect and head of cloud engineering shared, the company is focused on a bottom-up, community-driven approach and aspects like technical development, documentation and training. Pulsar is used at the likes of Tencent, Verizon, Intuit and Flipkart, with the latter two also being StreamNative clients.
StreamNative has grown significantly in 2021. It raised $23.7 million in series A funding, grew its team from 30 to more than 60 across North America, EMEA and Asia and saw six times the growth in its revenue and 3X growth in adoption, accelerated by AWS Marketplace integration, SQL support and other updates. Its community also grew by two times and Pulsar surpassed the 10,000 stars mark on GitHub.
Higham said that the question of how Pulsar compares to Kafka is one they get a lot. The last widely published Pulsar versus Kafka benchmark was performed in 2020 and a lot has changed since then. This is why the engineering team at StreamNative performed a benchmark study using the Linux Foundation Open Messaging benchmark.
According to StreamNative’s benchmarks, Pulsar can achieve 2.5 times the maximum throughput compared to Kafka. Pulsar provides consistent single-digit publish latency that is 100 times lower than Kafka at P99.99 (ms). Low publish latency is important because it enables systems to hand off messages to a message bus quickly.
With a historical read rate that is 1.5 times faster than Kafka, applications using Pulsar as their messaging system can catch up after an unexpected interruption in half the time. That said, we should note that the benchmark, like all benchmarks and especially those coming from vendors, should be seen as indicative.
In addition, as StreamNative also notes, the report focuses purely on comparing technical performance. While clearly important, that’s not all that matters in evaluating alternatives, as Higham also acknowledged. Many third parties have embarked on a Pulsar vs. Kafka comparison.
Higham said that in many situations, Pulsar and Kafka can behave similarly. Where StreamNative tries to differentiate with Pulsar are in the areas of management and developer experience.
Pulsar’s architecture and positioning
Higham referred to Pulsar’s legacy as a messaging-oriented platform, which later evolved to address streaming and events as well. This is reflected in Pulsar’s API and Higham thinks, this makes for easier adoption among developers. While Pulsar does not have direct compatibility with Kafka, a feature called Protocol Handler enables it to interoperate with other system APIs, with a Kafka implementation featured prominently.
Higham said StreamNative regularly interacts with companies that use Kafka and found that they have just a large sprawl of hundreds or even thousands of Kafka clusters, almost one per application, which ends up being not very cost-effective. Pulsar’s built in multi-tenancy is designed to safely share workloads and that’s extremely valuable at scale, Higham added, while also emphasizing features such as Geo-replication.
Pulsar also offers SQL access to streaming data via Trino, as well as data transformation Pulsar functions in languages such as Go, Java and Python. Pulsar’s latest version is 2.9.1, however, when version 2.8 was released, the Pulsar team published a technical blog detailing Pulsar’s architecture and we refer interested readers there.
StreamNative claims that its Protocol Handler framework offers not just a clear migration path from Kafka, but also integration to other systems and protocols such as RocketMQ, AMQP and MQTT. Higham noted that is coming soon to StreamNative Cloud, with emphasis on support for Kafka API.
StreamNative Cloud is StreamNative’s main revenue driver. In addition to supporting both a managed cloud offeringStreamNativeoffers value-adds to Apache Pulsar for security and integration functionalities, including with platforms likesuch as Flink, Spark and Delta Lake.
CAs far as comparing Pulsar to other offerings in that space such as Apache Flink or Spark Streaming, Higham said that Pulsar is not really focused on trying to build something similar to one of those streaming compute engines.
What they are focused on is “a great integration story of building [the] best of breed connector that’s very flexible, ease of use and the simple 80% use cases of single message transformation”, Higham said. Pulsar has more in common with Redpanda, as they aim at solving some of those core pain points, but some of those pain points sit not just in the implementation, but also in the underlying protocol, Higham claims.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.