Organizations increasingly require instantaneous analysis and decision-making capabilities. To achieve this, they're turning to stream processing technologies, where data is viewed as a constantly generated “stream of events."
In April 2021, Confluent, a stream processing vendor backed by Sequoia Capital, raised $250M at a $4.5B valuation, while also confidentially filing for an IPO.
Confluent, founded by the team that worked on the open-source Apache Kafka framework at LinkedIn, has become one of the leading vendors in this space, and its IPO filing comes at a time when the stream processing market is heating up.
Stream processing is where data is captured and analyzed “on the fly,” instead of being stored and analyzed in batches — which can result in missed opportunities for enterprises that need to respond to events as they happen.
GET the enterprise AI TRENDS report
Download the free report to learn about the biggest emerging trends in AI and strategies to watch for 2021.
Consider fraud prevention, for instance. Steam processing would enable vendors to catch suspicious activity as soon as a card is swiped, instead of analyzing data after a transaction is complete. Similarly, in factories, analyzing real-time IoT data can help identify and fix problematic machinery instantly.
As the number of real-time data sources grows with the proliferation of IoT, big tech companies and cloud solutions providers are building stream processing or analytics capabilities in order to cater to all of an enterprise’s data ingestion and processing needs.
Below, we look at the latest developments in stream processing that CIOs and data science teams need to keep abreast of.
WHAT YOU NEED TO KNOW:
- Stream processing technology has benefited from a strong open-source ecosystem. The open-source Apache project has several distributed streaming platforms under its umbrella, lowering the barrier to entry for developers who want to integrate streaming analytics into their enterprise workflows. Popular Apache stream processing services include Kafka, Samza, Storm, Heron, Spark, and Flink. Many of these were developed or contributed to by major tech players like LinkedIn, Intel, Twitter, and Databricks.
- In-memory computing tech is enabling streaming analytics. In-memory computing pools together parallel distributed random access memory (RAM) from multiple computers to store and process data at high speeds, making analysis thousands of magnitudes faster than traditional memory. The advantages of in-memory computing make it a natural enabler of stream processing. In-memory computing tech vendors include Gigaspaces, GridGain, and Hazelcast.
- Companies are developing streaming SQL to query data streams. While structured query language (SQL) is popular for querying data from relational or tabular databases, companies are now building “streaming SQL” to interact with real-time streams of data. Vendors like Eventador (acquired by Cloudera in Q4’20) and Striim are working on this.
- Confluent developed a new event streaming database architecture. In Q4’19, Confluent effectively created a new category of infrastructure when it released ksqlDB, an event streaming database. ksqlDB makes it easier for developers to build apps that use streaming data by incorporating features similar to traditional databases that developers are familiar with.
- AI technologies like machine learning can enable streaming analytics. For example, data integration vendor Informatica’s CLAIRE AI engine automatically recognizes the structure of incoming data and parses it. To strengthen CLAIRE’s capabilities, Informatica acquired AI startup GreenBay Technologies in Q3’20.
- Popular cloud vendors and big tech companies have built their own stream processing capabilities. By now, many of the largest cloud vendors offer support for an enterprise’s streaming needs, from Amazon’s Kinesis to IBM’s Event Streams to Microsoft’s Stream Analytics. Enterprises can choose to leverage their existing cloud partners for stream processing.