With the introduction of Event Grid, Microsoft Azure now offers an even greater choice of messaging infrastructure options. The expanded messaging service fleet consists of the Service Bus message broker, the Event Hubs streaming platform, and the new Event Grid event distribution service. Those services, which are all focused on moving datagrams, are complemented by the Azure Relay that enables bi-directional, point-to-point network connection bridging.
At first glance, it may appear that Service Bus, Event Hubs, and the Event Grid compete with each other. They all accept datagrams, either called events or messages, and they enable consumers to obtain and handle them.
Looking more closely at the kind of information that is published and how that information is consumed and processed, reveals, however, that the usage scenario overlap of the three services is rather small and that they are very complementary. A single application might use all three services in combination, each for different functional aspects, and we expect a lot of Azure solutions to do so.
To understand the differences, let’s first explore the intent of the publisher.
If a publisher has a certain expectation of how the published information item ought to be handled, and what audience should receive it, it’s issuing a command, assigning a job, or handing over control of a collaborative activity, either of which is expressed in a message.
Message exchanges
Messages often carry information that pass the baton of handling certain steps in a workflow or a processing chain to a different role inside a system. Those messages, like a purchase order or a monetary account transfer record, may express significant inherent monetary value. That value may be lost and/or very difficult to recover if such a message were somehow lost in transfer. The transfer of such messages may be subject to certain deadlines, might have to occur at certain times, and may have to be processed in a certain order. Messages may also express outright commands to perform a specific action. The publisher may also expect that the receiver(s) of a message report back the outcome of the processing, and will make a path available for those reports to be sent back.
This kind of contractual message handling is quite different from a publisher offering facts to an audience without having any specific expectations of how they ought to be handled. Distribution of such facts is best-called events.
Event distribution and streaming
Events are also messages, but they don’t generally convey a publisher intent, other than to inform. An event captures a fact and conveys that fact. A consumer of the event can process the fact as it pleases and doesn’t fulfill any specific expectations held by the publisher.
Events largely fall into two big categories: They either hold information about specific actions that have been carried out by the publishing application, or they carry informational data points as elements of a continuously published stream.
Let’s first consider an example for an event sent based on an activity. Once a sales support application has created a data record for a new sales lead, it might emit an event that makes this fact known. The event will contain some summary information about the new lead that is thought to be sufficient for a receiver to decide whether it is interested in more details, and some form of link or reference that allows the obtaining of those details.
The ability to subscribe to the source and therefore obtain the event, and to subsequently get the referenced information will obviously be subject to access control checks. Any such authorized party may react to the event with their own logic and therefore extend the functionality of the overall system.
A subscriber to the “new sales lead” event may, for instance, be an application that handles newsletter distribution, and signs up the prospective customer to newsletters that match their interests and which they agreed to receive. Another subscriber to the same event may put the prospective customer on an invitation list for a trade show happening in their home region in the following month and initiate sending an invitation letter via regular mail. The latter system extension may be a function that’s created and run just on behalf of the regional office for the duration of a few weeks before the event, and subsequently removed.
The core of the sales support application isn’t telling those two subscribers what to do and isn’t even aware of them. They are authorized consumers of events published by the source application, but the coupling is very loose, and removing these consumers doesn’t impact the source application’s functional integrity. Creating transparency into the state changes of the core application allows for easy functional extension of the overall system functionality, either permanent or temporary.
Events that inform about discrete “business logic activity” are different from events that are emitted with an eye on statistical evaluation and where the value of emitting those events lies in the derived insights. Such statistics may be used for application and equipment health and load monitoring, for user experience and usage metrics, and many other purposes. Events that support the creation of statistics are emitted very frequently, and usually capture data point observations made at a certain instant.
The most common example for events carrying data points are log data events as they are produced by web servers or, in a different realm, by environmental sensors in field equipment. Typically, an application wouldn’t trigger actions based on such point observations, but rather on a derived trend. If a temperature sensor near a door indicates a chilly temperature for a moment, instantly turning up the heating is likely an overreaction. But if the temperature remains low for a minute or two, turning up the heat a notch and also raising an alert about that door possibly not being shut are good reactions. These examples are based on looking at a series of events calculating the temperature trend over time, not just a point observation.
The analysis of such streams of events carrying data points, especially when near real-time insights are needed, requires data to be accumulated in a buffer that spans a desired time window and a desired number of events, and then processed using some statistical function or some machine-trained algorithm. The best pattern to acquire events of such a buffer is to pull them towards the buffer, do the calculation, move the time window, pull the next batch of events to fill the time window, do the next calculation, and so forth.
The Azure Messaging services fleet
Applications emit action events and data point events as messages to provide insights into what work they do and how that work is progressing. Other messages are used to express commands, work jobs, or transfers of control between collaborating parties. While these are all messages, the usage scenarios are so different that Microsoft Azure provides a differentiated, and yet composable, portfolio of services.
Azure Event Hubs
Azure Event Hubs is designed with focus on the data point event scenario. An Event Hub is an “event ingestor” that accepts and stores event data, and makes that event data available for fast “pull” retrieval. A stream analytics processor tasked with a particular analytics job can “walk up” to the Event Hub, pick a time offset, and replay the ingested event sequence at the required pace and with full control; an analytics task that requires replaying the same sequence multiple times can do so. Because most modern systems handle many data streams from different publishers in parallel, Event Hubs support a partitioning model that allows keeping related events together while enabling fast and highly parallelized processing of the individual streams that are multiplexed through the Event Hub. Each Event Hub instance is typically used for events of very similar shape and data content from the same kind of publishers, so that analytics processors get the right content in a timely fashion, and without skipping.
Example: A set of servers that make up a web farm may push their traffic log data into one Event Hub, and the partition distribution of those events may be anchored on the respective client IP address to keep related events together. That Event Hub capturing traffic log events will be distinct from another Event Hub that receives application tracing events from the same set of servers because the shape and context of those events differ.
The temperature sensors discussed earlier each emit a distinct stream that will be kept together using the Event Hub partitioning model, using the identity of the device as the partitioning key. The same partitioning logic and a compatible consumption model is also used in Azure IoT Hubs.
The Event Hubs Capture feature automatically writes batches of captured events into either Azure Storage blob containers or into Azure Data Lake and enables timely batch-oriented processing of events as well as “event sourcing” based on full raw data histories.
Azure Event Grid
Azure Event Grid is the distribution fabric for discrete “business logic activity” events that stand alone and are valuable outside of a stream context. Because those events are not as strongly correlated and also don’t require processing in batches, the model for how those events are being dispatched for processing is very different.
The first assumption made for the model is that there’s a very large number of different events for different contexts emitted by an application or platform service, and a consumer may be interested in just one particular event type or just one particular context. This motivates a filtered subscriber model where a consumer can select a precise subset of the emitted events to be delivered.
The second assumption is that independent events can generally be processed in a highly parallelized fashion using Web service calls or “serverless” functions. The most efficient model for dispatching events to those handlers is to “push” them out, and have the existing auto-scaling capabilities of the Web site, Azure Functions, or Azure Logic Apps manage the required processing capacity. If Azure Event Grid gets errors indicating that the target is too busy, it will back off for a little time, which allows for more resources to be spun up. This composition of Azure Event Grid with existing service capabilities in the platform ensures that customers don’t need to pay for running “idle” functionality like a custom VM/Container hosting the aforementioned newsletter service – and that doesn’t do anything but wait for the next event, while still having processing capacity ready in milliseconds for when such an event occurs.
Azure Service Bus
Azure Service Bus is the “Swiss Army Knife” service for all other generic messaging tasks. While Azure Event Grid and Azure Event Hubs have a razor-sharp focus on the collection and distribution of events at great scale, and with great velocity, an Azure Service Bus namespace is a host for queues holding jobs of critical business value. It allows for the creation of routes for messages that need to travel between applications and application modules. It is a solid platform for workflow and transaction handling and has robust facilities for dealing with many application fault conditions.
A sale recorded in a point-of-sale solution is both a financial record and an inventory tracking record, and not a mere event. It’s recorded in a ledger, which will eventually be merged into a centralized accounting system, often via several integration bridges, and the information must not be lost on the way. The sales information, possibly expressed as separate messages to keep track of the stock levels at the point of sale, and across the sales region, may be used to initiate automated resupply orders with order status flowing back to the point of sale.
A particular strength of Service Bus is also its function as a bridge between elements of hybrid cloud solutions and systems that include branch-office or work-site systems. Systems that sit “behind the firewall”, are roaming across networks, or are occasionally offline can’t be reached directly via “push” messaging, but require messages to be sent to an agreed pickup location from where the designated receiver can obtain them.
Service Bus queues or topic subscriptions are ideal for this use-case, where the core of the business application lives in the cloud or even an on-site datacenter, branch-offices, work-sites, or service tenants spread across the world. This model is particularly popular with SaaS providers in health care, tax and legal consulting, restaurant services, and retail.
Composition
Because it’s often difficult to draw sharp lines between the various use-cases, the three services can also be composed. (Mind that Event Grid is still in early preview; some of the composition capabilities described here will be made available in the coming months)
First, both Service Bus and Event Hub will emit events into Event Grid that will allow applications to react to changes quickly, while not wasting resources on idle time. When a queue or subscription is “activated” by a message after sitting idle for a period of time, it will emit a Grid event. The Grid event can then trigger a function that spins up a job processor.
This addresses the case where high-value messages flow only very sporadically, maybe at rates of a handful of messages per day, and to keep a service alive on an idle queue will be unnecessarily costly. Even if the processing of said messages were to require substantial resources, the spin-up of those resources can be anchored on the Event Grid event trigger. The available queue messages are then processed, and the resources can again be spun down.
Event Hub will emit a Grid event when a Capture package has been dropped into an Azure Storage container, and this can trigger a function to process or move the package.
Second, Event Grid will allow subscribers to drop events into Service Bus queues or topics, and into Event Hubs.
If there’s an on-premises service in a hybrid cloud solution that is interested in specific files appearing in an Azure Storage container so that it can promptly download them, it can reach out to the cloud through NAT and Firewall and listen on a Service Bus queue that is subscribed to that event on Azure Storage.
The same model of routing a Grid event to a queue is applicable if reacting to the event is particularly costly in terms of time and/or resources, or if there’s a high risk of failure. The Event Grid will wait for, at most, 60 seconds for an event to be positively acknowledged. If there’s any chance that the processing will take longer, it’s better to turn to the Service Bus pull model that allows for processing to take much longer while maintaining a lock on the message.
Since many Event Grid events will also be interesting to be looked at statistically and projected over time, you can route them selectively into an Event Hub, and from there into the many different analytics channels that are Event Hub enabled, without writing any extra code. Event Hub Capture is also a great archive facility for Grid events through this easily configured path.
Summary
Azure Messaging provides a fleet of services that allows application builders to pick a fully-managed service that best fits their needs for a particular scenario. The services follow common principles and provide composability that doesn’t force developers into hard decisions choosing between the services. The core messaging fleet that consists of Event Hubs, Event Grid, Service Bus, and the Relay is complemented by further messaging-based or message-driven Azure services for more specific scenarios, such as Logic Apps, IoT Hub and Notification Hubs.
It’s quite common for a single application to rely on multiple messaging services in composition, and we hope that we could provide some orientation around which of the core services is most appropriate for each scenario.