Webhooks are a supplement to many APIs. With a webhook system in place, System B can register to receive notifications about certain changes to System A. When a change occurs, System A pushes the change to System B, usually in the form of making an HTTP POST request.
Webhooks are intended to eliminate or reduce the need to constantly poll for data. But in my experience, webhooks come with a few challenges.
In general, you can't rely on webhooks alone to keep two systems consistent. Every integration I've ever worked on has realized this fact by eventually augmenting webhooks with polling. This is due to a few problem areas.
First, there are risks when you go down. Yes, senders typically retry undelivered webhooks with some exponential back-off. But the guarantees are often loose or unclear. And the last thing your system probably needs after recovering from a disaster is a deluge of backed-up webhooks to handle.
Second, webhooks are ephemeral. They are too easy to mishandle or lose. If you realize after deploying a code change that you fat-fingered a JSON field and are inserting null
s into your database, there is no way to play the webhooks back. Or, you might handle part of the webhook processing pipeline out-of-band with the webhook request – like the database insert. But then you risk that failing and you losing the webhook.
To mitigate both of these issues, many developers end up buffering webhooks onto a message bus system like Kafka, which feels like a cumbersome compromise.
Consider the architecture for a sophisticated webhook pipeline between two parties:
We have two message buses, one on the sending end and one on the receiving end. The complexity is apparent and the stages where things can go wrong are many. For example: On the receiving end, even if your system is tight you're still subject to sender deliverability failure. If the sender's queue starts to experience back-pressure, webhook events will be delayed, and it may be very difficult for you to know that this slippage is occurring.
Adding to the complexity, the security layer between the two is usually some HTTP request signing protocol, like HMAC. This is robust and alleviates managing a secret. But it's also far less familiar to your average developer and therefore more prone to headache and error. (HTTP request signing and verification is one of those tasks I feel one does just infrequently enough to never fully commit to memory.)
So, not only do webhooks leave you open to eventual inconsistency, they're also a lot more work for everybody.
What else can we use to keep two systems in sync, then?
/events
endpointFor inspiration on keeping two data sets in harmony, we need look no further than databases. Consider Postgres' replication slots: you create a replication slot for each follower database, and the followers subscribe to that replication slot for updates.
The two key components are:
If the follower goes down, when it comes back it can page through the history at its leisure. There is no queue, nor workers on each end trying to pass events along as a bucket brigade.
APIs can follow from this model as well. Take Stripe. They have an /events
endpoint that contains all creates, updates, and deletes to a Stripe account over the last 30 days. Each event object contains the full payload of the entity that was acted upon. Here's an example of an event for a subscription
object:
"id": "evt_1J7rE6DXGuvRIWUJM7m6q5ds","object": "event","created": 1625012666,"data": {"object": {"id": "sub_JgFEscIjO0YEHN","object": "subscription","canceled_at": 1625012666,"customer": "cus_Jff7uEN4dVIeMQ","items": {"object": "list","data": [// ...],"url": "/v1/subscription_items?subscription=sub_JgFEscIjO0YEHN""start_date": 1623826800,"status": "canceled","type": "customer.subscription.deleted"