Nikola Buhinicek for Productive

Posted on Dec 15, 2023 • Edited on Jan 17 • Originally published at productive.io

A Close Call with Real-Time: How Rethinking Pub-Sub Saved the Day

#api #rails #development #programming

All of this started while I was working on our new feature - Automations 🤖. In a nutshell, Automations allow customers to set up actions triggered under specific conditions. Some of the currently implemented actions are sending slack messages, creating and updating tasks, or posting comments on objects.

I wanted that actions that modify tasks and comments would send a message over our real-time system so that our frontend clients (browser, mobile, desktop app) could pick that up and show those changes as they occur.

Currently, our app’s real-time updates are tied to POST/PATCH/DELETE requests. We have a controller extension Extensions::Broadcastable that hooks on save_form and destroy_resource methods and sends a real-time event if the action was successful. However, this approach wasn't suitable for my automation actions, as they don't go thru controllers.

module Api::V2::Extensions::Broadcastable
  extend ActiveSupport::Concern

  private

  def save_form
    super && broadcast_event(action_name == 'create' ? 'created' : 'updated')
  end

  def destroy_resource
    super && broadcast_event('deleted')
  end

  ...
end

As I was digging into this topic, I somehow changed the scope of my task from making the automations actions “live” to revamping our whole broadcasting architecture. I wanted to move that logic out of controllers to a place where I could catch all the changes - which would also catch the automations actions.

📖 Sidestory: A Recent Development in Pub-Sub

Just recently, Stef implemented a Publish-Subscribe architecture in our Rails app. He made it while revamping our search feature. That was a pretty cool moment for the team and it sounded really useful. After the presentation about it, we immediately started thinking what of our current code could've been done with Pub-Sub. No one really revamped anything by the time I was working on this real-time topic.

🧐 Exploring Different Approaches

1. Callbacks

Yeah... altho we all know that callbacks are evil, I couldn’t at least think about them, just for a moment...

2. Forms instead of Controllers

Both the controller actions and my automation actions use the same forms to handle our data. So, why wouldn't I hook on forms and send my real-time messages from there.
This approach was bugging me a bit as we don't use Form objects in all the places we are actually changing our data. So this wouldn't make the whole app feel "live" but I would cover more places than what we have currently. That sounded promising to me.

Wanted to pitch my thoughts to the rest of the Core team. Thru that discussion, a lightbulb moment happened 💡

3. Embracing Pub/Sub

I was mind-blown 🤯 That is exactly what I wanted!!

Publishing events happens for every change. Ofc, except the places we are using methods that skip callbacks (update_column, update_all, ...) as Pub/Sub and the aspect of publishing changes essentially is hooked to callbacks - but that’s a topic for itself 🫠

Making a PoC

As with all big changes in our codebase, and generally in our product, I was putting this code behind a Feature Flag (FF).
Simply, when one would have the pubSubBroadcasting FF enabled I would skip sending the real-time events from the controller actions and I would handle the published events accordingly. If you didn't have that FF, nothing changed.

class Api::V2::Tenanted::TasksController < ApiController
  include Extenstensions::Broadcastable

  ...

  def should_broadcast?
    return false if FeatureFlags.enabled?('pub-sub-broadcasting')

    super
  end

  ...
end

Made a few Subscribers that would listen for all the task and comment related changes and simply handle them as sending a real-time event.

class Core::Tasks::Broadcaster < Realtime::Broadcaster
  PubSub.subscribe('task.upsert', self, :on_upsert)
  PubSub.subscribe('task.delete', self, :on_delete)
  PubSub.subscribe('comment.upsert', self, :on_comment_upsert)
  PubSub.subscribe('comment.delete', self, :on_comment_upsert)

  ...
end

And ofc, added RSpec tests and set up some widgets on New Relic to cover the difference in number of events that we are sending now - as we knew that we are going to send more events now.

Basically that was it. The next step was to slowly propagate this FF over all organizations, check our New Relic metrics and see if nothing breaks. Once we would release that change to all of our user base, we should cover the remaining objects and make subscribers for their Pub/Sub events too.

Showing off with it

As I was pretty proud of this solution, I kinda talked a lot about it and so naturally it popped up in a 1on1 meeting with my EM Lucin. He wanted to discuss that a bit more so I told him the same thing I wrote here. My vibe was like "Isn't that great? Our APP will be LIVE, all the data would be in sync."

His response was "But do we really want that?".

That wasn't the response that I was looking for 🙃

What I didn't know was that our frontend client, once it gets socket messages, depending on the screen the user is, has to make additional API calls so that all the required data could be fetched again. So, a great deal of my real-time events actually ends up generating additional requests to our server and in a way, we are just generating a lot more traffic (self DDoSing?). As I didn't know about this, I wasn't even paying attention to those metrics along the way.

One step forward, two steps back

Let's get better data so that we can make a better call on this.

I took a period of one week from our logs and checked the number of the POST/PATCH/DELETE requests versus the number of dispatched publish events. This in the end would roughly be the same number as the events we are sending over the socket in the current and in the new way.

Tasks endpoint
controller POST/PATCH/DELETE actions - 211k
task.upsert + task.delete publishes  - 263k
> thats ~25% more real-time events

Companies endpoint
controller POST/PATCH/DELETE actions      -  5.2k
company.upsert + company.delete publishes - 20.1k
> thats ~4 times more real-time events

Deals endpoint
controller POST/PATCH/DELETE actions -    31k
deal.upsert + deal.delete publishes  - 1_728k
> thats ~55 times more real-time events!!!

I wasn't really aware of how badly this could end up. I was making a PoC out of this for tasks and comments and you can see here that there wasn't such a big difference. 25% more events was okay, I knew it would be more.

But look at our deals endpoint for example - 55 times more events would be sent. That would add up to the traffic we sent over sockets, to our infrastructure bill for that services, and I don't want to imagine the number of API calls generated as a result of this - by ourselves...

Deals have that much traffic because a lot of other objects update financial data on deals so that was understandable too, it immediately came to us...

Back to the drawing board

1. Pub/Sub shouldn't be a bad call for this

As this is the part of our code that gets all the changes in our data, when wanting to make a frontend client to be as live as it gets, this should be a good call. The solution would be in not sending all the changes over sockets but filter them by relevant and not relevant. This way we would surely see a drop of those mad company and deal numbers.

2. Send all the needed data to front?

As mentioned before, each real-time message already contains the object that was changed. The issue here is that there are a lot of screens in our client and we can't know all the contexts our users are in and what additional data should be sent - which is exactly why our client makes API calls when receiving some socket messages.

3. Why didn't I just resolve the problem I had

To make those actions "live" in our frontend clients (browser, mobile, desktop APP), I wanted to plug them into our real-time system.
So, why didn't I just put a bit of explicit calls to the code of my automations actions where I would just call the class that sends the real-time message.
But no, I wanted to play smart and to fix a problem that wasn't really there in the first place - to make everything more live while no one was asking for it.

The Aftermath

So yeah, the Pub/Sub usage in our real-time part of the app is put on pause until we leverage things up.

I went on with the 3rd solution and added 5 lines of code - a call to the Broadcaster class is one line and I have 5 events over 4 classes to send...

This was a nice learning opportunity for me and I would say that:

When having concrete problems, stick to handling them and resolve them first
I had the data the whole time in New Relic, I should've prepare better
It's not bad to be explicit in code, not everything should be an abstraction, generalization, metaprogramming, ... Hope to write on this point a bit more soon

Good thing in this story is that we didn't actually do any damage with this and we didn't lose a lot of time.

Anyone faced similar problems? If yes, how did you deal with them?

Top comments (3)

HencoBurger • Dec 16 '23

You could have used something like NoLag.

There is nothing wrong with Pub/Sub. For example with NoLag, you can "query" your real-time data.

The Publisher will send data to a specific Topic and specify who can receive that data using NQL identifiers. The Subscriber on the other hand will then do a "reverse/query" using the same NQL identifiers.

This gives the developer fine grain controle over which user can receive what information. Eliminating the need to broadcast messages to all devices listening to a Topic and the DDoSing your systems.

Nikola Buhinicek • Dec 17 '23

Will check this out, thanks!

I agree with you on "There is nothing wrong with Pub/Sub" - that's what I said on the "Back to the drawing board" paragraph under "1. Pub/Sub shouldn't be a bad call for this".

HencoBurger • Dec 19 '23

Full disclosure, I am the founder of NoLag. Unfortunately we do not have a Ruby on Rails SDK yet.

We only have a TypeScript SDK, and we will have the C# SDK out soon.

If you do have a chance to have a look at NoLag, any feedback will be great!

DEV Community