In the early days of my software engineering journey, I came across a library called Socket.io. You’ve probably heard of it. The idea that I could enable real-time communication directly from my Node.js applications was an exciting proposition. Like many developers, I immediately started thinking of all the ways I could use this technology to build innovative apps and services.
Like so many before me, I started with a chat app—who hasn’t? I built it with Backbone on the client side, hosted the backend on a tiny DigitalOcean droplet, and used MySQL as the persistent data store.
It had rooms, persistent messages, file uploads, and user avatars. I sat back, admiring my creation, wondering if this might be the next internet sensation. Spoiler: it definitely wasn’t.
Reality check...
I had no idea that scaling my "groundbreaking" application would be my first major challenge. It meant dealing with adaptors, load balancers, sticky sessions, and careful management of socket connection lifecycles.
After tackling the scaling challenges, the next hurdle was managing application state. WebSocket connections can be unreliable—if a connection drops, how do I ensure the user is reconnected to their rooms, providing a smooth experience even when the connection isn’t?
So, I managed to hack together a very basic scalable(ish) solution with application state persistence. But at this point, anyone with the URL could connect to my WebSocket server—it was basically a free-for-all! I realized I needed at least an application firewall, some rate limiting, and secure access, probably using token-based authentication.
This was a nightmare! I should have been adding new features to my chat app, not stuck building out backend infrastructure.
I eventually set the app aside and focused on a career in software engineering. But I kept getting drawn back to projects involving real-time functionality, where I’d always run into the same challenges.
Cut to now...
Recently, I started thinking about another project with some real-time features, so I looked into platforms that could help me get up and running quickly. Naturally, I came across the big names like Supabase, Firebase, and Appwrite. They all offer excellent support for real-time functionality, but the buy-in is huge. Running an entire platform on a BaaS offering just didn’t make sense for me. Plus, the costs could skyrocket, even with self-hosting, which comes with its own overhead. I needed something I could drop into my application to handle real-time functionality at scale while still fitting smoothly into my existing setup. Here’s what I was looking for:
- Secure connection authentication
- Seemless socket lifecycle management
- Reliable application state handling
- User presence tracking
- Guaranteed message delivery
- The ability to push messages from the backend to connected clients
- Ideally, a way to run everything locally during the development phase
I struggled to find a solution and started wondering if others were facing the same challenges. I needed a simple service that could handle the real-time functionality—without the extras or the hefty price tag. Suddenly, my original project idea took a backseat. Maybe I could use the lessons I’d learned throughout my career to build a platform that solved these problems once and for all.
After some research, my initial thought was to rekindle my relationship with Socket.io. It’s a superb library with plenty of great features, but it has some memory management issues at scale, and its extra features can bloat a system that needs to stay lean. I needed something that could handle anything I threw at it (within reason). The system had to deliver high performance and extremely low latency. That’s when I started tinkering with uWebSockets.js, and I was blown away by its performance benchmarks, even on modest hardware. It was a no-brainer—this would become the core of the system.
Up first...scalability
The first challenge was scalability—distributing messages across horizontally scaled servers without relying on sticky sessions. I needed a centralized broker to deliver messages at scale, and RabbitMQ with AMQP seemed like the perfect fit.
I would assign a set number of queues to be consumed by each server instance. Then, using a topic exchange, I could route messages based on room identifiers. However, having a binding for each room would result in high churn, especially in applications with many rooms. Consistent hashing solved this by allowing me to assign each room to a specific queue based on its hash value. After lots of testing and fine-tuning, I had a scalable message broker at the system’s core.
What about secure connection and socket lifecycle management?
Next up was socket lifecycle management and secure authentication. Auth is often a pain point in app development, but it’s crucial to get it right. To handle this, I built a robust, token-based system to authenticate connections before a WebSocket connection could be established.
The result was a feature-rich live auth system with MFA, OAuth, and built-in token lifecycle management. This system could even be used as an isomorphic authentication library in it’s own right, but most importantly, it was integrated into the WebSocket server’s connection handshake, ensuring only authenticated connections were upgraded.
Next, application state management
State management is crucial for any app, especially one using WebSockets. A lost connection needs to be able to re-establish its session seamlessly for the user. I didn’t want to slow down the blazing-fast socket server with costly asynchronous database calls, but I still needed persistence to handle reconnections.
Redis was the perfect solution—its speed for reads and writes, plus Lua scripting for atomic operations, made it an ideal fit. Each connection gets a unique ID, allowing its state to be restored during reconnection.
WOW!!
If you’ve made it this far, thank you! I know there’s a lot to unpack, and I’m not the most captivating writer, so I really appreciate your attention. Ready to keep going?
The Core Services
Next, I wanted to introduce some core services, starting with user presence tracking, message history for guaranteed delivery, metrics, and a webhook system to let third-party services subscribe to real-time events. Since these services need to scale independently as demand fluctuates, a distributed system made the most sense. The WebSocket server processes events as they happen, enqueuing named events that are routed to downstream handlers for persistence and dispatch. Given that I was already using Redis for state management, BullMQ was the perfect fit to handle this. The five core services are…
Each service runs its own BullMQ worker, allowing it to scale independently based on system load. By deploying to a container orchestration service like ECS, I can automate scaling as demand grows, while ensuring that at least one container instance is always running.
With these five services forming the backbone of the system, things were really starting to take shape. The next step was to provide a way for developers and users to interact with the system—now called RelayBox.
Ok, that's great, but how to interact with the system?
The platform needed to be accessible from both the browser and the server. To enable this, I wrote two libraries: a client SDK and a REST SDK to separate concern. The client SDK handles WebSocket connections, managing rooms, and interactions with the authentication service. The REST SDK issues tokens (for self-authentication) and publishes server-side events to connected clients. My goal was to abstract as much functionality as possible into simple APIs, making it easy to build real-time apps on RelayBox.
For the client library, I chose an event emitter pattern, similar to Socket.io, but with added functionality for socket lifecycle management, rooms, authentication, metrics, and more.
The REST SDK provides wrappers for token and request signature verification, along with the ability to publish events to clients connected to your application. This allows real-time events to be emitted from server-side Node applications. For example, a database interaction or file upload could trigger an event for clients subscribed to a specific table or directory. The possibilities are endless!
At this stage, the core of the platform is nearly complete. From the outset, my goal was to make real-time functionality accessible to developers without forcing them to rely on managed cloud infrastructure right away. I wanted developers to be able to experiment, prototype, and build locally until their application was ready for production.
Working Offline
Given the modular design and containerized structure of the platform, making this possible was straightforward. I decided to create a simple CLI toolkit that developers can use to spin up the entire platform locally with a single command, thanks to Docker Compose. NGINX handles the role of proxy for service discovery within the local network, meaning developers only need to connect to a single exposed endpoint.
And when your application is ready to go live? No problem—just use the dashboard to generate an API key and seamlessly switch to the managed cloud infrastructure. This allows your app to scale globally with just a few clicks, giving you the flexibility to develop locally and deploy at scale when you’re ready. And, for those who prefer self-hosting, the open-source model remains flexible, offering full support for that approach too.
Conclusion
Throughout this article, I’ve highlighted some of the core features of RelayBox, but there’s so much more to explore. From its flexible architecture that supports both local development and cloud-scale deployments, to the robust security features like token-based authentication and user presence tracking, RelayBox is designed to make real-time application development not only powerful but also easy to manage.
Beyond the core services, there are numerous possibilities for customization and scaling to fit the unique needs of your project. Whether you're building a chat app, a live dashboard, or anything that relies on real-time data, RelayBox can handle the load, offering performance that’s unmatched at scale.
Plus, the platform is completely open-source, meaning you have full transparency and the ability to contribute, modify, or extend any other services however you see fit. It’s built with developers in mind, so you can focus on building features for your users, not managing infrastructure.
There’s a lot more under the hood, and I can’t wait for you to dig in and see what RelayBox can do for your next project.
What’s Next?
I’m still exploring the next steps for RelayBox, but I’m leaning towards integrating a serverless function invocation environment. This would allow specific events to trigger workflows as they happen, adding even more flexibility and power to the platform. Imagine automating complex tasks in response to real-time events without managing additional infrastructure—that’s the kind of functionality I’m excited about.
What do you think? I’d love to hear your feedback and ideas on where to take this next!
If you’re excited about simplifying real-time app development as much as I am, give RelayBox a try! It’s open-source and designed to be easy to integrate into your existing projects, whether you're building locally or deploying at scale. Head over to GitHub to get started, or check out the documentation for more details on how to spin it up for your next project.
Top comments (0)