Monitoring uptime statuses can be a complicated task, and SaaS services typically target enterprise users and have enterprise pricing. For small team and personal projects, uptime monitoring is often overlooked. Observability
Still, uptime monitoring is a valuable tool for incident response and tracking problems. If your servers go down at 2:00 AM on Saturday morning, it might be hard to catch unless you're a night owl like me. By the time you find out your critical service isn't functioning, it may be Monday or whenever customers or clients start sending support requests. That's no fun for anyone and makes it harder to gain visibility into what is broken.
That's where uptime monitoring tools come into play. A health check is run every few seconds or minutes to see if the server or endpoint is behaving as expected. The history of these checks and metrics on response speed is logged in a database for future review. If there's a problem, alarms or notifications can be triggered to ensure the right people know there's a problem as soon as it happens.
Tricities Media Group, my consultancy, is currently a single-person business, so I don't have a large team (or large budget) to configure enterprise monitoring solutions. But my client and my personal projects can still benefit from a comprehensive status monitoring system.
My Solution
I decided upon a tool called Uptime Kuma, which is an open-source self-hosted monitoring tool that can be run with Docker. I'm using Portainer on my Synology NAS to host the container and Cloudflare Zero Trust's Tunnels tool to securely expose it to the internet. This combo works great by using existing hardware that I already have running and allows for easy administration and maintenance.
Overall, I'm super happy with how it's working and plan to expand my use of this tool in the future. This tool is open-source, MIT-licensed, and community-supported, so while there are a few potential cons, they are opportunities for improvement rather than complaints.
π‘This tutorial by Marius Hosting is an excellent resource for installing Uptime Kuma on Synologys.
Pros of Uptime Kuma
- So. many. notification. options. This screenshot only shows about a quarter of the options. From Discord and Slack to Twilio and PagerTree, there are few notification tools that I can think of that don't exist. If, for some odd reason, the one you use isn't there, it also supports Webhooks with a custom body and headers, letting you send a request pretty much anywhere.
- There are lots of options for customizing health check requests. You can change the HTTP method, encoding, body, headers, and specify authentication (basic auth, OAuth2, NTLM, and mTLS). With that many options, there's few things you can't do. You could even use it for HTTP CRONs.
- A variety of check types. In addition to standard HTTP request and ping checks, there's also support for TCP, DNS, and other Docker containers. Many of these options support response validation to check if expected results are returned. One handy feature is a push URL that your application calls every x seconds. On top of that, there's support for a variety of database and server types, such as Postgres, MySQL, Radius, and MQTT.
- The biggest feature for me is integration with Cloudflare Tunnels for reverse proxy. It's as easy as pasting in the tunnel token.
- Status pages let you create custom statuses with groups of services you select. This allows for creating a status page for each client, for each application, or one for all of your applications. The sky is the limit. In addition, you can pin custom error or warning messages to highlight incident statuses as you correct any issues.
- Custom domains for status pages. You can create custom domains for your status pages allowing for a single instance of Uptime Kuma to handle a variety of products and services.
- If you run the tool on-prem like I'm doing in your home or office you can setup checks for systems on your local network. This let's you ping computers, security cameras, or anything really to see if they're running and awake. This opens a lot of options and the potential for using statuses as triggers for automation. For example, if all of the office computers are offline and it's after hours, schedule the lights to turn off in 5 minutes. Or if the smart refrigerator stops responding to health checks it may be worth checking in on. Just an idea. π€
Cons of Uptime Kuma
- The base page of the app directs to login rather than a status page. For example, when a user visits status.example.com you would expect to be taken to a status page, but instead, you're taken to the status page admin login. A great change would be the ability to set a default status page to live at the app's base. A potential workaround for this is to host your app at a different URL and use custom domains for your status pages. This one might just be user error, but I'd appreciate the option either way.
- Creating an incident isn't intuitive, especially at first. To create an incident message/update, you have to go to the status page's editor as if you were changing the page's title or style. A dedicated page within the main dashboard would be more intuitive and feel like a more seamless solution.
- Lack of REST HTTP API. An API would superpower the capabilities of this project and make things like incidents much more programmable. Being able to send updates directly from an incident management system or form would help make automation much more seamless.
- Automation integrations. This is probably something held back by not having an API, but it would be great to see integrations with tools like n8n or Zapier to trigger automation outside of the existing notification destinations. This can kinda be done using the webhooks feature, but still, there's an opportunity to let automation tools do things like set maintenance windows and incident messages.
Resiliency and Disaster Readiness
To wrap up, I have to add a quick note about resiliency. Many companies have made the mistake of hosting their uptime monitoring and error visibility tooling with the same infrastructure dependencies as the services they monitor. It's easy to see how this can go wrong. If there's a major problem and your systems go down, it's likely your uptime monitor will go down as well, making it useless.
For my purposes, I'm self-hosting it locally and using Cloudflare as the tunneling service. I, of course, do not host any of my services or sites locally, so as far as ISP, server, power, and network resiliency go, I should be fairly good. But I do use Cloudflare extensively. If Cloudflare's DNS or Tunneling went down, my entire infrastructure would be taken out. There are always going to be risks, but it's important to identify those risks and consider them when planning a solution. For my purposes, that's within my acceptable tolerance of risk, so I'm not going to spend the resources required to go a different route. If you're a large company or a dependency in your infrastructure is less reliable or more critical, I would suggest getting as much separation as possible between your core application services and your uptime solution. (read: Host on a different continent, use a different DNS, a different domain name registrar, and use an alternative payment method to pay for it)
If you enjoyed this post or want to chat directly, let's connect on LinkedIn, X/Twitter, or contact me about business opportunities at tricitiesmediagroup.com/contact.
Top comments (0)