DEV Community

Ogedi001
Ogedi001

Posted on

"From Logs to Metrics: My Journey in Building a Robust Monitoring System"

As I begin my journey with the HNG Internship Backend Track, I would love to share a personal experience that truly tested my problem-solving skills as a backend engineer. In this article, I’ll take you through how I tackled a challenging issue and the valuable lessons I learned along the way.

In this project, I was tasked to develop a centralized logging system for collecting logs from different applications using the Elastic Stack. This system is part of a thorough application monitoring solution that includes logs and metrics collection and visualization on Grafana. The system also incorporates Prometheus for metrics scraping and Loki and Promtail for metrics log aggregation.

One of the critical challenges was that these technologies were relatively new to me, and I did not have a deep understanding of them when the task was assigned. Compounding this issue and the fact that my team was not fluent in English, made effective communication difficult. Also, I had limited experience with Linux environments and configuring various service 'yml' files. Connecting Kibana to Elasticsearch for visualization, building custom metrics to meet specific application requirements, and meeting tight deadlines were other significant blockers I faced.

To find my way around these challenges, I resolved to use the following steps:

  1. Understanding Key Concepts:
    I began by thoroughly studying key concepts such as Elasticsearch, Kibana, Prometheus, and Loki. This involved reading official documentation and watching YouTube videos. Understanding the basics of logs and metrics, as well as what constitutes a centralized logging system, was crucial for laying a strong foundation.

  2. Learning Linux Basics:
    I spent time learning essential Linux commands and concepts, including installing software, managing permissions and ownership, using nano commands, and configuring services. I practiced these skills using the Windows Subsystem for Linux (WSL), which I had installed before being assigned the task.

  3. Splitting the Task Solution:
    To manage the complexity, I divided the task into two main components: metrics and a centralized logging service. This helped me focus on each aspect individually and ensure thorough implementation.

  4. Installing and Services Configuration:
    I started by installing and configuring the services needed to build a centralized logging system using the Elastic Stack. After setting up Elasticsearch, Logstash, and Kibana, I focused on configuring Prometheus, Loki, and Promtail for metrics and metrics log aggregation.

  5. Building the Logging Service API:
    After configuring and connecting Kibana to Elasticsearch and ensuring all services were running locally, I incorporated the Elasticsearch client API into a Node.js and TypeScript project. I built an endpoint that collects and stores logs in Elasticsearch. The API logic was straightforward: it checks if the index exists, creates a new log document if it does, or creates the index and then the log document if it doesn't.

  6. Building Custom Metrics:
    I added custom metrics to an existing project, including metrics like total HTTP requests, response time, system uptime and downtime, system availability, and Disk and CPU usage. This application exposes these metrics and I ensure Prometheus can scrape them by connecting the application server address to Prometheus through the Prometheus "yml" configuration file, ensuring that Prometheus can scrape the application's metrics endpoint and visualize these metrics effectively.

  7. Configuring Loggers :
    I configured the Winston logger to send logs directly to Loki. The logger was used as a method in my application

  8. Creating Grafana Dashboard:
    I installed Grafana locally, connected multiple data sources, and built custom dashboards for monitoring logs and metrics for my application.

  9. Documentation:
    Once the system was up and running and had been properly tested, I documented the entire setup process, including configurations and troubleshooting steps. I also created a training guide for my team to help them understand and maintain the system.

Throughout the process, I encountered several challenges, including:

  • Configuration And Integration Issues: Configuring in a system with limited memory was challenging and ensured seamless integration between different components of the Elastic Stack.

  • Team Communication: Overcoming language barriers within the team to ensure effective explanation and collaboration.

The implementation of the centralized logging and monitoring system resulted in:

  • Improved Visibility: Enhanced visibility into application performance and health through comprehensive logs and metrics.

  • Operational Efficiency: Streamlined operations and quicker response times to incidents due to real-time monitoring and alerting.

This experience taught me valuable lessons, including:

Importance of Documentation: Keeping thorough documentation is crucial for troubleshooting and onboarding new team members.

Continuous Learning: Adapting to new technologies requires a positive mindset and a proactive approach to learning and problem-solving.

Collaboration: Effective communication and collaboration are vital, especially in diverse teams.

Solving this backend challenge not only benefited our project but also fueled my passion for tackling complex technical problems. I am excited about the opportunity to contribute my skills and learn from industry experts through the HNG Internship. I look forward to the new challenges and opportunities that lie ahead in this internship.

If you're interested in exploring the code behind this project, you can find the repository on GitHub: Logging System, monitoring service

Top comments (0)