As a developer working with the tonutils-go library to interact with the TON blockchain, I’ve spent a considerable amount of time building various side projects. Over time, I noticed a troubling pattern: the service I had built using the Echo framework was consuming resources more and more, even as I made minor tweaks and optimizations.
Time to Investigate: Detective Mode ON 🕵🏻♂️
I started my investigation by trying different strategies, making small adjustments to pinpoint the root cause. Was there a goroutine leak? I rarely used goroutines in my code, except in one small section. Could the issue lie in my queries? Perhaps GORM was messing something up?
I was stumped. So, I decided to dive deeper into debugging using pprof, the profiling tool from Google, to get to the bottom of it.
It didn’t take long before I uncovered something unexpected: a goroutine leak. But how could that be? I wasn’t even using goroutines directly!
I dug deeper and deeper, and eventually, I stumbled upon the culprit: time.NewTimer().
Wait… what? I wasn’t using any timers in my code! After spending more time researching, I realized that the problem wasn’t with my own code — it was stemming from the third-party libraries I was using. While it was a relief to know I wasn’t at fault, I couldn’t help but feel a little frustrated. How could I fix an issue with something I had no control over?
The Root Cause: Leaks from time.After()
Upon further investigation, I discovered that this was a well-known issue in Go: resources created by time.After() are never garbage collected. This can result in memory leaks, particularly when the timers are not stopped or handled properly.
select {
case <-time.After(time.Second):
// do something after 1 second.
case <-ctx.Done():
// do something when context is finished.
// resources created by the time.After() will not be garbage collected
}
Thankfully, I found a recommended solution from other developers. They suggested using a time.NewTimer() with a context to ensure that the timer is properly managed and cleaned up:
delay := time.NewTimer(time.Second)
select {
case <-delay.C:
// Do something after one second.
case <-ctx.Done():
// Do something when the context is finished and stop the timer.
if !delay.Stop() {
// If the timer has been stopped, read from the channel.
<-delay.C
}
}
A Step Toward a Solution: Contributing to tonutils-go
Determined to resolve the issue, I submitted a pull request to the tonutils-go repository. To my relief, the project author reviewed and accepted the fix! You can find the pull request here: PR #297 on tonutils-go.
Does the New Golang Version Fix It?
Interestingly, I tested the fix with a newer version of Go, hoping it would address the issue, but I found that the behavior remained unchanged. It seems that, despite some updates, this problem persists in certain cases. You can check it here.
Exposing pprof for Production Monitoring and Visualizing Memory Leaks
Since identifying and resolving the issue required monitoring the application over time, I decided to expose it on NGINX to allow access to pprof. This would enable me to keep track of memory usage and any potential leaks that might arise during actual production traffic. It should not be exposed always, just for debugging.
I set up the pprof endpoint on my server and then started watching the application under real-world load. I knew that some leaks could take time to manifest, so it was important to observe the app during normal production requests.
To visualize the performance and memory usage, I installed Graphviz on my Mac using Homebrew:
brew install graphviz
With Graphviz installed, I could generate helpful visualizations of the memory allocations and goroutine activity, which greatly aided in pinpointing where resources were being consumed and not released.
Next, I used go tool pprof to capture and analyze the memory and goroutine data. Here’s how I did it:
For memory allocations:
go tool pprof -http=:8080 https://production.com/debug/pprof/allocs
And for goroutine profiling:
go tool pprof -http=:8080 https://production.com/debug/pprof/goroutine
By running these commands, I was able to open up a web interface that allowed me to visually explore the application’s memory allocation and goroutine states. This gave me valuable insights into where things were going wrong and where I could optimize further.
By exposing pprof in this manner, I was able to monitor production traffic and track down leaks that would’ve otherwise been hard to catch in a local environment. It’s a simple yet effective technique for diagnosing subtle performance issues that only appear under real-world conditions.
Before:
After more monitoring, I realized there are more of them and I should find and fix them! But I hope It could help you to learn more about this issue.
Example codes are from this blog.
My Github: https://Github.com/iw4p
Top comments (0)