Concurrency allows us to handle multiple tasks independently from each other. Goroutines are a simple way to process multiple tasks independently. In this post we progressively enhance a http handler which accepts files an explore various concurrency patterns in Go utilizing channels and the sync package.
Setup
Before getting into concurrency patterns letβs set the stage. Imagine we have a HTTP handler that accepts multiple files via form and processes the files in some way.
func processFile(file multipart.File) {
// do something with the file
fmt.Println("Processing file...")
time.Sleep(100 * time.Millisecond) // Simulating file processing time
}
func UploadHandler(w http.ResponseWriter, r *http.Request) {
// limit to 10mb
if err := r.ParseMultipartForm(10 << 20); err != nil {
http.Error(w, "Unable to parse form", http.StatusBadRequest)
return
}
// iterate through all files and process them sequentially
for _, file := range r.MultipartForm.File["files"] {
f, err := file.Open()
if err != nil {
http.Error(w, "Unable to read file", http.StatusInternalServerError)
return
}
processFile(f)
f.Close()
}
}
In the example above we receive files from a form and process them sequentially. If 10 files are uploaded it would take 1 second to complete the process and send a response to the client.
When handling many files this can become a bottleneck, however with Go's concurrency support we can easily solve this issue.
Wait Groups
To solve this we can process files concurrently. To spawn a new goroutine we can prefix a function call with the go
keyword e.g. go processFile(f)
. However as goroutines are non blocking the handler might return before the process is finished, leaving files possibly unprocessed or returning an incorrect state. To wait for the processing of all files we can utilize sync.WaitGroup
.
A WaitGroup
waits for a number of goroutines to finish, for each goroutine we spawn we additionally should increase the counter in the WaitGroup
this can be done with the Add
function. When a goroutine is finished Done
should be called so the counter is decreased by one. Before returning from the function Wait
should be called which is blocking until the counter of the WaitGroup
is 0.
func UploadHandler(w http.ResponseWriter, r *http.Request) {
if err := r.ParseMultipartForm(10 << 20); err != nil {
http.Error(w, "Unable to parse form", http.StatusBadRequest)
return
}
// create WaitGroup
var wg sync.WaitGroup
for _, file := range r.MultipartForm.File["files"] {
f, err := file.Open()
if err != nil {
http.Error(w, "Unable to read file", http.StatusInternalServerError)
return
}
wg.Add(1) // Add goroutine to the WaitGroup by incrementing the WaitGroup counter, this should be called before starting a goroutine
// Process file concurrently
go func(file multipart.File) {
defer wg.Done() // decrement the counter by calling Done, utilize defer to guarantee that Done is called.
defer file.Close()
processFile(f)
}(f)
}
// Wait for all goroutines to complete
wg.Wait()
fmt.Fprintln(w, "All files processed successfully!")
}
Now for each uploaded file a new goroutine is spawned this could overwhelm the system. One solution is to limit the number of spawned goroutines.
Limiting Concurrency With A Semaphore
A Semaphore is just a variable we can use to control access to common resources by multiple threads or in this case goroutines.
In Go we can utilize buffered channels to implement a semaphore.
Channels
Before getting into the implementation let's look at what channels are and the difference between buffered and unbuffered channels.
Channels are a pipe through which we can send and receive data to communicate safely between go routines.
Channels must be created with the make
function.
ch := make(chan int)
Channels have a special operator <-
which is utilized to send or read from a channel.
Having the operator point at the channel ch <- 1
sends data to the channel, if the arrow points away from the channel <-ch
the value will be received. Send and receive operations are blocking by default this means each operation will wait until the other side is ready.
The animation visualizes a producer sending the value 1 through an unbuffered channel and the consumer reading from the channel.
If the producer can send events faster than the consumer can handle then we have the option to utilize a buffered channel to queue up multiple messages without blocking the producer until the buffer is full. At the same time the consumer can handle the messages at its own pace.
ch := make(chan int, 2)
In this example the producer can send up to two items without blocking. When the capacity of the buffer is reached the producer will block until the consumer handled at least one message.
Back to the initial problem we want to limit the amount of goroutines processing files concurrently. To do this we can utilize buffered channels.
const maxConcurrentUploads = 5
func UploadHandler(w http.ResponseWriter, r *http.Request) {
if err := r.ParseMultipartForm(10 << 20); err != nil {
http.Error(w, "Unable to parse form", http.StatusBadRequest)
return
}
semaphore := make(chan struct{}, maxConcurrentUploads) // channel used as a semaphore
var wg sync.WaitGroup
for _, file := range r.MultipartForm.File["files"] {
f, err := file.Open()
if err != nil {
http.Error(w, "Unable to read file", http.StatusInternalServerError)
return
}
wg.Add(1)
go func(file multipart.File) {
defer wg.Done()
semaphore <- struct{}{} // Acquire slot by writing to the channel
processFile(file)
defer file.Close()
<-semaphore // Release slot by reading from the channel
}(f)
}
// Wait for all goroutines to complete
wg.Wait()
fmt.Fprintln(w, "All files processed successfully!")
}
In this example we added a buffered channel with a capacity of 5, this allows us to process 5 files concurrently and limit strain on the system.
But what if not all files are equal? We might can reliably predict that different file types or file size require more resources to process. In this case we can utilize a weighted semaphore.
Weighted Semaphore
Simply put with a weighted semaphore we can assign more resources to a single tasks. Go already provides an implementation for a weighted semaphore within the extend sync package.
const maxConcurrentUploads = 5
func UploadHandler(w http.ResponseWriter, r *http.Request) {
if err := r.ParseMultipartForm(10 << 20); err != nil {
http.Error(w, "Unable to parse form", http.StatusBadRequest)
return
}
// Create a semaphore with a max of maxConcurrentUploads slots
sem := semaphore.NewWeighted(int64(maxConcurrentUploads))
var wg sync.WaitGroup
for _, file := range r.MultipartForm.File["files"] {
f, err := file.Open()
if err != nil {
http.Error(w, "Unable to read file", http.StatusInternalServerError)
return
}
wg.Add(1)
go func(file multipart.File, filename string) {
defer wg.Done()
slotsNeeded := getSlotsForFile(filename)
// Acquire the semaphore with the needed slots
if err := sem.Acquire(r.Context(), int64(slotsNeeded)); err != nil {
http.Error(w, "Failed to acquire semaphore", http.StatusInternalServerError)
return
}
defer sem.Release(int64(slotsNeeded)) // Release the slots after processing
processFile(file)
defer file.Close()
}(f, file.Filename)
}
// Wait for all goroutines to finish
wg.Wait()
fmt.Fprintln(w, "All files processed successfully!")
}
func getSlotsForFile(filename string) int {
ext := strings.ToLower(getFileExtension(filename))
if ext == "pdf" {
return 2
}
return 1
}
func getFileExtension(filename string) string {
if idx := strings.LastIndex(filename, "."); idx != -1 {
return filename[idx+1:]
}
return ""
}
In this version we created a weighted semaphore with 5 slots, if only images are uploaded for example the process handles 5 images concurrently, however if a PDF is uploaded 2 slots are acquired, which would reduce amount of files which can be handled concurrently.
Conclusion
We explored a few concurrency patterns in Go, utilizing sync.WaitGroup
and semaphores to control the number of concurrent tasks. However there are more tools available, we could utilize channels to create a worker pool, add timeouts or use fan in/out pattern.
Additionally error handling is an important aspect which was mostly left out for simplicity.
One way to handle errors would be utilized channels to aggregate errors and handle them after all goroutines are done.
Go also provides a errgroup.Group
which is related to sync.WaitGroups
but adds handling of tasks which return errors.
The package can be found in the extend sync package.
Top comments (0)