Hi, my name is Walid, a backend developer who’s currently learning Go and sharing my journey by writing about it along the way.
Resource :
-The Go Programming Language book by Alan A. A. Donovan & Brian W. Kern ighan
- Matt Holiday go course ## Introduction
Concurrency is one of the standout features of Go, allowing developers to write highly efficient, parallelized programs with minimal complexity. In this article, we explore key lessons from a practical Go program that scans directories, hashes files using MD5, and identifies duplicate files. We'll discuss Go’s concurrency model, channels, synchronization, and best practices for managing goroutines efficiently. At the end, we provide a full example demonstrating these concepts in action.
Go’s Concurrency Model
Go’s concurrency model revolves around goroutines and channels. A goroutine is a lightweight thread managed by the Go runtime, and channels facilitate safe communication between goroutines.
Channels as Messaging and Synchronization Tools
Channels in Go serve two primary purposes:
- Messaging Tool: They allow goroutines to communicate by passing values safely between them.
- Synchronization Tool: Channels can be used to control execution flow and coordinate goroutines, ensuring proper sequencing and avoiding race conditions.
Unidirectional Channels
By default, channels allow bidirectional communication, but Go also supports unidirectional channels, which can only send or receive data:
func sendData(ch chan<- int) { // Only send
ch <- 42
}
func receiveData(ch <-chan int) { // Only receive
fmt.Println(<-ch)
}
Unidirectional channels help enforce safe communication patterns and make the code easier to reason about.
Deadlocks in Go
A deadlock occurs when all goroutines are waiting on a channel operation that will never happen, effectively causing the program to freeze. For example:
ch := make(chan int)
ch <- 1 // Deadlock: no goroutine is receiving
Go detects deadlocks at runtime and will panic if no goroutines are executing.
Unbuffered and Buffered Channels
Unbuffered Channels (Rendezvous Model)
An unbuffered channel requires a sender and a receiver to be ready at the same time, making it a synchronization mechanism. This ensures data is passed immediately without any intermediate storage.
ch := make(chan int) // Unbuffered
go func() {
ch <- 42 // This will block until a receiver is ready
}()
fmt.Println(<-ch) // Receives immediately
Buffered Channels
A buffered channel allows sending multiple values without requiring an immediate receiver. This can improve performance but also introduce subtle race conditions if not managed carefully.
ch := make(chan int, 3) // Buffered with capacity 3
ch <- 1
ch <- 2
ch <- 3
// Sending a 4th value without a receiver will block
Buffered channels can hide race conditions by delaying the need for a receiver. If the buffer is large enough, senders may not block immediately, leading to unpredictable behavior in concurrent programs.
Counting Semaphore with Buffered Channels
A counting semaphore is a concurrency control pattern used to limit the number of simultaneous operations. A buffered channel can act as a semaphore to control resource usage efficiently.
sem := make(chan struct{}, 5) // Limit to 5 concurrent operations
for i := 0; i < 10; i++ {
sem <- struct{}{} // Acquire a slot
go func(i int) {
defer func() { <-sem }() // Release slot after completion
fmt.Println("Processing", i)
}(i)
}
Managing Goroutines When Performing I/O
When performing I/O operations such as file reads/writes or network requests, we must limit the number of goroutines to prevent overwhelming system resources. This is achieved using semaphores or worker pools.
Example:
workers := 4 * runtime.GOMAXPROCS(0) // Limit based on CPU cores
limits := make(chan bool, workers)
for _, file := range fileList {
limits <- true // Acquire slot
go func(file string) {
defer func() { <-limits }() // Release slot after completion
processFile(file)
}(file)
}
This prevents excessive goroutines from causing resource exhaustion.
Full Example: File Hashing with Concurrency in Go
Below is the complete example demonstrating efficient file processing using concurrency:
package main
import (
"crypto/md5"
"fmt"
"io"
"log"
"os"
"path/filepath"
"runtime"
"sync"
)
type pair struct {
hash string
path string
}
type fileList []string
type results map[string]fileList
func hashFile(path string) pair {
file, err := os.Open(path)
if err != nil {
log.Fatal(err)
}
defer file.Close()
hash := md5.New()
if _, err := io.Copy(hash, file); err != nil {
log.Fatal(err)
}
return pair{fmt.Sprintf("%x", hash.Sum(nil)), path}
}
func processFile(path string, pairs chan<- pair, wg *sync.WaitGroup, limits chan bool) {
defer wg.Done()
limits <- true
defer func() { <-limits }()
pairs <- hashFile(path)
}
func collectHashes(pairs <-chan pair, result chan<- results) {
hashes := make(results)
for p := range pairs {
hashes[p.hash] = append(hashes[p.hash], p.path)
}
result <- hashes
}
func searchTree(dir string, pairs chan<- pair, wg *sync.WaitGroup, limits chan bool) error {
defer wg.Done()
visit := func(p string, fi os.FileInfo, err error) error {
if err != nil && err != os.ErrNotExist {
return err
}
if fi.Mode().IsDir() && p != dir {
wg.Add(1)
go searchTree(p, pairs, wg, limits)
return filepath.SkipDir
}
if fi.Mode().IsRegular() && fi.Size() > 0 {
wg.Add(1)
go processFile(p, pairs, wg, limits)
}
return nil
}
limits <- true
defer func() { <-limits }()
return filepath.Walk(dir, visit)
}
func run(dir string) results {
workers := 4 * runtime.GOMAXPROCS(0)
limits := make(chan bool, workers)
pairs := make(chan pair)
result := make(chan results)
wg := new(sync.WaitGroup)
go collectHashes(pairs, result)
wg.Add(1)
err := searchTree(dir, pairs, wg, limits)
if err != nil {
log.Fatal(err)
}
wg.Wait()
close(pairs)
return <-result
}
func main() {
if len(os.Args) < 2 {
log.Fatal("Missing parameter, provide dir name!")
}
if hashes := run(os.Args[1]); hashes != nil {
for hash, files := range hashes {
if len(files) > 1 {
fmt.Println(hash[len(hash)-7:], len(files))
for _, file := range files {
fmt.Println(" ", file)
}
}
}
}
}
Conclusion
Go’s concurrency model, when used properly, allows highly efficient parallel execution. Understanding channels, deadlocks, synchronization, and goroutine management is key to writing performant and safe Go programs.