In the rush to get to market, it's easy to overlook small details that turn out to not be small. When we launched Vectopus.com, our multi-vendor vector image marketplace, we omitted WebP image support. To be honest, as the technical lead / architect - this was on me. I was so focused on getting the site up and running, I was not thinking about the full range of performance issues. I considered the performance of the API code, EC2 instance sizes and configuration, ALBs, CDN, etc., but did not include the full breadth of things that impact performance and SEO.

Once we launched, however, it became clear that WebP support was crucial for faster page loads, better SEO, and better user experience. This meant retroactively converting 500,000+ SVG files to WebP — a sizable task that needed efficiency, scalability, and accuracy.

Fixing the problem for future uploads was simple: we added a step to our image ingestion pipeline to include WebP images. This is the part where I'm kicking myself because it was an easy problem to avoid. Hindsight is 20/20, right?

Our Image Ingestion Pipeline

Our images are stored in S3 with a unique prefix for each vendor, with nested prefixes for family and set. SVG images in a private, protected bucket, and PNG previews in a public bucket that acts as the origin for our CloudFront CDN. The image metadata and object key are stored in a Postgres database table cross-referenced with the family, set, and item via polymorphic associations.

The ingestion pipeline is comprised of a couple of SQS queues and lambda functions that handle the image conversion and storage. I covered the details of the process in the first post in this series A Multi-Vendor Marketplace Architecture on AWS Cloud Services.

CLI Solution

The solution was to build a command-line interface (CLI) tool to handle the batch conversion. The CLI would:

  1. Download SVG files from S3.
  2. Convert them to PNG.
  3. Optionally watermark the PNGs.
  4. Convert the PNGs to WebP.
  5. Upload the WebP files back to S3.
  6. Enter the new image metadata into the database.
  7. Delete the temporary files.
  8. Repeat 500,000 times.

Single-threaded vs. Multi-threaded

500,000 images is a lot of files to process. To put it in perspective: there are 86,400 seconds in a day. So, if each image takes just 1 second to process, it would take a total of 5.8 days to convert all 500,000 images. Clearly, relying on a single-threaded process was not going to cut it.

How Node.js, Python, and Go Concurrency Models Work

The three languages I have used the most are Node.js, Python, and Go. I use Node.js and Go every day, and I coded in Python full-time at Iconfinder for three years. This image conversion task was a great opportunity to compare the way each language handles concurrency.

Node.js Concurrency Model

Node.js is single-threaded by default and uses an event-driven, non-blocking I/O model. This means it can handle many connections simultaneously without creating a new thread for each connection. Tasks are sent to an event queue, and the Node.js runtime processes them one-at-a-time. This is great for I/O-bound tasks like web servers, but it can lead to performance bottlenecks for CPU-bound tasks that require heavy computation. For example, if you have a CPU-bound task that takes 10 seconds to complete, and you have 10 tasks, Node.js will take 100 seconds in total to process them. This is not ideal for batch processing tasks like image conversion.

Worker threads Example in Node.js

const { Worker, isMainThread, parentPort } = require('worker_threads');

function downloadImage(file) {
  return `Downloaded image: ${file}`;
}

function processImage(file) {
  return `Processed image: ${file}`;
}

function uploadImage(file) {
  return `Uploaded image: ${file}`;
}

if (isMainThread) {
  const images = ['image1.svg', 'image2.svg', 'image3.svg'];

  const downloadQueue = [];
  const processQueue = [];
  const uploadQueue = [];
  const resultChan = [];

  images.forEach(image => {
    // Worker to download
    const downloadWorker = new Worker(__filename);
    downloadWorker.on('message', (msg) => resultChan.push(msg));
    downloadWorker.postMessage({ type: 'download', image });

    // Worker to process
    const processWorker = new Worker(__filename);
    processWorker.on('message', (msg) => resultChan.push(msg));
    processWorker.postMessage({ type: 'process', image });

    // Worker to upload
    const uploadWorker = new Worker(__filename);
    uploadWorker.on('message', (msg) => resultChan.push(msg));
    uploadWorker.postMessage({ type: 'upload', image });
  });

} else {
  parentPort.on('message', (message) => {
    let result;
    switch (message.type) {
      case 'download':
        result = downloadImage(message.image);
        break;
      case 'process':
        result = processImage(message.image);
        break;
      case 'upload':
        result = uploadImage(message.image);
        break;
    }
    parentPort.postMessage(result);  // Send result back
  });
}

In Node.js v10.5 and later, the worker_threads module allows you to create true parallel threads. This enables you to run CPU-bound tasks in parallel. Each worker thread operates in its own V8 instance, meaning it can run JavaScript independently in its own memory space. However, this comes with some trade-offs: it increases memory usage and adds complexity in managing communication between threads. While you can offload tasks to workers, you must handle message passing and error management between the main thread and the workers yourself. This added complexity can lead to more potential bugs, but it does provide a way to implement parallelism in Node.js.

Python Concurrency Model

Python uses a Global Interpreter Lock (GIL) that prevents multiple threads from executing Python bytecode simultaneously. This means that when using threading, only one thread can execute at a time. However, similar to Node’s worker_threads, Python offers the multiprocessing module, which allows you to create separate processes, each with its own Python interpreter and memory space. This allows you to bypass the GIL and take full advantage of multiple CPU cores.

That said, using multiprocessing comes at the cost of increased memory usage and the complexity of managing inter-process communication (IPC), especially in apps that require sharing state or data between processes. For simple tasks like batch processing images, IPC is often unnecessary, as the main process can communicate directly with the worker processes.

One caveat with Python’s multiprocessing is that it’s limited by the number of CPU cores on the machine. Older computers with fewer cores may not be able to fully leverage multiprocessing.

On the positive side, I love how concise and readable Python code is—This is one of the truly beautiful things about Python, in my opinion.

Multiprocessing example in Python

import multiprocessing

def download_image(file):
    return f"Downloaded image: {file}"

def process_image(file):
    return f"Processed image: {file}"

def upload_image(file):
    return f"Uploaded image: {file}"

if __name__ == '__main__':
    images = ['image1.svg', 'image2.svg', 'image3.svg']

    # Create a pool of workers
    with multiprocessing.Pool(processes=3) as pool:
        download_results = pool.map(download_image, images)
        process_results = pool.map(process_image, images)
        upload_results = pool.map(upload_image, images)

    # Collect and print results
    for result in download_results + process_results + upload_results:
        print(result)

Go Concurrency Model

Go is designed for concurrency from the ground up. It uses goroutines, which are lightweight threads managed by the Go runtime, and channels, which are used for communication between goroutines. Goroutines are much lighter than OS threads, and the Go runtime can schedule thousands of them on a single OS thread. This allows Go to handle a large number of concurrent tasks with minimal overhead. The Go runtime manages the scheduling and resource allocation for goroutines, making it easy to scale the number of concurrent tasks without worrying about the underlying implementation details. This is one area where Go really stands out.

The batch conversion of hundreds-of-thousands of SVG images to WebP is a CPU-bound task, and while Node.js and Python can handle it, they are not only a lot slower, but they require a lot more setup and orchestration, and create more opportunities for bugs - something I would like to avoid.

Goroutines Example in Go

package main

import (
    "fmt"
    "log"
    "sync"
    "time"
)

type ImageFile struct {
    Path string
}

func downloadImage(file ImageFile) string {
    // Simulate downloading the image
    time.Sleep(1 * time.Second)
    return fmt.Sprintf("Downloaded image: %s", file.Path)
}

func processImage(file ImageFile) string {
    // Simulate processing the image
    time.Sleep(2 * time.Second)
    return fmt.Sprintf("Processed image: %s", file.Path)
}

func uploadImage(file ImageFile) string {
    // Simulate uploading the image
    time.Sleep(1 * time.Second)
    return fmt.Sprintf("Uploaded image: %s", file.Path)
}

// Worker function for download
func downloadWorker(id int, downloadQueue <-chan ImageFile, resultChan chan<- string, wg *sync.WaitGroup) {
    defer wg.Done()
    for file := range downloadQueue {
        resultChan <- downloadImage(file)
    }
}

// Worker function for process
func processWorker(id int, processQueue <-chan ImageFile, resultChan chan<- string, wg *sync.WaitGroup) {
    defer wg.Done()
    for file := range processQueue {
        resultChan <- processImage(file)
    }
}

// Worker function for upload
func uploadWorker(id int, uploadQueue <-chan ImageFile, resultChan chan<- string, wg *sync.WaitGroup) {
    defer wg.Done()
    for file := range uploadQueue {
        resultChan <- uploadImage(file)
    }
}

func main() {
    images := []ImageFile{
        {"image1.svg"},
        {"image2.svg"},
        {"image3.svg"},
    }

    var wg sync.WaitGroup

    // Channels for task queues and results
    downloadQueue := make(chan ImageFile, len(images))
    processQueue := make(chan ImageFile, len(images))
    uploadQueue := make(chan ImageFile, len(images))
    resultChan := make(chan string, len(images))

    // Start workers
    wg.Add(3) // One group for each type of worker
    go downloadWorker(1, downloadQueue, resultChan, &wg)
    go processWorker(2, processQueue, resultChan, &wg)
    go uploadWorker(3, uploadQueue, resultChan, &wg)

    // Queue up images for downloading
    go func() {
        for _, img := range images {
            downloadQueue <- img
        }
        close(downloadQueue)
    }()

    // Queue up images for processing after downloading
    go func() {
        for file := range downloadQueue {
            processQueue <- file
        }
        close(processQueue)
    }()

    // Queue up images for uploading after processing
    go func() {
        for file := range processQueue {
            uploadQueue <- file
        }
        close(uploadQueue)
    }()

    // Wait for workers to finish and collect results
    go func() {
        wg.Wait()
        close(resultChan)
    }()

    // Output results
    for result := range resultChan {
        fmt.Println(result)
    }
}

Comparison Summary

Feature Python Go Node.js
Concurrency model Threads (limited by GIL) Goroutines (no GIL) Event Loop (single-threaded)
Parallel CPU usage Not with threads Yes, across all cores Not natively, needs worker threads
True parallelism Via multiprocessing With goroutines Limited to I/O tasks; needs workers for CPU tasks
Threading overhead High (processes) Low (goroutines) Low (event loop, no threads by default)
Scalability Limited Extremely high Moderate (with worker threads or clustering)

The only way to truly determine which language performs best is to run identical tests in each. However, due to time constraints, I had to make an educated guess. I chose Go for the CLI tool because of its efficient concurrency model, where you can run as many goroutines as needed with minimal overhead. In contrast, Node requires a new V8 instance for each worker thread, leading to higher memory consumption. Python’s multiprocessing spawns full processes with their own memory space, resulting in even more memory usage and system load. Go minimizes these issues, making it far more scalable and efficient for CPU-bound tasks, even when processing large numbers of tasks concurrently.

For a baseline, I ran an initial test with a single-threaded Go program, which took 7 minutes and 17 seconds to process 4,500 images. Extrapolating that to 500,000 images, it would take approximately 13 hours—a solid benchmark for demonstrating the performance benefits of Go’s concurrency model.

I started with just 10 goroutines, expecting it to take several minutes to process the images. In reality, it took only 27 seconds to process 4,500 images. At that rate, processing 500,000 images would take about 50 minutes. With 50 worker threads, the processing time could be reduced to around 10 minutes (Disclosure: I have not run the full test yet, as I was diverted by other priorities).

Metric Value
Worker Pool Size 10
Start Time 2025-05-02 12:12:39.815977 -0400 EDT
End Time 2025-05-02 12:13:07.200567 -0400 EDT
Elapsed Time 27.38 seconds

Conclusion

Tackling the 500,000 image conversion problem highlighted some key insights about concurrency, performance, and system scalability. Go’s goroutines provided an efficient solution, enabling high concurrency and fast processing times. By leveraging Go’s concurrency model, we were able to process 500,000 images in under an hour. This experience reinforced the importance of carefully evaluating your tools to meet specific performance needs and considering all aspects of system design when building a solution.

The Source Code

You can find the source code for the CLI tool on GitHub.


Up Next: Secure API Access with Platform-Level Authentication

In the next blog post, I will cover how I handled secure API access with platform-level authentication. Also in the works is a post on the custom Event-driven plugin architecture that the site uses.