Ever wondered what really happens when you upload a file to Amazon S3 or a similar object storage system? 🤔

Let’s break it down with clear concepts and step-by-step architecture.


🏷️ Key Concepts to Know

🔹 Bucket — A globally unique logical container for objects. You can think of it like a top-level folder.

🔹 Object — A piece of data stored inside a bucket. It consists of:

  • 📄 Object Data: The actual content (file, image, video, etc.)
  • 🧾 Metadata: Key-value pairs describing the object (e.g., name, ID, content type)

🛑 Note:

  • Buckets only store metadata
  • Objects contain both metadata and data
  • Object data is immutable, while metadata can be mutated

📍Example path: /bucket-to-share/script.txt

Here, bucket-to-share holds the metadata, and script.txt is the object containing your data.


🔁 The Upload Flow (Step-by-Step)

Here’s what happens when you upload a file to S3:

1️⃣ Client creates a bucket

→ Sends an HTTP PUT request for bucket-to-share.

→ Forwarded to the API service.

2️⃣ Authorization

→ API service verifies permissions via IAM (Identity and Access Management).

3️⃣ Bucket metadata creation

→ API service records the new bucket in the metadata store.

✅ Success response is returned.

4️⃣ File upload begins

→ Client uploads script.txt via another PUT request.

5️⃣ Validation again

→ API checks if user has WRITE access to the target bucket.

6️⃣ Object data storage

→ Payload is stored in the data store.

→ A UUID is generated for the object.

7️⃣ Object metadata registration

→ API creates an entry in the metadata database with details:

  • object_id (UUID)
  • bucket_id
  • object_name, and more.

🧠 Why It Matters

This separation of metadata and object data, combined with strict identity checks and immutability, gives object storage:

  • 📈 Extreme scalability
  • 🔐 Strong security controls
  • 🔄 Easy versioning & lifecycle management
  • ☁️ Cloud-native architecture suited for cold data, backups, and media

💬 Have you worked with S3 or any S3-compatible services like MinIO, DigitalOcean Spaces, or Backblaze B2?

Would love to hear about your experiences or pain points!