I recently worked on improving chunking strategies for a Slack text RAG (Retrieval-Augmented Generation) system, and I wanted to share my approach — especially for those dealing with chaotic, real-world conversational data.

When you’re trying to retrieve relevant context from Slack conversations, naive chunking can lead to fragmented or unhelpful responses. So I combined three different chunking strategies to make the data much richer and improve retrieval quality.

By doing this, I saw about a 5–6% increase in accuracy, and interestingly, the system gets even more accurate as more data is added. 📈

Let’s dive in! 🧩


🧩 The Problem: Slack Conversations Are Messy

Slack messages are fast-paced and fragmented:

  • Conversations happen across multiple channels.
  • Threads are scattered.
  • Messages are often short and informal.
  • Context gets lost easily if you chunk blindly.

My goal was to feed high-quality chunks into the vector store for better context retrieval, especially for RAG systems. So I experimented with multiple chunking techniques to capture as much context as possible in each chunk.


🧠 Strategy 1: Token-Based Chunking (Contextual Enrichment)

The first thing I implemented was token-based chunking.

Instead of chunking by a fixed number of messages, I chunked by token count (e.g., ~500 tokens per chunk). This ensured:

  • Each chunk was dense with meaningful information.
  • I avoided splitting messages awkwardly.
  • I could control the input size for my LLM efficiently.

Bonus: Token-based chunking allowed me to enrich each chunk with metadata (timestamps, user IDs, thread info) while staying within token limits.

📝 Why it matters:

Token limits are very real when you’re dealing with LLMs. Efficient token-based chunking helps maximize signal while respecting those limits.


⏱️ Strategy 2: Timestamp-Based Chunking (5-Minute Windows)

Slack conversations often happen in bursts.

To capture that natural rhythm, I implemented timestamp-based chunking, grouping all messages within a 5-minute window.

This helped me capture:

  • Natural conversation flow.
  • Real-time back-and-forth.
  • Standalone short discussions.

📝 Why it matters:

By keeping chunks within natural conversational timeframes, retrieval felt more human. When the model retrieved context, it got the full flow of that moment in time.


🧵 Strategy 3: Thread-Based Chunking

Slack threads are goldmines of context.

To avoid fragmenting them, I chunked entire threads as a single chunk.

This way:

  • Every reply and reaction in a thread stayed together.
  • I avoided splitting up follow-up questions and answers.
  • Models could "read" the whole conversation without gaps.

📝 Why it matters:

Thread-based chunking keeps related ideas intact, which is critical for meaningful retrieval in Q&A scenarios.


📊 The Impact: 5–6% Accuracy Boost (And It Scales!)

By combining these three strategies, my Slack RAG system became noticeably smarter:

  • ✅ More relevant context retrieved.
  • ✅ Better grounding for generation tasks.
  • ✅ Less noise in retrieval results.

I measured about a 5–6% increase in retrieval accuracy, and I noticed something exciting:

The accuracy improves even further as the dataset grows.

This makes sense:

  • The richer the chunks, the better your embeddings.
  • As you add more data, there’s a higher chance of finding meaningful matches.
  • Chunking effectively compounds its benefits over time.

If you’re scaling your data ingestion, this is an optimization that keeps giving back!


🚀 Takeaways for Your RAG System

If you’re building any RAG system, especially with noisy chat data, I highly recommend combining chunking strategies.

Here’s your actionable playbook:

  • Token-based chunking to manage LLM input limits efficiently.
  • Timestamp chunking to preserve natural conversation flow.
  • Thread chunking to keep full discussions intact.
  • ✅ And remember: the bigger your dataset, the more these strategies shine! 📈

Experiment and find the right balance for your use case.


💡 Pro Tip

Consider layering these strategies together:

First, chunk by thread.

Then, within threads, chunk by token count if they’re too big.

For non-threaded conversations, use timestamp-based chunking to group messages naturally.

It’s a multi-step process, but the quality of your retrieval will thank you.


💬 What’s Next?

I’m thinking about pushing this even further by exploring:

  • Hybrid chunking (e.g., timestamp + thread + token cap).
  • Sentiment-aware chunking (grouping emotional bursts together).
  • Speaker role-based chunking (grouping moderator/admin messages separately).

Would love to hear your thoughts — how are you handling chunking in your RAG systems? Drop a comment below! 🚀