I recently worked on improving chunking strategies for a Slack text RAG (Retrieval-Augmented Generation) system, and I wanted to share my approach — especially for those dealing with chaotic, real-world conversational data.
When you’re trying to retrieve relevant context from Slack conversations, naive chunking can lead to fragmented or unhelpful responses. So I combined three different chunking strategies to make the data much richer and improve retrieval quality.
By doing this, I saw about a 5–6% increase in accuracy, and interestingly, the system gets even more accurate as more data is added. 📈
Let’s dive in! 🧩
🧩 The Problem: Slack Conversations Are Messy
Slack messages are fast-paced and fragmented:
- Conversations happen across multiple channels.
- Threads are scattered.
- Messages are often short and informal.
- Context gets lost easily if you chunk blindly.
My goal was to feed high-quality chunks into the vector store for better context retrieval, especially for RAG systems. So I experimented with multiple chunking techniques to capture as much context as possible in each chunk.
🧠 Strategy 1: Token-Based Chunking (Contextual Enrichment)
The first thing I implemented was token-based chunking.
Instead of chunking by a fixed number of messages, I chunked by token count (e.g., ~500 tokens per chunk). This ensured:
- Each chunk was dense with meaningful information.
- I avoided splitting messages awkwardly.
- I could control the input size for my LLM efficiently.
Bonus: Token-based chunking allowed me to enrich each chunk with metadata (timestamps, user IDs, thread info) while staying within token limits.
📝 Why it matters:
Token limits are very real when you’re dealing with LLMs. Efficient token-based chunking helps maximize signal while respecting those limits.
⏱️ Strategy 2: Timestamp-Based Chunking (5-Minute Windows)
Slack conversations often happen in bursts.
To capture that natural rhythm, I implemented timestamp-based chunking, grouping all messages within a 5-minute window.
This helped me capture:
- Natural conversation flow.
- Real-time back-and-forth.
- Standalone short discussions.
📝 Why it matters:
By keeping chunks within natural conversational timeframes, retrieval felt more human. When the model retrieved context, it got the full flow of that moment in time.
🧵 Strategy 3: Thread-Based Chunking
Slack threads are goldmines of context.
To avoid fragmenting them, I chunked entire threads as a single chunk.
This way:
- Every reply and reaction in a thread stayed together.
- I avoided splitting up follow-up questions and answers.
- Models could "read" the whole conversation without gaps.
📝 Why it matters:
Thread-based chunking keeps related ideas intact, which is critical for meaningful retrieval in Q&A scenarios.
📊 The Impact: 5–6% Accuracy Boost (And It Scales!)
By combining these three strategies, my Slack RAG system became noticeably smarter:
- ✅ More relevant context retrieved.
- ✅ Better grounding for generation tasks.
- ✅ Less noise in retrieval results.
I measured about a 5–6% increase in retrieval accuracy, and I noticed something exciting:
The accuracy improves even further as the dataset grows.
This makes sense:
- The richer the chunks, the better your embeddings.
- As you add more data, there’s a higher chance of finding meaningful matches.
- Chunking effectively compounds its benefits over time.
If you’re scaling your data ingestion, this is an optimization that keeps giving back!
🚀 Takeaways for Your RAG System
If you’re building any RAG system, especially with noisy chat data, I highly recommend combining chunking strategies.
Here’s your actionable playbook:
- ✅ Token-based chunking to manage LLM input limits efficiently.
- ✅ Timestamp chunking to preserve natural conversation flow.
- ✅ Thread chunking to keep full discussions intact.
- ✅ And remember: the bigger your dataset, the more these strategies shine! 📈
Experiment and find the right balance for your use case.
💡 Pro Tip
Consider layering these strategies together:
First, chunk by thread.
Then, within threads, chunk by token count if they’re too big.
For non-threaded conversations, use timestamp-based chunking to group messages naturally.
It’s a multi-step process, but the quality of your retrieval will thank you.
💬 What’s Next?
I’m thinking about pushing this even further by exploring:
- Hybrid chunking (e.g., timestamp + thread + token cap).
- Sentiment-aware chunking (grouping emotional bursts together).
- Speaker role-based chunking (grouping moderator/admin messages separately).
Would love to hear your thoughts — how are you handling chunking in your RAG systems? Drop a comment below! 🚀