At Digdep, our goal is to help people find supplements that actually work — not just by claims, but by scientific research and user-reported outcomes.
The catch? We had over 30000+ product-condition combinations (e.g. Vitamin A for acne, Omega 3 for ADHD) and needed to generate trustworthy, dynamic, evolving pages — without hiring a hundred content writers.
So we did what any backend-leaning team would do:
We built a pipeline-first, AI-assisted content system, structured around research data, user reviews, and intent-based modules.
🧱 Architecture Overview
We split the problem into three systems:
- Content Orchestration Layer A scheduled ETL engine (Airflow + custom workers) that:
Fetches new research data from PubMed, clinical trial APIs, and internal annotations
Pulls structured review data from reputable sellers.
Normalizes supplement metadata (dosage, source, purity, etc.)
- ML/NLP Layer This is where the raw data gets meaning:
Clinical research is chunked, embedded (SBERT), and summarized using a hybrid of GPT-4 + in-house fine-tuned classifiers
Reviews are clustered by condition + sentiment, scored, and tagged (e.g. “2-week results”, “used with zinc”)
FAQ candidates are extracted from natural language queries, Reddit, Quora, and Digdep’s internal search logs
- Headless CMS + API Delivery The processed content lives in a GraphQL-accessible store (we use Strapi but heavily extended)
Each page is assembled dynamically on the frontend via metadata-driven composition: which sections to show, what order, how they’re prioritized
Content updates are non-destructive and versioned — users get fresh insights without pages losing their SEO/indexing
🧠 AI Where It Makes Sense
We were careful not to overfit with LLMs. Here’s how we actually use them:
Summarization: Input = abstract + result + cohort size; Output = 2-sentence result with risk qualifiers
Semantic clustering: We embed every user review and map it into symptom categories and conditions (some users don’t say “acne” — they say “skin bumps”)
Question synthesis: LLMs turn query logs into human-readable FAQs, then we pass them through filters for duplication, bias, and hallucination
We built a confidence scoring layer to decide when to show or suppress LLM output. If the model’s not sure, it defers to rules or hides the result.
📦 How Pages Are Built
Each product page is made of composable modules, injected via API:
from the ML pipeline
from review tagging
from research weighting
generated dynamically
based on co-purchase graph
The backend controls what renders, and the frontend just assembles.
We also exposed a JSON manifest for each page so QA/devs can debug pipeline decisions without inspecting raw DB rows.
📊 Feedback Loops
This system let us do things we couldn’t before:
Trigger model re-training when new research changes a supplement’s score
Use search and review logs to automatically discover emerging use-cases (e.g. berberine + PCOS suddenly rising)
Log anonymized click paths to see which modules drive trust, then tune the page structure accordingly
🚀 Results & Takeaways
We scaled to thousands of pages within 2 weeks without bottlenecks
Pages adapt over time as new data/reviews/research arrives
Everything is traceable, explainable, and testable — no “black box content”
If you’re building content at scale in a high-trust domain (health, legal, finance), structured pipelines + LLM-assisted augmentation is a sweet spot. It’s not sexy, but it’s robust.
💬 Curious how we handle edge cases (e.g. conflicting research, multi-supplement effects), cold-start products, or data validation? Drop a question below — always happy to nerd out.