From YouTube to Insights: Using Gen AI to Query Video Transcripts

📌 Problem Statement: Video Overload, Information Lost

In today’s digital age, we consume a massive amount of video content—tutorials, lectures, interviews, podcasts, and more. But revisiting a 30-minute YouTube video just to find one key idea? That’s inefficient.

Wouldn’t it be powerful if you could just ask a question and instantly get an answer from a video?

That’s exactly the problem I tackled in my recent Kaggle notebook using Generative AI + Embeddings + vector search.

🧠 The Solution: Gen AI + Embedding Search on YouTube Transcripts

The goal was to build a pipeline that:

Fetches a YouTube video's transcript
Splits it into manageable chunks
Embeds those chunks using a Gemini embedding model
Stores them in ChromaDB for fast similarity search
Uses a Generative AI model to answer natural language queries from the user

📄Implementation Breakdown

1. Fetching the Transcript

We use youtube_transcript_api to extract the transcript directly from a YouTube video:

from youtube_transcript_api import YouTubeTranscriptApi

video_url = 'https://www.youtube.com/watch?v=pTB0EiLXUC8'
video_id = video_url.split('v=')[1]
transcript_text = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])

2. Chunking the Transcript

To make it easier to process and embed, we break the transcript into smaller parts:

def chunk_transcript(transcript, chunk_size=500):
    chunks, current_chunk = [], ""
    for item in transcript:
        text = item['text']
        if len(current_chunk) + len(text) <= chunk_size:
            current_chunk += " " + text
        else:
            chunks.append(current_chunk.strip())
            current_chunk = text
    if current_chunk:
        chunks.append(current_chunk.strip())
    return chunks

3. Embedding and Storing in ChromaDB

We embed each chunk using Google’s Gemini embedding model and store them in a ChromaDB collection for retrieval later:

embedding_response = client.models.embed_content(
    model="models/text-embedding-004",
    contents=chunks
)

embeddings = [e.values for e in embedding_response.embeddings]

for emb, meta in zip(embeddings, [{'chunk': c} for c in chunks]):
    collection.add(
        ids=[str(meta['chunk'])],
        embeddings=[emb],
        metadatas=meta,
        documents=[meta['chunk']]
    )

4. Querying with Generative AI

When a user enters a question, we:

Embed the query
Search for relevant chunks
Feed the results into Gemini to generate a concise answer

def generate_answer(query, relevant_chunks):
    context = " ".join(relevant_chunks)
    prompt = f"Question: {query}\n\nContext: {context}"
    response = client.models.generate_content(
        model="gemini-2.0-flash", 
        contents=prompt
    )
    return response.candidates[0].content.parts[0].text

5. User Query Execution

This final step uses the RAG pipeline to return an LLM-generated answer that’s directly grounded in the original transcript content

user_query = 'what problem did object oriented programming came to solve?'
relevant_chunks = get_relevant_chunks(user_query, collection)
if relevant_chunks:
    answer = generate_answer(user_query, relevant_chunks)
    print("Answer:", answer)
else:
    print("No relevant chunks found to generate an answer.")

🔮 Future Possibilities

Multilingual support: Transcripts in multiple languages and translation layers.
Transcript availability: Not all YouTube videos have transcripts or English subtitles. Audios can be extracted and then can be converted into transcripts using speech-to-text models.
Summarization layer: Automatically summarize full videos.

🏁 Conclusion

This project demonstrates how Gen AI and vector databases can work together to transform passive video content into an interactive knowledge base.

With just a YouTube link, you can now ask intelligent questions and get answers backed by the video transcript—all powered by embeddings and Gemini.

Check out the full notebook on Kaggle and try it out with your favorite videos!
upvote the notebook if you want!
Kaggle notebook link

From YouTube to Insights: Using Gen AI to Query Video Transcripts

📌 Problem Statement: Video Overload, Information Lost

🧠 The Solution: Gen AI + Embedding Search on YouTube Transcripts

📄Implementation Breakdown

1. Fetching the Transcript

2. Chunking the Transcript

3. Embedding and Storing in ChromaDB

4. Querying with Generative AI

5. User Query Execution

🔮 Future Possibilities

🏁 Conclusion

Comments (0)

Read More

#reading

#popular

From YouTube to Insights: Using Gen AI to Query Video Transcripts

📌 Problem Statement: Video Overload, Information Lost

🧠 The Solution: Gen AI + Embedding Search on YouTube Transcripts

📄Implementation Breakdown

1. Fetching the Transcript

2. Chunking the Transcript

3. Embedding and Storing in ChromaDB

4. Querying with Generative AI

5. User Query Execution

🔮 Future Possibilities

🏁 Conclusion

Comments (0)

Read More

Model routing for function calling with Arcee Conductor

Remote Development with Cursor?

Top 15 Builder.ai Alternatives for 2025: Explore the Best App Development Platforms

What is Deep Learning

#reading

#popular