Welcome to my AI-powered multilingual translation project, where I integrated state-of-the-art technologies for transcription, translation, and voice synthesis. This project includes two main components: a Kaggle notebook and a Telegram bot, both powered by Whisper, Transformers, and gTTS. Let’s dive into each of them!

📊 Kaggle Project: AI Translator Notebook

The Kaggle Notebook serves as the backbone of this project, providing an end-to-end transcription and translation pipeline that can be easily replicated and adapted.

Purpose
The Kaggle notebook demonstrates how to:

  1. Transcribe audio files using Whisper (OpenAI’s model)
  2. Translate transcribed text into multiple languages using Hugging Face’s M2M100 model
  3. Generate speech in the translated language using gTTS (Google Text-to-Speech) or other TTS libraries

This project aims to provide an easy-to-use solution for transcription and translation, making it useful for language learners, travelers, or anyone who needs real-time language support.

Features

  1. Audio Transcription: Uses Whisper to convert audio files (MP3, WAV, OGG) into text.
  2. Multilingual Translation: Translates transcribed text from one language to another with high accuracy.
  3. Text-to-Speech: Converts translated text back into audio, making it accessible in the target language.
  4. PDF Text Translation: Extracts text from PDF documents and translates it.

How to Use It

  1. Load your dataset: If you want to test with your own audio, PDF, or text, upload them directly into the notebook.
  2. Run the transcription cell: Use Whisper to transcribe the audio to text.
  3. Run the translation cell: Translate the transcription into the target language using the M2M100 model.
  4. Generate audio: Use gTTS to convert the translated text into speech.

Who Can Use It?

  1. Students: To transcribe and translate audio notes, lectures, and podcasts.
  2. Travelers: To easily convert voice messages into the local language.
  3. Language Enthusiasts: Learn new languages by transcribing, translating, and listening to sentences in the target language.
  4. Researchers: Process audio datasets or translate academic materials automatically.

🤖 Telegram Bot: AI-Powered Multilingual Translation

The Telegram bot offers the same powerful multilingual translation capabilities in a user-friendly format. The goal of the bot is to allow anyone to easily transcribe, translate, and listen to translated text/audio from within Telegram.

Purpose
The Telegram bot is designed to:

  1. Transcribe voice messages: Convert speech to text automatically.
  2. Translate text/audio: Convert transcribed or typed text into a different language.
  3. Generate speech: Translate text and speak it out loud in the target language.

Features

  1. Voice Message Transcription: The bot accepts voice messages, transcribes them using Whisper, and returns the transcription.
  2. Multilingual Translation: Translates the transcription to the target language (supports 10+ languages).
  3. Text-to-Speech: Generates speech in the translated language using gTTS or pyttsx3.
  4. PDF Parsing: Allows users to upload PDF documents, transcribes text, and translates it.

How to Use It

  1. Start the Bot: Open the Telegram bot and type /start to initiate the bot.
  2. Send a Voice Message: Record a voice message and the bot will transcribe and translate it automatically.
  3. Send Text: You can type or paste any text into the bot, and it will translate it for you.
  4. Send a PDF: Upload a PDF document, and the bot will extract text and translate it into the chosen language.
  5. Get Speech Back: After translation, the bot will provide the translated text as a voice message, so you can hear the translation.

Who Can Use It?

  1. Travelers: If you’re traveling to a country where you don’t speak the language, this bot can help translate your speech and provide an audio translation instantly.
  2. Students and Teachers: Easily translate lecture notes or class discussions. Teachers can use it for multilingual classroom support.
  3. Language Learners: Great for practicing pronunciation in different languages by hearing the translated speech.
  4. Professionals: For those who need quick translations in the field, whether in meetings, calls, or interviews.

🔮 Future Plans

  1. Speech Improvement: Implement advanced speech synthesis models for more natural-sounding voice outputs (e.g., ElevenLabs).
  2. Mobile App: Create a mobile version of the bot to help users access translations on the go.
  3. Customizable Voice Profiles: Allow users to choose from multiple voices and accents for translations.

🧑‍💻 About Me

Hi! I’m Aksel, a 16-year-old self-taught developer from Armenia 🇦🇲
I’m passionate about building useful tools with AI, back-end tech, and modern software engineering.

🔧 Skills & Interests

  1. 👨‍💻 Backend Dev: Python, PHP, Laravel, C++, MySQL
  2. 🤖 AI & NLP: Whisper, Transformers, LLMs
  3. 📱 Telegram Bots, Automation, Web Development
  4. 🎮 Game Dev: Unreal Engine
  5. 📚 Lifelong Learner | Passionate about building with purpose

🌐 Connect with Me

This project combines the power of AI with real-world accessibility. By integrating tools like Whisper, Hugging Face Transformers, and gTTS, it enables seamless transcription, translation, and voice synthesis across multiple languages. Whether you’re using the Telegram bot for quick translations or exploring the full pipeline on Kaggle, it’s designed to help break language barriers. Built with care by a passionate young developer, it’s a step toward smarter, more inclusive global communication. 🌍✨