Requirements

  • ffmpeg
  • whisper
  • Python 3.10+ (for Whisper)

Installation

macOS

# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install ffmpeg
brew install ffmpeg

# Install Python (if needed)
brew install python

# Install Whisper
pip3 install --upgrade pip
pip3 install git+https://github.com/openai/whisper.git

Windows

# Install Chocolatey if you don't have it
# Run in PowerShell as administrator:
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

# Install ffmpeg
choco install ffmpeg

# Install Python (from python.org)
# Make sure to check "Add Python to PATH" during installation

# Install Whisper
pip install -U openai-whisper

Linux

# Install ffmpeg
sudo apt update && sudo apt install ffmpeg

# Install Python and pip
sudo apt install python3 python3-pip

# Install Whisper
pip3 install git+https://github.com/openai/whisper.git

Transcription Steps

  1. Extract audio from video using ffmpeg
ffmpeg -i input_video.mp4 -vn -acodec mp3 output.mp3
  1. Transcribe audio with Whisper
whisper output.mp3 --language English --model small --output_format txt

Model Options

  • tiny: Fastest, lowest accuracy (~1GB RAM)
  • base: Fast, decent accuracy (~1GB RAM)
  • small: Balanced speed/accuracy (~2GB RAM)
  • medium: Good accuracy (~5GB RAM)
  • large: Best accuracy (~10GB RAM)

Output Formats

  • txt: Plain text transcript
  • srt: Standard subtitle format
  • vtt: Web Video Text Tracks format
  • json: Detailed JSON with timestamps

Additional Options

  • --task translate: Translates non-English audio to English
  • --language en: Specifies the source language (faster and more accurate)
  • --model: Selects the model size (tiny/base/small/medium/large)

Source: macos.gadgethacks.com
Source: dev.to