Building a Gemini AI Assistant for macOS

Image description

I created the first macOS AI-powered Agent:

Meet the AI-Agent, a native macOS application that integrates with Google's Gemini AI to provide a seamless assistant experience. This project is open-source, and we encourage you to test it, modify it, and contribute to its development.

GitHub logo SchBenedikt / ai-agent

Testing macOS AI Agent with Google Gemini Live Web API

Gemini Assistant macOS App

A native macOS application that connects to Google's Gemini AI. The app automatically accesses your camera and microphone to provide a seamless AI assistant experience image

Features

  • Audio input through your microphone
  • Visual context through your camera
  • Text responses displayed in the app
  • Audio responses played through your speakers

Setup

Prerequisites

  1. Python 3.8+
  2. A Google Gemini API key

Installation

  1. Install the required dependencies:

    pip install google-generativeai opencv-python pyaudio pillow mss PyQt5 pynput python-dotenv pyinstaller
    
  2. Set your Gemini API key as an environment variable (optional):

    export GEMINI_API_KEY="your-api-key-here"
    

    If not set as an environment variable, the app will ask for it on startup.

Building the macOS App

There are two ways to build the app:

Method 1: Using PyInstaller (Recommended)

PyInstaller creates a more reliable standalone application that better handles dependencies:

  1. Make sure PyInstaller is installed:

    pip install pyinstaller
    
  2. Run the build process:

    # First clean any previous builds

What is AI-Agent?

The Gemini Assistant is a macOS application designed to:

  • Capture audio input through your microphone.
  • Use your camera for visual context.
  • Provide AI-powered responses via text.

The app leverages Google's Gemini AI for natural language understanding and response generation, making it a powerful tool for productivity and interaction.

Features

  • Audio Input: Speak to the assistant using your microphone.
  • Visual Context: The app uses your camera to gather additional context.
  • Text Responses: Get responses displayed in the app
  • Customizable: Modify the code to add new features or improve existing ones

How It Works

The application is built using Python and integrates several libraries:

  • PyQt5: For the user interface.
  • OpenCV: For camera access and visual processing.
  • PyAudio: For capturing and playing audio.
  • Google Generative AI: For natural language processing.
  • Python-dotenv: For managing environment variables.

The app uses a .env file to store your Google Gemini API key securely. If the file doesn't exist, the app will create one for you.

Getting Started

Prerequisites

Installation

  1. Clone the repository:
git clone https://github.com/SchBenedikt/ai-agent.git
   cd ai-agent
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Set your Gemini API key in the .env file:
echo "GEMINI_API_KEY=your-api-key" > .env

Running the App

To run the app directly without building:

python app.py

Building the App

You can build a standalone macOS application using PyInstaller:

pyinstaller gemini.spec

The app will be created in the dist folder as Gemini Assistant.app.

Contributing

We welcome contributions! Here are some ways you can help:

  • Test the App: Run the app and report any issues.
  • Improve the Code: Add new features or optimize existing ones.
  • Documentation: Help us improve the documentation.

Feedback

We'd love to hear your thoughts! Share your feedback, suggestions, or issues in the GitHub repository.

Conclusion

The Gemini Assistant is a powerful example of how AI can be integrated into everyday applications.

I hope you find this project as useful and enjoyable as it is!

Thanks for reading,
techtech