Building an Excel Summarizer Using Streamlit, Karo, and LLMs

In this tutorial, We'll show you how to build a Streamlit application that can read Excel files and generate summaries using large language models (LLMs) like GPT-4 or Claude. We'll use the Karo framework to simplify our interaction with these AI models, creating a tool that reads Excel data and produces insightful summaries and key takeaways.

What We're Building

Our application will:

Allow users to upload Excel files
Read and display a preview of the data
Use OpenAI or Anthropic APIs to generate insights about the data
Present summaries and key takeaways in a user-friendly interface

Prerequisites

Basic knowledge of Python
A computer with Python installed (3.8 or higher recommended)
An OpenAI API key or Anthropic API key (you'll need to have funds in your account)
A GitHub account (for deployment)

Step 1: Setting Up Your Development Environment

Let's start by creating a dedicated project folder and setting up a virtual environment:

# Create project directory
mkdir excel_summarizer
cd excel_summarizer

# Create a virtual environment
# For Windows
python -m venv venv

# For macOS/Linux
python3 -m venv venv

# Activate the virtual environment
# For Windows
venv\Scripts\activate

# For macOS/Linux
source venv/bin/activate

Next, open the project in your favorite code editor. If you're using Visual Studio Code:

code .

Step 2: Installing Required Packages

Install the necessary packages:

pip install karo streamlit python-dotenv pandas openpyxl chromadb

Note: We're installing chromadb as it works with the base agent.

Step 3: Creating the Excel Reader Tool

Create a new file called excel_tool_reader.py with the following code:

import pandas as pd
import os
import logging
from typing import List, Dict, Any, Optional, Union
from pydantic import Field, FilePath

from karo.tools.base_tool import BaseTool, BaseToolInputSchema, BaseToolOutputSchema

logger = logging.getLogger(__name__)

class ExcelReaderInput(BaseToolInputSchema):
    """Input schema for the ExcelReader tool."""
    file_path: FilePath = Field(..., description="Path to the Excel file.")
    sheet_name: Optional[Union[str, int]] = Field(None, description="Specific name of sheet")
    max_rows: Optional[int] = Field(100, description="Maximum number of rows to read from each sheet.")
    max_cols: Optional[int] = Field(20, description="Maximum number of columns to read from each sheet.")

class ExcelReaderOutput(BaseToolOutputSchema):
    """Output schema for the ExcelReader tool."""
    file_path: str = Field(..., description="The path of file that was read")
    sheet_name_read: str = Field(..., description="The name of the sheet that was actually read")
    data_preview: Optional[str] = Field(None, description="A string representation of the first few rows/columns of the data")
    row_count: Optional[int] = Field(None, description="The total number of rows read")
    column_names: Optional[List[str]] = Field(None, description="List of column names read (up to max_cols)")

class ExcelReader(BaseTool):
    """Tool to read data from an Excel file."""
    name = "excel_reader"
    description: str = "Reads data from an Excel file and returns a preview of the data."
    input_schema = ExcelReaderInput
    output_schema = ExcelReaderOutput

    def __init__(self, config: Optional[Any] = None):
        """Initialize the ExcelReader tool."""
        logger.info("ExcelReaderTool initialized.")
        pass

    def run(self, input_data: ExcelReaderInput) -> ExcelReaderOutput:
        """Reads the specificied Excel file and returns a data preview."""

        # Read the specified sheet from the Excel file
        try:
            import openpyxl
        except ImportError:
            logger.error("openpyxl is not installed. Please install it to read Excel files.")
            return self.output_schema(success=False, error_message="openpyxl is not installed.", file_path=str(input_data.file_path), sheet_name_read="N/A")

        if not isinstance(input_data, self.input_schema):
            return self.output_schema(success=False, error_message="Invalid input data format.", file_path=str(input_data.file_path), sheet_name_read="N/A")

        file_path_str = str(input_data.file_path)

        if not os.path.exists(file_path_str):
            return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")

        try:
            excel_file = pd.ExcelFile(file_path_str, engine = "openpyxl")
            sheet_names = excel_file.sheet_names
            sheet_to_read: Union[str, int] = 0 
            sheet_name_read: str = sheet_names[0]

            if input_data.sheet_name is not None:
                if isinstance(input_data.sheet_name, int):
                    if 0 <= input_data.sheet_name < len(sheet_names):
                        sheet_to_read = input_data.sheet_name
                        sheet_name_read = sheet_names[sheet_to_read]
                    else:
                        return self.output_schema(success=False, error_message=f"Sheet index out of range: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")
                elif isinstance(input_data.sheet_name, str):
                    if input_data.sheet_name in sheet_names:
                        sheet_to_read = input_data.sheet_name
                        sheet_name_read = input_data.sheet_name
                    else:
                        return self.output_schema(success=False, error_message=f"Sheet name not found: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")

            header_df = pd.read_excel(excel_file, sheet_name=sheet_to_read, nrows=0)
            all_columns = header_df.columns.tolist()
            cols_to_use = all_columns[:input_data.max_cols] if input_data.max_cols else all_columns

            df = pd.read_excel(excel_file, sheet_name=sheet_to_read, usecols=cols_to_use, nrows=input_data.max_rows)

            preview_rows = min(len(df), 10)
            data_preview_str = df.head(preview_rows).to_markdown(index=False)

            logger.info(f"Successfully read {len(df)} rows and {len(df.columns)} columns from sheet '{sheet_name_read}' in '{file_path_str}'.")

            return self.output_schema(
                success=True,
                file_path=file_path_str,
                sheet_name_read=sheet_name_read,
                data_preview=data_preview_str,
                row_count=len(df),
                column_names=df.columns.tolist()
            )
        except FileNotFoundError:
            logger.error(f"File not found: {file_path_str}")
            return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")
        except Exception as e:
            logger.error(f"Error reading Excel file '{file_path_str}': {e}", exc_info=True)
            return self.output_schema(success=False, error_message=f"Error reading Excel file: {e}", file_path=file_path_str, sheet_name_read="N/A")

Step 4: Creating the Streamlit Application

Create a file called streamlit_app.py:

import streamlit as st
import pandas as pd
import os
import tempfile
from dotenv import load_dotenv
from karo.prompts.system_prompt_builder import SystemPromptBuilder
from rich.console import Console
from pydantic import Field
from typing import List, Dict, Any
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

dotenv_path = os.path.join(os.path.dirname(__file__), '.env')
load_dotenv(dotenv_path=dotenv_path)

from karo.core.base_agent import BaseAgent, BaseAgentConfig
from karo.providers.openai_provider import OpenAIProvider, OpenAIProviderConfig
from karo.providers.anthropic_provider import AnthropicProvider, AnthropicProviderConfig
from karo.schemas.base_schemas import BaseInputSchema, BaseOutputSchema, AgentErrorSchema
from excel_tool_reader import ExcelReaderInput, ExcelReaderOutput, ExcelReader

console = Console()

st.set_page_config(page_title="Excel Reader Tool", page_icon="📊", layout="wide", initial_sidebar_state="expanded")

st.title("Excel Reader Tool")
st.markdown("This tool reads data from an Excel file and returns a preview of the data.")

class SummarizationOutput(BaseOutputSchema):
    summary: str = Field(..., description="The summary of the data read from the Excel file.")
    key_takeaways: List[str] = Field(default_factory=list, description="Key takeaways from the data.")

with st.sidebar:
    st.header("About Karo Framework")
    st.info("Karo is a framework for building and deploying AI agents.")

    st.header("Settings")
    provider_type = st.selectbox("Provider", ["OpenAI"], help="Select the provider for the agent.")

    if provider_type == "OpenAI":
        api_key = st.text_input("OpenAI API Key", type="password", help="Enter your OpenAI API key.")

        if api_key:
            os.environ["OPENAI_API_KEY"] = api_key

        model_choice = st.selectbox("AI Model", ["gpt-4-turbo", "gpt-4"], help="Select the model for summarization.")

    else:
        api_key = st.text_input("Anthropic API Key", type="password", help="Enter your Anthropic API key for summarization.")

        if api_key:
            os.environ["ANTHROPIC_API_KEY"] = api_key

        model_choice = st.selectbox("AI Model", ["claude-3-opus-20240229", "claude-3-sonnet-20240229"], help="Select the model for summarization.")

    max_rows = st.slider("Max Rows to process", 10, 50, 100, help="Limit the number of rows to process (higher = more complete but slower)")

    max_cols = st.slider("Max Columns to process", 1, 20, 5, help="Limit the number of columns to process")

    show_debug = st.checkbox("Show Debug Info", value=False, help="Show system prompts and messages being sent to the LLM")

def run_summarization(file_path, max_rows, max_cols, provider_type="OpenAI", model_choice="gpt-4-turbo", show_debug=False):
    """Function to summarize the data read from the Excel file."""
    if provider_type == "OpenAI":
        api_key = os.environ.get("OPENAI_API_KEY")
        if not api_key:
            st.error("OpenAI API key is not set. Please enter your API key in the sidebar.")
            return None

    else:
        api_key = os.environ.get("ANTHROPIC_API_KEY")
        if not api_key:
            st.error("Anthropic API key is not set. Please enter your API key in the sidebar.")
            return None

    with st.spinner("Initializing Excel Reader Tool..."):
        excel_reader_tool = ExcelReader()
        st.success("Excel Reader Tool initialized successfully.")

    with st.spinner("Reading Excel file..."):
        tool_input = ExcelReaderInput(file_path=file_path, max_rows=max_rows, max_cols=max_cols)
        tool_output = excel_reader_tool.run(tool_input)

    if not tool_output.success:
        st.error(f"Error reading Excel file: {tool_output.error_message}")
        return None

    if not tool_output.data_preview:
        st.warning("No data found in the Excel file.")
        return None

    st.success(f"Successfully read sheet '{tool_output.sheet_name_read}' {tool_output.row_count} rows, {len(tool_output.column_names)} columns from the Excel file.")
    st.markdown("### Data Preview")
    st.markdown(tool_output.data_preview)

    with st.spinner(f"Initializing {provider_type} provider..."):
        if provider_type == "OpenAI":
            provider_config = OpenAIProviderConfig(model=model_choice)
            provider = OpenAIProvider(config=provider_config)
        else:
            provider_config = AnthropicProviderConfig(model=model_choice)
            provider = AnthropicProvider(config=provider_config)

        st.success(f"{provider_type} provider initialized (Model: {model_choice} successfully.")

    system_prompt_content = "You are an expert data analyst. Your task is to analyze Excel data and provide accurate summaries and key takeaways."

    system_prompt_builder = SystemPromptBuilder(role_description=system_prompt_content)

    data_message = (
        "Please analyze this Excel data:\n\n"
        "Data preview:\n"
        "```

markdown\n"
        f"{tool_output.data_preview}\n"
        "

```\n\n"
        f"Column names: {', '.join(tool_output.column_names)}\n"
        f"Sheet name: {tool_output.sheet_name_read}\n"
        f"(Note: Only the first {tool_output.row_count} rows are shown in the preview\n\n"
        "Generate a summary that specifically analyzes the data patterns. "
        "Include insights about key metrics and patterns visible in the data. "
    )

    if show_debug:
        st.markdown("### debug information")
        st.markdown("#### System Prompt")
        st.code(system_prompt_content)
        st.markdown("#### User Message")
        st.code(data_message)

    with st.spinner("Configuring Agent..."):
        agent_config = BaseAgentConfig(
            provider_config=provider_config,
            system_prompt=system_prompt_builder,
            output_schema=SummarizationOutput,
        )

        summarization_agent = BaseAgent(config=agent_config)
        st.success("Agent configured successfully.")

    external_history = [ {"role": "user", "content": data_message }]

    with st.spinner(f"Generating summary using {provider_type} Agent..."):
        simple_input = BaseInputSchema(chat_message="")

        result = summarization_agent.run(
            input_data=simple_input,
            history=external_history,
        )

    return result

uploaded_file = st.file_uploader("Upload Excel file", type=["xlsx", "xls"], help="Upload an Excel file to read data from.")

if uploaded_file is not None:

    with tempfile.NamedTemporaryFile(delete=False, suffix='.xlsx') as temp_file:
        temp_file.write(uploaded_file.getvalue())
        tmp_filepath = temp_file.name

    try:
        st.success(f"File uploaded: {uploaded_file.name}")

        api_var = "OPENAI_API_KEY" if provider_type == "OpenAI" else "ANTHROPIC_API_KEY"
        if api_var not in os.environ or not os.environ[api_var]:
            st.warning(f"{provider_type} API key is not set. Please enter your API key in the sidebar.")

        else:
            if st.button("Generate Summary"):
                result = run_summarization(tmp_filepath, max_rows, max_cols, provider_type, model_choice, show_debug)

                if result is None:
                    pass

                elif isinstance(result, SummarizationOutput):
                    st.subheader("Summary")
                    st.write(result.summary)

                    st.subheader("Key Takeaways")
                    for i, takeaway in enumerate(result.key_takeaways, start=1):
                        st.markdown(f"**{i}.** {takeaway}")
                elif isinstance(result, AgentErrorSchema):
                    st.error(f"Error from Karo Agent: {result.error_message}")
                else:
                    st.warning(f"Unexpected result type from agent: {type(result)}")

    finally:
        if os.path.exists(tmp_filepath):
            os.unlink(tmp_filepath)
else:

    st.info("Please upload an Excel file to read data from.")

    st.subheader("Agent summary will appear here")
    st.text("Upload a file and click 'Generate Summary' to analyze your data.")

Step 6: Setting Up Environment Variables

Create a .env file to store your API keys (this is optional, as the app also allows entering them via the UI):

OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

Step 7: Testing Locally

Run your application locally:

streamlit run streamlit_app.py

At your first Streamlit run, you'll be asked for your email. Use the same email you'll use for creating your Streamlit Cloud account.
In the app settings (sidebar), input your API key or the app will use your .env file if configured.

Step 8: Getting API Keys

OpenAI API Key:

Go to platform.openai.com
Create an account or sign in
Navigate to API Keys in your account settings
Create a new secret key and copy it
Add funds to your account (required for API access)

Anthropic API Key:

Go to console.anthropic.com
Create an account or sign in
Navigate to API Keys section
Create a new API key and copy it
Add payment method to your account (required for API access)

Step 9: Preparing for Deployment

Create a requirements.txt file for Streamlit Cloud:

pip freeze > requirements.txt

Step 10: Creating a GitHub Repository

If you don't have a GitHub account:

Go to github.com and sign up
Verify your email address
Set up your profile

Creating a new repository:

Click the "+" icon in the top right and select "New repository"
Name your repository (e.g., "excel-summarizer")
Add a description (optional)
Choose "Public" visibility
Initialize with a README file (optional)
Click "Create repository"

Step 11: Pushing Your Code to GitHub

Initialize a Git repository and push your code:

# Initialize Git repository
git init

# Add your files
git add .

# Commit changes
git commit -m "Initial commit"

# Add your GitHub repository as remote
git remote add origin https://github.com/yourusername/excel-summarizer.git

# Push to GitHub
git push -u origin main

Note: GitHub now uses main instead of master as the default branch name. If your Git is configured differently, use master instead.

Step 12: Deploying to Streamlit Cloud

Go to share.streamlit.io and create an account or sign in
Click "New app"
In the deployment form:
Select your GitHub repository from the dropdown or paste the repository URL
Select the main branch
Enter the path to your main file (streamlit_app.py)
Click "Deploy"

Step 13: Creating a Patch to Bypass ChromaDB

You will encounter an error on streamlit due to version incompatibilities of SQLite, we will create a new file called patch.py with the following code:

import sys
import types

class MockModule(types.ModuleType):
    def __init__(self, name):
        super().__init__(name)

    def __getattr__(self, name):
        return None

sys.modules['chromadb'] = MockModule('chromadb')
sys.modules['karo.memory.services.chromadb_service'] = MockModule('karo.memory.services.chromadb_service')

This patch will help us bypass the ChromaDB dependency. Redo Step 11 starting with git add . to commit the new addition, refresh your deployed application to see it live.

Step 14: Using Your Deployed App

Once deployed, you can:

Upload Excel files (under 10MB)
Enter your API key in the settings
Adjust the maximum rows and columns to process
Click "Generate Summary" to see the AI analyze your data

Troubleshooting Deployment Issues

If you encounter errors related to SQLite or ChromaDB when deploying to Streamlit Cloud, make sure that:
The patch.py file is correctly set up
The import patch line is at the very top of your streamlit_app.py file
Your requirements.txt file is correctly generated and includes all necessary dependencies

Conclusion

You've now built and deployed a powerful Excel file summarizer that uses AI to analyze and extract insights from your data. This application demonstrates how to combine Streamlit's user-friendly interface with powerful LLMs through the Karo framework.

This approach can be extended to create more sophisticated data analysis tools, dashboards, or even complete AI-powered applications. The possibilities are endless!

If you want to see this tutorial in action, check out the Excel Summarizer Agent.

Happy coding!

Building an Excel Summarizer Using Streamlit, Karo, and LLMs

What We're Building

Prerequisites

Step 1: Setting Up Your Development Environment

Step 2: Installing Required Packages

Step 3: Creating the Excel Reader Tool

Step 4: Creating the Streamlit Application

Step 6: Setting Up Environment Variables

Step 7: Testing Locally

Step 8: Getting API Keys

Step 9: Preparing for Deployment

Step 10: Creating a GitHub Repository

Step 11: Pushing Your Code to GitHub

Step 12: Deploying to Streamlit Cloud

Step 13: Creating a Patch to Bypass ChromaDB

Step 14: Using Your Deployed App

Conclusion

Comments (0)

Read More

#reading

#popular

Building an Excel Summarizer Using Streamlit, Karo, and LLMs

What We're Building

Prerequisites

Step 1: Setting Up Your Development Environment

Step 2: Installing Required Packages

Step 3: Creating the Excel Reader Tool

Step 4: Creating the Streamlit Application

Step 6: Setting Up Environment Variables

Step 7: Testing Locally

Step 8: Getting API Keys

Step 9: Preparing for Deployment

Step 10: Creating a GitHub Repository

Step 11: Pushing Your Code to GitHub

Step 12: Deploying to Streamlit Cloud

Step 13: Creating a Patch to Bypass ChromaDB

Step 14: Using Your Deployed App

Conclusion

Comments (0)

Read More

⚛️ Build a Simple Todo App with React Store - a Tiny React State Manager

System Hacking: Journey into the Intricate World of Cyber Intrusion

How to manage large env files?

Top 15 Builder.ai Alternatives for 2025: Explore the Best App Development Platforms

#reading

#popular