In this tutorial, We'll show you how to build a Streamlit application that can read Excel files and generate summaries using large language models (LLMs) like GPT-4 or Claude. We'll use the Karo framework to simplify our interaction with these AI models, creating a tool that reads Excel data and produces insightful summaries and key takeaways.
What We're Building
Our application will:
- Allow users to upload Excel files
- Read and display a preview of the data
- Use OpenAI or Anthropic APIs to generate insights about the data
- Present summaries and key takeaways in a user-friendly interface
Prerequisites
- Basic knowledge of Python
- A computer with Python installed (3.8 or higher recommended)
- An OpenAI API key or Anthropic API key (you'll need to have funds in your account)
- A GitHub account (for deployment)
Step 1: Setting Up Your Development Environment
Let's start by creating a dedicated project folder and setting up a virtual environment:
# Create project directory
mkdir excel_summarizer
cd excel_summarizer
# Create a virtual environment
# For Windows
python -m venv venv
# For macOS/Linux
python3 -m venv venv
# Activate the virtual environment
# For Windows
venv\Scripts\activate
# For macOS/Linux
source venv/bin/activate
Next, open the project in your favorite code editor. If you're using Visual Studio Code:
code .
Step 2: Installing Required Packages
Install the necessary packages:
pip install karo streamlit python-dotenv pandas openpyxl chromadb
Note: We're installing chromadb as it works with the base agent.
Step 3: Creating the Excel Reader Tool
Create a new file called excel_tool_reader.py with the following code:
import pandas as pd
import os
import logging
from typing import List, Dict, Any, Optional, Union
from pydantic import Field, FilePath
from karo.tools.base_tool import BaseTool, BaseToolInputSchema, BaseToolOutputSchema
logger = logging.getLogger(__name__)
class ExcelReaderInput(BaseToolInputSchema):
"""Input schema for the ExcelReader tool."""
file_path: FilePath = Field(..., description="Path to the Excel file.")
sheet_name: Optional[Union[str, int]] = Field(None, description="Specific name of sheet")
max_rows: Optional[int] = Field(100, description="Maximum number of rows to read from each sheet.")
max_cols: Optional[int] = Field(20, description="Maximum number of columns to read from each sheet.")
class ExcelReaderOutput(BaseToolOutputSchema):
"""Output schema for the ExcelReader tool."""
file_path: str = Field(..., description="The path of file that was read")
sheet_name_read: str = Field(..., description="The name of the sheet that was actually read")
data_preview: Optional[str] = Field(None, description="A string representation of the first few rows/columns of the data")
row_count: Optional[int] = Field(None, description="The total number of rows read")
column_names: Optional[List[str]] = Field(None, description="List of column names read (up to max_cols)")
class ExcelReader(BaseTool):
"""Tool to read data from an Excel file."""
name = "excel_reader"
description: str = "Reads data from an Excel file and returns a preview of the data."
input_schema = ExcelReaderInput
output_schema = ExcelReaderOutput
def __init__(self, config: Optional[Any] = None):
"""Initialize the ExcelReader tool."""
logger.info("ExcelReaderTool initialized.")
pass
def run(self, input_data: ExcelReaderInput) -> ExcelReaderOutput:
"""Reads the specificied Excel file and returns a data preview."""
# Read the specified sheet from the Excel file
try:
import openpyxl
except ImportError:
logger.error("openpyxl is not installed. Please install it to read Excel files.")
return self.output_schema(success=False, error_message="openpyxl is not installed.", file_path=str(input_data.file_path), sheet_name_read="N/A")
if not isinstance(input_data, self.input_schema):
return self.output_schema(success=False, error_message="Invalid input data format.", file_path=str(input_data.file_path), sheet_name_read="N/A")
file_path_str = str(input_data.file_path)
if not os.path.exists(file_path_str):
return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")
try:
excel_file = pd.ExcelFile(file_path_str, engine = "openpyxl")
sheet_names = excel_file.sheet_names
sheet_to_read: Union[str, int] = 0
sheet_name_read: str = sheet_names[0]
if input_data.sheet_name is not None:
if isinstance(input_data.sheet_name, int):
if 0 <= input_data.sheet_name < len(sheet_names):
sheet_to_read = input_data.sheet_name
sheet_name_read = sheet_names[sheet_to_read]
else:
return self.output_schema(success=False, error_message=f"Sheet index out of range: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")
elif isinstance(input_data.sheet_name, str):
if input_data.sheet_name in sheet_names:
sheet_to_read = input_data.sheet_name
sheet_name_read = input_data.sheet_name
else:
return self.output_schema(success=False, error_message=f"Sheet name not found: {input_data.sheet_name}", file_path=file_path_str, sheet_name_read="N/A")
header_df = pd.read_excel(excel_file, sheet_name=sheet_to_read, nrows=0)
all_columns = header_df.columns.tolist()
cols_to_use = all_columns[:input_data.max_cols] if input_data.max_cols else all_columns
df = pd.read_excel(excel_file, sheet_name=sheet_to_read, usecols=cols_to_use, nrows=input_data.max_rows)
preview_rows = min(len(df), 10)
data_preview_str = df.head(preview_rows).to_markdown(index=False)
logger.info(f"Successfully read {len(df)} rows and {len(df.columns)} columns from sheet '{sheet_name_read}' in '{file_path_str}'.")
return self.output_schema(
success=True,
file_path=file_path_str,
sheet_name_read=sheet_name_read,
data_preview=data_preview_str,
row_count=len(df),
column_names=df.columns.tolist()
)
except FileNotFoundError:
logger.error(f"File not found: {file_path_str}")
return self.output_schema(success=False, error_message=f"File not found: {file_path_str}", file_path=file_path_str, sheet_name_read="N/A")
except Exception as e:
logger.error(f"Error reading Excel file '{file_path_str}': {e}", exc_info=True)
return self.output_schema(success=False, error_message=f"Error reading Excel file: {e}", file_path=file_path_str, sheet_name_read="N/A")
Step 4: Creating the Streamlit Application
Create a file called streamlit_app.py:
import streamlit as st
import pandas as pd
import os
import tempfile
from dotenv import load_dotenv
from karo.prompts.system_prompt_builder import SystemPromptBuilder
from rich.console import Console
from pydantic import Field
from typing import List, Dict, Any
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
dotenv_path = os.path.join(os.path.dirname(__file__), '.env')
load_dotenv(dotenv_path=dotenv_path)
from karo.core.base_agent import BaseAgent, BaseAgentConfig
from karo.providers.openai_provider import OpenAIProvider, OpenAIProviderConfig
from karo.providers.anthropic_provider import AnthropicProvider, AnthropicProviderConfig
from karo.schemas.base_schemas import BaseInputSchema, BaseOutputSchema, AgentErrorSchema
from excel_tool_reader import ExcelReaderInput, ExcelReaderOutput, ExcelReader
console = Console()
st.set_page_config(page_title="Excel Reader Tool", page_icon="📊", layout="wide", initial_sidebar_state="expanded")
st.title("Excel Reader Tool")
st.markdown("This tool reads data from an Excel file and returns a preview of the data.")
class SummarizationOutput(BaseOutputSchema):
summary: str = Field(..., description="The summary of the data read from the Excel file.")
key_takeaways: List[str] = Field(default_factory=list, description="Key takeaways from the data.")
with st.sidebar:
st.header("About Karo Framework")
st.info("Karo is a framework for building and deploying AI agents.")
st.header("Settings")
provider_type = st.selectbox("Provider", ["OpenAI"], help="Select the provider for the agent.")
if provider_type == "OpenAI":
api_key = st.text_input("OpenAI API Key", type="password", help="Enter your OpenAI API key.")
if api_key:
os.environ["OPENAI_API_KEY"] = api_key
model_choice = st.selectbox("AI Model", ["gpt-4-turbo", "gpt-4"], help="Select the model for summarization.")
else:
api_key = st.text_input("Anthropic API Key", type="password", help="Enter your Anthropic API key for summarization.")
if api_key:
os.environ["ANTHROPIC_API_KEY"] = api_key
model_choice = st.selectbox("AI Model", ["claude-3-opus-20240229", "claude-3-sonnet-20240229"], help="Select the model for summarization.")
max_rows = st.slider("Max Rows to process", 10, 50, 100, help="Limit the number of rows to process (higher = more complete but slower)")
max_cols = st.slider("Max Columns to process", 1, 20, 5, help="Limit the number of columns to process")
show_debug = st.checkbox("Show Debug Info", value=False, help="Show system prompts and messages being sent to the LLM")
def run_summarization(file_path, max_rows, max_cols, provider_type="OpenAI", model_choice="gpt-4-turbo", show_debug=False):
"""Function to summarize the data read from the Excel file."""
if provider_type == "OpenAI":
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
st.error("OpenAI API key is not set. Please enter your API key in the sidebar.")
return None
else:
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
st.error("Anthropic API key is not set. Please enter your API key in the sidebar.")
return None
with st.spinner("Initializing Excel Reader Tool..."):
excel_reader_tool = ExcelReader()
st.success("Excel Reader Tool initialized successfully.")
with st.spinner("Reading Excel file..."):
tool_input = ExcelReaderInput(file_path=file_path, max_rows=max_rows, max_cols=max_cols)
tool_output = excel_reader_tool.run(tool_input)
if not tool_output.success:
st.error(f"Error reading Excel file: {tool_output.error_message}")
return None
if not tool_output.data_preview:
st.warning("No data found in the Excel file.")
return None
st.success(f"Successfully read sheet '{tool_output.sheet_name_read}' {tool_output.row_count} rows, {len(tool_output.column_names)} columns from the Excel file.")
st.markdown("### Data Preview")
st.markdown(tool_output.data_preview)
with st.spinner(f"Initializing {provider_type} provider..."):
if provider_type == "OpenAI":
provider_config = OpenAIProviderConfig(model=model_choice)
provider = OpenAIProvider(config=provider_config)
else:
provider_config = AnthropicProviderConfig(model=model_choice)
provider = AnthropicProvider(config=provider_config)
st.success(f"{provider_type} provider initialized (Model: {model_choice} successfully.")
system_prompt_content = "You are an expert data analyst. Your task is to analyze Excel data and provide accurate summaries and key takeaways."
system_prompt_builder = SystemPromptBuilder(role_description=system_prompt_content)
data_message = (
"Please analyze this Excel data:\n\n"
"Data preview:\n"
"```
markdown\n"
f"{tool_output.data_preview}\n"
"
```\n\n"
f"Column names: {', '.join(tool_output.column_names)}\n"
f"Sheet name: {tool_output.sheet_name_read}\n"
f"(Note: Only the first {tool_output.row_count} rows are shown in the preview\n\n"
"Generate a summary that specifically analyzes the data patterns. "
"Include insights about key metrics and patterns visible in the data. "
)
if show_debug:
st.markdown("### debug information")
st.markdown("#### System Prompt")
st.code(system_prompt_content)
st.markdown("#### User Message")
st.code(data_message)
with st.spinner("Configuring Agent..."):
agent_config = BaseAgentConfig(
provider_config=provider_config,
system_prompt=system_prompt_builder,
output_schema=SummarizationOutput,
)
summarization_agent = BaseAgent(config=agent_config)
st.success("Agent configured successfully.")
external_history = [ {"role": "user", "content": data_message }]
with st.spinner(f"Generating summary using {provider_type} Agent..."):
simple_input = BaseInputSchema(chat_message="")
result = summarization_agent.run(
input_data=simple_input,
history=external_history,
)
return result
uploaded_file = st.file_uploader("Upload Excel file", type=["xlsx", "xls"], help="Upload an Excel file to read data from.")
if uploaded_file is not None:
with tempfile.NamedTemporaryFile(delete=False, suffix='.xlsx') as temp_file:
temp_file.write(uploaded_file.getvalue())
tmp_filepath = temp_file.name
try:
st.success(f"File uploaded: {uploaded_file.name}")
api_var = "OPENAI_API_KEY" if provider_type == "OpenAI" else "ANTHROPIC_API_KEY"
if api_var not in os.environ or not os.environ[api_var]:
st.warning(f"{provider_type} API key is not set. Please enter your API key in the sidebar.")
else:
if st.button("Generate Summary"):
result = run_summarization(tmp_filepath, max_rows, max_cols, provider_type, model_choice, show_debug)
if result is None:
pass
elif isinstance(result, SummarizationOutput):
st.subheader("Summary")
st.write(result.summary)
st.subheader("Key Takeaways")
for i, takeaway in enumerate(result.key_takeaways, start=1):
st.markdown(f"**{i}.** {takeaway}")
elif isinstance(result, AgentErrorSchema):
st.error(f"Error from Karo Agent: {result.error_message}")
else:
st.warning(f"Unexpected result type from agent: {type(result)}")
finally:
if os.path.exists(tmp_filepath):
os.unlink(tmp_filepath)
else:
st.info("Please upload an Excel file to read data from.")
st.subheader("Agent summary will appear here")
st.text("Upload a file and click 'Generate Summary' to analyze your data.")
Step 6: Setting Up Environment Variables
Create a .env file to store your API keys (this is optional, as the app also allows entering them via the UI):
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
Step 7: Testing Locally
Run your application locally:
streamlit run streamlit_app.py
At your first Streamlit run, you'll be asked for your email. Use the same email you'll use for creating your Streamlit Cloud account.
In the app settings (sidebar), input your API key or the app will use your .env file if configured.
Step 8: Getting API Keys
OpenAI API Key:
- Go to platform.openai.com
- Create an account or sign in
- Navigate to API Keys in your account settings
- Create a new secret key and copy it
- Add funds to your account (required for API access)
Anthropic API Key:
- Go to console.anthropic.com
- Create an account or sign in
- Navigate to API Keys section
- Create a new API key and copy it
- Add payment method to your account (required for API access)
Step 9: Preparing for Deployment
Create a requirements.txt file for Streamlit Cloud:
pip freeze > requirements.txt
Step 10: Creating a GitHub Repository
If you don't have a GitHub account:
- Go to github.com and sign up
- Verify your email address
- Set up your profile
Creating a new repository:
- Click the "+" icon in the top right and select "New repository"
- Name your repository (e.g., "excel-summarizer")
- Add a description (optional)
- Choose "Public" visibility
- Initialize with a README file (optional)
- Click "Create repository"
Step 11: Pushing Your Code to GitHub
Initialize a Git repository and push your code:
# Initialize Git repository
git init
# Add your files
git add .
# Commit changes
git commit -m "Initial commit"
# Add your GitHub repository as remote
git remote add origin https://github.com/yourusername/excel-summarizer.git
# Push to GitHub
git push -u origin main
Note: GitHub now uses main instead of master as the default branch name. If your Git is configured differently, use master instead.
Step 12: Deploying to Streamlit Cloud
- Go to share.streamlit.io and create an account or sign in
- Click "New app"
- In the deployment form:
- Select your GitHub repository from the dropdown or paste the repository URL
- Select the main branch
- Enter the path to your main file (streamlit_app.py)
- Click "Deploy"
Step 13: Creating a Patch to Bypass ChromaDB
You will encounter an error on streamlit due to version incompatibilities of SQLite, we will create a new file called patch.py with the following code:
import sys
import types
class MockModule(types.ModuleType):
def __init__(self, name):
super().__init__(name)
def __getattr__(self, name):
return None
sys.modules['chromadb'] = MockModule('chromadb')
sys.modules['karo.memory.services.chromadb_service'] = MockModule('karo.memory.services.chromadb_service')
This patch will help us bypass the ChromaDB dependency. Redo Step 11 starting with git add .
to commit the new addition, refresh your deployed application to see it live.
Step 14: Using Your Deployed App
Once deployed, you can:
- Upload Excel files (under 10MB)
- Enter your API key in the settings
- Adjust the maximum rows and columns to process
- Click "Generate Summary" to see the AI analyze your data
Troubleshooting Deployment Issues
- If you encounter errors related to SQLite or ChromaDB when deploying to Streamlit Cloud, make sure that:
- The patch.py file is correctly set up
- The import patch line is at the very top of your streamlit_app.py file
- Your requirements.txt file is correctly generated and includes all necessary dependencies
Conclusion
You've now built and deployed a powerful Excel file summarizer that uses AI to analyze and extract insights from your data. This application demonstrates how to combine Streamlit's user-friendly interface with powerful LLMs through the Karo framework.
This approach can be extended to create more sophisticated data analysis tools, dashboards, or even complete AI-powered applications. The possibilities are endless!
If you want to see this tutorial in action, check out the Excel Summarizer Agent.
Happy coding!