Introduction
In my Unlocking Image Analysis with Snowflake Cortex COMPLETE Multimodal Function, I introduced how this newly available feature in Snowflake's Cortex AI can process images. Today, I'll show you how to build a practical application using this functionality: an AI Flowchart Diagram Cleanup Tool.
This application allows users to upload hand-drawn flowcharts, have the AI analyze them, generate Graphviz DOT code, and display a clean, formatted diagram. It's a handy tool for transforming whiteboard sketches from meetings or hand-drawn concept diagrams into professional-looking flowcharts with minimal effort.
As someone who frequently draws architecture diagrams on whiteboards at work, having a tool that automatically creates clean versions can significantly improve productivity!
If you're interested in learning more about using Graphviz, check out my article on Building a Flow Diagram Auto-Generation App with Streamlit in Snowflake (SiS) and Cortex AI, and for more on file uploads in Streamlit for Snowflake, see File Upload and Download with Streamlit in Snowflake.
Note: This article represents my personal views and not those of Snowflake.
Application Features
This "AI Flowchart Diagram Cleanup Tool" includes the following features:
-
Image Upload: Users can upload hand-drawn flowchart images
- Original images are stored in a Snowflake stage
- Oversized images are automatically resized for more efficient processing
- AI Analysis: Snowflake's Cortex COMPLETE Multimodal function analyzes the images
- DOT Code Generation: The AI generates Graphviz DOT code based on the image
- Graph Visualization: The application renders a clean diagram from the generated DOT code
We'll implement this using Streamlit in Snowflake, which allows us to build interactive web applications directly within the Snowflake environment without connecting to external services.
Application Images
Prerequisites
- A Snowflake account
- With access to Cortex LLM (cross-region inference has removed most cloud/region constraints)
- Streamlit in Snowflake installation packages
- Python 3.11 or later
- snowflake-ml-python 1.8.0 or later
- python-graphviz 0.20.1 or later
See Cortex LLM Region Availability (Snowflake Documentation)
A Note on Stage Setup
The application automatically creates the required stage when it starts, so manual preparation is unnecessary. For more details on stages and file uploading, refer to my previous article:
File Upload and Download with Streamlit in Snowflake
That article explains how to use st.file_uploader
for file uploads and generate presigned URLs. We'll use the same approach in this app to store files in an internal stage for AI analysis.
Implementation Steps
Create a New Streamlit in Snowflake App
Click on "Streamlit" in the left panel of Snowsight, then click the "+ Streamlit" button to create a new SiS app.
Run the Streamlit in Snowflake App
In the Streamlit in Snowflake app editor, install snowflake-ml-python
and python-graphviz
, then copy and paste the following code. (The code is long, so I've hidden it in a collapsible section.)
import os
import io
import streamlit as st
import graphviz
from snowflake.snowpark.context import get_active_session
from snowflake.cortex import Complete as CompleteText
from PIL import Image
# Application settings
st.set_page_config(
layout="wide",
initial_sidebar_state="expanded"
)
# -------------------------------------
# Get Snowflake session
# -------------------------------------
session = get_active_session()
# -------------------------------------
# Constants and settings
# -------------------------------------
# Supported image file extensions
SUPPORTED_IMAGE_EXTENSIONS = ['.jpg', '.jpeg', '.png', '.webp', '.gif', '.bmp']
# Stage name (without @)
DEFAULT_STAGE_NAME = "IMAGE_FLOW_STAGE"
# Default maximum image size for resizing (pixels)
MAX_IMAGE_SIZE = 1024
# Maximum image file size (MB)
MAX_FILE_SIZE_MB = 10.0
# -------------------------------------
# Stage existence check and creation
# -------------------------------------
def ensure_stage_exists(stage_name_no_at: str):
"""
Creates the stage if it doesn't exist. Does nothing if it already exists.
Enables directory table and encryption.
"""
try:
# Check if stage exists
session.sql(f"DESC STAGE {stage_name_no_at}").collect()
except:
# Create stage if it doesn't exist
try:
session.sql(f"""
CREATE STAGE {stage_name_no_at}
DIRECTORY = ( ENABLE = true )
ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' )
""").collect()
st.sidebar.success(f"Stage @{stage_name_no_at} has been created.")
except Exception as e:
st.sidebar.error(f"Failed to create stage: {str(e)}")
st.stop()
# -------------------------------------
# Get list of files in stage
# -------------------------------------
def get_stage_files(stage_name: str):
"""
Get list of files in the specified stage
"""
try:
# Add @ prefix if missing
stage_name_with_at = stage_name
if not stage_name.startswith('@'):
stage_name_with_at = f"@{stage_name}"
stage_files = session.sql(f"LIST {stage_name_with_at}").collect()
if stage_files:
file_names = [
row['name'].split('/', 1)[1] if '/' in row['name'] else row['name']
for row in stage_files
]
return file_names
else:
return []
except Exception as e:
st.error(f"Failed to get file list from stage: {str(e)}")
return []
# -------------------------------------
# Image resizing
# -------------------------------------
def resize_image_if_needed(img_data: bytes, filename: str, max_size: int = MAX_IMAGE_SIZE) -> tuple:
"""
Load image data and resize if needed
Parameters:
- img_data: Binary image data
- filename: Original filename
- max_size: Maximum size (in pixels)
Returns:
- resize_needed: Whether resizing was needed
- resized_data: Resized image data or original data
- resized_filename: Resized image filename or original filename
- dimensions: Image dimensions (width, height) after processing
"""
try:
# Open image with PIL
img = Image.open(io.BytesIO(img_data))
width, height = img.size
# Check if resizing is needed
if width > max_size or height > max_size:
# Resizing needed
resize_ratio = min(max_size / width, max_size / height)
new_width = int(width * resize_ratio)
new_height = int(height * resize_ratio)
# Resize using high-quality algorithm
resized_img = img.resize((new_width, new_height), Image.LANCZOS)
# Convert resized image to byte stream
buffered = io.BytesIO()
# Keep original format
file_ext = os.path.splitext(filename)[1].lower()
if file_ext in ['.jpg', '.jpeg']:
resized_img.save(buffered, format="JPEG", quality=95)
elif file_ext == '.png':
resized_img.save(buffered, format="PNG")
elif file_ext == '.webp':
resized_img.save(buffered, format="WEBP")
elif file_ext == '.gif':
resized_img.save(buffered, format="GIF")
elif file_ext == '.bmp':
resized_img.save(buffered, format="BMP")
else:
# Default to PNG
resized_img.save(buffered, format="PNG")
# Change filename to indicate resizing
name, ext = os.path.splitext(filename)
resized_filename = f"{name}_resized{ext}"
return True, buffered.getvalue(), resized_filename, (new_width, new_height)
else:
# Resizing not needed
return False, img_data, filename, (width, height)
except Exception as e:
# Return original data if error occurs
st.warning(f"Error during image resizing: {str(e)}. Continuing with original size.")
return False, img_data, filename, (0, 0)
# -------------------------------------
# Verify file on stage
# -------------------------------------
def verify_image_file(stage_name: str, file_name: str):
"""
Verify that the specified file exists on stage and is a valid image format
"""
try:
# Add @ prefix if missing
stage_name_with_at = stage_name
if not stage_name.startswith('@'):
stage_name_with_at = f"@{stage_name}"
# Check if file exists
file_check = session.sql(f"LIST {stage_name_with_at}/{file_name}").collect()
if not file_check:
return False, "File does not exist on stage"
# Check file extension
file_extension = os.path.splitext(file_name)[1].lower()
if file_extension not in SUPPORTED_IMAGE_EXTENSIONS:
return False, f"Unsupported file format: {file_extension}"
# Check file size
file_size_bytes = file_check[0]['size']
file_size_mb = file_size_bytes / (1024 * 1024)
if file_size_mb > MAX_FILE_SIZE_MB:
return False, f"File size too large ({file_size_mb:.2f}MB). Must be less than {MAX_FILE_SIZE_MB}MB."
return True, "Validation successful"
except Exception as e:
return False, f"Error during file validation: {str(e)}"
# -------------------------------------
# AI Analysis Functions
# -------------------------------------
def analyze_image_with_ai(stage_name: str, file_name: str, model_name: str, prompt: str):
"""
Analyze image file on stage using AI
"""
query = "Query not yet executed" # Default value for exception handling
try:
# Validate file
is_valid, validation_message = verify_image_file(stage_name, file_name)
if not is_valid:
return validation_message
# Escape special characters
sql_safe_prompt = prompt.replace("'", "''")
# Ensure stage name has @ prefix
if not stage_name.startswith('@'):
stage_name_with_at = f"@{stage_name}"
else:
stage_name_with_at = stage_name
# SQL syntax following Snowflake documentation
query = f"SELECT SNOWFLAKE.CORTEX.COMPLETE('{model_name}', '{sql_safe_prompt}', TO_FILE('{stage_name_with_at}', '{file_name}'))"
# Execute query
result = session.sql(query).collect()
# Return result
if result and len(result) > 0:
return result[0][0]
else:
return "No results from AI analysis."
except Exception as e:
# Get detailed error message
error_str = str(e)
return f"Error during AI analysis: {error_str}\nQuery: {query}"
# -------------------------------------
# DOT code generation prompt
# -------------------------------------
def generate_dot_code_prompt():
"""
Create prompt for AI to generate DOT code
"""
return """
Generate a Graphviz DOT code representation of the provided flowchart or diagram, following these rules:
1. Analyze the image and faithfully represent its structure in the DOT code.
2. Choose appropriate shapes and colors for nodes and edges that match the diagram.
3. Add labels to nodes and edges wherever possible.
4. Use readable colors where the original doesn't specify any.
5. Ensure the generated DOT code is renderable by Graphviz.
6. Enclose the code in "digraph G {" and "}" tags.
7. Return ONLY the DOT code without any additional explanation.
"""
# -------------------------------------
# Format DOT code
# -------------------------------------
def format_dot_code(dot_code: str):
"""
Format the DOT code generated by AI
"""
# Remove code block markers if present
dot_code = dot_code.strip()
if dot_code.startswith("```
dot"):
dot_code = dot_code[6:]
elif dot_code.startswith("
```"):
dot_code = dot_code[3:]
if dot_code.endswith("```
"):
dot_code = dot_code[:-3]
return dot_code.strip()
# -------------------------------------
# Initialize session state
# -------------------------------------
if 'dot_code' not in st.session_state:
st.session_state['dot_code'] = ""
if 'ai_output' not in st.session_state:
st.session_state['ai_output'] = ""
if 'analyzed_image' not in st.session_state:
st.session_state['analyzed_image'] = None
if 'original_image' not in st.session_state:
st.session_state['original_image'] = None
if 'image_dimensions' not in st.session_state:
st.session_state['image_dimensions'] = None
if 'active_tab' not in st.session_state:
st.session_state['active_tab'] = "upload"
# -------------------------------------
# Main application
# -------------------------------------
def main():
st.title("AI Flowchart Diagram Cleanup Tool")
st.markdown("Upload a hand-drawn flowchart, generate Graphviz DOT code with AI, and display a clean diagram.")
# Sidebar settings
st.sidebar.header("Settings")
# Stage settings
stage_name = st.sidebar.text_input(
"Enter stage name (e.g., IMAGE_FLOW_STAGE)",
DEFAULT_STAGE_NAME
)
# Create stage if it doesn't exist
stage_name_no_at = stage_name.lstrip('@')
ensure_stage_exists(stage_name_no_at)
# AI model selection
st.sidebar.header("AI Model Settings")
model_name = st.sidebar.selectbox(
"Select AI model to use",
[
"claude-3-5-sonnet",
"pixtral-large"
],
index=0
)
# Image resizing settings
st.sidebar.header("Image Processing Settings")
resize_enabled = st.sidebar.checkbox("Automatically resize large images", value=True)
resize_max_size = st.sidebar.slider("Maximum size for resizing (pixels)", 512, 2048, MAX_IMAGE_SIZE)
# Create tabs
tab_upload, tab_analyze, tab_visualize = st.tabs([
"Upload Image",
"AI Analysis",
"Graphviz Visualization"
])
# Tab 1: Image Upload
with tab_upload:
st.header("Upload Flowchart Image")
st.write("Upload a flowchart or architecture diagram image.")
uploaded_file = st.file_uploader(
"Supported formats: JPG, PNG, WEBP, GIF, BMP",
type=['jpg', 'jpeg', 'png', 'webp', 'gif', 'bmp']
)
if uploaded_file:
# Check file extension
file_extension = os.path.splitext(uploaded_file.name)[1].lower()
if file_extension in SUPPORTED_IMAGE_EXTENSIONS:
try:
# Read file as binary
file_data = uploaded_file.getvalue()
# Resize if needed
if resize_enabled:
resized, resized_data, resized_filename, img_dimensions = resize_image_if_needed(
file_data, uploaded_file.name, resize_max_size
)
else:
resized = False
resized_data = file_data
resized_filename = uploaded_file.name
# Get size info
img = Image.open(io.BytesIO(file_data))
img_dimensions = img.size
# Upload to stage
stage_name_with_at = f"@{stage_name_no_at}"
# Upload original and resized file if needed
if resized:
# Upload original file
session.file.put_stream(
io.BytesIO(file_data),
f"{stage_name_with_at}/{uploaded_file.name}",
auto_compress=False,
overwrite=True
)
# Upload resized file
session.file.put_stream(
io.BytesIO(resized_data),
f"{stage_name_with_at}/{resized_filename}",
auto_compress=False,
overwrite=True
)
st.success(f"File '{uploaded_file.name}' uploaded and resized to '{resized_filename}' for AI analysis!")
st.session_state['analyzed_image'] = resized_filename
st.session_state['original_image'] = uploaded_file.name
else:
# Upload only original file if no resizing needed
session.file.put_stream(
io.BytesIO(file_data),
f"{stage_name_with_at}/{uploaded_file.name}",
auto_compress=False,
overwrite=True
)
st.success(f"File '{uploaded_file.name}' uploaded successfully!")
st.session_state['analyzed_image'] = uploaded_file.name
st.session_state['original_image'] = uploaded_file.name
# Save image dimensions
st.session_state['image_dimensions'] = img_dimensions
# Image preview
st.subheader("Image Preview")
st.image(uploaded_file, caption=f"{uploaded_file.name} ({img_dimensions[0]}x{img_dimensions[1]} pixels)")
if resized:
st.info(f"Image was resized to {img_dimensions[0]}x{img_dimensions[1]} pixels for AI analysis.")
except Exception as e:
st.error(f"Error during file upload: {str(e)}")
else:
st.error("Unsupported file format.")
# Tab 2: AI Analysis
with tab_analyze:
st.header("Cortex COMPLETE Multimodal Analysis")
# List of files on stage
st.subheader("Files on Stage")
files = get_stage_files(stage_name_no_at)
if files:
selected_file = st.selectbox("Select image to analyze", files)
if selected_file:
# Save selected file to session
st.session_state['analyzed_image'] = selected_file
# Check if it's a resized file
if "_resized" in selected_file:
# Guess original filename
original_name = selected_file.replace("_resized", "")
if original_name in files:
st.session_state['original_image'] = original_name
else:
st.session_state['original_image'] = selected_file
else:
st.session_state['original_image'] = selected_file
# Ensure stage name has @ prefix
if not stage_name.startswith('@'):
stage_name_with_at = f"@{stage_name}"
else:
stage_name_with_at = stage_name
# Image preview
try:
# Generate presigned URL to display the file
url_result = session.sql(f"""
SELECT GET_PRESIGNED_URL(
'{stage_name_with_at}',
'{selected_file}',
3600
)
""").collect()
signed_url = url_result[0][0]
st.image(signed_url, caption=selected_file)
except Exception as e:
st.error(f"Error retrieving image: {str(e)}")
# DOT code generation button
if st.button("Generate DOT Code"):
with st.spinner(f"AI is analyzing image '{selected_file}'..."):
prompt = generate_dot_code_prompt()
ai_output = analyze_image_with_ai(
stage_name,
selected_file,
model_name,
prompt
)
# Save results to session
st.session_state['ai_output'] = ai_output
st.session_state['dot_code'] = format_dot_code(ai_output)
# Display DOT code
st.subheader("Generated DOT Code")
st.text_area("DOT Code", st.session_state['dot_code'], height=300)
else:
st.info("No image files found on stage. Please upload an image in the 'Upload Image' tab.")
# Tab 3: Graphviz Visualization
with tab_visualize:
st.header("Graphviz Visualization")
if st.session_state['dot_code']:
# Editable DOT code
dot_code = st.text_area("DOT Code (editable)", st.session_state['dot_code'], height=300)
st.session_state['dot_code'] = dot_code
try:
# Render DOT code with Graphviz
graph = graphviz.Source(dot_code)
st.subheader("Generated Diagram")
st.graphviz_chart(graph)
# Export instructions
st.info("To export the diagram, copy the DOT code above and save it to a file.")
except Exception as e:
st.error(f"Error parsing DOT code: {str(e)}")
else:
st.info("No DOT code generated yet. Go to the 'AI Analysis' tab to generate DOT code.")
# Help information
with st.expander("Help & Usage Guide"):
st.markdown("""
### How to Use
1. In the **Upload Image** tab, upload a flowchart or architecture diagram image.
2. In the **AI Analysis** tab, select the image from the stage and generate DOT code.
3. In the **Graphviz Visualization** tab, view and edit the generated diagram.
### Image Resizing
Large images are automatically resized for efficient AI processing.
You can enable/disable resizing and adjust the maximum size in the sidebar settings.
### Supported File Formats
- JPG/JPEG
- PNG
- WEBP
- GIF
- BMP (Pixtral Large model only)
### About AI Models
- **Claude 3.5 Sonnet**: Anthropic's multimodal model with advanced visual processing and language understanding
- **Pixtral Large**: Mistral AI's model optimized for visual reasoning tasks
### About Graphviz
Graphviz is a tool for visualizing graph structures. It uses the DOT language to represent various diagrams.
The generated DOT code is editable and can be manually modified as needed.
""")
# Run the application
if __name__ == "__main__":
main()
Key Technical Highlights
1. Image Resizing
Image processing is computationally expensive for AI models, so I've implemented automatic resizing for large images. This not only reduces processing time but also helps manage costs.
The implementation uses Python's PIL (Pillow) library with the LANCZOS algorithm for high-quality resizing, which works particularly well for diagrams with text and line art:
python
# High-quality resizing using LANCZOS algorithm
resized_img = img.resize((new_width, new_height), Image.LANCZOS)
2. Prompt Engineering
To generate appropriate DOT code from images, I've designed a specific prompt with "few-shot" rules. Feel free to customize this prompt to improve results for your specific use cases:
python
# -------------------------------------
# DOT code generation prompt
# -------------------------------------
def generate_dot_code_prompt():
"""
Create prompt for AI to generate DOT code
"""
return """
Generate a Graphviz DOT code representation of the provided flowchart or diagram, following these rules:
1. Analyze the image and faithfully represent its structure in the DOT code.
2. Choose appropriate shapes and colors for nodes and edges that match the diagram.
3. Add labels to nodes and edges wherever possible.
4. Use readable colors where the original doesn't specify any.
5. Ensure the generated DOT code is renderable by Graphviz.
6. Enclose the code in "digraph G {" and "}" tags.
7. Return ONLY the DOT code without any additional explanation.
"""
Conclusion
This "AI Flowchart Diagram Cleanup Tool" demonstrates a practical application of Snowflake's Cortex COMPLETE Multimodal function. By combining Snowflake's data platform capabilities with cutting-edge generative AI, we can create innovative applications that provide real value.
This application can be useful in various scenarios:
- Digitizing whiteboard diagrams from meetings
- Converting hand-drawn concept sketches to digital form
- Cleaning up architecture diagrams and flowcharts
- Saving time in diagram creation
Snowflake's Cortex COMPLETE Multimodal function can be applied to many other image processing tasks beyond this example. I hope to continue exploring and sharing new use cases in the future.
Promotion
Snowflake What's New Updates on X
I'm sharing updates on Snowflake's What's New on X. I'd be happy if you could follow:
English Version
Snowflake What's New Bot (English Version)
Japanese Version
Snowflake's What's New Bot (Japanese Version)
Change Log
(20250417) Initial post