Introduction

On April 14, 2025, Snowflake announced the public preview of Cortex COMPLETE Multimodal, a powerful new capability within its Cortex AI suite.

While multimodal LLMs have been available for a while, the integration of image analysis capabilities into Snowflake's COMPLETE function represents more than just convenient access to multimodal LLMs. It enables you to add new value to your existing data workflows. This article introduces the multimodal capabilities of the COMPLETE function and how it can transform your data pipelines.

Note (2025/4/15): Cortex COMPLETE Multimodal is currently in public preview, so features may change significantly in future updates.

Note: This article represents my personal views and not those of Snowflake.

What is Cortex COMPLETE Multimodal?

Cortex COMPLETE Multimodal allows you to analyze images using the COMPLETE function. With just SQL or Python, you can perform powerful image processing tasks such as:

  • Comparing images
  • Generating image captions
  • Classifying images
  • Extracting entities from images
  • Answering questions based on data in graphs and charts

Previously, image processing required external APIs, services, or complex implementations with libraries like OpenCV in Python. With this feature, you can process images directly within your data workflows, simplifying your data pipelines and easily attaching image processing capabilities to your existing data workloads.

Available Models

The COMPLETE function's multimodal capabilities currently support two powerful models:

  • Claude 3.5 Sonnet: An Anthropic multimodal model with advanced visual processing and language understanding capabilities

    • Model name parameter: claude-3-5-sonnet
    • Context window: 200,000 tokens
    • Supported file types: .jpg, .jpeg, .png, .webp, .gif
    • Maximum file size: 3.75MB
  • Pixtral Large: A Mistral AI model that excels at visual reasoning tasks and supports multiple languages

    • Model name parameter: pixtral-large
    • Context window: 128,000 tokens
    • Supported file types: .jpg, .jpeg, .png, .webp, .gif, .bmp
    • Maximum file size: 10MB

Notably, Pixtral Large, Mistral AI's latest multimodal LLM, became generally available on April 14, 2025 and shows promise particularly when combined with data analysis.

The following clouds and regions support Cortex COMPLETE Multimodal; however, by enabling cross-region inference, you can use this feature across any cloud and region without restriction. For details, see Cortex COMPLETE Multimodal Regional Availability.

Preparation: Creating a Stage for Images

First, you need to create a stage to store your images. The stage must have server-side encryption and a directory table enabled. (You can do this via SQL query or through the Snowsight GUI):

-- Create an internal stage
CREATE OR REPLACE STAGE image_stage
    DIRECTORY = ( ENABLE = true )
    ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' );

Next, upload images to the stage. Supported image formats are .jpg, .jpeg, .png, .webp, .gif, and .bmp (.bmp is only supported by Pixtral Large):

-- Upload images
PUT file:///path/to/your/image.jpg @image_stage
AUTO_COMPRESS = FALSE;

Practical Examples: Image Analysis

Example 1: Image Caption Generation

Generating captions for images is straightforward. You simply add one argument to the COMPLETE function to pass an image file from your stage:

-- Image caption generation with Cortex COMPLETE Multimodal
SELECT SNOWFLAKE.CORTEX.COMPLETE('',
    'Please provide a concise description of this image.',
    TO_FILE('@', ''));

sample_image

For example, when using this query with an image, you might get output like:

-- claude-3-5-sonnet output
This is a close-up photo of a cream/light-colored Chihuahua dog resting on what appears to be a blue quilted blanket or comforter. The dog has the characteristic large, alert eyes and prominent pointed ears typical of the Chihuahua breed. The lighting appears natural, possibly from a nearby window, and the photo captures the dog's sweet, attentive expression.
-- pixtral-large output
The image shows a small, light-colored dog, likely a Chihuahua, resting on a blue quilted blanket. The dog has large, erect ears and a short, smooth coat. Its eyes are open, and it appears to be calmly looking at something. The background includes a window with a view of the outdoors, and there are some cushions or pillows visible behind the dog. The overall setting seems to be a cozy indoor environment.

Example 2: Multiple Image Comparison

You can compare multiple images and explain the differences. Note that the query structure differs slightly from Example 1. You can pass up to 100 images, but currently only claude-3-5-sonnet supports this functionality:

SELECT SNOWFLAKE.CORTEX.COMPLETE('claude-3-5-sonnet',
    PROMPT('Extract the similarities between the two images {0} and {1}.',
    TO_FILE('@', ''),
    TO_FILE('@', '')));

sample_image

sample_image

When comparing two images, you might get output like:

-- claude-3-5-sonnet output
These images show what appears to be the same Chihuahua dog in different settings. The similarities between the images include:

1. The dog's physical characteristics:
- Cream/light tan and white colored fur
- Pointed, erect ears
- Large, dark eyes
- Small, compact body typical of a Chihuahua
- Similar facial features and expression

2. The dog's grooming:
- Well-maintained, clean coat
- Similar fur length and texture

3. The photography style:
- Both are clear, well-lit photos
- Both are taken from a slightly elevated angle
- Both show the dog in a relaxed, casual setting

The main difference is that in the first image, the dog is resting on what appears to be a blue quilted surface, while in the second image, the dog is on a carpeted floor with a toy.

Also, within the PROMPT function, you need to include placeholders corresponding to each image file. In this example, {0} refers to the first image file and {1} refers to the second.

Example 3: Information Extraction from Images

You can extract information from handwritten diagrams or charts:

SELECT SNOWFLAKE.CORTEX.COMPLETE('',
    'This image shows a hand-drawn architecture diagram. Please explain the structure in detail.',
    TO_FILE('@', ''));

sample_architecture

For a hand-drawn architecture diagram, you might get output like:

-- claude-3-5-sonnet output
This diagram illustrates a data pipeline architecture between AWS and Snowflake. Here's a detailed breakdown:

1. AWS Side:
- Contains an S3 bucket, which is AWS's object storage service
- Located on the left side of the diagram

2. Middle Component:
- Shows "Snowpipe" which is a continuous data ingestion service
- Represented by a connecting line between AWS and Snowflake
- Acts as the data transfer mechanism

3. Snowflake Side (right):
- Contains two main components:
  1. A "Stage" area where data initially lands
  2. A "Table" where the data is ultimately stored
- Shows a "Copy Into" operation that moves data from the Stage to the Table

The flow appears to be:
1. Data is stored in AWS S3
2. Snowpipe automatically detects and transfers this data
3. Data lands in a Snowflake Stage
4. A "Copy Into" command loads the data into the final Snowflake Table

This is a common architecture for automated, continuous data loading from AWS into Snowflake, typically used for real-time or near-real-time data integration scenarios.
-- pixtral-large output
 The hand-drawn architecture diagram illustrates a data flow between Amazon Web Services (AWS) and Snowflake, a cloud-based data warehousing service. Here is a detailed explanation of the structure:

1. **AWS (Amazon Web Services)**:
   - **S3 (Simple Storage Service)**: This is a scalable storage service provided by AWS. In the diagram, S3 is represented as a bucket, which is a container for storing objects (files).

2. **Snowpipe**:
   - Snowpipe is a continuous data ingestion service provided by Snowflake. It enables loading data into Snowflake as soon as it becomes available in a stage.
   - The diagram shows data being transferred from the S3 bucket in AWS to Snowflake via Snowpipe.

3. **Snowflake**:
   - **Stage**: This is a temporary storage area in Snowflake where data files are staged before being loaded into a table. The diagram shows data from S3 being moved into the stage.
   - **Table**: This is the final destination in Snowflake where the data is stored in a structured format. The diagram indicates that data from the stage is copied into the table.
   - **Copy Into**: This is a command used in Snowflake to load data from the stage into the table. The diagram shows an arrow labeled "Copy Into" pointing from the stage to the table, indicating the process of loading data.

In summary, the diagram depicts a data pipeline where data stored in an S3 bucket in AWS is continuously ingested into Snowflake using Snowpipe. The data is first staged in Snowflake and then copied into a table for structured storage and further analysis.

Business Ideas

The multimodal functionality can be leveraged in various business scenarios, including:

  1. E-commerce product image management: Automatically generate descriptions and tags from product images
  2. Real estate photo analysis: Extract floor plans and features from property photos
  3. Document image data extraction: Obtain structured data from images of invoices and contracts
  4. Medical image organization and search: Automatically add metadata to medical images
  5. Social media image analysis: Analyze social media image content for marketing purposes

By combining these capabilities with other Snowflake features like Streamlit in Snowflake, you can further expand the possibilities for data applications. I encourage you to implement your ideas using Cortex COMPLETE Multimodal.

Conclusion

The multimodal capabilities of the COMPLETE function are incredibly powerful, enabling sophisticated image processing with standard functions out of the box. The key advantage is that these capabilities can be integrated into existing Snowflake workflows, allowing you to extract even more value from your data. This article covered only the basic functionality, but I'll be sharing more advanced use cases for Cortex COMPLETE Multimodal in the near future.

Promotion

Snowflake What's New Updates on X

I'm sharing updates on Snowflake's What's New on X. I'd be happy if you could follow:

English Version

Snowflake What's New Bot (English Version)

Japanese Version

Snowflake's What's New Bot (Japanese Version)

Change Log

(20250415) Initial post
(20250416) Added support cloud and region availability information
(20250421) Updated Claude 3.5 Sonnet max file size to 3.75MB and added PROMPT placeholder explanation

Original Japanese Article

https://zenn.dev/tsubasa_tech/articles/167f2c3826dc02