Tinybird is an analytics backend for software applications, and as LLM usage and AI features become more commonplace, developers are increasingly using Tinybird to track LLM usage, costs, and performance, both internally and in user-facing features.

We recently open sourced an app template, the LLM Performance Tracker, that includes a frontend + backend to capture LLM calls from your AI apps and analyze your LLM usage in real time.

The template is full of cool features (kudos to my coworker @alrocar, but I want to focus on one in particular because I think it's going to be the new normal for real-time data visualization.

If you check out the live demo of the app, you'll notice a button in the top right corner: AI Cost Calculator.

Clicking this button opens a modal where you can define how you want to visualize your LLM costs. For example:

Show me cost for OpenAI provider by organization

Filter by organization quantum_systems and show costs for last week

Cost for OpenAI provider in production environment in last month group by model

How would costs change if we use Claude 3 Opus at $0.00003 per prompt token and $0.00015 per completion token?

You can see it in action here:

A quick summary of what is happening under the hood:

  1. The user input is passed to an API
  2. The API uses an LLM to generate structured parameters for Tinybird data API
  3. The component determines what kind of chart to show by analyzing the user input
  4. The component fetches the Tinybird API with the LLM-supplied filters and hydrates the chart

Let me walk you through how we built this feature. If you're interested in building dynamic, user-generated data visualizations in your application, you can use this as inspiration.

By the way, all of the code snippets I share below are gleaned from the open source LLM Performance Tracker repo.

The components

There are 4 core components to this feature:

  1. A Tinybird datasource called llm_events.datasource
  2. A Tinybird pipe called llm_usage.pipe
  3. A React component called CostPredictionModal.tsx
  4. An extract-cost-parameters API route

In addition, there are some utilities and services to simplify fetching the Tinybird APIs from the frontend.

Let's take a look at each of these components sequentially to understand how to create user-generated real-time data visualizations.

Want to try Tinybird?

If you've read this far, you might want to use Tinybird as your analytics backend. Start for free with no time limit. Sign up

Storing and processing LLM calls with Tinybird

The basic primitives in Tinybird are data sources and pipes. Data sources store data, pipes transform it.

The llm_events data source in this project is designed to store time series data: LLM call events and all the metadata associated with the call.

Here's the table schema:

DESCRIPTION >
    'Store LLM events with usage metrics, costs, and metadata'

SCHEMA >
    `timestamp` DateTime `json:$.start_time`,
    `organization` String `json:$.proxy_metadata.organization` DEFAULT '',
    `project` String `json:$.proxy_metadata.project` DEFAULT '',
    `environment` String `json:$.proxy_metadata.environment` DEFAULT '',
    `user` String `json:$.user` DEFAULT 'unknown',
    `chat_id` String `json:$.proxy_metadata.chat_id` DEFAULT '',
    `message_id` String `json:$.message_id`,
    `model` LowCardinality(String) `json:$.model` DEFAULT 'unknown',
    `prompt_tokens` UInt16 `json:$.response.usage.prompt_tokens` DEFAULT 0,
    `completion_tokens` UInt16 `json:$.response.usage.completion_tokens` DEFAULT 0,
    `total_tokens` UInt16 `json:$.response.usage.total_tokens` DEFAULT 0,
    `response_time` Float32 `json:$.standard_logging_object_response_time` DEFAULT 0,
    `duration` Float32 `json:$.duration` DEFAULT 0,
    `cost` Float32 `json:$.cost` DEFAULT 0,
    `exception` String `json:$.exception` DEFAULT '',
    `traceback` String `json:$.traceback` DEFAULT '',
    `response_status` LowCardinality(String) `json:$.standard_logging_object_status` DEFAULT 'unknown',
    `messages` Array(Map(String, String)) `json:$.messages[:]` DEFAULT [],
    `response_choices` Array(String) `json:$.response.choices[:]` DEFAULT [],
    `proxy_metadata` String `json:$.proxy_metadata` DEFAULT '',
    `provider` LowCardinality(String) `json:$.provider` DEFAULT 'unknown',
    `llm_api_duration_ms` Float32 `json:$.llm_api_duration_ms` DEFAULT 0,
    `end_time` DateTime `json:$.end_time`,
    `id` String `json:$.id` DEFAULT '',
    `stream` Bool `json:$.stream` DEFAULT false,
    `call_type` LowCardinality(String) `json:$.call_type` DEFAULT 'unknown',
    `api_key` String `json:$.api_key` DEFAULT '',
    `log_event_type` LowCardinality(String) `json:$.log_event_type` DEFAULT 'unknown',
    `cache_hit` Bool `json:$.cache_hit` DEFAULT false,
    `response` String `json:$.response` DEFAULT '',
    `response_id` String `json:$.response.id`,
    `response_object` String `json:$.response.object` DEFAULT 'unknown',
    `embedding` Array(Float32) `json:$.embedding[:]` DEFAULT []

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, organization, project, environment, user, chat_id"
ENGINE_PRIMARY_KEY "timestamp, organization, project"

The llm_usage pipe defines a SQL query to select from the llm_events table. This pipe gets deployed as an API Endpoint, with query parameters defined using the Tinybird templating language:

TOKEN "read_pipes" READ

NODE endpoint
SQL >
    %
    {% if defined(column_name) and column_name not in ['model', 'provider', 'organization', 'project', 'environment', 'user'] %}
        {{ error('column_name (String) query param must be one of the following: model, provider, organization, project, environment, user') }}
    {% end %}
    SELECT
        toDate(timestamp) as date,
        {% if defined(column_name) %}
            toString({{column(column_name, 'model')}}) as category,
        {% end %}
        count() as total_requests,
        countIf(exception != '') as total_errors,
        sum(total_tokens) as total_tokens,
        sum(completion_tokens) as total_completion_tokens,
        sum(prompt_tokens) as total_prompt_tokens,
        sum(cost) as total_cost,
        avg(duration) as avg_duration,
        avg(response_time) as avg_response_time
    FROM llm_events
    WHERE 1
        {% if defined(organization) and organization != [''] %}
        AND organization IN {{Array(organization)}}
        {% end %}
        {% if defined(project) and project != [''] %}
        AND project IN {{Array(project)}}
        {% end %}
        {% if defined(environment) and environment != [''] %}
        AND environment IN {{Array(environment)}}
        {% end %}
        {% if defined(provider) and provider != [''] %}
        AND provider IN {{Array(provider)}}
        {% end %}
        {% if defined(user) and user != [''] %}
        AND user IN {{Array(user)}}
        {% end %}
        {% if defined(model) and model != [''] %}
        AND model IN {{Array(model)}}
        {% end %}
        {% if defined(start_date) %}
        AND timestamp >= {{DateTime(start_date)}}
        {% else %}
        AND timestamp >= now() - interval 7 day
        {% end %}
        {% if defined(end_date) %}
        AND timestamp < {{DateTime(end_date)}}
        {% else %}
        AND timestamp < now()
        {% end %}
    GROUP BY 
    {% if defined(column_name) %}
    category,
    {% end %} 
    date
    order by total_cost desc

TYPE endpoint

A quick explanation of what is happening in this pipe definition:

  • Aggregates LLM usage data (e.g. cost, tokens, requests, etc.) by date and, optionally, by a specified category (e.g. model)
  • Is secured by a read_pipes token
  • Includes dynamic filtering for optionally supplied parameters like model, organization, project, environment, etc.

Once deployed (tb --cloud deploy), we can access this API via HTTP and supply parameters in the URL, for example:

curl -d https://api.tinybird.co/v0/pipes/llm_usage.json?token=$READ_PIPES_TOKEN&column_name=model&provider=openai

This will return a JSON object with time series data containing all of the aggregate metrics grouped by model, filtered only on OpenAI calls.

This API is designed for scalability and speed, and should easily respond in milliseconds even as the number of LLM calls logged grows into the millions.

The time series chart in our dynamic UI fetches data from this API.

Defining an API route to generate structured parameters from user input

The extract-cost-parameters API route is the key piece of AI functionality. The LLM's job is to take the free-text user input, analyze it, and produce a set of structured parameters that can be passed to the Tinybird API.

To do that, it implements the following logic.

First, it fetches the pipe definition for the llm_usage pipe and the available dimensions (from another Tinybird API endpoint, llm_dimensions):

TOKEN read_pipes READ

NODE llm_dimensions_node
SQL >
%
    SELECT
        groupUniqArray(organization) as organizations,
        groupUniqArray(project) as project,
        groupUniqArray(environment) as environment,
        groupUniqArray(model) as model,
        groupUniqArray(provider) as provider
      FROM 
      (
        SELECT organization, project, environment, model, provider
        FROM 
        llm_events WHERE timestamp > now() - interval '1 month' 
        {% if defined(organization) and organization != [''] %}
            AND organization IN {{Array(organization)}}
        {% end %}
      )

TYPE endpoint

The available dimensions are used to define the system prompt for the LLM, so it knows which dimensions are available for filtering:

// System prompt
const systemPromptText = `
    You are a parameter extractor for an LLM cost calculator. Extract parameters from natural language queries about AI model cost predictions.

    Available dimensions and unique values:
    ${availableDimensions?.meta?.map((meta: { name: string, type: string }) => {
        const name = meta.name;
        const values = dimensionValues[name]?.values || [];
        return `- ${name.charAt(0).toUpperCase() + name.slice(1)}: ${JSON.stringify(values)}`;
    }).join('\n  ') || ''}

    Look for phrases like "filter by", "for", "in", "with", etc. to identify filtering parameters, guess the parameter name based on the available dimensions. Fix typos when necessary.
`;

Then, the request is made to the LLM provider (in this case OpenAI using gpt-3.5-turbo model):

const openai = createOpenAI({ apiKey: apiKey })
    const wrappedOpenAI = wrapModelWithTinybird(
      openai('gpt-3.5-turbo'),
      process.env.NEXT_PUBLIC_TINYBIRD_API_URL!,
      process.env.TINYBIRD_JWT_SECRET!,
      {
        event: 'ai_cost_calculator',
        environment: process.env.NODE_ENV,
        project: 'llm-tracker',
        organization: 'tinybird',
        chatId: generateRandomChatId(),
        user: hashApiKeyUser(apiKey),
        systemPrompt: systemPromptText,
      }
    );

    const result = await generateObject({
      model: wrappedOpenAI,
      schema: costParametersSchema,
      prompt: query,
      systemPrompt: systemPromptText,
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    } as any);

Note that we're using a wrapped model, which is how we instrument the Vercel AI SDK to send LLM call events to Tinybird for usage (so we're both using this app to analyze LLM calls, and also analyzing calls made from this AI app :mindblown:)

Finally, the backend does some type checking and applies defaults for missing parameters, returning the structured parameters in the API response:

// Type assertion to handle the result object
    const extractedParams = result.object as CostParameters;
    console.log('Extracted parameters:', extractedParams);

    // Ensure timeframe is correctly processed
    const timeframe = extractedParams.timeframe || 'last month';
    start_date = extractedParams.start_date || start_date;
    end_date = extractedParams.end_date || end_date;

    // Apply defaults for missing parameters
    const processedResult = {
      model: extractedParams.model || null,
      promptTokenCost: extractedParams.promptTokenCost || null,
      completionTokenCost: extractedParams.completionTokenCost || null,
      discount: extractedParams.discount || 0,
      timeframe: timeframe,
      volumeChange: extractedParams.volumeChange || 0,
      start_date: start_date,
      end_date: end_date,
      group_by: extractedParams.group_by || null,
      organization: extractedParams.organization || null,
      project: extractedParams.project || null,
      environment: extractedParams.environment || null,
      provider: extractedParams.provider || null,
      user: extractedParams.user || null,
      pipeDefinition: pipeDefinition,
      availableDimensions: dimensionValues
    };

    return NextResponse.json(processedResult);

Gathering user input and displaying the chart in the UI

The core UI component is CostPredictionModal.tsx, which handles receiving user input, getting structured parameters from the LLM, fetching data from Tinybird with the structured parameters, and defining the type of chart to use based on the query.

First, the component handles the user input:

// State for the query input
const [query, setQuery] = useState('');

// Input field in the form
<input
  type="text"
  placeholder="Ask AI..."
  value={query}
  onChange={(e) => setQuery(e.target.value)}
  onKeyDown={(e) => {
    if (e.key === 'Enter') {
      e.preventDefault();
      handleSubmit(e);
    }
  }}
/>

On submit, it determines the type of query based on heuristics (this could easily be handled by an additional LLM for more complex use cases):

// Check if this is a prediction query
const queryLower = query.toLowerCase();
const isPrediction = queryLower.includes('predict') || 
                     queryLower.includes('what if') || 
                     queryLower.includes('would cost') ||
                     queryLower.includes('change if') ||
                     queryLower.includes('switch to');

setIsPredictionQuery(isPrediction);

// Check if we're grouping by any dimension (including model)
const isGrouped = !!params.group_by;

And determines what type of chart to use (AreaChart vs BarChart, multiple categories, etc.) based on this analysis:

// The visualization type is determined by these state variables
const [isPredictionQuery, setIsPredictionQuery] = useState(false);
const [isGroupedData, setIsGroupedData] = useState(false);

// Different chart types are rendered based on these conditions
{isPredictionQuery ? (
  // Dual area chart for predictions
  <AreaChart
    categories={['actualCost', 'predictedCost']}
    // ... other props
  />
) : isGroupedData ? (
  // Stacked bar chart for grouped data
  <BarChart
    categories={chartCategories}
    stack={true}
    // ... other props
  />
) : (
  // Single area chart for regular cost analysis
  <AreaChart
    categories={['actualCost']}
    // ... other props
  />
)}

It then passes the user input to the extract-cost-parameters API route and sets the parameters based on the response:

const handleSubmit = async (e: React.FormEvent) => {
  e.preventDefault();
  if (!query.trim()) return;

  setIsLoading(true);
  try {
    …

    // Extract parameters from natural language query
    const response = await fetch('/api/extract-cost-parameters', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-custom-tinybird-token': token || '',
        'x-custom-tinybird-api-url': apiUrl || '',
      },
      body: JSON.stringify({ query, apiKey: openaiKey }),
    });

    const extractedParams = await response.json();
    setParameters(extractedParams);
  } catch (error) {
    console.error('Error generating prediction:', error);
  } finally {
    setIsLoading(false);
  }
};

Once it receives the parameters, it parses and cleans those parameters, fetches the Tinybird API, and runs the calculateCosts() function with the result:

// Effect that triggers when parameters change
useEffect(() => {
  async function fetchUsageData() {
    if (!parameters || !token) return;

    try {
      const columnName = parameters.group_by || 'model';

      // Build filters object with all possible filter parameters
      const filters: Record<string, string> = {
        ...currentFilters,
        start_date: formatDateForTinybird(parameters.start_date),
        end_date: formatDateForTinybird(parameters.end_date),
        column_name: columnName
      };

      // Add model filter if specified
      if (parameters.model) {
        const normalizedModel = normalizeModelName(parameters.model);
        if (normalizedModel) {
          filters.model = normalizedModel;
        }
      }

      // Add other filters...

      const response = await fetchLLMUsage(token, apiUrl!, filters);

      if (response && response.data && response.data.length > 0) {
        calculateCosts(response.data, parameters);
      } else {
        // Generate sample data if no real data is available
        const sampleData = generateSampleData(
          new Date(parameters.start_date),
          new Date(parameters.end_date),
          parameters.model
        );
        calculateCosts(sampleData, parameters);
      }
    } catch (error) {
      console.error("Error fetching usage data:", error);
      // Handle error case...
    }
  }

  fetchUsageData();
}, [parameters, token, apiUrl, currentFilters]);

The calculateCosts() function is responsible for setting the React states for the data and categories. For example, to process non-grouped data:

// Process data for visualization
const processRegularData = (data: UsageDataItem[], params: CostParameters) => {
  // Group data by date
  const dateMap = new Map<string, DateAggregatedData>();

  // Calculate costs and prepare data for chart
  data.forEach(day => {
    const date = day.date;
    if (!dateMap.has(date)) {
      dateMap.set(date, {
        prompt_tokens: 0,
        completion_tokens: 0,
        total_cost: 0
      });
    }

    const entry = dateMap.get(date)!;
    entry.prompt_tokens += day.total_prompt_tokens || 0;
    entry.completion_tokens += day.total_completion_tokens || 0;
    entry.total_cost += day.total_cost || 0;
  });

  // Transform data for chart visualization
  const dailyCostData: DailyCost[] = Array.from(dateMap.entries()).map(([date, data]) => ({
    date: new Date(date).toLocaleDateString('en-US', { 
      month: 'short', 
      day: '2-digit' 
    }),
    actualCost: data.total_cost || 0,
    predictedCost: calculatePredictedCost(data, params)
  }));

  // Set the processed data for chart rendering
  setDailyCosts(dailyCostData);
  setChartCategories(['actualCost', 'predictedCost']);
  setIsGroupedData(false);
};

Finally, the component renders the chart with the data/categories stored in state (using chart components from the Tremor charts library). 

The result: A dynamic chart that matches user intent

This is a relatively simple implementation of a dynamic, user-generated data visualization. It uses heuristic analysis to define the type of chart to generate, but this could easily be outsourced to LLMs as well for a more flexible/dynamic implementation.

Here are the important takeaways:

  1. We can use LLMs to generate structured data snippets from free-text input.
  2. We need a performant analytics backend (e.g. Tinybird) to parse those structured data snippets and return the data we need to visualize in real time
  3. We can define the type of visualization to create heuristically (as we did here) or using LLMs.

This pattern opens up a bunch of possibilities to allow end users to generate their own data visualizations. All we must do is give LLMs contextual understanding of the underlying data to be able to create structured filters, aggregations, and groupings.

Discussion: Why not use LLMs for everything?

In this demo, we used the LLM to take a free-text user input and return structured parameters that we could pass to my real-time analytics API.

Things we didn't use LLMs for:

  1. Determining what type of chart to produce
  2. Generating a SQL query to fetch the data

Why didn't I use LLMs?

Well, for 1), we certainly could have. The use case was simple enough that it didn't seem necessary, but it could easily be augmented. You simply add something to the LLM system prompt and ask it to determine what kind of query it is, and add it to the structured response of the LLM output. Easy.

2) is a little more nuanced. Yes, we could ask the LLM to generate the SQL for us, and then ask the LLM to generate the chart component based on the results of the SQL.

Here's why we used a dynamic API endpoint instead:

  1. Encapsulated logic and best practices. If we're repeatedly delivering the same kind of analysis, having a somewhat static endpoint (with dynamic parameters) can both simplify and improve performance. We can encapsulate good data engineering practices into our query, rather than relying on the LLM to produce something good.
  2. Authentication, security, and multi-tenancy. Instructing an LLM to query a raw table of multi-tenant data carries a significant security risk. What if the wrong customer's data gets exposed? We could isolate each customer's data into a separate table, but that isn't always feasible. Using an API secured by tokens/JWTs guarantees security and data privacy in multi-tenant architectures.
  3. Rate limiting. Related to the above. We can add rate limits to user tokens for the API to ensure it isn't abused.
  4. Better observability. If LLMs are generating SQL queries willy nilly, it becomes much more challenging to monitor performance and debug. While LLM observability is getting better, this scenario would add a lot of complexity we don't want to deal with.
  5. More deterministic output and resource usage. LLMs are great. But they do not supply deterministic responses with deterministic resource usage. As a SaaS/product builder, I would be wary of outsourcing too much functionality to LLMs, especially that which can use considerable compute resources.

LLMs will get better and better at writing good SQL. For now, we're sticking with tried-and-true APIs that give us speed, simplicity, and predictability in our analysis.

Subscribe to our newsletterLinks to our blog and other great reads sent every other Saturday.Last Touchpoint## Get started

If you want to see the full implementation of this feature, check out the components mentioned in theLLM Performance Tracker repo.

If you're new to Tinybird, you can sign up for free (no time limit) and create real-time LLM analysis API endpoints in a few minutes using the template:

curl -L tinybird.co | sh
tb login
tb --cloud deploy --template https://github.com/tinybirdco/llm-performance-tracker/tree/main/tinybird