Modern Data Stack Updates: April 2025

※This article is an English translation of my original Japanese post.

As a consultant specializing in the Modern Data Stack, I've noticed that there's a constant stream of new information being released in this field. In this article, I'll summarize some of the most interesting Modern Data Stack updates from the past two weeks.

Note: This article doesn't cover all the latest updates for the mentioned products. It only includes information that I personally found interesting.

Modern Data Stack General

"DeepWiki" - Automatically Generate Wiki for GitHub Repositories

Cognition Labs, the company behind Devin, has announced "DeepWiki," a service that generates a wiki for a GitHub repository simply by changing the domain part of the repository URL to deepwiki.

https://x.com/cognition_labs/status/1915816544480989288

As shared by @satoshihirose on X, when applied to Modern Data Stack repositories like dbt Labs' jaffle-shop sample repository, DeepWiki automatically generates data flow diagrams and ER diagrams. This is truly impressive...

https://deepwiki.com/dbt-labs/jaffle-shop

2025-04-28_10h20_27

2025-04-28_11h09_14

You can also generate comprehensive wikis for other open-source services like DuckDB, Airbyte, Airflow, and Dagster by inputting their repositories into DeepWiki.

https://deepwiki.com/duckdb/duckdb

https://deepwiki.com/airbytehq/airbyte

https://deepwiki.com/apache/airflow

https://deepwiki.com/dagster-io/dagster

Bauplan Announces $7.5M Funding Round

Bauplan, a company I hadn't heard of before, has announced a $7.5 million funding round.

https://www.bauplanlabs.com/blog/ai-needs-better-data-infrastructure

According to their documentation, Bauplan is a Python-based data platform that provides functions as a service for data pipelines and git-for-data over S3 data lakes:

Bauplan is a Pythonic data platform that provides functions as a service for large-scale data pipelines and git-for-data over S3 data lakes. Bauplan handles tasks that would typically require an entire infrastructure team. Our goal is to allow you and your team to run large-scale ML workflows, AI applications and data transformation pipelines in the cloud without managing any data infrastructure.

https://docs.bauplanlabs.com/en/latest/

An example of building a pipeline using Bauplan and Orchestra is explained in the following article:

https://www.getorchestra.io/blog/this-pattern-is-a-rude-awakening-for-the-modern-data-stack

Data Extract/Load

Airbyte

Released S3 Data Lake Destination with Iceberg Format Output

Airbyte has released a new destination that outputs data to S3 Data Lake in Iceberg format.

https://airbyte.com/blog/build-once-and-query-anywhere-with-airbytes-data-lake-connector

https://docs.airbyte.com/integrations/destinations/s3-data-lake

dlt

Article Summarizing dlt's 2025 Roadmap

Marcin, founder and CTO of dltHub, published an article summarizing dlt's roadmap for 2025.

According to the article, they will focus on:

  • Increasing Quality of Life, enabling LLM assisted coding
  • Accessing and transforming loaded data
  • Support for nested types
  • Unifying data normalizers and make them faster
  • Pipeline state and schema storage abstraction
  • Full data lineage and schema abstraction

https://dlthub.com/blog/2025-whats-next

Data Warehouse/Data Lakehouse

Snowflake

ML Jobs Released

Snowflake has released "ML Jobs," a new feature that makes it easy to run Python processes using SPCS Compute Pool resources.

https://docs.snowflake.com/en/release-notes/2025/other/2025-04-16-snowflake-ml-jobs

https://docs.snowflake.com/en/developer-guide/snowflake-ml/ml-jobs/snowflake-ml-jobs

Mr. Takada, an engineer at Snowflake, has also published an article using ML Jobs that provides useful information.

https://zenn.dev/tf_takada/articles/d00d1587b288f7

Passkeys and TOTP Support for MFA Coming in Release 9.12 (Early May)

In the upcoming 9.12 release notes scheduled for early May, Snowflake mentions that they will support passkeys and TOTP (Time-based One-Time Password) as authentication methods for MFA. Many people have been waiting for this!

https://docs.snowflake.com/release-notes/2025/9_12#new-authentication-methods-for-multi-factor-authentication-mfa-general-availability

terraform-provider-snowflake v2.0.0 Released and Now Officially Supported

terraform-provider-snowflake v2.0.0 has been released and is now generally available with official support. This means that you can now open support tickets for v2.0.0 and later.

https://github.com/snowflakedb/terraform-provider-snowflake/blob/main/ROADMAP.md

https://docs.snowflake.com/en/user-guide/terraform

Databricks

GA4 Raw Data Connector Announced (Public Preview)

Databricks has announced a new connector for GA4 Raw Data (Public Preview).

It appears to be designed to retrieve data exported to BigQuery.

https://docs.databricks.com/aws/ja/release-notes/product/2025/april#google-analytics-raw-data-connector-public-preview

https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/google-analytics-source-setup

MotherDuck/DuckDB

"Instant SQL" Released - Real-time Query Result Preview While Editing

MotherDuck and DuckDB Local UI have released "Instant SQL," a new feature that provides real-time preview of query results while editing queries.

Traditional SQL development typically follows a "write → execute → wait → modify" cycle, but this feature eliminates the "wait" process.

While it's not yet clear how much data volume can be previewed without waiting time, this feels like a unique feature not found in other products!

https://motherduck.com/blog/introducing-instant-sql/

Onehouse

"Open Engines" Announced - Launch Any OSS Engine on Onehouse Platform

Onehouse has announced "Open Engines," a new feature that allows launching any OSS engine on the Onehouse platform.

https://www.onehouse.ai/blog/announcing-open-engines-tm-flipping-defaults-to-open-for-both-data-and-compute

They will initially release three engines: Apache Flink™ (stream processing), Trino (BI and analytics), and Ray (AI/ML, data science). As shown in the image below, you can select which OSS engine resources to launch from Onehouse.

67f549d5ca88e2e0b8a02441_AD_4nXfbN8ifGP2Sj9AvteBvivynZRHK5Q3278NW0jwqjsITl1j0fN9j0QkZmppfzUztILlwJtBc1zcraILUh9JJj63zgTJowqtRFSjgvhBOaj2Cs0yygVAKcmmQZzdI33eQ4LcCi6kLCg

67f96214bf5b31599f5fd355_AD_4nXd6ea3Tqlw-iqLfq9tBgPQg_4JuqaFfvY64K4hPHGayRlyJSMEfanUbi8re7BRodgjykF1HxglqcO4WdgvsQIowLJZ-iKDal0nfssZjBWQ40jcoo0Y-NW9A-poMVnGjc5oMummsXg

Data Transform

dbt

Official dbt Labs MCP Server Released

dbt Labs has released the MCP Server, which is available on GitHub. It's currently an experimental release.

https://docs.getdbt.com/blog/introducing-dbt-mcp-server

https://github.com/dbt-labs/dbt-mcp/tree/main

We've been testing it at our company, and it currently allows retrieving model lists, analyzing SQL code written in models, and retrieving metrics lists.

https://dev.classmethod.jp/articles/using-dbt-mcp/

This MCP Server will also be discussed at the dbt Launch Showcase to be held on May 28-29, 2025.

https://www.getdbt.com/resources/webinars/2025-dbt-cloud-launch-showcase

Business Intelligence

Looker

New Measure Type "period_over_period" Released (Preview)

Looker has released a new measure type called "period_over_period." This is a feature I've been waiting for!

https://cloud.google.com/looker/docs/release-notes#April_29_2025

https://cloud.google.com/looker/docs/period-over-period

Omni

Databricks Ventures Invests in Omni

Databricks Ventures has announced an investment in Omni. According to the article below, this is Databricks' first investment in the business intelligence space.

https://omni.co/blog/databricks-invests-in-omni

Data Catalog

OpenMetadata

OpenMetadata 1.7 and Managed Version Collate 1.7 Released

The latest version of OpenMetadata, 1.7, and its managed version, Collate 1.7, have been released.

I'm particularly interested in AutoPilot and Reverse Metadata, which were added in Collate 1.7.

  • AutoPilot
    • A collective term for four agent functions:
    • Metadata Ingestion Agent: Automatically extracts comprehensive metadata from data sources
    • Documentation Agent: Automatically generates descriptions based on data shapes and generates SQL queries from natural language requests
    • Tiering Agent: Analyzes table usage and lineage to determine business importance of data assets
    • Data Quality Agent: Validates table patterns and constraints and creates data quality tests
  • Reverse Metadata
    • Allows sending descriptions, tags, and ownership information collected in Collate back to data sources
    • Supported systems include Athena, BigQuery, Clickhouse, Databricks, Microsoft SQL Server, MySQL, Oracle, Postgres, Redshift, Snowflake, Unity Catalog, and more

https://blog.open-metadata.org/announcing-openmetadata-1-7-9f9778579704

https://blog.getcollate.io/announcing-collate-17

Data Quality & Data Observability

Metaplane

Datadog Announces Acquisition of Metaplane

Datadog has announced the acquisition of Metaplane.

For current Metaplane customers, there will be no immediate changes, and the product will continue to be offered as "Metaplane by Datadog."

https://www.metaplane.dev/blog/metaplane-by-datadog

Recce

"Recce" Releases v1.0 and Beta SaaS Version for dbt-Focused Data Change Detection

"Recce," a tool for detecting data changes and analyzing impact areas specifically for dbt, has released v1.0 along with a beta SaaS version.

https://datarecce.io/blog/2025-04-22_announcing-recce-1-0-with-cloud-beta/

https://github.com/datarecce/recce