Modern Data Stack Updates: April 2025
※This article is an English translation of my original Japanese post.
As a consultant specializing in the Modern Data Stack, I've noticed that there's a constant stream of new information being released in this field. In this article, I'll summarize some of the most interesting Modern Data Stack updates from the past two weeks.
Note: This article doesn't cover all the latest updates for the mentioned products. It only includes information that I personally found interesting.
Modern Data Stack General
"DeepWiki" - Automatically Generate Wiki for GitHub Repositories
Cognition Labs, the company behind Devin, has announced "DeepWiki," a service that generates a wiki for a GitHub repository simply by changing the domain part of the repository URL to deepwiki.
https://x.com/cognition_labs/status/1915816544480989288
As shared by @satoshihirose on X, when applied to Modern Data Stack repositories like dbt Labs' jaffle-shop sample repository, DeepWiki automatically generates data flow diagrams and ER diagrams. This is truly impressive...
https://deepwiki.com/dbt-labs/jaffle-shop
You can also generate comprehensive wikis for other open-source services like DuckDB, Airbyte, Airflow, and Dagster by inputting their repositories into DeepWiki.
https://deepwiki.com/duckdb/duckdb
https://deepwiki.com/airbytehq/airbyte
https://deepwiki.com/apache/airflow
https://deepwiki.com/dagster-io/dagster
Bauplan Announces $7.5M Funding Round
Bauplan, a company I hadn't heard of before, has announced a $7.5 million funding round.
https://www.bauplanlabs.com/blog/ai-needs-better-data-infrastructure
According to their documentation, Bauplan is a Python-based data platform that provides functions as a service for data pipelines and git-for-data over S3 data lakes:
Bauplan is a Pythonic data platform that provides functions as a service for large-scale data pipelines and git-for-data over S3 data lakes. Bauplan handles tasks that would typically require an entire infrastructure team. Our goal is to allow you and your team to run large-scale ML workflows, AI applications and data transformation pipelines in the cloud without managing any data infrastructure.
https://docs.bauplanlabs.com/en/latest/
An example of building a pipeline using Bauplan and Orchestra is explained in the following article:
https://www.getorchestra.io/blog/this-pattern-is-a-rude-awakening-for-the-modern-data-stack
Data Extract/Load
Airbyte
Released S3 Data Lake Destination with Iceberg Format Output
Airbyte has released a new destination that outputs data to S3 Data Lake in Iceberg format.
https://airbyte.com/blog/build-once-and-query-anywhere-with-airbytes-data-lake-connector
https://docs.airbyte.com/integrations/destinations/s3-data-lake
dlt
Article Summarizing dlt's 2025 Roadmap
Marcin, founder and CTO of dltHub, published an article summarizing dlt's roadmap for 2025.
According to the article, they will focus on:
- Increasing Quality of Life, enabling LLM assisted coding
- Accessing and transforming loaded data
- Support for nested types
- Unifying data normalizers and make them faster
- Pipeline state and schema storage abstraction
- Full data lineage and schema abstraction
https://dlthub.com/blog/2025-whats-next
Data Warehouse/Data Lakehouse
Snowflake
ML Jobs Released
Snowflake has released "ML Jobs," a new feature that makes it easy to run Python processes using SPCS Compute Pool resources.
https://docs.snowflake.com/en/release-notes/2025/other/2025-04-16-snowflake-ml-jobs
https://docs.snowflake.com/en/developer-guide/snowflake-ml/ml-jobs/snowflake-ml-jobs
Mr. Takada, an engineer at Snowflake, has also published an article using ML Jobs that provides useful information.
https://zenn.dev/tf_takada/articles/d00d1587b288f7
Passkeys and TOTP Support for MFA Coming in Release 9.12 (Early May)
In the upcoming 9.12 release notes scheduled for early May, Snowflake mentions that they will support passkeys and TOTP (Time-based One-Time Password) as authentication methods for MFA. Many people have been waiting for this!
terraform-provider-snowflake v2.0.0 Released and Now Officially Supported
terraform-provider-snowflake v2.0.0 has been released and is now generally available with official support. This means that you can now open support tickets for v2.0.0 and later.
https://github.com/snowflakedb/terraform-provider-snowflake/blob/main/ROADMAP.md
https://docs.snowflake.com/en/user-guide/terraform
Databricks
GA4 Raw Data Connector Announced (Public Preview)
Databricks has announced a new connector for GA4 Raw Data (Public Preview).
It appears to be designed to retrieve data exported to BigQuery.
https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/google-analytics-source-setup
MotherDuck/DuckDB
"Instant SQL" Released - Real-time Query Result Preview While Editing
MotherDuck and DuckDB Local UI have released "Instant SQL," a new feature that provides real-time preview of query results while editing queries.
Traditional SQL development typically follows a "write → execute → wait → modify" cycle, but this feature eliminates the "wait" process.
While it's not yet clear how much data volume can be previewed without waiting time, this feels like a unique feature not found in other products!
https://motherduck.com/blog/introducing-instant-sql/
Onehouse
"Open Engines" Announced - Launch Any OSS Engine on Onehouse Platform
Onehouse has announced "Open Engines," a new feature that allows launching any OSS engine on the Onehouse platform.
They will initially release three engines: Apache Flink™ (stream processing), Trino (BI and analytics), and Ray (AI/ML, data science). As shown in the image below, you can select which OSS engine resources to launch from Onehouse.
Data Transform
dbt
Official dbt Labs MCP Server Released
dbt Labs has released the MCP Server, which is available on GitHub. It's currently an experimental release.
https://docs.getdbt.com/blog/introducing-dbt-mcp-server
https://github.com/dbt-labs/dbt-mcp/tree/main
We've been testing it at our company, and it currently allows retrieving model lists, analyzing SQL code written in models, and retrieving metrics lists.
https://dev.classmethod.jp/articles/using-dbt-mcp/
This MCP Server will also be discussed at the dbt Launch Showcase to be held on May 28-29, 2025.
https://www.getdbt.com/resources/webinars/2025-dbt-cloud-launch-showcase
Business Intelligence
Looker
New Measure Type "period_over_period" Released (Preview)
Looker has released a new measure type called "period_over_period." This is a feature I've been waiting for!
https://cloud.google.com/looker/docs/release-notes#April_29_2025
https://cloud.google.com/looker/docs/period-over-period
Omni
Databricks Ventures Invests in Omni
Databricks Ventures has announced an investment in Omni. According to the article below, this is Databricks' first investment in the business intelligence space.
https://omni.co/blog/databricks-invests-in-omni
Data Catalog
OpenMetadata
OpenMetadata 1.7 and Managed Version Collate 1.7 Released
The latest version of OpenMetadata, 1.7, and its managed version, Collate 1.7, have been released.
I'm particularly interested in AutoPilot and Reverse Metadata, which were added in Collate 1.7.
- AutoPilot
- A collective term for four agent functions:
- Metadata Ingestion Agent: Automatically extracts comprehensive metadata from data sources
- Documentation Agent: Automatically generates descriptions based on data shapes and generates SQL queries from natural language requests
- Tiering Agent: Analyzes table usage and lineage to determine business importance of data assets
- Data Quality Agent: Validates table patterns and constraints and creates data quality tests
- Reverse Metadata
- Allows sending descriptions, tags, and ownership information collected in Collate back to data sources
- Supported systems include Athena, BigQuery, Clickhouse, Databricks, Microsoft SQL Server, MySQL, Oracle, Postgres, Redshift, Snowflake, Unity Catalog, and more
https://blog.open-metadata.org/announcing-openmetadata-1-7-9f9778579704
https://blog.getcollate.io/announcing-collate-17
Data Quality & Data Observability
Metaplane
Datadog Announces Acquisition of Metaplane
Datadog has announced the acquisition of Metaplane.
For current Metaplane customers, there will be no immediate changes, and the product will continue to be offered as "Metaplane by Datadog."
https://www.metaplane.dev/blog/metaplane-by-datadog
Recce
"Recce" Releases v1.0 and Beta SaaS Version for dbt-Focused Data Change Detection
"Recce," a tool for detecting data changes and analyzing impact areas specifically for dbt, has released v1.0 along with a beta SaaS version.
https://datarecce.io/blog/2025-04-22_announcing-recce-1-0-with-cloud-beta/