If you're tracking trends of data engineering, you could find out that there's a big shift in tools we use.

Recently, Java(+Scala) and Python have been a go-to languages for data engineers, and it's still true. However, there's a growing trend towards using YAML/JSON configuring to control data pipeline. Will it be a big change to us?

"Configuration" on top

config-to-pipeline

Declarative Pipelines

One of the primary reasons data engineers are leaning towards YAML/JSON, is major platform(or cloud) service provider are offering declarative pipelines. These platforms allow(or willing) engineers to define data workflows only by using config files, instead of making from code.

This approach simplifies the process of setting up and managing complex data pipelines, making it more accessible and causing less exceptional cases.

Cloud services and simplified pipelines development

As adoption of cloud services such as AWS, Azure, and GCP increases, companies are building their data pipelines upon these platforms. These providers offers various functions and services that simplifies data pipeline creation and management.

While these services come at an additional cost, they significantly reduce the complexity points involved in pipeline development. It allows engineers to focus on designing main tasks in high level, rather than struggling in low-level coding.

From comfort, but now become a trend

What started as a move towards greater comfort and ease of use has now become a trend due to the flexibility these tools offer. YAML/JSON allow for quick adjustments and iterations, making them ideal for the fast-paced environment of data engineering. This flexibility is particularly valuable in a field where requirements can change rapidly, and the ability to adapt is crucial.

Is this shift beneficial for Engineers?

The shift towards using config files brings both opportunities and challenges for data engineers. It is true that these tools can increase productivity by simplifying complex tasks and reducing the need for extensive coding. But on the other hand, they may limit the depth of customization and optimization that can be achieved with traditional programming languages like Java and Python.

Conclusion

The trend towards using configuration files in data engineering reflects a broader shift towards simplicity and flexibility in the field. While this shift offers many benefits, it's essential for data engineers to maintain a balance, ensuring they have the skills and knowledge to leverage both traditional programming languages and modern configuration tools effectively. It means it can give simplicity on development, but it also requires engineers to have more deep knowledge of how this configuration affects to the pipeline under the hood.

Reference