Parquet files offer significant advantages over traditional formats like CSV or JSON. This is more relevant in analytical workloads and processing.

Tools like parquet-tools and DuckDB make it easy to create, manipulate, and query these files.

1) parquet-tool
display data to terminal in json (default) or jsonl, or csv

parquet-tools cat data_file.parquet | jq .

or

display in jsonl only two lines/records

parquet-tools cat --format jsonl --limit 2 data_file.parquet

Get the meta data about the parquet file

parquet-tools meta data_file.parquet

**2) DuckDB

DuckDB is an embedded SQL database that supports reading and writing Parquet files.

Example:

Generate a Parquet file:

COPY (SELECT 'example' AS col1) TO 'data_file.parquet' (FORMAT 'parquet');

Read a Parquet file:

SELECT * FROM read_parquet('data_file.parquet');

Lately, I have been using DuckDB for most of my analytics (dealing with Gigabytes of data) and it can handle both local and cloud-based files efficiently.

What is the Parquet file format?

Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented.