Skip to content

Overview

Purpose

parquet2postgres-go is a small CLI for copying analytics-style data from object storage into Postgres. You point it at a bucket and key prefix where one or more .parquet files live; it loads their rows into a target schema and table.

Typical use cases:

  • Landing zone or warehouse files in MinIO, AWS S3, or another S3 API–compatible store
  • One-off or scheduled loads without a separate ETL framework
  • Quick hydration of a Postgres table that should mirror Parquet layout

What you can do

Capability Description
List Parquet objects Reads all objects under the configured S3 prefix (--path).
Infer DDL Reads the first Parquet file and derives column names and types for Postgres.
Create table With --table-create, runs CREATE TABLE IF NOT EXISTS when the table is missing.
Truncate With --table-truncate, truncates the table before load if it already exists.
Batch load Streams rows in chunks of --batch-size and inserts via the Postgres data loader (batched copy-style pipeline).
Parallel files Processes multiple Parquet files concurrently; concurrency is capped at --db-pool-size so it stays within the connection pool limit.

Constraints

  • Same schema for all files: Every Parquet file under the prefix must share the same schema. The tool does not merge incompatible layouts.
  • Secrets via environment: Database password and user, and S3 access keys, must be supplied through P2PG_* environment variables (see Configuration).
  • S3-compatible endpoint: The client expects a MinIO/S3-style endpoint and credentials.

Command

The main subcommand is dataloader:

parquet2postgres-go dataloader [flags]

See Quick start and Examples for concrete invocations.