Overview
Purpose
parquet2postgres-go is a small CLI for copying analytics-style data from object storage into Postgres. You point it at a bucket and key prefix where one or more .parquet files live; it loads their rows into a target schema and table.
Typical use cases:
- Landing zone or warehouse files in MinIO, AWS S3, or another S3 API–compatible store
- One-off or scheduled loads without a separate ETL framework
- Quick hydration of a Postgres table that should mirror Parquet layout
What you can do
| Capability | Description |
|---|---|
| List Parquet objects | Reads all objects under the configured S3 prefix (--path). |
| Infer DDL | Reads the first Parquet file and derives column names and types for Postgres. |
| Create table | With --table-create, runs CREATE TABLE IF NOT EXISTS when the table is missing. |
| Truncate | With --table-truncate, truncates the table before load if it already exists. |
| Batch load | Streams rows in chunks of --batch-size and inserts via the Postgres data loader (batched copy-style pipeline). |
| Parallel files | Processes multiple Parquet files concurrently; concurrency is capped at --db-pool-size so it stays within the connection pool limit. |
Constraints
- Same schema for all files: Every Parquet file under the prefix must share the same schema. The tool does not merge incompatible layouts.
- Secrets via environment: Database password and user, and S3 access keys, must be supplied through
P2PG_*environment variables (see Configuration). - S3-compatible endpoint: The client expects a MinIO/S3-style endpoint and credentials.
Command
The main subcommand is dataloader:
parquet2postgres-go dataloader [flags]
See Quick start and Examples for concrete invocations.