Skip to content

Quick start

Prerequisites

  • Go 1.25 (see go.mod)
  • Docker (optional but recommended for a local Postgres + MinIO stack)

Build the CLI

From the repository root:

go build -o parquet2postgres-go .

The binary exposes the Cobra command parquet2postgres-go (see main and cmd packages).

Local Postgres and MinIO

The repo includes development/docker-compose.yml, which starts:

  • Postgres 17 on port 5432 (user/password/database: postgres)
  • MinIO on 9000 (API) and 9001 (console)
  • A MinIO client sidecar that creates bucket warehouse and sets a public policy

Start the stack:

cd development
docker compose up -d

Wait until Postgres is healthy, then upload Parquet files to your bucket under a prefix (for example s3://warehouse/myschema/mytable/). You can use the MinIO console at http://localhost:9001 with admin / password.

Required secrets (environment)

Export these before running dataloader:

Variable Role
P2PG_DB_PASSWORD Postgres password
P2PG_DB_USER Postgres user
P2PG_S3_ACCESS_KEY S3 access key
P2PG_S3_SECRET_ACCESS_KEY S3 secret key

For the compose stack, MinIO credentials are admin / password.

Minimal run

Replace public, my_table, prefix, and endpoints to match your environment.

export P2PG_DB_PASSWORD=postgres
export P2PG_DB_USER=postgres
export P2PG_S3_ACCESS_KEY=admin
export P2PG_S3_SECRET_ACCESS_KEY=password

./parquet2postgres-go dataloader \
  --schema public \
  --table my_table \
  --bucket warehouse \
  --path myschema/mytable/ \
  --db-host localhost \
  --db-port 5432 \
  --db-name postgres \
  --s3-endpoint localhost:9000 \
  --s3-region none \
  --s3-secure=false
  • --schema, --table, and --bucket are required.
  • --path is the key prefix inside the bucket; it should end with / for clarity (the tool normalizes a missing trailing slash).

Optional: add --table-create to create the table if it does not exist, and/or --table-truncate to truncate before load. See Examples.

Next steps

  • Configuration for every flag and env var
  • Examples for copy-paste commands including higher parallelism (--db-pool-size)