Skip to content

Examples

Full load with create, truncate, and parallelism

Load all Parquet files under s3://warehouse/schema/table/ into public.all_types_copy. The table is created if missing, then truncated. Up to five files are processed concurrently (--db-pool-size 5), matching five connections in the pool.

export P2PG_DB_PASSWORD=postgres
export P2PG_DB_USER=postgres
export P2PG_S3_ACCESS_KEY=admin
export P2PG_S3_SECRET_ACCESS_KEY=password

./parquet2postgres-go dataloader \
  --schema public \
  --table all_types_copy \
  --bucket warehouse \
  --batch-size 10000 \
  --path schema/table/ \
  --db-host localhost \
  --db-port 5432 \
  --db-name postgres \
  --db-pool-size 5 \
  --s3-endpoint localhost:9000 \
  --s3-region none \
  --table-create \
  --table-truncate

Load into an existing table (no DDL)

Omit --table-create and --table-truncate when the table already exists and you do not want it truncated. The schema in Postgres must still be compatible with the Parquet files.

export P2PG_DB_PASSWORD=postgres
export P2PG_DB_USER=postgres
export P2PG_S3_ACCESS_KEY=admin
export P2PG_S3_SECRET_ACCESS_KEY=password

./parquet2postgres-go dataloader \
  --schema public \
  --table existing_table \
  --bucket warehouse \
  --path imports/daily/ \
  --db-host localhost \
  --db-port 5432 \
  --db-name postgres \
  --s3-endpoint localhost:9000 \
  --s3-region none

TLS object store

For HTTPS endpoints, set --s3-secure=true and point --s3-endpoint at your provider’s host and port (often 443).

./parquet2postgres-go dataloader \
  --schema analytics \
  --table events \
  --bucket my-bucket \
  --path curated/events/ \
  --s3-endpoint s3.example.com:443 \
  --s3-region us-east-1 \
  --s3-secure=true \
  --db-host db.internal \
  --db-port 5432 \
  --db-name warehouse \
  --db-pool-size 4

(You must still export the four P2PG_* secrets in the shell or your process manager.)