BQ2PG

Querying Bigquery can get really expensive and you can not run it locally.
So I created BQ2PG because “No such thing as a high-priced question.”

Main Points

Migrate data from Bigquery to Postgresql easily.
Migrate data to already existing tables or create new ones automatically.
Append rows or replace whole table.
Run export and import individually.
- If there is no new export, no import will run.
- Data are loaded from small GZIP files, keep your disk’s space free!

How it works

Config

This is the main configuration file.
The only mandatory part is :integrations, everything else can via provided by environment variables.
check example_run.sh or Environment variables.

GCS and BQ clients

both clients will be initialized with $BQ2PGSA environment variable if provided
you can also set the path inside the main configuration file under :sa-path key.
if you do not provide service account key it will try to use gcloud tools login.

Integrations

Integrations must be created under :integration key in the main config file.
Take a look at example integration in ./example/pokemon_integration.edn
These are mandatory keys:
- :name - transfer name, also used for bucket folder name - must be unique in all configurations!
- :query - SQL query definition for data extraction
- :location - location of BigQuery dataset
- :target-pg-table - target postgre table in format schema_name.table_name
- :method - postgre update mode - append (default) or replace (truncate target table before)
- :timeout - timeout in seconds, checks on bucket for new export from BQ (if new export does not exist after timeout exceeded, then continue)
Bigquery fields are converted to text:
- if your :target-pg-table already exists, it will try to convert value to its column type
- If that table doesn’t exist it will be created and every field will be imported as text

Export?

You can set :export? key to false if you don’t want bq2pg to export your data from Bigqeury to Google cloud storage.
This value can be configured in main config file under :export? key or as BQEXPORT env variable.

Import?

You can set :import? key to false if you don’t want bq2pg to import your data into Postgresql database.
Keep in mind: bq2pg creates record for every last integration in .last_import.edn so if you have already imported some existing export it won’t be imported twice.
- it means that it won’t be imported until export to the target folder finishes.
This value can be configured in main config file under :import? key or as PGIMPORT env variable.

Stream GZIP files

Bigquery data are exported to a batch of GZIP files which are loaded one by one and save space on your disk.

Examples

Example 1 - easy and quick

Download bq2pg-1.jar from releases

create your integration.edn like this:

{:export? true
 :import? true
 :gcs-name "my-bucket"
 :gcs-folder "bq2pg"
 :db {:dbtype "postgres"
      :dbname "postgres"
      :user "postgres"
      :host "somehost"
      :port 5432
      :password "my_password"}
 :integrations [{:name "test"
                 :query  "SELECT * FROM `project.test.pokemons` LIMIT 1000"
                 :location "EU"
                 :target-pg-table "public.pokemons"
                 :method "replace"
                 :timeout 120}]}

run java -jar bq2pg.jar integration.edn

Example 2 - clojure developer, installation

run ./install.sh
bq2pg will be compiled and you can run it by ~/bin/bq2pg
- or add $HOME/bin to your PATH.

Example 3 - clojure developer, interactive use

run repl and switch to user namespace
you can see example configuration + integration you must update before you run integrate-dev!.

Example 4 - real use-case

Place bq2pg.jar or personalized example_run.sh somewhere in your PATH (see Example 2 - clojure developer, installation)
Configure every environment variable so your config file will contain only integrations.
Schedule bq2pg with same env vars but different configs and synchronize Bigquery and Postgresql on daily basis.

Environment variables

if you are into 12factor ;)
$PGHOST = Name / address of the Postgresql database server you will connect to.
$PGDATABASE = Database name
$PGPORT = Database port
$PGUSER = Database user
$PGPASSWORD = Database password
- this is OPTIONAL - i would recommend you to use .pgpass which should be parsed.
$BQEXPORT = true [default] / false
- Disable this only if you want to export once but import multiple times.
$PGIMPORT = true / false
- Disable this only if you only want to export data from the environment.
$GCSNAME = bucket name
$GCSFOLDER = directory name on bucket - for bq2pg purposes
$BQ2PGSA = path to service account JSON key
- This is an optional parameter, you can also login via gcloud auth login…
If you happen to use a proxy, don’t forget to set one up as well:
- $PROXYHOST = Adress of proxy server
- $PROXYUSER = Proxy user
- $PROXYPASSWORD = Proxy password

Caveats

Your Bigquery dataset and Google Cloud Storage must exist in the same location or you get error.
you must have a bigquery.job.create rights so you can export data.
- + r/w rights on configured GCS

Development

There is Malli function schema for every fn, lintering is prepared in user namespace - this makes debugging experience much better.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dev		dev
example		example
resources/images		resources/images
src/bq2pg		src/bq2pg
test/bq2pg		test/bq2pg
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
example_run.sh		example_run.sh
install.sh		install.sh
project.clj		project.clj
readme.org		readme.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BQ2PG

Main Points

How it works

Config

GCS and BQ clients

Integrations

Export?

Import?

Stream GZIP files

Examples

Example 1 - easy and quick

Example 2 - clojure developer, installation

Example 3 - clojure developer, interactive use

Example 4 - real use-case

Environment variables

Caveats

Development

About

Releases 1

Packages

Languages

License

zikajk/bq2pg

Folders and files

Latest commit

History

Repository files navigation

BQ2PG

Main Points

How it works

Config

GCS and BQ clients

Integrations

Export?

Import?

Stream GZIP files

Examples

Example 1 - easy and quick

Example 2 - clojure developer, installation

Example 3 - clojure developer, interactive use

Example 4 - real use-case

Environment variables

Caveats

Development

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages