Google Cloud Storage output plugin for Embulk.
- Plugin type: file output
- Load all or nothing: no
- Resume supported: yes
- Cleanup supported: no
- Connector do not support retry in case we have any problem with streaming chanel. In this case, we need to run the job again.
- bucket: Google Cloud Storage bucket name (string, required)
- path_prefix: Prefix of output keys (string, required)
- file_ext: Extention of output file (string, required)
- sequence_format: Format of the sequence number of the output files (string, default value is ".%03d.%02d")
- content_type: content type of output file (string, optional, default value is "application/octet-stream")
- auth_method: Authentication method
private_key
,json_key
orcompute_engine
(string, optional, default value is "private_key") - service_account_email: Google Cloud Platform service account email (string, required when auth_method is private_key)
- p12_keyfile: Private key file fullpath of Google Cloud Platform service account (string, required when auth_method is private_key)
- json_keyfile fullpath of json_key (string, required when auth_method is json_key)
- application_name: Application name, anything you like (string, optional, default value is "embulk-output-gcs")
- max_connection_retry: Number of connection retries to GCS (number, default value is 10)
- delete_in_advance: Delete Bucket/Prefix matched files in advance (boolean, default value is false)
out:
type: gcs
bucket: your-gcs-bucket-name
path_prefix: logs/out
file_ext: .csv
auth_method: `private_key` #default
service_account_email: '[email protected]'
p12_keyfile: '/path/to/private/key.p12'
formatter:
type: csv
encoding: UTF-8
There are three methods supported to fetch access token for the service account.
- Public-Private key pair of GCP(Google Cloud Platform)'s service account
- JSON key of GCP(Google Cloud Platform)'s service account
- Pre-defined access token (Google Compute Engine only)
You first need to create a service account (client ID), download its private key and deploy the key with embulk.
out:
type: gcs
auth_method: private_key
service_account_email: ABCXYZ123ABCXYZ123.gserviceaccount.com
p12_keyfile: /path/to/p12_keyfile.p12
You first need to create a service account (client ID), download its json key and deploy the key with embulk.
out:
type: gcs
auth_method: json_key
json_keyfile: /path/to/json_keyfile.json
You can also embed contents of json_keyfile at config.yml.
out:
type: gcs
auth_method: json_key
json_keyfile:
content: |
{
"private_key_id": "123456789",
"private_key": "-----BEGIN PRIVATE KEY-----\nABCDEF",
"client_email": "..."
}
On the other hand, you don't need to explicitly create a service account for embulk when you run embulk in Google Compute Engine. In this third authentication method, you need to add the API scope "https://www.googleapis.com/auth/devstorage.read_write" to the scope list of your Compute Engine VM instance, then you can configure embulk like this.
Setting the scope of service account access for instances
out:
type: gcs
auth_method: compute_engine
$ ./gradlew gem
$ ./gradlew test # -t to watch change of files and rebuild continuously
To run unit tests, we need to configure the following environment variables.
When environment variables are not set, skip almost test cases.
GCP_EMAIL
GCP_P12_KEYFILE
GCP_JSON_KEYFILE
GCP_BUCKET
GCP_BUCKET_DIRECTORY(optional, if needed)
If you're using Mac OS X El Capitan and GUI Applications(IDE), like as follows.
$ vi ~/Library/LaunchAgents/environment.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>my.startup</string>
<key>ProgramArguments</key>
<array>
<string>sh</string>
<string>-c</string>
<string>
launchctl setenv GCP_EMAIL ABCXYZ123ABCXYZ123.gserviceaccount.com
launchctl setenv GCP_P12_KEYFILE /path/to/p12_keyfile.p12
launchctl setenv GCP_JSON_KEYFILE /path/to/json_keyfile.json
launchctl setenv GCP_BUCKET my-bucket
launchctl setenv GCP_BUCKET_DIRECTORY unittests
</string>
</array>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
$ launchctl load ~/Library/LaunchAgents/environment.plist
$ launchctl getenv GCP_EMAIL //try to get value.
Then start your applications.