Skip to content

synthesized-io/sdk-bigquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Synthesized Scientific Data Kit (SDK) is a comprehensive framework for generative modelling for structured data (tabular, time-series and event-based data). The SDK helps you create compliant statistical-preserving data snapshots for BI/Analytics and ML/AI applications. Right-size your data with AI-supported data transformations.

Available on the GCP Cloud Marketplace: https://console.cloud.google.com/marketplace/product/synthesized-marketplace-public/synthesized-sdk-service

Architecture

architecture.png

Installation

deployment.png

Deploy Kubernetes resources

Quick install with Google Cloud Marketplace

To install Synthesized SDK Service to a Google Kubernetes Engine cluster via Google Cloud Marketplace, follow the on-screen instructions.

Command-line instructions

Kubernetes CLI installation

Viewing your app in the Google Cloud Console

To get the Cloud Console URL for your app, run the following command:

echo "https://console.cloud.google.com/kubernetes/application/${ZONE}/${CLUSTER}/${NAMESPACE}/${APP_INSTANCE_NAME}"

To view the app, open the URL in your browser.

Deploy cloud resources

Cloud resources installation

Using the app

How to use SDK service

Navigate to BigQuery and make sure that synthesize and check_synthesized routines exist under the specified dataset.

The created functions look like this:

bigquery_functions.png

Change the dataset, table names and config and run the following SQL script:

SELECT dataset.synthesize('project.dataset.input_table', 'project.dataset.output_table', '{"synthesize": {"num_rows": 1000, "produce_nans": true}}');

The output should be similar to

{"status":"success","task_id":"d15d63f5-d476-47e2-814f-f8323ca844fb"}

You can check the status of the task with the following script:

SELECT dataset.check_synthesized('d15d63f5-d476-47e2-814f-f8323ca844fb');

Scaling

The number of SDK Celery workers can be increased with the property worker.replicas.

App metrics

At the moment, the application does not support exporting Prometheus metrics and does not have any exporter.

Uninstalling the app

Delete Kubernetes resources

Using the Google Cloud Console

  1. In the Cloud Console, open Kubernetes Applications.

  2. From the list of apps, choose your app installation.

  3. On the Application Details page, click Delete.

Using the command-line

Preparing your environment

Set your installation name and Kubernetes namespace:

export APP_INSTANCE_NAME=synthesized-sdk
export NAMESPACE=synthesized-sdk

Deleting your resources

NOTE: We recommend using a kubectl version that is the same as the version of your cluster. Using the same version for kubectl and the cluster helps to avoid unforeseen issues.

Deleting the deployment with the generated manifest file

Run kubectl on the expanded manifest file:

kubectl delete -f ${APP_INSTANCE_NAME}_manifest.yaml --namespace ${NAMESPACE}
Deleting the deployment by deleting the Application resource

If you don't have the expanded manifest file, delete the resources by using types and a label:

kubectl delete application,deployment,secret,service,statefulset \
  --namespace ${NAMESPACE} \
  --selector name=${APP_INSTANCE_NAME}

Delete cloud resources

Set GCP project ID and region. Please change the values to needed:

export REGION=us-west1

Set BigQuery dataset:

export BQ_FUNCTION_DATASET=dataset

Run the script and confirm deletion:

./cloud/clean.sh