Synthesized Scientific Data Kit (SDK) is a comprehensive framework for generative modelling for structured data (tabular, time-series and event-based data). The SDK helps you create compliant statistical-preserving data snapshots for BI/Analytics and ML/AI applications. Right-size your data with AI-supported data transformations.
Available on the GCP Cloud Marketplace: https://console.cloud.google.com/marketplace/product/synthesized-marketplace-public/synthesized-sdk-service
To install Synthesized SDK Service to a Google Kubernetes Engine cluster via Google Cloud Marketplace, follow the on-screen instructions.
To get the Cloud Console URL for your app, run the following command:
echo "https://console.cloud.google.com/kubernetes/application/${ZONE}/${CLUSTER}/${NAMESPACE}/${APP_INSTANCE_NAME}"
To view the app, open the URL in your browser.
Navigate to BigQuery and make sure that synthesize
and check_synthesized
routines exist under the specified dataset.
The created functions look like this:
Change the dataset, table names and config and run the following SQL script:
SELECT dataset.synthesize('project.dataset.input_table', 'project.dataset.output_table', '{"synthesize": {"num_rows": 1000, "produce_nans": true}}');
The output should be similar to
{"status":"success","task_id":"d15d63f5-d476-47e2-814f-f8323ca844fb"}
You can check the status of the task with the following script:
SELECT dataset.check_synthesized('d15d63f5-d476-47e2-814f-f8323ca844fb');
The number of SDK Celery workers can be increased with the property worker.replicas
.
At the moment, the application does not support exporting Prometheus metrics and does not have any exporter.
-
In the Cloud Console, open Kubernetes Applications.
-
From the list of apps, choose your app installation.
-
On the Application Details page, click Delete.
Set your installation name and Kubernetes namespace:
export APP_INSTANCE_NAME=synthesized-sdk
export NAMESPACE=synthesized-sdk
NOTE: We recommend using a
kubectl
version that is the same as the version of your cluster. Using the same version forkubectl
and the cluster helps to avoid unforeseen issues.
Run kubectl
on the expanded manifest file:
kubectl delete -f ${APP_INSTANCE_NAME}_manifest.yaml --namespace ${NAMESPACE}
If you don't have the expanded manifest file, delete the resources by using types and a label:
kubectl delete application,deployment,secret,service,statefulset \
--namespace ${NAMESPACE} \
--selector name=${APP_INSTANCE_NAME}
Set GCP project ID and region. Please change the values to needed:
export REGION=us-west1
Set BigQuery dataset:
export BQ_FUNCTION_DATASET=dataset
Run the script and confirm deletion:
./cloud/clean.sh