Skip to content

Commit

Permalink
Merge branch 'main' of gist.github.com:dbt-labs/dbt-spark into mcknig…
Browse files Browse the repository at this point in the history
…ht/fix-test-store-test
  • Loading branch information
McKnight-42 committed Oct 13, 2023
2 parents 87e2bd3 + 7ac4a7e commit e2bdb27
Show file tree
Hide file tree
Showing 35 changed files with 374 additions and 157 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.7.0a1
current_version = 1.8.0a1
parse = (?P<major>[\d]+) # major version number
\.(?P<minor>[\d]+) # minor version number
\.(?P<patch>[\d]+) # patch version number
Expand Down
6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230424-230630.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230424-230645.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230501-231003.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230501-231035.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230510-230725.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230803-224622.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230803-224626.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230803-224629.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Dependencies-20230804-225232.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Features-20230707-104150.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Features-20230707-113337.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Features-20230707-114650.yaml

This file was deleted.

6 changes: 0 additions & 6 deletions .changes/unreleased/Under the Hood-20230724-165508.yaml

This file was deleted.

41 changes: 21 additions & 20 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,24 @@ jobs:
- run: tox -e flake8,unit

# Turning off for now due to flaky runs of tests will turn back on at later date.
# integration-spark-session:
# environment:
# DBT_INVOCATION_ENV: circle
# docker:
# - image: godatadriven/pyspark:3.1
# steps:
# - checkout
# - run: apt-get update
# - run: python3 -m pip install --upgrade pip
# - run: apt-get install -y git gcc g++ unixodbc-dev libsasl2-dev
# - run: python3 -m pip install tox
# - run:
# name: Run integration tests
# command: tox -e integration-spark-session
# no_output_timeout: 1h
# - store_artifacts:
# path: ./logs
integration-spark-session:
environment:
DBT_INVOCATION_ENV: circle
docker:
- image: godatadriven/pyspark:3.1
steps:
- checkout
- run: apt-get update
- run: conda install python=3.10
- run: python3 -m pip install --upgrade pip
- run: apt-get install -y git gcc g++ unixodbc-dev libsasl2-dev libxml2-dev libxslt-dev
- run: python3 -m pip install tox
- run:
name: Run integration tests
command: tox -e integration-spark-session
no_output_timeout: 1h
- store_artifacts:
path: ./logs

integration-spark-thrift:
environment:
Expand Down Expand Up @@ -116,9 +117,9 @@ workflows:
test-everything:
jobs:
- unit
# - integration-spark-session:
# requires:
# - unit
- integration-spark-session:
requires:
- unit
- integration-spark-thrift:
requires:
- unit
Expand Down
43 changes: 43 additions & 0 deletions .github/workflows/docs-issues.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# **what?**
# Open an issue in docs.getdbt.com when a PR is labeled `user docs`

# **why?**
# To reduce barriers for keeping docs up to date

# **when?**
# When a PR is labeled `user docs` and is merged. Runs on pull_request_target to run off the workflow already merged,
# not the workflow that existed on the PR branch. This allows old PRs to get comments.


name: Open issues in docs.getdbt.com repo when a PR is labeled
run-name: "Open an issue in docs.getdbt.com for PR #${{ github.event.pull_request.number }}"

on:
pull_request_target:
types: [labeled, closed]

defaults:
run:
shell: bash

permissions:
issues: write # opens new issues
pull-requests: write # comments on PRs


jobs:
open_issues:
# we only want to run this when the PR has been merged or the label in the labeled event is `user docs`. Otherwise it runs the
# risk of duplicaton of issues being created due to merge and label both triggering this workflow to run and neither having
# generating the comment before the other runs. This lives here instead of the shared workflow because this is where we
# decide if it should run or not.
if: |
(github.event.pull_request.merged == true) &&
((github.event.action == 'closed' && contains( github.event.pull_request.labels.*.name, 'user docs')) ||
(github.event.action == 'labeled' && github.event.label.name == 'user docs'))
uses: dbt-labs/actions/.github/workflows/open-issue-in-repo.yml@main
with:
issue_repository: "dbt-labs/docs.getdbt.com"
issue_title: "Docs Changes Needed from ${{ github.event.repository.name }} PR #${{ github.event.pull_request.number }}"
issue_body: "At a minimum, update body to include a link to the page on docs.getdbt.com requiring updates and what part(s) of the page you would like to see updated."
secrets: inherit
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11"]

env:
TOXENV: "unit"
Expand Down Expand Up @@ -177,7 +177,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11"]

steps:
- name: Set up Python ${{ matrix.python-version }}
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dev: ## Installs adapter in develop mode along with development dependencies
dev-uninstall: ## Uninstalls all packages while maintaining the virtual environment
## Useful when updating versions, or if you accidentally installed into the system interpreter
pip freeze | grep -v "^-e" | cut -d "@" -f1 | xargs pip uninstall -y
pip uninstall -y dbt-spark

.PHONY: mypy
mypy: ## Runs mypy against staged changes for static type checking.
Expand Down
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,20 @@ more information, consult [the docs](https://docs.getdbt.com/docs/profile-spark)

## Running locally
A `docker-compose` environment starts a Spark Thrift server and a Postgres database as a Hive Metastore backend.
Note: dbt-spark now supports Spark 3.1.1 (formerly on Spark 2.x).
Note: dbt-spark now supports Spark 3.3.2.

The following command would start two docker containers
```
The following command starts two docker containers:

```sh
docker-compose up -d
```

It will take a bit of time for the instance to start, you can check the logs of the two containers.
If the instance doesn't start correctly, try the complete reset command listed below and then try start again.

Create a profile like this one:

```
```yaml
spark_testing:
target: local
outputs:
Expand All @@ -60,7 +62,7 @@ Connecting to the local spark instance:

Note that the Hive metastore data is persisted under `./.hive-metastore/`, and the Spark-produced data under `./.spark-warehouse/`. To completely reset you environment run the following:

```
```sh
docker-compose down
rm -rf ./.hive-metastore/
rm -rf ./.spark-warehouse/
Expand Down
2 changes: 1 addition & 1 deletion dbt/adapters/spark/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = "1.7.0a1"
version = "1.8.0a1"
5 changes: 2 additions & 3 deletions dbt/adapters/spark/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,12 @@

from dbt.adapters.base.column import Column
from dbt.dataclass_schema import dbtClassMixin
from hologram import JsonDict

Self = TypeVar("Self", bound="SparkColumn")


@dataclass
class SparkColumn(dbtClassMixin, Column): # type: ignore
class SparkColumn(dbtClassMixin, Column):
table_database: Optional[str] = None
table_schema: Optional[str] = None
table_name: Optional[str] = None
Expand Down Expand Up @@ -63,7 +62,7 @@ def convert_table_stats(raw_stats: Optional[str]) -> Dict[str, Any]:
table_stats[f"stats:{key}:include"] = True
return table_stats

def to_column_dict(self, omit_none: bool = True, validate: bool = False) -> JsonDict:
def to_column_dict(self, omit_none: bool = True, validate: bool = False) -> Dict[str, Any]:
original_dict = self.to_dict(omit_none=omit_none)
# If there are stats, merge them into the root of the dict
original_stats = original_dict.pop("table_stats", None)
Expand Down
Loading

0 comments on commit e2bdb27

Please sign in to comment.