Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADAP-1167: Add table format telemetry reporting to Athena adapter #403

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .github/workflows/_integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ on:
description: "Choose the Python version to test against"
type: string
default: "3.9"
is-flaky-test-run:
description: "What kind of tests to execute in this run (true for flaky tests, false for the rest)"
type: boolean
default: false
workflow_dispatch:
inputs:
package:
Expand All @@ -49,6 +53,10 @@ on:
description: "Choose the Python version to test against"
type: choice
options: ["3.9", "3.10", "3.11", "3.12"]
is-flaky-test-run:
description: "What kind of tests to execute in this run (true for flaky tests, false for the rest)"
type: boolean
default: false

permissions:
id-token: write
Expand All @@ -67,6 +75,7 @@ env:

jobs:
integration-tests-athena:
name: integration-tests-athena-${{ inputs.is-flaky-test-run && 'flaky' || 'not-flaky' }}
if: ${{ inputs.package == 'dbt-athena' || inputs.package == 'dbt-athena-community' }}
runs-on: ${{ inputs.os }}
environment:
Expand Down Expand Up @@ -94,7 +103,8 @@ jobs:
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/${{ secrets.ASSUMABLE_ROLE_NAME }}
aws-region: ${{ vars.DBT_TEST_ATHENA_REGION_NAME }}
- run: hatch run integration-tests
# run flaky tests command if is-flaky-test-run flag is true. Otherwise, run the other tests
- run: ${{ inputs.is-flaky-test-run && 'hatch run integration-tests -m flaky -n1' || 'hatch run integration-tests -m "not flaky"' }}
working-directory: ./${{ inputs.package }}

integration-tests-bigquery:
Expand Down
25 changes: 23 additions & 2 deletions .github/workflows/pull-request-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
os: ${{ matrix.os }}
python-version: ${{ matrix.python-version }}

integration-tests:
integration-tests-not-flaky:
uses: ./.github/workflows/_integration-tests.yml
strategy:
fail-fast: false
Expand All @@ -83,8 +83,29 @@ jobs:
repository: ${{ github.event.pull_request.head.repo.full_name }}
os: ${{ matrix.os }}
python-version: ${{ matrix.python-version }}
is-flaky-test-run: false
secrets: inherit
# Special case here for Athena, since we have two identical packages -- dbt-athena and dbt-athena-community
# We don't want certain tests within this adapter to run in parallel
integration-tests-flaky-athena:
uses: ./.github/workflows/_integration-tests.yml
strategy:
fail-fast: false
max-parallel: 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did these workflow changes so that TestIcebergRetriesDisabled is isolated in a separate job and so that it's forced to run in series. This is to rule out that parallel test runs write to the same place, causing the test to fail.

However, I ended up disabling that test (see comment below). I think it's debatable whether to still keep the workflow changes here (as scaffolding). I'm leaning towards yes, but happy to remove this stuff, in case you disagree.

matrix:
package:
- "dbt-athena"
- "dbt-athena-community"
os: [ ubuntu-22.04 ]
python-version: [ "3.9", "3.10", "3.11", "3.12" ]
with:
package: ${{ matrix.package }}
branch: ${{ github.event.pull_request.head.ref }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
os: ${{ matrix.os }}
python-version: ${{ matrix.python-version }}
is-flaky-test-run: true
secrets: inherit

# This job does nothing and is only used for branch protection
results:
name: "Pull request checks" # keep this name, branch protection references it
Expand Down
6 changes: 4 additions & 2 deletions dbt-athena/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,13 @@ Repository = "https://github.com/dbt-labs/dbt-athena.git#subdirectory=dbt-athena
Issues = "https://github.com/dbt-labs/dbt-athena/issues"
Changelog = "https://github.com/dbt-labs/dbt-athena/blob/main/dbt-athena/CHANGELOG.md"

[tool.pytest]
[tool.pytest.ini_options]
testpaths = ["tests/unit", "tests/functional"]
color = true
csv = "results.csv"
filterwarnings = [
"ignore:.*'soft_unicode' has been renamed to 'soft_str'*:DeprecationWarning",
"ignore:unclosed file .*:ResourceWarning",
]
markers = [
"flaky: marks tests as flaky so they run one at a time (de-select with `-m 'not flaky'`)"
]
14 changes: 14 additions & 0 deletions dbt-athena/src/dbt/adapters/athena/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -1452,3 +1452,17 @@ def _run_query(self, sql: str, catch_partitions_limit: bool) -> AthenaCursor:
LOGGER.debug(f"CAUGHT EXCEPTION: {e}")
raise e
return cursor

@classmethod
def _get_adapter_specific_run_info(cls, config: RelationConfig) -> Dict[str, Any]:
table_format: Optional[str] = None
if (
config
and hasattr(config, "_extra")
and (table_type := config._extra.get("table_type"))
):
table_format = table_type
return {
"adapter_type": "athena",
"table_format": table_format,
}
4 changes: 4 additions & 0 deletions dbt-athena/tests/functional/adapter/test_retries_iceberg.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@
seeds__expected_target_post = "id,status\n" + "\n".join([f"{i},{i}" for i in range(PARALLELISM)])


@pytest.mark.flaky
@pytest.mark.skip(
reason="Unreliable test in practice. No apparent reliable way to trigger a retry, which causes the test to fail. Disabled until we get a resolution here https://github.com/dbt-labs/dbt-athena/pull/657/files#r1922043351"
Copy link
Contributor Author

@VanTudor VanTudor Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Drawing attention to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to fail randomly, irrespective of my changes. E.g. https://github.com/dbt-labs/dbt-athena/actions/runs/12756721311/job/35555467442#step:6:595

)
class TestIcebergRetriesDisabled:
@pytest.fixture(scope="class")
def dbt_profile_target(self):
Expand Down
28 changes: 28 additions & 0 deletions dbt-athena/tests/unit/test_adapter_telemetry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from unittest import mock

import dbt.adapters.athena.__version__

from dbt.adapters.athena.impl import AthenaAdapter
from dbt.adapters.base.relation import AdapterTrackingRelationInfo


def test_telemetry_with_athena_details():
mock_model_config = mock.MagicMock()
mock_model_config._extra = mock.MagicMock()
mock_model_config._extra = {
"adapter_type": "athena",
"table_type": "iceberg",
}

res = AthenaAdapter.get_adapter_run_info(mock_model_config)

assert res.adapter_name == "athena"
assert res.base_adapter_version == dbt.adapters.__about__.version
assert res.adapter_version == dbt.adapters.athena.__version__.version

assert res.model_adapter_details == {
"adapter_type": "athena",
"table_format": "iceberg",
}

assert type(res) is AdapterTrackingRelationInfo
Loading