Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add workflow to generate DBT documentation and deploy to GitHub Pages #73

Merged
merged 1 commit into from
Oct 5, 2024

Conversation

lawrenceadams
Copy link
Collaborator

This PR adds automatic dbt docs generation for the dbt-synthea project.

Why bother?

The docs are a nice place to start for those not used to either dbt / OMOP to explore the models/DAG/etc (IMHO - feel free to disagree!); is easier for us to point people to our project!

This approach requires little/no effort of maintainers once setup.

How does it work

  • runs when a relevant file in github has been modified in the main branch (as per paths filter)
  • installs python + requirements (caches as possible)
  • uses duckdb rendering of the project (lighterweight/friendly flavour of sql compared to others)
  • uploads to GitHub pages (this needs to be enabled by an admin)

Next steps

  • Enable pages (if approved/agreed as useful)
  • Add other docs to make prettier/complete - e.g. an overview page, etc.

… Pages

 - runs when a relevant file in github has been modified in the main branch (as per paths filter)
 - installs python + requirements (caches as possible)
 - uploads to GitHub pages (this needs to be enabled by an admin)
@katy-sadowski
Copy link
Collaborator

I do agree with adding this! Thanks for the initiative! I think I enabled Pages:

Screenshot 2024-10-04 at 8 32 33 PM

If this looks right, go ahead and merge to test it out!

@lawrenceadams lawrenceadams merged commit b675383 into main Oct 5, 2024
@katy-sadowski
Copy link
Collaborator

katy-sadowski commented Oct 5, 2024

It works!!! https://ohdsi.github.io/dbt-synthea

For some reason, the data types are not showing up in the GH Pages site, though they do show up when I build it locally. Quick Google says this might be a fix? dbt-labs/dbt-spark#988

@lawrenceadams
Copy link
Collaborator Author

lawrenceadams commented Oct 6, 2024

Excellent!! 🥳

For some reason, the data types are not showing up in the GH Pages site, though they do show up when I build it locally. Quick Google says this might be a fix? dbt-labs/dbt-spark#988

Interesting... I have can reproduce the same thing. The documentation on the docs is ironically thin on how you should do this.

I can confirm that if you use type then the docs works. Buttt - the reason we don't have this issue locally is that if you have dbt run / dbt build your project and have your warehouse available then dbt will grab the data type metadata from the database. I.e. if we update the workflow to do dbt build then it will work fine.


Going down the rabbit hole, I am torn as to what to do:

  • The dbt yaml schema declares that data_type is correct (type is not valid)
  • data_type exists so that if you enable/enforce contracts then dbt will check the we have matched our expected schema on project build.
    • I think it would be nice to be able to enforce checking of this if we want in the future - help to prove our code is CDM compliant etc.
    • I tried enabling this for care_site and failed as the ID uses bigint:
image

Interestingly, this should be handled by the adapter (unless this is for something else...):

https://github.com/duckdb/dbt-duckdb/blob/41794e3edc2dfb4237bee0167e4df07f581e5ba6/dbt/adapters/duckdb/column.py#L18-L24

But does not seem to work!

What do you think @katy-sadowski ? I think doing the dbt run step before building the docs would be a way around this, and allow us to enforce schema checking in the future - but just using type is another valid solution

@lawrenceadams
Copy link
Collaborator Author

lawrenceadams commented Oct 6, 2024

I've gone too deep into the weeds, but essentially if we want contract enforcement to work then the dbt-duckdb adapter needs to be updated to overload the DuckDBColumn class here with an updated TYPE_LABELS dict which would include mappings for 'BIGINT': 'INTEGER'.

I have tested it as the below and confirm this builds and the contract enforcer code is happy when this is present:

image

I do think it would be cool to have contractually bound type checking using the CDM docs as the source but this would require upstream dbt-duckdb changes, and I think BIGINT is thought to be a distinctly different type to INT (although I am not sure why, the contract feature allows type aliasing to be enabled or disabled at will.)


@katy-sadowski I will pull this out into a new issue and we can discuss there! 🧠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants