dbt-fuzzy-text

Macros that help with fuzzy text matching, with the aim of keeping dbt models agnostic of data warehouses.

Current coverage:

Algorithm	Snowflake	BigQuery	Azure Synapse (Dedicated SQL Pool)
Edit distance based
Levenshtein Distance	✔️	✔️	✔️
Jaro-Winkler Similarity	✔️	❌	❌

Installation instructions

New to dbt packages? Read more about them here.

Include this package in your packages.yml file — check here for the latest version number:

packages:
  - package: calum-mcg/fuzzy_text
    version: X.X.X ## update to latest version here

Run dbt deps to install the package.

Macros

levenshtein_distance (source)

This macro generates the levenshtein distance between two strings.

Arguments

str1 (required): First string to compare
str2 (required): Second string to compare
max (optional, default=none): Maximum distance to compute (integer)

Usage:

Copy the macro into a statement tab in the dbt Cloud IDE, or into a model, and compile your code

 ... {{ fuzzy_text.levenshtein_distance('input_string_column', 'comparison_string_column') }} as levenshtein_distance ...

jaro_winkler (source)

This macro generates the Jaro-Winkler similarity between two strings.

Arguments

str1 (required): First string to compare
str2 (required): Second string to compare

Usage:

Copy the macro into a statement tab in the dbt Cloud IDE, or into a model, and compile your code

 ... {{ fuzzy_text.jaro_winkler('input_string_column', 'comparison_string_column') }} as jaro_winkler ...

Contribution Guidelines

Pull requests are the best way to propose changes to the codebase. Steps required:

Create an issue in the repo with a description of the problem / bug / improvement required
Clone the main branch with a suitable branch name, e.g. feature/add-cool-thing
Add tests for supported adaptors in the integration_tests folder
If required, update the README documentation to include usage and an example.
Issue a pull request, provide:
- a description of changes
- add a reviewer
- reference original issue (from step 1)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
integration_tests		integration_tests
macros		macros
.dbtignore		.dbtignore
.gitignore		.gitignore
README.md		README.md
dbt_project.yml		dbt_project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt-fuzzy-text

Installation instructions

Macros

levenshtein_distance (source)

Arguments

Usage:

jaro_winkler (source)

Arguments

Usage:

Contribution Guidelines

About

Releases 5

Packages

Languages

calum-mcg/dbt-fuzzy-text

Folders and files

Latest commit

History

Repository files navigation

dbt-fuzzy-text

Installation instructions

Macros

levenshtein_distance (source)

Arguments

Usage:

jaro_winkler (source)

Arguments

Usage:

Contribution Guidelines

About

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages