Skip to content

Commit

Permalink
MINOR: improve query log lineage docs (open-metadata#16413)
Browse files Browse the repository at this point in the history
  • Loading branch information
ulixius9 authored May 27, 2024
1 parent eb88dc1 commit cb8f4c6
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,20 @@ A standard CSV should be comma separated, and each row represented as a single l
{% /note %}

- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes
that your query has commas `,` inside. Then, wrap each query in quotes to not have any clashes
with the comma as a separator.
- **database_name (optional):** Enter the database name on which the query was executed.
- **schema_name (optional):** Enter the schema name to which the query is associated.

Checkout a sample query log file [here](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/examples/sample_data/glue/query_log.csv).

```csv
query_text,database_name,schema_name
"select * from sales",default,information_schema
"select * from marketing",default,information_schema
"insert into marketing select * from sales",default,information_schema
```

## Lineage Workflow
In order to run a Lineage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the lineage workflow.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ A standard CSV should be comma separated, and each row represented as a single l
{% /note %}

- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes
with the comma as a separator.- **user_name (optional):** Enter the database user name which has executed this query.
that your query has commas `,` inside. Then, wrap each query in quotes to not have any clashes
with the comma as a separator.
- **user_name (optional):** Enter the database user name which has executed this query.
- **start_time (optional):** Enter the query execution start time in YYYY-MM-DD HH:MM:SS format.
- **end_time (optional):** Enter the query execution end time in YYYY-MM-DD HH:MM:SS format.
- **aborted (optional):** This field accepts values as true or false and indicates whether the query was aborted during execution
Expand All @@ -44,6 +45,12 @@ A standard CSV should be comma separated, and each row represented as a single l

Checkout a sample query log file [here](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/examples/sample_data/glue/query_log.csv).

```csv
query_text,database_name,schema_name
"create table sales_analysis as select id, name from sales",default,information_schema
"insert into marketing select * from sales",default,information_schema
```

## Usage Workflow
In order to run a Usage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the usage workflow.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,20 @@ A standard CSV should be comma separated, and each row represented as a single l
{% /note %}

- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes
that your query has commas `,` inside. Then, wrap each query in quotes to not have any clashes
with the comma as a separator.
- **database_name (optional):** Enter the database name on which the query was executed.
- **schema_name (optional):** Enter the schema name to which the query is associated.

Checkout a sample query log file [here](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/examples/sample_data/glue/query_log.csv).

```csv
query_text,database_name,schema_name
"select * from sales",default,information_schema
"select * from marketing",default,information_schema
"insert into marketing select * from sales",default,information_schema
```

## Lineage Workflow
In order to run a Lineage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the lineage workflow.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ A standard CSV should be comma separated, and each row represented as a single l
{% /note %}

- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes
with the comma as a separator.- **user_name (optional):** Enter the database user name which has executed this query.
that your query has commas `,` inside. Then, wrap each query in quotes to not have any clashes
with the comma as a separator.
- **user_name (optional):** Enter the database user name which has executed this query.
- **start_time (optional):** Enter the query execution start time in YYYY-MM-DD HH:MM:SS format.
- **end_time (optional):** Enter the query execution end time in YYYY-MM-DD HH:MM:SS format.
- **aborted (optional):** This field accepts values as true or false and indicates whether the query was aborted during execution
Expand All @@ -44,6 +45,13 @@ A standard CSV should be comma separated, and each row represented as a single l

Checkout a sample query log file [here](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/examples/sample_data/glue/query_log.csv).

```csv
query_text,database_name,schema_name
"select * from sales",default,information_schema
"select * from marketing",default,information_schema
"insert into marketing select * from sales",default,information_schema
```

## Usage Workflow
In order to run a Usage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the usage workflow.

Expand Down

0 comments on commit cb8f4c6

Please sign in to comment.