-
Notifications
You must be signed in to change notification settings - Fork 182
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support to store and fetch dbt ls cache in remote stores (#1147)
This PR introduces the functionality to store and retrieve the `dbt ls` output cache in remote storage systems. This enhancement improves the efficiency and scalability of cache management for Cosmos dbt projects that use the `dbt ls` cache option (enabled by default) introduced in PR #1014 ## Key Changes 1. **`dbt ls` Cache Storage in Remote Stores**: Added support to store the dbt ls cache as a JSON file in remote storage paths configured in the Airflow settings under the `cosmos` section. The cache is saved in the specified remote storage path & it includes the `cosmos_cache__` prefix. 2. **Cache Retrieval from Remote Stores**: Implemented logic to check the existence of the cache in the remote storage path before falling back to the Variable cache. If the `remote_cache_dir` is specified and it exists in the remote store, it is read and used; We try creating the specified path if it does not exist. 3. **Backward Compatibility**: Maintained backward compatibility by allowing users to continue using local cache storage through Airflow Variables if a `remote_cache_dir` is not specified. ## Impact 1. **Scalability**: Enables the use of remote, scalable storage systems for dbt cache management. 2. **Performance**: Reduces the load on Airflow's metadata database by offloading cache storage to external systems. 3. **Flexibility**: Provides users with the option to choose between local (Airflow metadata using Variables) and remote cache storage based on their infrastructure needs. ## Configuration To leverage this feature, users need to set the `remote_cache_dir` in their Airflow settings in the `cosmos` section. This path should point to a compatible remote storage location. You can also specify the `remote_cache_dir_conn_id` which is your Airflow connection that can connect to your remote store. If it's not specified, Cosmos will aim to identify the scheme for the specified path and use the default Airflow connection ID as per the scheme. ## Testing 1. Tested with various remote storage backends (AWS S3 and GCP GS) to ensure compatibility and reliability 2. Verified that cache retrieval falls back to Variable based caching approach if the `remote_cache_dir` is not configured. ## Documentation Updated the documentation to include instructions on configuring `remote_cache_dir`. ## Limitations 1. Users must be on Airflow version 2.8 or higher because the underlying Airflow Object Store feature we utilise to access remote stores was introduced in this version. If users attempt to specify a `remote_cache_dir` on an older Airflow version, they will encounter an error indicating the version requirement. 2. Users would observe a slight delay for the tasks being in queued state (approx 1-2 seconds queued duration vs the 0-1 seconds previously in the Variable approach) due to remote storage calls to retrieve the cache from. Closes: #1072
- Loading branch information
1 parent
e1ff924
commit 41053ed
Showing
9 changed files
with
315 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.