-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] /tmp/ File Not Found Error Causing Task Failure for dbt Cosmos Tasks #1075
Comments
To resolve the intermittent Here are some steps to address this issue:
By ensuring the temporary directory is properly managed, configuring the |
Hi, @oliverrmaa. I'm still trying to reproduce this issue, but I have a few hypotheses, and we can try a few things out. We'll soon have an alpha release to validate one of these hypotheses. The errors you are seeing are raised by We create a temporary directory for every dbt command that will be run. This was to overcome two main issues:
Some follow-up questions: There are two low-hanging fruits I can see: a) Assuming the issue may be with the creation of symbolic links, we can create an alpha version of Cosmos that avoids creating those, runs the commands from the original folder, and uses environment variables for dbt-core JSON artifacts and dbt-core logs. b) We could check if this error message ("[Errno 2] No such file or directory") happened within the dbt task run, and attempt to run it as part of the same task run in Cosmos itself. However, it does feel that configuring the task_retry in Airflow may be the most suitable place. If we implement (a) and (b), you confirm that reducing the concurrency in smaller Airflow /Astro instances leads to the same error. Another path we can follow to improve the concurrent executions of dbt tasks is not to use dbt to run the SQL but to have Airflow operators running the compiled SQL. This is a strategy companies who had to scale dbt in Airflow used in their proprietary solutions (e.g., Monzo), and were quite successful. The downside is that some macros may not work as expected, but this can be a good compromise. The downside of this strategy is that we'll have some effort per Dataware house, but it may pay off. |
This issue seems to be affecting several users from the OpenSource community and Astro customers. |
Feedback from @oliverrmaa :
On (ii), I'd recommend the following steps:
After a few hours of analyzing and troubleshooting this issue, I have a few conclusions: Cause Either MItigations There are two workarounds to the problem: a) Use Airflow b) Reduce task concurrency at the Airflow worker nodes to 1 in the node pools running Cosmos tasks, and have very small worker nodes, with a larger autoscaling upper limit. c) Use d) Use Cosmos Follow ups We were not able to reproduce this problem. But it would be great, @oliverrmaa , if we could narrow down "who" is creating these files (e.g. "/usr/local/airflow/dags/dbt_bfp/models/paid_media/audiences/om_20240401_fl_scotus_abortion_showbuy_person/._om_20240401_fl_scotus_abortion_showbuy_person.yml.DGBDnb"). We could make this folder: By making the original folder read-only. There are two options:
From an implementation perspective in Cosmos itself, we could adopt one of the following strategies to mitigate the problem:
|
I'm having these same problems with airflow, using version 1.7.1 of cosmos and 2.10.3 of airflow
I am using VIRTUALENV mode
|
@oliverrmaa After analysing this issue, I considered a few possible strategies to overcome it. I'll first summarise what seems to be the cause and then give suggestions. Cause This is not a "bug" in Cosmos per se - since this happens to users of other solutions that run dbt in Airflow (and potentially anyone trying to run multiple dbt commands in the same host, regardless of the orchestration tool). This issue happens intermittently, and I was not able to reproduce it. dbt, or some plugin/adaptor, is creating/deleting these temporary Although Cosmos creates a new temporary folder per dbt command run, the new temporary folder contains a symbolic link to the original dbt project models' folder. In other words, multiple dbt processes in the same host will reference the same original models' folder. In the past, Cosmos would copy the entire dbt project folder per process, but this was causing lots of disk usage & delaying running dbt commands when dbt projects were large. Mitigations Astro Customers reported that by setting the task Another mitigation I thought about is to leverage the dbt Create/update the root directory of your dbt project to contain a
Possible code changes to Cosmos I see two possible changes we could make to the Cosmos code-base to handle this problem:
Next steps We should only consider these two changes to Cosmos if |
@arodrber0, your problem differs from the original one reported in the current ticket. Please, create a dedicated ticket with the steps to reproduce. |
yes, I managed to solve it by changing the execution configuration, in these newer versions `dbt_executable = Path(f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt") venv_execution_config = ExecutionConfig( |
This seems to attempt to fix the issue at the root of the problem, |
Astronomer Cosmos Version
Other Astronomer Cosmos version (please specify below)
If "Other Astronomer Cosmos version" selected, which one?
1.4.3
dbt-core version
1.7.17
Versions of dbt adapters
dbt-bigquery==1.7.4
dbt-core==1.7.17
dbt-extractor==0.5.1
dbt-semantic-interfaces==0.4.4
LoadMode
DBT_LS
ExecutionMode
LOCAL
InvocationMode
SUBPROCESS
airflow version
apache-airflow==2.9.2+astro.1
Operating System
Debian GNU/Linux 11 (bullseye)
If a you think it's an UI issue, what browsers are you seeing the problem on?
No response
Deployment
Astronomer
Deployment details
We have a main production deployment in Astro Cloud which we consider as production. We also do local development via
astro dev start
. We have continuous deployment set up through CircleCI which deploys merged PRs to our master branch to our production deployment viaastro deploy --dags
. For authentication to our data warehouse (Google BigQuery) in production, we useGoogleCloudServiceAccountDictProfileMapping
and for local we useProfileConfig
where our dbt profiles.yml has a hardcoded path to a service account JSON file which is at the same path for each developer.What happened?
We are still intermittently seeing
FileNotFoundError: [Errno 2] No such file or directory for /tmp files
every few hours or so across multiple DAGs ever since the inception of our Astronomer/Cosmos setup. This error appears on Cosmos created dbt model run tasks. This issue affects our on-call personnel because they have to manually clear and re-run these tasks in order for the model to successfully run (the re-run usually succeeds). Some model runs must be manual re-run in order for this task to succeed and some may recover on their own.Relevant log output
How to reproduce
I believe having our setup should reproduce it as the issue happens quite often (once every few hours). We also have 1000+ dbt models so an example project of that size may be needed for reproducibility. If underlying deployment infrastructure information is required here it is:
Anything else :)?
No response
Are you willing to submit PR?
Contact Details
[email protected]
The text was updated successfully, but these errors were encountered: