-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pyiron based example #73
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the merge request. The tests are running through (there is a problem with cwl, but that is unrelated to your merge request). There is a main question related to the connection of the workflow steps in addition to some minor details.
- What is not clear to me is, how the dependencies are defined. The inputs are specified for each workflow step, but not the output and thus I'm not sure how the execution graph of the workflow is build. This is also relevant in the example, where instead of the output of a previous step (num_dofs) a hardcoded number is used.
- The current implementation seems to always execute the complete python script with all steps - most workflow tools build a graph and then execute this graph. This would also be important when building a provenance graph of an execution/run to follow all processing steps and the information that was transfered.
- This then also relates to the last question of avoiding the computation of already computed steps when e.g. working only on the postprocessing. It seems here that the complete workflow is re-executed. Not all tools provide an option to do so, but if that is the case, it would be nice to showcase that here (in a realistic scenario we would often define the function in a workflow step (e.g. the postprocessing script) as an additional input and thus trigger automatically a recomputation of only that step (and subsequent ones), since that input file has changed.
- That finally relates to the last question, which is not directly related to the merge request, but a general pyiron question. We often assume that a workflow step is completely defined by the inputs and the function call, i.e. there are no hidden static variables that are set in a workflow step and then reused in subsequent steps. With file based interfaces, this is relatively easy to ensure, since there are no internal variables attached to a workflow step. How would that be ensured in a system where complete objects/jobs are stored in memory? Related then the question, is there - in your opinion - a difference between a workflow system and a complete programming language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jan-janssen ,
thank you very much for the PR!
I added my review (mostly questions to clarify). I agree with @joergfunger that not explicitly defining the outputs/targets of each job is not ideal (and probably something that should be changed in a future version).
I was not able to run the pyiron workflow on my machine.
The error seems to be related to mpi4py
and conda_subprocess
.
pyiron_base ❯ python workflow.py
The job gmsh was saved and received the ID: 1
The job meshio was saved and received the ID: 2
The job poisson was saved and received the ID: 3
2024-03-12 11:54:53,752 - pyiron_log - WARNING - Job aborted
2024-03-12 11:54:53,752 - pyiron_log - WARNING - ERROR: could not import mpi4py!
Traceback (most recent call last):
File "/home/pdiercks/repos/nfdi4ing/pyiron_example/exemplary_workflow/pyiron/workflow/poisson_hdf5/poisson/poisson.py", line 6, in <module>
import dolfin as df
File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/__init__.py", line 142, in <module>
from .fem.assembling import (assemble, assemble_system, assemble_multimesh,
File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/fem/assembling.py", line 34, in <module>
from dolfin.fem.form import Form
File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/fem/form.py", line 12, in <module>
from dolfin.jit.jit import dolfin_pc, ffc_jit
File "/home/pdiercks/mambaforge/envs/processing/lib/python3.9/site-packages/dolfin/jit/jit.py", line 121, in <module>
def compile_class(cpp_data, mpi_comm=MPI.comm_world):
RuntimeError: Error when importing mpi4py
Traceback (most recent call last):
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 665, in execute_job_with_external_executable
out = conda_subprocess.run(
^^^^^^^^^^^^^^^^^^^^^
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/conda_subprocess/interface.py", line 175, in run
raise CalledProcessError(
subprocess.CalledProcessError: Command '['/bin/bash', '/tmp/tmprzp6zlv0']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pdiercks/repos/nfdi4ing/pyiron_example/exemplary_workflow/pyiron/workflow.py", line 51, in <module>
poisson = pr.wrap_executable(
^^^^^^^^^^^^^^^^^^^
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/project/generic.py", line 436, in wrap_executable
job.run()
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/utils/deprecate.py", line 171, in decorated
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 763, in run
self._run_if_new(debug=debug)
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 1281, in _run_if_new
run_job_with_status_initialized(job=self, debug=debug)
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 92, in run_job_with_status_initialized
job.run()
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/utils/deprecate.py", line 171, in decorated
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 765, in run
self._run_if_created()
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 1292, in _run_if_created
return run_job_with_status_created(job=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 115, in run_job_with_status_created
job.run_static()
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/generic.py", line 799, in run_static
execute_job_with_external_executable(job=self)
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 610, in wrapper
func(job)
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 676, in execute_job_with_external_executable
out, job_crashed = handle_failed_job(job=job, error=e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pdiercks/mambaforge/envs/pyiron_base/lib/python3.12/site-packages/pyiron_base/jobs/job/runfunction.py", line 732, in handle_failed_job
raise RuntimeError("Job aborted")
RuntimeError: Job aborted
The conda environments were created successfully. Here, I would also suggest to use a local prefix for the installation instead of a name and global installation.
Best,
Philipp
Co-authored-by: Philipp Diercks <[email protected]>
I was not able to reproduce your error. Is it possible that you already had an environment named |
@pdiercks and @joergfunger thank you both for the careful review. I updated the pull request and try to address the majority of the points individually. For the remaining points, I try to provide a brief summary here:
I think the first two points can be addressed rather quickly (days to weeks), while the first point most likely still takes more time (weeks to months). So I guess my question is: What is the minimal amount of changes to merge this pull request? |
The changes are well addressed and in principle we could merge. Just a couple of remarks.
|
I will investigate, but it's probably a problem with my local setup rather than the workflow. Also, the actions / tests were passing before, so no need for changes on your end. |
I would also be okay to merge. Thanks @jan-janssen for incorporating the changes and your explanations! |
As pyiron is copying all files to the working directory of each job step, we basically define the files on the input level rather than the output level. So for a remote setup this would mean we copy the working directory including all the required files for a given task via SSH to a remote computer and potentially even submit it to the scheduler there. Then once we get run time the task is executed and afterwards the results are compressed in one tar archive. This tar archive is then copied back to the local computer, there the output can be reused in the following job steps.
Yes, in pyiron each job receives a unique ID and so if a job was successfully calculated then it is just reloaded and no additional execution is necessary. In the case a calculation failed then the user has to manually remove this failed calculation before the workflow can be executed again. We added this blocker to remind users to look at their failed calculation and check if it was just a random error or if a modification of the input is necessary.
Based on this feedback, we made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice work. If you update pyiron, it would be great to integrate the updated versions here as well.
This pull requests implements the example workflow using the pyiron_base workflow manager which is developed as part of the pyiron project. The workflow in the
workflow.py
file can be executed using:In addition, the pull request also adds the corresponding Github action and a short Readme to explain the usage of the example.