Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/add logging #60

Open
wants to merge 44 commits into
base: dev
Choose a base branch
from
Open

Features/add logging #60

wants to merge 44 commits into from

Conversation

MaGering
Copy link
Collaborator

@MaGering MaGering commented Mar 13, 2023

Fixes #21.

With this PR logging is added to the pipeline.

Before merging into dev-branch, please make sure that

  • if data flow was adjusted: data pipeline run finishes successfully.
  • new and adjusted code is formated using black and isort.
  • the CHANGELOG.rst was updated.
  • the docs were updated.

@MaGering MaGering self-assigned this Mar 13, 2023
@MaGering MaGering force-pushed the features/add_logging branch from f5e1a3b to fccb356 Compare March 27, 2023 15:46
@MaGering MaGering marked this pull request as ready for review March 28, 2023 09:40
@MaGering MaGering requested a review from nesnoj March 28, 2023 09:40
@MaGering
Copy link
Collaborator Author

Logging is ready! :)
I was wondering if I should adapt mastr.py, so that logging is passed where we have print function? Here for instance. I would need to pass the logger with the respective function.

Copy link
Member

@nesnoj nesnoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this feature including all different ways of running tasks - works well so far! 😸

Some thoughts from my side:

  • I like the idea of putting the dataset's folder where it belongs to. However, I find it a bit messy when it is saved to the data folder - the user wants to see the data quite often whereas the log is only checked if something goes wrong. What about a separate log folder?
  • I'm a bit curious about the file name. You use the actual file's name (without extension), but what if there're multiple files?
    What about log/<DATASET_NAME>.log (in folder I proposed above)? Or just sth like log/dataset.log which would save implementing an additional function to obtain the dataset's name. We'd have DATASET_PATH / "log" / "dataset.log" everywhere which is sufficient to my mind. Are you fine with that?
    • Sidenote: If we use wildcards and 10 jobs are working on 10 parallel tasks of the same dataset, everything will be logged to the same file, but I propose not to tackle this now.
  • Currently there's no logging of Linux shell commands' output. That can be easily achieved by redirecting Linux' standard output (stdout) and standard error (stderr) messages by adding 2>&1 > {log} (it's also described in the snakemake docs). I can do this.

One more thing: I'm getting all output twice at the moment (regular out from snakemake and additionally from the logger):

image

Could you suppress snakemake's log output? Or the output from our logger but it is richer than snakemake's so I'd prefer to see ours..

@nesnoj
Copy link
Member

nesnoj commented Apr 20, 2023

Logging is ready! :) I was wondering if I should adapt mastr.py, so that logging is passed where we have print function? Here for instance. I would need to pass the logger with the respective function.

Oh, I missed this one. Yes, that makes much sense!

@nesnoj
Copy link
Member

nesnoj commented Apr 20, 2023

One more question:
In snakefiles where the task is called via python <script>, you pass the log file - example:
https://github.com/rl-institut-private/digipipe/blob/c19ceb2c7162809c745a6d7cbf05843e691269f7/digipipe/store/datasets/bkg_vg250_region/create.smk#L20-L23
But you do not use the 4th arg actually:
https://github.com/rl-institut-private/digipipe/blob/c19ceb2c7162809c745a6d7cbf05843e691269f7/digipipe/store/datasets/bkg_vg250_region/scripts/create.py#L33-L37
How does it work out that the log is written to the right file?

@MaGering
Copy link
Collaborator Author

  • I like the idea of putting the dataset's folder where it belongs to. However, I find it a bit messy when it is saved to the data folder - the user wants to see the data quite often whereas the log is only checked if something goes wrong. What about a separate log folder?

  • I'm a bit curious about the file name. You use the actual file's name (without extension), but what if there're multiple files?
    What about log/<DATASET_NAME>.log (in folder I proposed above)? Or just sth like log/dataset.log which would save implementing an additional function to obtain the dataset's name. We'd have DATASET_PATH / "log" / "dataset.log" everywhere which is sufficient to my mind. Are you fine with that?

    • Sidenote: If we use wildcards and 10 jobs are working on 10 parallel tasks of the same dataset, everything will be logged to the same file, but I propose not to tackle this now.

Thanks for your thoughts! I like the idea putting the log output to a separate directory! Since I wanted to keep the name of the data file in the content of the logging file, I decided to use the option of individual log files in the directory '.log' instead of the variant '.log/dataset.log'. The latter would require more adjustment and time to keep the name of the corresponding folder with datasets in it. Do you go along with that?

@MaGering
Copy link
Collaborator Author

That would be awesome! Thank you. :)

@MaGering
Copy link
Collaborator Author

Could you suppress snakemake's log output? Or the output from our logger but it is richer than snakemake's so I'd prefer to see ours..

I am quite at a loss as to how to silence this. Most of the entries have disappeared with commit a0a0676. Unfortunately I can't get rid of the ones you describe. I only found the --quiet option for this, but I can't get it built into the pipeline. Could you tell me how to do this @nesnoj?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add logging
2 participants