-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a logging convention #30
Comments
It's used in most scrapers, as can be seen on Github and doesn't prevent anything. It's a set of scraper-tailored defaults which I think is the good approach. Most of our projects have very simple logging needs:
I don't see any realistic, current use case for an external log config file which is a very intrusive change. Please convince me otherwise 🤓 |
I really never noticed this was used in so many places, sorry about that. And I'm now sure it is not used outside scrapers (while it is very important AFAIK for zimfarm, cms, metrics, ...). The scraper-tailored defaults are anyway meaningful and useful. The limitation I find in this approach is that it forces us to have only one log level for all logs. And it disables the verbose logs of some deps (which are indeed too verbose by defaults) Should we need to have more detailed logs only for some deps or have more details about one verbose dep, it is not possible to achieve this without modifying the source code, and it is not even straightforward, I'm not sure how I should do it for instance (copy/paste the shared code in my scraper, do the modifications needed and replace the import?). Maybe this need is only very occasional and I shouldn't mind, but I'm not comfortable yet. I would be probably more comfortable with at least this same shared defaults embedded in a dictionary so that it can be more easily modified. |
It is used in CMS, sorry again |
Well, as its name suggest, it's a scraper lib 😉
I don't understand this. It surely defines a single handler because that's our common use case but you do have access to the logger so you can add additional handlers with different log levels. It sounds like a special case anyway: you're not going to ask your user to pass multiple log levels and following your proposal different log levels would be for third parties anyway.
No, if your project (unlike most) needs debug logs for urllib3, you can still bring them back logging.getLogger("urllib3).setLevel(logging.DEBUG)
This can be phrased as “I want user to manage logging completely” which is fine but should be done knowingly.
Does sounds like it
By experience, I find it less flexible, less dynamic and harder to maintain: a typo and your log operates unexpectedly. You often end-up touching it with code anyway and code has the benefit of linting. That said, we'd benefits from written conventions at least for non-scraper. I believe conventions should be driven by usage and experience so you try your approach in metrics and once it's matured or when we start another non-scraper project and require guidance, we can derive a non-scraper convention from it. scraperlib can also be updated as well, for scraper needs but there I believe we need less flexibility. |
You convinced me that current approach is sufficient AND that we miss documentation around typical usage ^^ I just do not knew how to achieve stuff I wanted to do with current approach. I will experiment (and ask if there is something I do not achieve) and write something around all this in the wiki. |
We have not yet formalized a policy around logging, and I think this is clearly missing. It looks like a kind of convention is proposed in https://github.com/openzim/python-scraperlib/blob/main/src/zimscraperlib/logging.py but I do not find it very versatile and forces quite a lot of stuff (verbose dependencies, logger level is DEBUG, ...). Plus it seems to be rarely used in fact (I rarely encountered this at least).
I've read https://docs.python.org/3/howto/logging.html and there are tons of good ideas and recommendations, and this should probably be the first decision: adhere to recommendations in this document.
There are still open topics in this documentation, for which I propose decisions or suggest discussions below. WDYT? Anything else?
Live logging configuration
In order to be as versatile as possible, the software we produce must not take much decision on what needs to be logged and what is useless.
Software must hence allow to configure logging either:
LOGGING_LEVEL
whose value is a log level which is set in the root handlerLOGGING_FILE_CONFIG
whose value is the path to a configuration in configuration file format.It should also support configuration by an environment variable named
LOGGING_YAML_CONFIG
whose value is the path to a YAML configuration adhering to the configuration dictionary schemaWhile
LOGGING_YAML_CONFIG
is the preferred solution (most versatile and less verbose), this is optional because it forces the use of PyYAML to parse the configuration file (and we might want to not include this in all projects).If multiple
LOGGING_*
environment variables are set, the precedence isLOGGING_YAML_CONFIG
>LOGGING_FILE_CONFIG
>LOGGING_LEVEL
.We do not recommend to continue to support older environment variables for backward compatibility, or at least they must have even less precedence than the
LOGGING_*
ones.Software must provide a sensible default configuration, typically with a default dict_config embedded in the code.
Loggers
Each software and library must use a unique logger with a unique and easily identifiable name, such as the
__name__
of their top-level package or module. They must not log directly to the root logger.This is meant to help the software / library user configure the logging verbosity or handlers as they wish while maintaining simplicity. Using a unique logger per software and library is deemed sufficient and avoids by default logging configurations that rely too heavily on module names. Should a specific need arise (e.g. filter out one kind of very verbose logs), it is always possible to achieve this with custom filters based on pathname or log content for instant. Experience shows we are usually lazy to do this anyway and long logger names are in fact just clogging the logs.
The text was updated successfully, but these errors were encountered: