-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gather stats #189
Comments
General I have two more questions:
Optional service Plugin stats It might also be interesting to ask the Systems department if there is already something available Alternatives I would like to discuss the alternative in a meeting when you are back. |
Thanks for the input. I wouldn't be too worried about resource usage when collecting resource usage. I'll will see how much resources this takes. About the optionality, ok agreed it should be straight forward to do that. The systems group uses slurm for statistics which is fine because DKRZ as an institution decided to use slurm. I wouldn't want to rely on slurm as we have made the decision to keep the workload manager integration as general as possible. For example the DWD (I will get in touch with them in March) doesn't use slurm but PBS. About yet another database type. Once we have a stats instance with a mildly functioning rest API up and running we could think of transitioning the other parts that uses db storage towards that storage interface. I am kind of liking mongoDB or any other no SQL approach because it allows being more flexible with what kind of data we store. And since changes in data structure happen quite often mongo offers the flexibility to cater for those changes without headaches. |
I think that the stats you propose are basically all (and more) that I had in mind. maybe add 3 more:
regarding on how to implement that:
|
regarding on where to implement it. For the moment we could patch the code writting some of the Plugin related statistics into the logfile that ends up in the Very temporarily I added some junk code to the wrapper of a plugin to gather some data (runtime, storage usage, files produced) printing it to the logfiles e.g. here. I used psutil for the CPU/Memory usage but I am afraid I do not know how to make it work properly there. I was looking how to do something similar in the freva core code but I don't find where. I have some doubts where, how to put it. For what I see in the result: Optional[utils.metadict] = p._run_tool(
config_dict=complete_conf,
unique_output=unique_output,
out_file=out_file,
rowid=rowid,
) which in turn calls _run_tool def _run_tool(
self,
config_dict: Optional[ConfigDictType] = None,
unique_output: bool = True,
out_file: Optional[Path] = None,
rowid: Optional[int] = None,
) -> Optional[Any]:
config_dict = self._append_unique_id(config_dict, unique_output)
if out_file is None:
is_interactive_job = True
else:
is_interactive_job = False
self._plugin_out = out_file
for key in config.exclude:
config_dict.pop(key, "")
with self._set_environment(rowid, is_interactive_job):
try:
result = self.run_tool(config_dict=config_dict)
except NotImplementedError:
result = deprecated_method("PluginAbstract", "run_tool")(self.runTool)(config_dict=config_dict) # type: ignore
return result but the |
We've had the idea to gather solr search stats for a long time. Once in a while resource utilisation stats came up as well.
I just wanted to start a discussion on how such statistics could be gathered and evaluated.
Since the new databrowser-api already implements saving search queries into a MongoDB I would suggest we use a similar approach for all other statistics. Yet there are a couple of questions:
Should the statistics gather tool be a service on its own, that is seperated from the databrowser-api, or shall we just leave it a part of the databrowser-api? The latter would be simpler but less clear. I'd prefer a clear separation.
How should we save statistics? In the current approach the client would have to make a mongoDB connection to store the data. That means that a) any client software needs mongoDB software as a dependency, b) passwords usernames etc needed to be "communicated" to the client. We could bypass b) by making the mongoDB world writable but not readable (I don't know whether this is possible).
My answer to those questions would be setting up a dedicated statistics service with a dedicated simple RESTapi. Clients would only have to make requests to store data (without authentication). This would also allow for something like jsonSchema validation. Similar the statistics could be gathered after admin username and password are provided. This would make sure that only privileged people would have read access to the statistics.
Another question is what data do we want to store. Could I kindly ask to gather things, aside form databrowser query statistics. My suggestion would be:
Anything else that is missing?
The search queries are taken care of by the databrowser-api, hence straight forward since already implemented.
The plugin stats is a little more complicated. I think we would have to implement a daemon that uses psutils to gather those stats and adds them to the statistics service. I think we would need some sort of a thread (ideally async) with a start stop pattern that frequently gathers data. Async because we want the thread not only gather the data but to also add it to the DB, because if jobs get killed we will have at least some data in the DB. Ideally this daemon should run in a subprocess but I think then communication back and forward with the parent process gets tricky, therefore a thread might be the only thing that is left? Unless we make the whole plugin manager async 😝.
@eelucio @ckadow @eplesiat Input on the statistics would be good? Also if there would be need for any GPU stats and if so what type.
@Karinon any thoughts on the design?
The text was updated successfully, but these errors were encountered: