-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] Add basic telemetry to Jupyter server and begin emitting server events #364
Conversation
Bump telemetry extension commit as well
Experiments here informed the schema naming recommendations in jupyter/telemetry#11
python 3.5 is still supported
This lets us add detailed documentation & description to our schemas, which is very hard to do in JSON. We also add a lot of documentation to the one JSON schema we have
Primary advantage over JSON is that we can do multi-line strings for more detailed documentation. We also expect humans to read & write these, so YAML is a much better format there. All JSON is also valid YAML, so that helps. Depends on jupyter/telemetry#13
We made a v0.0.1 release!
bd8aa7d
to
7302396
Compare
Hi all, I was just curious if this is planned for a specific release / date? I noticed a few issues have been replaced by this one. We are investigating adding telemetry event logging for certain actions, and it looks like this work covers use cases we would want to cover. We're still planning how to approach our work, but we use the zero-to-jupyterhub-k8s for a deployment on AWS. We planned to do something to the effect of configure the event handler (like shown here: https://jupyter-telemetry.readthedocs.io/en/latest/pages/configure.html )
to use a CloudWatch or S3 EventLogger (as we're in AWS). e.g. https://pypi.org/project/cloudwatch/ (cloudwatch log handler). Then we would import https://github.com/jupyterlab/jupyterlab-telemetry , and start injecting our events via our own extension (by subscribing to "signals", or customizing code as necessary). Is that approach aligned with the work you're doing here? I assume we'd just start getting the additional events you're adding, once this code launches. |
Hi there's currently no specific release date for this but this is something that we've been actively working on and trying to get done as soon as we can. Could you elaborate more about your use cases? What events are you looking to record? These could potentially help us make better informed design decisions. Thanks! |
Hello! I work with @robertpyke and can speak to the details of this project. We are in the early stages and still nailing down specific requirements. But I can give a bit of context. We have a large user base on JupyterHub and are trying to get a better idea of usage patterns to inform our development. For instance, we'd like to know when people are using R kernel vs Python. When we have a better list I'll let you guys know. |
Our plan is to have telemetry events for Jupyterhub and Jupyter server REST APIs. Events for Jupyter server would be tracked under this PR while those for Jupyterhub would be under jupyterhub/jupyterhub#3218.
Starting a kernel would be one of those events being recorded, along with information on the user and kernel. That should work for your example use case?
Great! Currently we are working on the telemetry backend itself. Once that is settled we will come back to adding telemetry events under these two aforementioned PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, but then I wrote some good chunk of it :D Minor comments, but I'd also love to get this merged.
jupyter_server/serverapp.py
Outdated
self.init_server_extensions() | ||
self.init_configurables() | ||
self.init_components() | ||
self.init_eventlog() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being called twice.
docs/source/eventlog.rst
Outdated
@@ -0,0 +1,61 @@ | |||
Eventlogging and Telemetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a security note about not relying on this for security or auditing purposes, unless the user can categorically not mess with the python environment this is running in? Otherwise you can just change the code to emit whatever you want. Likewise, user code can just hit the eventlogging endpoint to add any spurious events it wants.
Almost everyone I talk to wants to use this for security, and this caveat is very important.
|
||
# Profile, may need to move to a background thread if this is problematic | ||
try: | ||
self.eventlog.record_event(schema_name, version, event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires the schema to be registered serverside before clients can post to it. How do we do that? Perhaps a traitlet, with corresponding entrypoint based loading as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let me create a separate PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinged you on gitter about this but Imma just copy paste here.
This means something like a jupyter telemetry [register|remove|list] <schema>
command?
register would put the schema file in a event-schemas directory. The directory structure would be something like event-schemas/<name>/version/<schema>.[json|yaml]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great! I'm thinking that most things that want to submit events will be classic notebook extensions or jupyterlab extensions. These will be python packages, so perhaps we can begin by just allowing entrypoints? You call it, and it returns a bunch of schemas to register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes more sense. I was thinking of another type of entrypoint. I'll make a separate PR on this one.
6. delete | ||
Delete a file or empty directory at given path | ||
path: | ||
category: personally-identifiable-information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to use URIs for category filtering, and there are a few well-known ones defined in https://jupyter-telemetry.readthedocs.io/en/latest/pages/schemas.html#property-categories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires changes in jupyter/telemetry master branch that haven't been made a released yet. Should we make a new release for telemetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, making a new release there seems appropriate. Would you be able to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope I don't have the permission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this one and use the latest commit on jupyter/telemetry master for now, will change it once we have a new release.
Also should the unrestricted
category be URI or just a generic one? Seems like we're currently using a universal unrestricted
category
https://github.com/jupyter/telemetry/blob/d2c19ffa03fde7d1903759de4a265ef711b333ef/jupyter_telemetry/eventlog.py#L165-L176
While we're on the topic of URI may I have your thoughts on jupyter/telemetry#56? Thanks!
Thanks, @kiendang. I've added this to the Jupyter Server Team Meeting agenda for this week. In the meantime, I'm going move this PR to "draft" state until we merge and release jupyter/telemetry#60. |
Move to a separate branch
instead of raw paths since raw paths do not work if e.g. install package from a zip file https://importlib-resources.readthedocs.io/en/latest/using.html#example
related jupyter/telemetry#61 |
Is this still being worked on? |
@Zsailer sorry for not working on this for quite a while. I have been busy for the last year. I do have some free time now and would like to resume working on this and getting this merged. The original todo list was
Does that plan still look valid? anything new/changed regarding telemetry? |
original pr #233
This is for emitting server events. For client events using the eventlog endpoint see #501.