EPIC: Towards Measurable Data Usage #97

wunder957 · 2023-11-08T02:51:23Z

Problems

This project was initially directed towards unplugged detection of data usage behaviour through eBPF technology. I'm glad we've initially implemented a framework for it. But want to make the probe results available to other applications (e.g. the Data Usage Controller of DataUcon project), we need to expose the results of our recording in a machine-readable format.

On the other hand, we need to finish standardising the storage side of things, and for large numbers of events, a traditional SQL database is not a good choice.

We don't yet have a good production example to represent our capabilities.

Status quo and Future

Relationship with storage back-end

OpenTelemetry is sought after by related projects as an open source standard for observability. We believe that although our project is far from observability in terms of observables, goals, and functions. However, our project is similar to OpenTelemetry related projects in terms of technical implementation, and we should be able to benefit from the development of OpenTelemtry and related backends.

As the project has evolved, we have completed the integration with OpenTelemetry: #82. Next, we will make OpenTelemetry our primary support, and SQL databases MAY NOT be actively maintained.

We are currently using jaeger as the first backend to access the.

Support jaeger analyzer #91

Cloud-native support

We will natively support monitoring of containers on the cloud, so let's start with the docker and k8s.

Support filter or identifier for runc containers #45

How to expose data

We will first build a querier for the jaeger backend to restore the tracer data from the backend, and then implement an analytics engine that can form an analysis of the tracer data to derive a picture of how the process is using the data. We will refer to this process as the measurement of data usage

Designing and implementing an analysis engine
Docs: 4W1H of Data Usage and Our Measurement Capabilities

production example

We previously accepted a machine learning case for MNIST that included analysis and associated probing points for data usage behaviours: #84, and I thought we could start with this case to demonstrate our data usage measurement capabilities

Support MNIST Case's Tracer #85
Support 4W1H Measurement of MNIST Case
Docs: 4W1H Measurement in MNIST Case

Other maintenance

Instead of (at least not in the near future) splitting the project into a queryer and a detector, we'll build two different images based on the same Python package(duetector). We already have a different CLI entry point, so I'm sure this won't be difficult.

In addition, we need to optimise the README document and the design document a bit, assuming the backend to be OpenTelemetry

Splitting container images
Docs: Switch to OTel backend

Roadmap

This EPIC will be released as version 1.0.0, prior to which the features described above will be integrated as version 0.x.y and in a gradual development process.

Regarding data use measurability, I am working on some related blogs (in Chinese).

The text was updated successfully, but these errors were encountered:

Wh1isper · 2024-01-14T09:05:34Z

Due to personal reasons I(aka @wunder957 ) will be leaving the project for a while, there is no one actively maintaining the project at the moment, if you are interested in getting involved feel free to contact me or any member of hitsz-ids.

wunder957 added the enhancement New feature or request label Nov 8, 2023

wunder957 added this to the v0.2.0 milestone Nov 8, 2023

wunder957 self-assigned this Nov 8, 2023

wunder957 pinned this issue Nov 8, 2023

wunder957 modified the milestones: v0.2.0, v1.0.0 Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Towards Measurable Data Usage #97

EPIC: Towards Measurable Data Usage #97

wunder957 commented Nov 8, 2023 •

edited

Loading

Wh1isper commented Jan 14, 2024

EPIC: Towards Measurable Data Usage #97

EPIC: Towards Measurable Data Usage #97

Comments

wunder957 commented Nov 8, 2023 • edited Loading

Problems

Status quo and Future

Relationship with storage back-end

Cloud-native support

How to expose data

production example

Other maintenance

Roadmap

Wh1isper commented Jan 14, 2024

wunder957 commented Nov 8, 2023 •

edited

Loading