-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deletion metrics #139
Comments
Hi @dannytbecker -- so deletions are tricky to do in NNA for a couple reasons. Mainly its because NNA always acts / analyzes the real time metadata available. When things are deleted they disappear from the metadata file system inode tree; which makes it impossible for us to get metrics on deletions. Trying to retain deletion information would mean hanging on to deleted inodes for some time and capturing information on them. This may be possible but I can't think of a great way to do it yet. Also -- at the level we have access to there may not be a difference between deletes and renames because both cause the deletion of an inode from the tree. One thing that is possible to do is to maintain a trending graph of all users file counts and sizes and use the difference between intervals as the amount / size deleted. That is subject to inaccuracy however but is about the best we can do today. I'll continue to think on how to do this better. |
Thanks for the response, @Zero45 . I appreciate you taking the time to think about this issue and explain the difficulties with the problem. |
An interesting idea; bit of a stretch though: http://johnjianfang.blogspot.com/2015/03/hdfs-6634-inotify-in-hdfs.html?m=1 We could make use of the HDFSAdmin INotify Stream feature introduced in Apache 2.6.0. To get to this point we would need to drop support for both Apache 2.4.x and 2.5.x versions however. Which is feasible. Once we are on 2.6.0+ versions we can use the stream to get a list of metadata updates and listen for "UnlinkEvent" messages which describe deletes and maintain a list of them. Beginning to collect deletion data would be the start for now; combining that with the INode tree data may be complicated, even impossible however. Will think further. |
So far I have had great success just relying on trending data vs trying to implement deletion monitoring in NNA. I can certainly understand a use-case for answering "who deleted file X?" though... but that's probably something better for the audit log to answer. |
I've been contemplating this one a bit still. One undiscussed possibility is to just calculate the difference between any 2 SuggestionEngine reports. For example, if at time X a report completes, and then at time Y another report completes, we can use the time during which we have both reports to calculate quick differences and store them as the amount(s) deleted (if there are negatives). This is likely the best we can do. Would this still be desirable? |
It would be very useful to see metrics around deletions available in the API. Some useful metrics:
The text was updated successfully, but these errors were encountered: