Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deletion metrics #139

Open
dannytbecker opened this issue Oct 3, 2018 · 6 comments
Open

Add deletion metrics #139

dannytbecker opened this issue Oct 3, 2018 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@dannytbecker
Copy link

It would be very useful to see metrics around deletions available in the API. Some useful metrics:

  • Count of files deleted grouped by owner
  • Count of files deleted grouped by file size
  • Total Size of Files deleted grouped by user
  • Total number of files deleted in the past 24 hours
  • Total number of files deleted by user in the past 24 hours
  • etc.
@pjeli
Copy link
Collaborator

pjeli commented Oct 8, 2018

Hi @dannytbecker -- so deletions are tricky to do in NNA for a couple reasons.

Mainly its because NNA always acts / analyzes the real time metadata available. When things are deleted they disappear from the metadata file system inode tree; which makes it impossible for us to get metrics on deletions.

Trying to retain deletion information would mean hanging on to deleted inodes for some time and capturing information on them. This may be possible but I can't think of a great way to do it yet.

Also -- at the level we have access to there may not be a difference between deletes and renames because both cause the deletion of an inode from the tree.

One thing that is possible to do is to maintain a trending graph of all users file counts and sizes and use the difference between intervals as the amount / size deleted. That is subject to inaccuracy however but is about the best we can do today.

I'll continue to think on how to do this better.

@pjeli pjeli added the enhancement New feature or request label Oct 8, 2018
@dannytbecker
Copy link
Author

Thanks for the response, @Zero45 . I appreciate you taking the time to think about this issue and explain the difficulties with the problem.

@pjeli
Copy link
Collaborator

pjeli commented Oct 23, 2018

An interesting idea; bit of a stretch though: http://johnjianfang.blogspot.com/2015/03/hdfs-6634-inotify-in-hdfs.html?m=1

We could make use of the HDFSAdmin INotify Stream feature introduced in Apache 2.6.0. To get to this point we would need to drop support for both Apache 2.4.x and 2.5.x versions however. Which is feasible.

Once we are on 2.6.0+ versions we can use the stream to get a list of metadata updates and listen for "UnlinkEvent" messages which describe deletes and maintain a list of them.

Beginning to collect deletion data would be the start for now; combining that with the INode tree data may be complicated, even impossible however.

Will think further.

@pjeli
Copy link
Collaborator

pjeli commented Feb 28, 2019

So far I have had great success just relying on trending data vs trying to implement deletion monitoring in NNA. I can certainly understand a use-case for answering "who deleted file X?" though... but that's probably something better for the audit log to answer.

@pjeli
Copy link
Collaborator

pjeli commented Jun 3, 2019

I've been contemplating this one a bit still. One undiscussed possibility is to just calculate the difference between any 2 SuggestionEngine reports.

For example, if at time X a report completes, and then at time Y another report completes, we can use the time during which we have both reports to calculate quick differences and store them as the amount(s) deleted (if there are negatives).

This is likely the best we can do. Would this still be desirable?

@pjeli
Copy link
Collaborator

pjeli commented Apr 6, 2022

I was thinking about this a bit more over my long break. There is a rather nasty option we can take here. We could override the FSNamesystem with a wrapper (or something) that overrides the method removeLeasesAndINodes and have that maintain some form of a sliding window of big deletes. However it does not exactly solve the problem of tracing down "who caused the delete".

I think an even better solution would be to solve this within the NameNode itself. Within Apache. There is a code block within the private unprotectedDelete call that has the count information already - meaning we can know exactly how many files/dirs and how much bytes were deleted:
image

This may be a task better situated for when NNA eventually makes it to Apache. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants