Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] - Duplicate functionality to support a primary folder location, rather than be based on age #418

Open
nodecentral opened this issue Oct 6, 2024 · 5 comments

Comments

@nodecentral
Copy link

Is your feature request related to a problem? Please describe.
Duplicates are a constant problem for me when people upload files, and I’ve tried to implement a process where there is always a primary folder where the one we use should be. (Not by age, but by storage location)

Describe the solution you'd like
To have the ability when using duplicates to specify a folder (including the hierarchy below it to be the primary master storage location.

Describe alternatives you've considered
Looked at standalone duplicate tools like fdupes and jdupes but they do not seem to have a facility to assign a location to be the master (and therefore protected from any duplicates being deleted)

Additional context
Nothing more to add I don’t think, it would be a great feature if possible to do..

@tfeldmann
Copy link
Owner

If I understand this correctly you may look into the detect_original_by: "last_seen" option of the duplicate filter. last_seen marks the file that was found later as the duplicate. Example:

rules:
  - locations:
      - ~/Downloads
      - ~/Desktop
    subfolders: true
    filters:
      - duplicate:
          detect_original_by: last_seen
    actions:
      - echo: "Found dup: {duplicate}"

Now if you have a duplicate in your downloads and your desktop, the desktop file is assumed to be the orginal.

@nodecentral
Copy link
Author

nodecentral commented Nov 27, 2024

Hi @tfeldmann - thanks for responding,

Unfortunately that approach only works in specific scenarios, and make a key assumption of which came first is what you want to keep

What I was hoping for was something that’s location based, not time based, that way people can move files in to a location that they want to have it preserved, and have any other instance actioned - e.g. identified, or moved, or deleted or …etc.

@tfeldmann
Copy link
Owner

„Which came first“ is not time based but based on the order of the locations you provide. Organize walks through the locations in the given order, so files in the first location are always first_seen, so to speak

@tfeldmann
Copy link
Owner

Am I missing something here? Can you post a imaginary config to see how this should be specified?

@YoSiJo
Copy link

YoSiJo commented Dec 1, 2024

I think the following is meant:

---
rules:
  - locations:
      - ~/Desktop
      - ~/Downloads
    subfolders: true
    filters:
      - duplicate:
          detect_original_by: first_seen
    actions:
      - echo: "Found dup: {duplicate}"

Unfortunately, it doesn't quite solve the problem.
If, for example, duplicates already exist in ~/Desktop, they would also be handled, although the aim of the rule is probably rather to handle only files in ~/Downloads.

I had hoped to be able to solve this simply using the regex filter, but it only gets the file name and not the path.

---
rules:
  - name: Remove all file, that we have in the archive
    locations:
      - ./
      - ~/.archive
    subfolders: true
    filter_mode: all
    filters:
      - duplicate
      - regex: '^\./.*$'
    actions:
      - delete

So it would be nice if you could enable filters like name or regex to get not only the filename example.foo, but also the path ./example.foo.
Alternatively, filters like name_path and regex_path would of course also be an option, but I think the first variant would be nicer.

---
rules:
  - name: Remove all file, that we have in the archive
    locations:
      - ./
      - ~/.archive
    subfolders: true
    filter_mode: all
    filter_string: path
    filters:
      - duplicate
      - regex: '^\./.*$'
    actions:
      - delete

For example, you could use the filter_string option via name or path to determine whether filters such as name or regex are passed as the string example.foo or ./example.foo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants