All notable changes to this project will be documented in this file.
The format follows keepachangelog.com. Please stick to it.
- preserve coredump and exit status for fatal signals
- --equal no longer writes rmlint.sh and rmlint.json by default
- issue #458: --is-reflink no longer accepts the '-o' option or writes rmlint.sh/rmlint.json
- issue #434: downgraded 'Added big file "<filename ...>' from warning to info
- PR #443: scons: make fiemap feature depend on --with-fiemap, not --with-gettext
- PR #470: scons: read environment variables before conftests
- PR #474: scons: when printing parameters such as DEBUG, interpret '0' as 'no'
- PR #485, issue #484: fix sorting of regex options (-S x and -S r) when multiple paths match
- issue #496: fix 'free(): invalid pointer' crash with
-c csv:unique
- issue #438: fix 'Can't open directory or file "...": Invalid argument' on some platforms
- issue #621: fix GUI freeze with glib 2.75.3 and above
- issue #608: fix setuptools InvalidVersion error when installing GUI with packaging 22.0 and above
- issue #613: actually remove the GUI's Polkit requrement ('Namespace Polkit not available')
- issue #522: fix --size overflow detection on some platforms
- issue #549: fix 'assertion failed: (node->inode != RM_NO_INODE)' on 32-bit platforms
- issue #555: fix deadlock when
-T nonstripped
encountered an executable FIFO - issues #527, #528, #529, #530, #611: fix --is-reflink false-positives and false-negatives
- fix --merge-directories false-negatives with
--algorithm sha1
- issue #577: fix GUI error 'Unknown option --match-with-extension'
- issue #552: prevent generated script's -p option from overwriting itself with --merge-directories
- issue #475: fix 'assertion failed: (self->num_pending == 0)' with --merge-directories
- issue #562: fix 'double free or corruption (out)' with --merge-directories --write-unfinished
- issue #545: fix skip_hardlink false-negatives with --merge-directories
- do not write checksums to xattrs when --clamp-low/--clamp-top are in use
- issue #431: suppress 'failed to getxattr for ...: Attribute not found' on macOS
- fix progress text not displaying final results before continuing to next stage
- fix regular files and symlinks potentially matching with --see-symlinks
- fix assertion failure with --merge-directories and zero paths from stdin
- issue #519: suppress 'Permission denied' errors from xfs_info
- fix --hash for large files on 32-bit platforms
- issue #628: fix 'SyntaxError: source code cannot contain null bytes' with --gui and Python 3.12
- implement -i option for rmlint.sh to ask before removal of each file (thanks JLuszawski)
- Fix version display in Shredder UI (thanks JLuszawski)
- Broken JSON output in case of paths with double quotes (thanks EvilMcJerkface)
--replay
works now with-D
:.json
created without-D
can now be packed into duplicate directories during replay. Vice versa, duplicate directories can be unpacked if no-D
switch was given. This feature is still considered to be experimental. There might be still a few rough edges. The output will warn you when the feature is in-use and offers instructions on how to disable it.
- Related to the
--replay
feature: The.json
output now containspart_of_directory
entries if generated with-D
. Those are basically duplicates the duplicate directories are composed out of. - The
sh
output now visually denotes groups of duplicates with a newline.
Nothing.
Nothing.
- The
mtime
comparison was broken when using--xattr
. - Timestamps for
xattr
were locale-sensitive. This has been fixed. - Intermediate checksums where written to extended attributes.
- Several Python related deprecation warnings.
- ui: Changing the size factor did not let you apply the settings.
--size
can now correctly handle very large sizes.- "Evil" (think procfs) filesystem detection triggered wrongly sometimes.
- An up-to-date COPR package for Fedora (thanks eclipseo).
- Add --xattr as --xattr-write, --xattr-read & --write-unfinished to enable easy checksum caching for the next run.
- Unique option for json and csv formatters (-c json:unique)
- New -k option for rmlint.sh to keep directory timestamps like before deletion.
- Warn when using -j without -D.
- The exist status is now EXIT_FAILURE when rmlint was interrupted.
- Slightly changed progressbar ETA algorithm to deliver more stable results.
- docs: added a few scripting examples to underline the possibility to use 3rd-party tools.
- --dedupe options now checks if the files are already reflinks to reduce disk thrashing.
- Made sure that -T dd is the same as specifying -D.
Nothing was deprecated.
Nothing was removed.
- Dry-run was not honoured when a user-defined command was used.
- Fix crash when compiled with -msse4.2
- Relative symbolic links were compared wrong when using -f (See issue #333)
- gui: The size options wer enot correctly passed to rmlint (only Megabytes)
- xattr: timestamps where often compared wrongly.
- Fiemap (and thus reflink detection) code was improved.
- --replay did not replay duplicate directories correctly.
- --replay did not honour --hidden and --size correctly.
- Various build issues.
- gui: fix "render from selected" feature.
- gui: generated script now correctly removes itself.
- gui: fix display of paths with ampersands in them.
Mostly a bugfix release with smaller functional changes.
- Support for reflinks on XFS (48c2e5b1)
- Option "--no-backup" to overwrite output files as before (see "Changed")
- New build system option called "gettext" that eases the maintenance of translations. (thanks Michael Witten)
- gui: Add config option to choose how to handle the found duplicates
- -pp is now silently permitted, but is the same as -p
- Old output files get backed up now to a new timestamped name. This was done to not to overwrite the results of long runs. You can use --no-backup to disable this behaviour.
- Several internal cleanups and potential bug fixes (thanks to Michael Witten)
- Change the default optimization level for a build to -O2.
Nothing was deprecated.
Nothing was removed.
- Several small fixes in the documentation.
- Compiler warnings that may happen with clang and on 32 bit.
- "-g" now does not overwrite other formatters anymore.
- Make the progressbar less flickering.
- Startup error when the permissions to a location where bad.
- Endless loop that happens on btrfs due to the fiemap optimization.
- Fix a bug when doing "rmlint --replay x.json" without an explicit path.
- Fix -f that did not really follow symbolic links.
- gui: locations are now stored persistently and survive restarts.
- scons should work now with both python2 and python3.
- extensive memory allocation with slow CPUs.
- Do not use --remove-destination of cp, but use "rm + ln" to support non-GNU systems.
- New checksum types metro and highway
- New option --keep-hardlinked
- --dedupe option can deduplicate twins on any reflick-capable filesystems
- --dedupe-readonly option can dedupe files on read-only btrfs snapshots
- Checksum types for -P... options (see #261)
- Option --btrfs-clone (use --dedupe)
- Paranoia option -pp (use -p)
- Checksum types bastard, spooky, city & farmhash
- Multihash output option
- Fix scons 3 compatibility issue (#258)
- Fix compile error on systems with no FIEMAP (#252)
- Fix handling of bad uids/gids in python output formatter (#239)
- Fix escaping of dirnames in rmlint.sh test for new emptydirs (#241)
- Fix another quoting issue (#272)
- Fix build-error on macOS due to wrong usage of #ifdef.
- Fix a bug in treemerge that sometimes counted hardlinks more than once.
- Fix copyright year notice all over the source code.
- Fix --equal to allow hidden files, files counting as other lint and empty files.
- Fix --equal to allow path doubles so passing symlinks to the same file still yields a 0.
Due to those fixes the paranoia mode of
rmlint.sh
should run smoother (again) now. - Use a more sophisticated strategy to find the path to the own binary and fallback to
$(which rmlint)
. - Workaround the bash bug that did not allow huge functions by bringing back the old approach without a function.
- It is now possible to treat empty files as duplicates when specifying "--size 0".
- Use the
cmp
builtin to compare regular in the shell script files again, since it's faster. - Abort early in
--equal
if an mismatch has been encountered. - Test the shell script more thoroughly to avoid release troubles.
- Add a new
-c
switch tormlint.sh
that cleans up empty directories while running the script.
- Do not chmod +x files that exist already (previous versions tried to +x e.g. /dev/null)
- Support anonymous pipes in the sh formatter (Supported now: rmlint -o sh:>(cat))
- Documentation should look a bit cleaner now.
- Fixed a crash when a file was clamped to a size of 0.
- Fixed a rounding error bug that can happen while sorting mtimes.
- Fixed the slightly broken --mtime-window option (was too permissive in rare cases)
- Force permissions (0755 for binary, 0644 for manpage) when installing files in the filesystem.
- Fix wrong ordering when using -D without -k or -K.
- Fix a space wasting bug that happened with -D and -c sh:link.
- Fix a bug in treemerge that sometimes lead to directories not being detected (should be also slightly faster now)
- Fix inconsistent handling of duplicate directories when using hardlinks, symlinks and reflinks.
- Abort early if an invalid path has been given via commandline or stdin.
- Do not silence errors while commandline parsing when using
-g
.
- Added new checksums: blake2{s,sp,b,bp} and sha3{-256,-384,-512}. sha3 is an alias to sha3-256.
- Support for GNU Hurd (well, at least it compiles :)
- New "-o stats" formatter that prints some statistics about the run.
- Progressbar now additionally shows an ETA time (how long rmlint will probably run)
- New option
--honour-dir-layout (-j)
that makes--merge-directories (-D)
also check if the directory layout is the same. - A new
--equal
option that works like thecmp
utility, but supports directories. - Add a progress percentage to
rmlint.sh
for big runs.
- Changed the default hash function to blake2b.
- The default formatter (pretty) now also outputs the time used for the run.
- Updated existing translations.
- Update translation template.
- Various fix to make rmlint ready to be packaged in debian.
- Output of rmlint.sh uses a colored prefix now.
- Make
--replay
truly merge different sets of duplicates. - Call
exit(1)
when getting a fatal signal (somehow was missing) scons test
now executes only the sane part of the testsuite.- Be more friendly when no manpage was found (and show --help)
- Handle readonly btrfs subvolumes well. See also: #195
- Various build errors fixed for old/rare systems.
- Various fixes in the gui, mostly related to old GTK versions.
- New option
--mtime-window
: Only consider files as duplicates that share a mtime in a certain time-window. See also: #197 - New sortcriteria
O
(maximize outside hardlinks) andH
(maximize total hardlinks) See also: #196 - Proper installation instructions for macOS.
- Re-Design
--replay
to accept//
like the normal commandline does. - New default sortcriteria is
pOMa
to maximize the chance of deleting the most bytes from the storage.
- Makefile related compile troubles on BSD for the GUI>
- Check for accessat and ATR_FDCWD explicitly and try to work around them when missing.
- Clearup installation instructions
- Fix symbolic link emitting in sh script (sometimes files were omitted from rmlint.sh)
- Fix compile stop on BSD systems in utilities.c (thanks f99aq8ove)
- Fix some compiler warnings and typos.
- Add basic spanish translation.
- Add basic compile support on cygwin.
- Fix bad size stats using
--replay
with hardlinks. - Fix unicode emission of
json
formatter. - Fix broken
gui/SConscript
(was looking for python4...)
- Add
unique
formatter to print unique files. (See #161)
- Removed
--with-metadata-cache
and--cache
since both introduced lots of code without giving many benefits. In total about 1000 lines of code were removed in this process. - For limiting memory usage there is just
--limit-mem
now.
2.4.1 Myopic Micrathene Bugfixes -- 2015-11-01
btrfs
: Do not try to clone files on different subvolumes.gui
: Tie down version for gobject-introspection.freebsd
: Fix very bad performance due to mounttable and some compile issues.
-
-S
now understands two new letters to match via regular expressions:r
: Select the path as original that matches the pattern given in<PATTERN>
after this letter.x
: Same asr
but match only basename.
2.4.0 Myopic Micrathene -- 2015-10-25
rmlint
should compile on Mac OSX now.- Bugfix: Broken
chown
calls in sh script (thanks Shukrat Mukimov) - Bugfix: memory corruption when specifying
-T dd
alone. - Bugfix: Make
-D
and-k / -K
play together nicely (thanks phiresky). - Smaller compile time troubles fixed.
- Progressbar uses timeout-based redraws which leads to much smoother drawing and less cpu footprint.
pretty
formatter (default) produces now valid escaped commands. It is still intended for visual output only. That's why a note for this was added.
- A fully working graphical user interface which is installed as a python module
by default (can be disabled via compile option ie
scons --without-gui
). It can be started viarmlint --gui
. - Support for automatic deduplication on btrfs using
BTRFS_IOC_FILE_EXTENT_SAME
. The Shellscript now will contain calls tormlint --btrfs $source $dest
for duplicates onbtrfs
filesystems if the user specified-c sh:clone
. - Benchmark suite that will track the performance of rmlint from release to release. This helps developers detect any speed regressions or improvements and is a tool to help develop and validate optimization strategies.
- Shell/Python-script now does more sanity checks before removing and can be told to
re-compare files byte-by-byte before removing them (
-p
option when running the.sh
file). - Add a new
--hash
option so rmlint can be used as a very fast file hashing utility, egrmlint --hash
works likesha1sum
, orrmlint --hash -d md5
works likemd5sum
. Also does sha256, sha512, murmur{128}, spooky{32,64,128}, city{128}. --sort-by
learned new keys:l
(path length) andd
(path depth).- New
--unmatched-basename
option only finds twins with differing basenames. - Smaller performance and memory optimisations in shredder.
-g
now checks if there is already ash
andjson
formatter before it adds one.-PP
now defaults toxxhash
as hashing algorithm.-o / --output
learned to guess the formatter you want to use from the file ending. For example-o /tmp/test.json
will work like-o json:/tmp/test.json
.- JSON output contains
rmlint
version and revision now. --replay
learned to merge several json files.- Internal refactoring (credits go to Daniel) of the scheduler and hashing library. The duplicate finding process has be split in separate modules.
2.3.0 Ominous Oscar -- 2015-06-15
- Compiles on Mac OSX now. See also: #139
- Fix a crash that happened with
-e
. - Protect other lint than duplicates by
-k
or-K
. chown
in sh script fixed (waschmod
by accident).
--replay
: Re-output a previously written json file. Allow filtering by using all other standard options (like size or directory filtering).--sort-by
: Similar to-S
, but sorts groups of files. So showing the group with the biggest size sucker is as easy as-y s
.
-S
's long options is--rank-by
now (prior--sortcriteria
).-o
can guess the formatter from the filename if given.- Remove some optimisations that gave no visible effect.
- Simplified FIEMAP optimisation to reduce initial delay and reduce memory overhead
- Improved hashing strategy for large disks (do repeated smaller sweeps across the disk instead of incrementally hashing every file on the disk)
- Incorrect handling of -W, --no-with-color option
- Handling of $PKG_CONFIG in SConstruct
- Failure to build manpage
- Various BSD compatibility issues
- Nonstandard header sequence in modules using fts
- Removed some unnecessary warnings
2.2.0 Dreary Dropbear -- 2015-05-09
- Issue with excessive memory usage and processing delays with very large file counts (>5M files)
- Problems and crashes on 32bit with large files and normal files.
- Bug in memory manager for "paranoid" file comparison method which could lead to OOM error in some cases and infinite looping in others.
- Fixed bug which prevented option --max-paranoid-mem working.
- Note: much kudos to our user "vvs-" who provided many useful testcases and was prepared to re-run a 10-hour duplicate search after each effort to fix the underlying issues.
- Handling of json formatter on invalid utf8, which fixed
--cache
in return. - Bug during file traversal when encountering symlinks to empty folders
- More aggressive test suite, leading to higher coverage rates (90% of lines, almost 100% functions at least). Let's not speak of branch coverage for now. 😄
- A primitive benchmark suite.
- A GUI sketch that can be shipped along rmlint.
- Most internal filesystems like
proc
are ignored now. - Improved progressbar
- Memory footprint reduced to enable larger filesets to be processed. See discussion at #109. Improvements include a Pat(h)ricia-Trie used as data structure to efficiently map file paths with much less memory consumption. Also the file preprocessing strategy (eg to find path doubles) has been improved to avoid having several large hashtables active at the same time.
- Improved threading strategy which increases speed of duplicate matching. As before, the threading strategy uses just one thread per physical disk to enable fast reading without disk thrash. The improved algorithm now increases the number of cpu threads used to hash the data as it is read in. Also an improved mutex strategy reduces the wait time before the hash results can be processed. Note the new threading strategy is particularly effective on the "paranoid" (byte-by-byte) file comparison method (option -pp), which is now almost as fast as the default (SHA1 hash) method.
- The optimisation in 2.1.0 which detects existing reflinks has been reverted for now due to conflicts between shredder and treemerge.
2.1.0 Malnourished Molly -- beta-release 2015-04-13
- performance regression: When having many pairs of duplicates, the core got slower very fast due to linear lookups. Fixed.
- performance regression: No SSDs were detected due to two bugs.
- commandline aborts also on non-fatal option misuses.
- Some statistic counts were updated wrong sometimes.
- Fixes in treemerge to respect directories tagges as originals.
- Ignore "evil" fs types like bindfs, nullfs completely.
- Fix race in file tree traversal.
- Various smaller bugfixes.
--with-metadata-cache
makesrmlint
less memory hungry by storing its paths in a sqlite3 database and selecting them when needed.--without-fiemap
disables thefiemap
optimization when focus is on memory footprint.--perms
can check if a file should be readable/writable or executable.- Json output is enabled by default and is written to
rmlint.json
. --partial-hidden
does only see hidden files in duplicate directories.--cache/--write-unfinished
can be used to speedup re-runs drastically.- Checksums can be stored in the xattr of files with
--xattr-read/write/clear
. - New progressbar output inspired by
journalctl --verfiy
. - Better support for reflink-capable filesystems (e.g. btrfs):
- detect existing reflinks using
fiemap
data (significant speedup) - support replacing files by a reflink if the filesystem supports it.
- detect existing reflinks using
- Optional dependency for sqlite3 for
--with-metadata-cache
. --hardlinked
is enabled by default.- Support -n (dry-run) for rmlint.sh; require user input on ask.
- Default digest is now sha1 instead of spooky.
- updated
.pot
template with help strings. - updated german translation accordingly.
- -T supports arguments like df,dd properly now.
- New --help text that shows a short reference only.
- sahib made his 1000th commit on rmlint with this text and wonders where all the time has gone and why he isn't rich yet.
2.0.0 Personable Pidgeon -- 2014-01-23
Initial release of the rewrite.