Background
This article compares fdupes and fslint in regards to finding duplicate files. The test consists of running the application on the same machine against the same set of files to determine which app finds the best set of duplicates and the time it takes for each app to do this task.
I have put together a simple script that performs the following for the test:
- show how large the directory being checked is
- run the deduplication test (just a check, no action)
- count the number of files vs number of dups found
- calculate how long the run takes and report it as “Total Runtime” in hh:mm:ss format
Details
Both tools are adequate in checking for duplicate files. fslint’s speed and GUI inclusion will probably put it atop the list for most users; as for me, I will use both for a while…
- fslint’s command line interface is lacking in documentation
- fslint has a few additional comparisons options (partial md5 and sha1)
- fslint is quicker
- fdupes has byte-by-byte comparison
fdupes
website | http://code.google.com/p/fdupes/ |
---|---|
features | http://linux.die.net/man/1/fdupes Summary:
|
install | `sudo apt-get install fdupes` #takes roughly ~49.2kB |
Test Script
|
|
Test Results
The results of the above script:
- Size: 43G /media/share/archive
- Total Runtime: 0:21:15
- Count of Duplicates: 3521
- Count of Files/Directories: 16507
Analysis of Results
I’ve used fdupes for a few years and like the cli interface (full disclosure). It took just over 21 minutes and found 3,521 duplicate files. A lot of the files are license.txt type files which are bundled in software that are the same file. I could delete all but one version and symlink to one but this is more effort than its worth (at the risk of screwing something up).
fslint
website | http://www.pixelbeat.org/fslint/ |
---|---|
features | http://fslint.googlecode.com/svn/trunk/doc/FAQ Summary:
|
install | `sudo apt-get install fslint` #takes roughly ~868kB |
Test Script
|
|
Test Results
- Size: 43G /media/share/archive
- Total Runtime: 0:21:15
- Count of Duplicates: 3521
- Count of Files/Directories: 16507
Analysis of Results
fslint is a newcomer to me. To be honest, I was leery simply because it is GUI focused rather than CLI. I know the GUI is a simple wrapper for the CLI BUT the CLI is not well documented. After using fslint I quickly found that it is faster and the GUI can be beneficial in some circumstances (say sorting through 100 dups quickly and easily).