This page looks best with JavaScript enabled

Find Duplicate Files - A Comparison of fdupes and fslint

 ·  ☕ 3 min read

Background

This article compares fdupes and fslint in regards to finding duplicate files. The test consists of running the application on the same machine against the same set of files to determine which app finds the best set of duplicates and the time it takes for each app to do this task.

I have put together a simple script that performs the following for the test:

  • show how large the directory being checked is
  • run the deduplication test (just a check, no action)
  • count the number of files vs number of dups found
  • calculate how long the run takes and report it as “Total Runtime” in hh:mm:ss format

Details

Both tools are adequate in checking for duplicate files. fslint’s speed and GUI inclusion will probably put it atop the list for most users; as for me, I will use both for a while…

  • fslint’s command line interface is lacking in documentation
  • fslint has a few additional comparisons options (partial md5 and sha1)
  • fslint is quicker
  • fdupes has byte-by-byte comparison

fdupes

websitehttp://code.google.com/p/fdupes/
featureshttp://linux.die.net/man/1/fdupes
Summary:
  • command line interface (cli)
  • compare by file size
  • compare by hardlinks
  • compare by MD5 signatures
  • compare by byte-by-byte
install`sudo apt-get install fdupes` #takes roughly ~49.2kB
Test Script
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
dupdir=/media/share/archive
du -hs ${dupdir}
startdt=`date +%s`
fdupes --recurse ${dupdir} > fdups.out
enddt=`date +%s`
((diff_sec=enddt-startdt))
runtime=(`echo - | awk '{printf "  %d:%d:%d","'"$diff_sec"'"/(60*60),"'"$diff_sec"'"%(60*60)/60,"'"$diff_sec"'"%60}'`)
echo "Total Runtime: ${runtime}"
cntOfDupes=`grep ${dupdir} fdups.out | wc -l`
((cntOfFD=`find ${dupdir} | wc -l`-1))   #subtract 1 as this counts the current dir
echo "Count of Duplicates: ${cntOfDupes}"
echo "Count of Files/Directories: ${cntOfFD}"
Test Results

The results of the above script:

  • Size: 43G /media/share/archive
  • Total Runtime: 0:21:15
  • Count of Duplicates: 3521
  • Count of Files/Directories: 16507
Analysis of Results

I’ve used fdupes for a few years and like the cli interface (full disclosure). It took just over 21 minutes and found 3,521 duplicate files. A lot of the files are license.txt type files which are bundled in software that are the same file. I could delete all but one version and symlink to one but this is more effort than its worth (at the risk of screwing something up).

fslint

websitehttp://www.pixelbeat.org/fslint/
featureshttp://fslint.googlecode.com/svn/trunk/doc/FAQ
Summary:
  • command line interface (cli)
  • graphical user interface (gui)
  • compare by file size
  • compare by hardlinks
  • compare by md5 (first 4k of a file)
  • compare by md5 (entire file)
  • compare by sha1 (entire file)
install`sudo apt-get install fslint` #takes roughly ~868kB
Test Script
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
PATH=$PATH:/usr/share/fslint/fslint #note: findup (cli form of fslint) is not on path by default so add it.
dupdir=/media/share/archive
du -hs ${dupdir}
startdt=`date +%s`
findup ${dupdir} > fslint.out
enddt=`date +%s`
((diff_sec=enddt-startdt))
runtime=(`echo - | awk '{printf "  %d:%d:%d","'"$diff_sec"'"/(60*60),"'"$diff_sec"'"%(60*60)/60,"'"$diff_sec"'"%60}'`)
echo "Total Runtime: ${runtime}"
cntOfDupes=`wc -l fslint.out`
((cntOfFD=`find ${dupdir} | wc -l`-1))   #subtract 1 as this counts the current dir
echo "Count of Duplicates: ${cntOfDupes}"
echo "Count of Files/Directories: ${cntOfFD}"
Test Results
  • Size: 43G /media/share/archive
  • Total Runtime: 0:21:15
  • Count of Duplicates: 3521
  • Count of Files/Directories: 16507
Analysis of Results

fslint is a newcomer to me. To be honest, I was leery simply because it is GUI focused rather than CLI. I know the GUI is a simple wrapper for the CLI BUT the CLI is not well documented. After using fslint I quickly found that it is faster and the GUI can be beneficial in some circumstances (say sorting through 100 dups quickly and easily).

Share on

drad
WRITTEN BY
drad
Sr. Consultant