Having run into a scenario where I had directories with many duplicate files, I just hacked up a simple command-line solution based on crypto signatures. It's the same idea used in source control systems like Git and Mercurial, basically the SHA-1 hash of a file's contents.
Sample usage:
DupDel.exe [target-directory]
The utility will recursively analyze any sub-directories under the target directory and build an index of all files based on their content. Once complete, duplicates are processed in an interactive manner where the user is presented with a choice of which duplicate to keep
Keep which of the following duplicates: 1. \Some foo.txt 2. \bar\some other foo.doc >
The types of files under the target directory are not important, so you can pass in directories to documents, music files, pictures, etc. My computer churned through 30 GB of data in about 5 minutes, so it's reasonably fast.
Comments