Skip to main content

Delete Duplicate Files From the Command-line with .NET

Having run into a scenario where I had directories with many duplicate files, I just hacked up a simple command-line solution based on crypto signatures. It's the same idea used in source control systems like Git and Mercurial, basically the SHA-1 hash of a file's contents.

Sample usage:

DupDel.exe [target-directory]

The utility will recursively analyze any sub-directories under the target directory and build an index of all files based on their content. Once complete, duplicates are processed in an interactive manner where the user is presented with a choice of which duplicate to keep

Keep which of the following duplicates:
1. \Some foo.txt
2. \bar\some other foo.doc
>

The types of files under the target directory are not important, so you can pass in directories to documents, music files, pictures, etc. My computer churned through 30 GB of data in about 5 minutes, so it's reasonably fast.

Comments

Clinty Clint said…
The link to DelDup is 404ing! :)
Sandro Magi said…
Thanks! Turns out it was actually DupDel.zip.

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen

Simple, Extensible IoC in C#

I just committed the core of a simple dependency injection container to a standalone assembly, Sasa.IoC . The interface is pretty straightforward: public static class Dependency { // static, type-indexed operations public static T Resolve<T>(); public static void Register<T>(Func<T> create) public static void Register<TInterface, TRegistrant>() where TRegistrant : TInterface, new() // dynamic, runtime type operations public static object Resolve(Type registrant); public static void Register(Type publicInterface, Type registrant, params Type[] dependencies) } If you were ever curious about IoC, the Dependency class is only about 100 lines of code. You can even skip the dynamic operations and it's only ~50 lines of code. The dynamic operations then just use reflection to invoke the typed operations. Dependency uses static generic fields, so resolution is pretty much just a field access + invoking a