I've just added a simple diff algorithm under Sasa.Linq. The signature is as follows:
/// <summary> /// Compute the set of differences between two sequences. /// </summary> /// <typeparam name="T">The type of sequence items.</typeparam> /// <param name="original">The original sequence.</param> /// <param name="updated">The updated sequence to compare to.</param> /// <returns> /// The smallest sequence of changes to transform /// <paramref name="original"/> into <paramref name="updated"/>. /// </returns> public static IEnumerable<Change<T>> Difference<T>( this IEnumerable<T> original, IEnumerable<T> updated); /// <summary> /// Compute the set of differences between two sequences. /// </summary> /// <typeparam name="T">The type of sequence items.</typeparam> /// <param name="original">The original sequence.</param> /// <param name="updated">The updated sequence to compare to.</param> /// <param name="eq">The equality comparer to use.</param> /// <returns>The smallest sequence of changes to transform /// <paramref name="original"/> into <paramref name="updated"/>. /// </returns> public static IEnumerable<Change<T>> Difference<T>( this IEnumerable<T> original, IEnumerable<T> updated, IEqualityComparer<T> eq);
The extension methods depend only on the following enum and struct:
/// <summary> /// Describes the type of change that was made. /// </summary> public enum ChangeType { /// <summary> /// An item was added at the given position. /// </summary> Add, /// <summary> /// An item was removed at the given position. /// </summary> Remove, } /// <summary> /// Describes a change to a collection. /// </summary> /// <typeparam name="T">The collection item type.</typeparam> public struct Change<T> { /// <summary> /// The change made at the given position. /// </summary> public ChangeType ChangeType { get; internal set; } /// <summary> /// The set of values added or removed from the given position. /// </summary> public IEnumerable<T> Values { get; internal set; } /// <summary> /// The position in the sequence where the change took place. /// </summary> public int Position { get; internal set; } }
This is a simple and general interface with which you can perform all sorts of computations on the differences between two sequences. The code as provided will work out of the box for any type T that implements equality. Some simple examples:
Console.WriteLine( "miller".Difference("myers").Format("\r\n") ); // prints out: // +1:y // -1:i,l,l // +6:s var original = new int[] { 2, 5, 99 }; var updated = new int[] { 2, 4, 4, 8 }; Console.WriteLine( original.Difference(updated).Format("\r\n") ); // prints out: // +1:4,4,8 // -1:5,99
"Format" is a simple extension method also under Sasa.Linq with generates a formatted string when given an IEnumerable.
At the moment, I simply implemented the naive algorithm that takes N*M space and time. I plan to eventually implement some linear space optimizations, as described in An O(ND) Difference Algorithm and Its Variations.
There are many applications for a general difference algorithm like this. Consider a reactive property of type IEnumerable
Comments