Skip to main content

Sasa.Strings - General String Extensions

This is the seventh post in my ongoing series covering the abstractions in Sasa. Previous posts:

The Sasa.Strings static class provides a number of useful extension methods on System.String, including slicing, simple tokenizing, conversions and formatting. It is available in the core Sasa.dll.

Sasa.Strings.Format

Sasa.Strings.Format are extension methods that wrap System.String.Format:

Console.WriteLine("i = {0}".Format(1234567));

string invariant = "{0:#,##0.00}".Format(CultureInfo.InvariantCulture,
                                         10000);
Console.WriteLine(invariant);

string de = "{0:#,##0.00}".Format(CultureInfo.GetCultureInfo("de-de"),
                                  10000);
Console.WriteLine(de);
// output:
// i = 1234567
// 10,000.00
// 10.000,00

Sasa.Strings.FromBase64

Sasa.Strings.FromBase64 is a convenient extension method for converting a Base64 encoded string into an ordinary Unicode string, with an optional encoding parameter used to interpret the data:

string encoded = "foo bar".ToBase64(); // utf-8
Console.WriteLine(encoded);
string decoded = encoded.FromBase64(encoded);
Console.WriteLine(decoded);
Console.WriteLine(decoded.ToBase64("utf-16"));
// output:
// Zm9vIGJhcg==
// foo bar
// ZgBvAG8AIABiAGEAcgA=

Sasa.Strings.HardWrapAt

Sasa.Strings.HardWrapAt is a set of extension methods that insert new lines directly at the index specified in a function parameter, without regard to words or other considerations, and returns an enumerable sequence of lines:

string text = "Lorem ipsum dolor sit amet, consectetur";
foreach (var line in text.HardWrapAt(8))
{
    Console.WriteLine(line);
}
// output:
// Lorem ip
// sum dolo
// r sit am
// et, cons
// ectetur

Sasa.Strings.IfNullOrempty

Sasa.Strings.IfNullOrEmpty are a set of overloaded extension methods that allow you to return a string or other type of input if the input string is null or empty:

string[] list = new string[] { null, "", "foo!" };
foreach(var x in list)
{
    Console.WriteLine(x.IfNullOrEmpty("null or empty!"));
}
// output:
// null or empty!
// null or empty!
// foo!

The overload with the generic type parameter simply calls ToString() on its argument. It was added simply to avoid the overhead of calling ToString on the second argument before knowing whether the first string was empty:

string[] list = new string[] { null, "", "foo!" };
foreach(var x in list)
{
    Console.WriteLine(x.IfNullOrEmpty(3.4));
}
// output:
// 3.4
// 3.4
// foo!

Sasa.Strings.IsNullOrEmpty

Sasa.Strings.IsNullOrEmpty is a convenient extension method that simply calls string.IsNullOrEmpty:

string[] list = new string[] { null, "", "foo!" };
foreach(var x in list)
{
    if (!x.IsNullOrEmpty())
        Console.WriteLine(x);
}
// output:
// foo!

Sasa.Strings.Lines

Sasa.Strings.Lines is an extension method that splits a string into a sequence of lines:

var lines = "foo\r\n \t bar\r\n\r\nend".Lines();
foreach (var x in lines)
{
    Console.WriteLine(x);
}
// output:
// foo
//    bar
// end

Sasa.Strings.Slice

Sasa.Strings.Slice is an extension method that complements string.Substring. Substring takes an index and the number of characters to extract, where Slice takes two indices:

string large = "This is a sample sentence.";
string firstWord = large.Slice(0, 3);
Console.WriteLine(firstWord);
// output:
// This

Sasa.Strings.SliceEqual

Sasa.Strings.SliceEquals takes an input string, an index, and a substring and checks whether the substring is in the input string at the given index:

string[] list = new string[] { "Flub", "Foo", "Baz_Foob", "Bar Foo!" };
foreach(var x in list)
{
    Console.WriteLine(x.SliceEquals(4, "Foo"));
}
// output:
// false
// false
// true
// true

Sasa.Strings.Split

Sasa.Strings.Split is a set of extension method wrappers for System.String.Split that accept StringSplitOptions:

string var csv = "foo,comma,bar,,baz";
foreach (var x in csv.Split(StringSplitOptions.RemoveEmptyEntries, ','))
{
    Console.WriteLine(x);
}
// output:
// foo
// comma
// bar
// baz

The overloads take a variable length list of either characters or strings to use for splitting, basically reversing the order in the base class libraries for convenience.

Sasa.Strings.ToBase64

Sasa.Strings.ToBase64 is a convenient extension method for converting a Unicode string into a Base64 encoded string, using an optional text encoding parameter. If no encoding provided, UTF-8 is the default:

string encoded = "foo bar".ToBase64(); // utf-8
Console.WriteLine(encoded);
string decoded = encoded.FromBase64(encoded);
Console.WriteLine(decoded);
Console.WriteLine(decoded.ToBase64("utf-16"));
// output:
// Zm9vIGJhcg==
// foo bar
// ZgBvAG8AIABiAGEAcgA=

Sasa.Strings.Tokenize

Sasa.Strings.Tokenize is an extension method that searches an input string for a list of string tokens, and returns a sequence of token positions found in the string:

var operators = "4 + 5 - 6^2 == x".Tokenize("+", "-", "^", "==");
foreach (var x in operators)
{
    Console.WriteLine("Found {0} at index {1}", x.Tok, x.Index);
}
// output:
// Found + at index 2
// Found - at index 6
// Found ^ at index 9
// Found == at index 12

Sasa.Strings.Words

Sasa.Strings.Words is an extension method that splits a string along whitespace boundaries:

string words = "foo bar\r\n\tbaz\rfoobar\nbarbaz".Words();
foreach (var x in words)
{
    Console.WriteLine(x);
}
// output:
// foo
// bar
// baz
// foobar
// barbaz

Sasa.Strings.WordWrapAt

Sasa.Strings.HardWrapAt is an extension method that inserts newlines at the whitespace boundary nearest but less than the provided column parameter. In other words, contrary to Strings.HardWrapAt which inserts the newline at the specified index, this method searches back for the nearest whitespace boundary less than the column boundary:

string text = "Lorem ipsum dolor sit amet, consectetur";
foreach (var line in text.WordWrapAt(8))
{
    Console.WriteLine(line);
}
// output:
// Lorem
//  ipsum
//  dolor
//  sit
//  amet,
//  consectetur

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C!Features:It's 100% portable C.It requires very little state (2 bytes).It's not dependent on an OS.It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved.#include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground breaking. I've mad…

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string".This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API:It does not support disjunctions, ie. "OR" clauses.It does not support filtering using inequalities on more than one property.It does not support a not-equal operation.So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR").Implementations of this…

Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading, this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important.Why Reverse Mode Automatic Differentiation?As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary!Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type:RRNThat is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario:RN → RThat is, functions of many parameters that return a single real …