Skip to main content

Easy File System Path Manipulation in C#

I came across this type-safe module for handling file paths in the Haskell subreddit this week, and thought it looked kind of neat. Handling paths as strings, even with System.IO.Path always bugged me.

So I created a close C# equivalent of the Haskell type and added it to the Sasa library, to be available in the upcoming v0.9.3 release:
public struct FsPath : IEnumerable<string>, IEnumerable
{
public FsPath(IEnumerable<string> parts);
public FsPath(string path);

public static FsPath operator /(FsPath path1, FsPath path2);
public static FsPath operator /(FsPath path, IEnumerable<string> parts);
public static FsPath operator /(FsPath path, string part);
public static FsPath operator /(FsPath path, string[] parts);
public static FsPath operator /(IEnumerable<string> parts, FsPath path);
public static FsPath operator /(string part, FsPath path);
public static FsPath operator /(string[] parts, FsPath path);
public static implicit operator FsPath(string path);

public FsPath Combine(FsPath file);
public IEnumerator<string> GetEnumerator();
public static FsPath Root(string root);
public override string ToString();
}

The above is just the interface, minus the comments which you can see at the above svn link. This struct tracks path components for you and C#'s operator overloading lets you specify paths declaratively without worrying about combining paths with proper separator characters, etc.:
FsPath root = "/foo/bar";
var baz = root / "blam" / "baz";
var etc = FsPath.Root("/etc/");
var passwd = etc / "passwd";

The library will generate path strings using the platform's directory separator.

One significant departure from the Haskell library is the lack of phantom types used to distinguish the various combinations of relative/absolute and file/directory paths. C# can express these same constraints, but I intentionally left them out for two reasons:
  1. The type safety provided by the file/directory phantom type is rather limited, since it's only an alleged file/directory path; you have to consult the file system to determine whether the alleged type is actually true. The only minor advantage to specifying this in a path type, is as a form of documentation to users of your library that you expect a file path in a particular method, and not a directory path. I could be convinced to add it for this reason.

  2. The relative/absolute phantom type seems a bit useless to me, though the reason may be less obvious to others. IMO, the set of file and directory info classes should not have GetParent() or support ".." directory navigation operations, nor should they support static privilege escalation operations like File.GetAbsolutePath() which ambiently converts a string into an authority on a file/directory, since this inhibits subsystem isolation; without GetParent() or privilege escalation functions, any subsystem you hand a directory object is automatically chroot jailed to operate only in that directory. This has long been known to the capability-security community, and is how they structure all of their IO libraries (see Joe-E and the E programming language). Thus, in a capability-secure file system library, all paths are relative paths interpreted relative to a given directory object. As a result, FsPath also strips all "." and resolve all ".." to within the provided path to supply this isolation.

It goes without saying that .NET does not currently handle files and directories this way, but I plan to handle this eventually as well. FsPath will play a significant role in ensuring that paths cannot escape the chroot jail.

As an interim step along that path, the FsPath interface is a good first step.

[Edit: the reddit thread for this post has some good discussion.]

Comments

Popular posts from this blog

Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading , this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important. Why Reverse Mode Automatic Differentiation? As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary! Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type: R → R N That is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario: R N → R That is, functions of many parameters t...

Blue-eyed Islander Puzzle - an analysis

Many people find themselves stumped by the so-called Blue-Eyed Islanders puzzle . There is also much controversy over its supposed solution. I'm going to analyze the problem and the solution, and in the process, explain why the solution works. To begin, let's modify the problem slightly and say that there's only 1 blue-eyed islander. When the foreigner makes his pronouncement, the blue-eyed islander looks around and sees no other blue eyes, and being logical, correctly deduces that his own eyes must be blue in order for the foreigner's statement to make sense. The lone blue-eyed islander thus commits suicide the following day at noon. Now comes the tricky part, and the source of much confusion. Let's say there are 2 blue-eyed islanders, Mort and Bob. When the foreigner makes his pronouncement, Mort and Bob look around and see only each other. Mort and Bob thus both temporarily assume that the other will commit suicide the following day at noon. Imagine their chagrin...

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...