High-Level Overview

Uniform Resource Identifiers (URIs) are the familiar strings that we use to navigate to websites, to create links that send emails or make phonecalls, or even to connect to a database. However, unlike other strings that seem like they should follow a certain format, but don't (like names, addresses, and even phone numbers), URI strings are rigorously controlled and must adhere to RFC3986 to be considered valid.

For this reason, even though URIs are most commonly transmitted between processes or machines as strings, for example when you type an address into a browser's navigation bar or read a database connection string from an environment variable, it is advisable to convert any URI you have to deal with into structured data at the "edge" of your code, and to work with it as structured data within your code. URIs should only be converted back to strings when they again reach the edge of your code and are being transmitted to a receiver that expects a string, when they are being displayed in a UI that expects a string for display, or when they are being stored on a medium that cannot efficiently serialize and restore structured data (e.g. a database with a "string" column for URIs).

This is not a pattern that you need to apply if your application handles URIs only incidentally (i.e. as part of a payload being transmitted between two points, but not being manipulated or introspected), or if it never modifies, constructs, or destructures URIs. However, if you do any sort of manipulation with URIs, doing so with traditional string manipulation tools (such as regular expressions) is bound to lead to trouble.

Candidate Solutions

There are a number of libraries available in Clojure to work with URIs as structured data. Since Clojure is a hosted language, there is always the option of using a library from the underlying platform to manage URIs, such as JavaScript's URL or Java's URI. There are, however, a number of Clojure specific options:

  • exploding-fish: A Clojure-only library that can handle accessing parts of URIs, updating URIs, normalization, and resolution
  • com.cemerick/url: A now-archived library that works across Clojure and ClojureScript that allows access to parts of URIs
  • Urly: A tiny, Clojure-only library that provides a protocol that allows accessing parts of URIs and can work with strings, java.net.URIs, java.net.URLs, and its own representation of URIs
  • uri: A Clojure-only wrapper around java.net.URI that enables construction of URIs and conversion to maps of their parts
  • lambdaisland/uri: A pure Clojure and ClojureScript library that represents URIs as records and can join URIs as well as construct/deconstruct and manipulate them

The Gaiwan Recommendation

Perhaps unsurprisingly, we recommend lambdaisland/uri. This is our recommendation not only because it is the library that we developed and maintain, but also because it is not reliant on an underlying platform dependency for its functionality. This may seem like a trivial concern, since URI libraries for the platforms Clojure runs on seem so stable, but the same could have been said about Java's Date and Time classes in 2013, before they were completely re-implemented in Java 8. Having complete control over how a URI's components are represented and manipulated within the library also ensures consistency across platforms.

Caveats and Potential Pit-Falls

Perhaps the most important caveat to be aware of when working with URIs is that, since they are a data-structure that is most often represented and transmitted as strings, they have their own encoding of string data known as percent-encoding. It is important to be aware of when components of a URI need to be string encoded and when they do not. For example, by not percent-encoding a path element of a URI that contains a "/", you may inadvertently create an additional path element when you did not intend to.