On Sortieing to CLojure-land

Raju Gandhi
  • July 2013
  • Clojure
  • Java

Clojure 1.2 (and partly 1.3) introduced protocols and records. These new constructs give us the ability to define not only new “types” in Clojure but also contracts, much like Java classes and interfaces. In this article we will take a look at what these new constructs mean and how to use them. We will also attempt to find out the reasoning behind these new constructs and how they differ from earlier constructs such as proxy and gen-class.

Venturing out

Prior to 1.2 Clojure had a lot of support for Java inter-op. We had syntactic sugar to invoke Java constructors and gen-class to define new types. Unfortunately, all of these implicitly tied Clojure to its host runtime (in this case, the JVM), not to mention that we were forced to deal with Java’s semantics (for example, code reuse via inheritance). In 1.2 Clojure introduced protocols and records (along with types) that allowed us to not only define types for our Clojure applications (if we so desired), but also did it in a way that was close to the “Clojure way”. In order to get the most out of them, we need to understand not only how they work, but also the design decisions that Clojure made to implement these, and in some cases, the performance semantics of their implementations. So, are you ready? Let us get typing! (pun intended)

reify

Every once in awhile we need to implement a Java interface within our Clojure code. This is where reify comes into play. reify creates an actual Java instance, albeit anonymous, so you can think of it as a way to create anonymous inner classes as you would in Java. The syntax for reify is as follows: it takes the fully-qualified name of the interface we need to implement followed by zero or more method implementations, much like those we would have used for a multi-arity function definition in Clojure. There is one thing to look out for: the method definitions need to declare one additional parameter beyond the arity listed in the interface’s definition. This parameter needs to be the first parameter in the parameter list, often called this, and it is exactly that - upon invocation of a “reified” method, Clojure will supply the actual instance as the first argument, followed by the usual arguments that the method expects. This is shown in Listing GAN-1. This is often useful when you need to refer to the actual owning instance (perhaps to call another method defined on that interface)
[I am using the #_ reader macro here which causes Clojure to skip over the following form so this is valid Clojure with the exception that it is not really useful.]

Listing GAN-1: Using +reify+

(reify java.io.FileFilter
  (accept [this path]
    (#_(do something here))))

Let us consider the example of filtering out hidden files and directories within a particular directory. We start with a java.io.File object to represent the directory in question, and then use the listFiles method to get back an array of java.io.File objects. Except in this case we will use the overloaded listFiles method that accepts a java.io.FileFilter. Since java.io.FileFilter is an interface we can use reify to give us a concrete implementation inline. Look at Listing GAN-2 to see how we can do this.

Listing GAN-2: Filtering out files using a java.io.FileFilter

(let [hidden? (memfn isHidden)
  visible? (complement hidden?)
  file? (memfn isFile)
  file-filter (reify java.io.FileFilter
                (accept [_ path]
                  (and (file? path) (visible? path))))]
  (.listFiles (java.io.File. ".") file-filter))

Let us first understand what memfn does. memfn stands for “member function”. In Listing GAN-2 we would have ideally liked to do something like (complement isHidden). But this would not work! The reason is that isHidden is not a Clojure function (which is what complement expects). Rather, isHidden is a method on java.io.File. memfn allows us to elevate a second-class Java construct to being first-class Clojure citizen. Essentially memfn simply wraps the provided symbol in a Clojure function with one parameter, and simply invokes the memfned method on the supplied argument (in this case that would be a java.io.File).

With that out of the way, note that the first argument of accept is being referred to as _. This is an idiomatic convention to highlight that we are not going to be using a particular argument. In Listing GAN-2 that unused argument is the automagically supplied this. However, even though we are not going to use it, we’re certainly not allowed to drop it from the parameter list! Also, the anonymous instance created by reify is lexically scoped. That is, it can see any local bindings within scope (hidden? for example).

reify can be also be used to create one-off instances of Clojure protocols and override methods on java.lang.Object. The one restriction to reify is that it cannot be used to subclass a Java abstract or concrete class. In order to do that we will need to use proxy or gen-class. Both proxy and gen-class allow extending a base class in Java, with the difference being that proxy creates an anonymous instance (much like reify) while gen-class creates a named type. Off the cuff it sounds like proxy (and subsequently gen-class) do exactly the same thing as reify. So then why did Clojure introduce reify? The issue here is semantics. reify represents pure Clojure semantics, while proxy and gen-class expose Java semantics. Ideally when in Clojure you want to remain true to the abstractions that are Clojure’s. The following is Rich Hickey’s stance on reify

Prefer reify to proxy unless some interop API forces you to use proxy. You shouldn’t be creating things in Clojure that would require you to use proxy.

— Rich Hickey

Records (and types)

Records (and types) are Clojure’s answer to Java types, in that they are Java classes. Before we go any further, let us quickly take a look at how to declare (and use) records and types in Listing GAN-3.

Listing GAN-3: Definition and usage

(defrecord PersonRecord [name])
        
(deftype PersonType [name])
        
(let [record (PersonRecord. "Rich")]
  (println (str "Using map member syntax: " (:name record)))
  (println (str "Using Java member syntax: " (.name record))))
        
(let [type (PersonType. "Stuart")]
  (println (str "Only Java member syntax allowed: " (.name type))))
        
;; Output
;; Using map member syntax: Rich
;; Using Java member syntax: Rich
        
;; Only Java member syntax allowed: Stuart

Both PersonRecord and PersonType have similar definitions with the exception of the macro name invoked. The list of supplied parameters become immutable fields that can be accessed using the dot notation as we would for any Java type. Construction is also similar to that for Java types. We should note that for records we can access the fields of a record as we would the keys of a map. The field lookups (for both records and types) are extremely performant, akin to Java member lookup. Outside of that, records and types look pretty similar. So what gives?

Well this is pretty much where the similarity ends, for under the covers, records and types are very different beasts that are designed for very different purposes. Let’s look at records first. Records automatically implement the clojure.lang.IPersistentMap interface, which allows them to act as regular Clojure maps. This also means that the entire associative API is available to us when working with records (such as assoc, dissoc etc). Let us take a look at Listing GAN-4 to see some of this in action.

Listing GAN-4: Records as maps

(defrecord PersonRecord [name])
        
(let [record (PersonRecord. "Rich")
    associated-record (assoc record :last-name "Hickey")
    dissociated-record (dissoc record :name)]
  (println (str "Last name is " (:last-name associated-record)))
  (println (str "associated-record is : " associated-record))
  (println (str "dissociated-record is : " dissociated-record)))
        
;; Output
;; Last name is Hickey
;; associated-record is : user.PersonRecord@e8606646
;; dissociated-record is : {}

We create a record using PersonRecord and then associate with it the key last-name. We can retrieve the last-name like we would name, but with a subtle difference. Earlier we saw that a member lookup on a record is extremely performant since it is a member lookup on a Java class. This does not extend to associated keys. One way to think of this is that the “baked” keys (those that are declared as part of the defrecord definition) and those that we tack on later behave differently. Furthermore, notice that associated-record is a Clojure record. On the flip side, when we dissociated name from record to create dissociated-record it no longer remains a record, rather it’s just a regular Clojure map. Now, if we were to associate a new key into a record, and then later dissociate it, the record remains a record. Essentially, anytime an operation affects the "baked-in" keys of a record, Clojure will convert it into a regular map.

When we define a record, Clojure gives us two "constructor" (or factory) functions to instantiate new records. Let us see how these work in Listing GAN-5.

Listing GAN-5: Record factories

(defrecord PersonRecord [name])
        
(let [using-> (->PersonRecord "Rich")
    using-map-> (map->PersonRecord {:name "Stuart"})]
  (println (str "using-> " (:name using->)))
  (println (str "using-map-> " (.name using-map->))))
        

\->TypeName (or in our case \->PersonRecord) accepts arguments in the same order as defined in the defrecord declaration. If we wanted to be explicit in our argument naming, or prefer to supply the arguments in a different order, then map\->TypeName (in our case map\->PersonRecord) accepts a map, with the keys being the same as the parameter names in the defrecord declaration.

Types on the other hand are exactly like Java types. We cannot treat them as maps, but they do provide one advantage over records — they allow for mutable fields! But before we get too further into types, let us explore the reasoning behind having both records and types. In Java we have one type abstraction — Class. In the real world we use the same construct to define both infrastructural concerns such as StringUtils as well as domain constructs such as User. The problem is that we have the exact same abstraction for both! The API defined on the “class” determines how a client would interface with it.

Clojure takes a different approach. Clojure asks that we model our domain and application-level concerns as records, and use types for everything else. By leveraging records (or maps for data modeling, the client need not cater to a specific API (as would have been the case with our Java User example). Rather the client has a standard interface to work with - the associative abstraction! With powerful constructs like assoc and merge, along with higher order functions like filter, we have pretty much everything we need to sift, sort and munge data to our liking. Furthermore, if were to start modeling our data using plain old Clojure maps, and then decided we needed type information, upgrading to a record means very little change to the code. [5]

Perhaps now we might be convinced that we may not employ types as much as we do records, so we are going to defer going into too many details of types for a later article.

Protocols

Protocols are to Clojure what interfaces are to Java. Protocols are mere contracts and offer no implementation. When we define a protocol in Clojure (using the defprotocol macro), Clojure creates a matching Java interface whose methods signatures match those defined in the protocol (with the exception that the first parameter that needs to be supplied to the protocol method signature designates this, much like reify). The fact that a protocol translates to a Java interface makes it easy to reach into your Clojure abstractions from the Java side of the equation. All this (again) begs the question, why introduce protocols? Well, for one, protocols reflect true Clojure semantics. Protocols are first-class Clojure citizens and are not affiliated with Java (or any of the possible Clojure host platforms). Let us explore protocols a little bit more and perhaps we will be able to spot a few other differences.

Protocols are declared using the defprotocol macro, as shown in Listing GAN-6.

Listing GAN-6: Declaring protocols

(defprotocol Show
  (pretty-print [this]))
        
(defprotocol Identify
  (id [this]))

We declare two protocols — Show and Identify — both of which happen to have one method. As we discussed earlier, the first argument is special and reserved for Clojure to pass the actual instance upon which the method dispatched. Clojure uses single type-based dispatch for protocols. This means that it will use the first argument to figure out which implementation to invoke.

How do we use protocols? Well, we have two macros: extend-protocol and extend-type. Both of these macros work with types (that is, those defined by defrecord or deftype). Let us start with extend-type in Listing GAN-7.

Listing GAN-7: Extending types

(defprotocol Show
  (pretty-print [this]))
        
(defprotocol Identify
  (id [this]))
        
(defrecord PersonRecord
  [name])
        
(extend-type PersonRecord
  Show
  (pretty-print [this] (str "My name is " (:name this)))
  Identify
  (id [this] (.hashCode (:name this))))
        
        
(let [rich (PersonRecord. "Rich")]
  (println (pretty-print rich))
  (println (id rich)))
        
;; Output
;; My name is Rich
;; 2546940

extend-type is useful when we want to extend a particular type (in this case PersonRecord) to multiple protocols. For our fellow Rubyists out there, we should note that this is not the same as monkey-patching! For one, notice how we invoke the protocol methods — we invoke them as we would any other ordinary function declared in the namespace. There is no code injection or byte-code mangling going on here. Otherwise we would have invoked the protocol method as we would a Java instance method. Rather, protocol methods become just regular functions in the current namespace (which is a good reason not to have two protocols with the same method name and signature in the same namespace). Upon invocation, Clojure simply looks at the type supplied as the first argument to the protocol method and dispatches to that implementation.

Just for grins, try Listing GAN-8 in your REPL.

Listing GAN-8: Inspecting an extended type

(defprotocol Show
  (pretty-print [this]))
        
(defprotocol Identify
  (id [this]))
        
(defrecord PersonRecord
  [name])
        
(extend-type PersonRecord
  Show
  (pretty-print [this] (str "My name is " (:name this)))
  Identify
  (id [this] (.hashCode (:name this))))
        
;; filter out any method names that match
;; the protocol method names
(filter
#(re-find #"^pretty-print$|^id$" %)
(map #(.getName %) (.getMethods PersonRecord)))
        
;; Output
;; ()

Here we are getting all the methods on the PersonRecord type, and maping getName on each one. We then filter out all the method names that do not match pretty-print or id using re-find. re-find stands for "regex find", and it returns a match if and only if the string matches the supplied regex pattern, otherwise it returns nil, which filter subsequently discards. In essense, we only list out any and all methods that match the supplied names in the regular expression.

The filtered list is empty! This tells us that there are no methods on the PersonRecord that have the same names as the ones defined by either of the protocols that we extended to it. See? Told ya!

Another way to extend a protocol is to use extend-protocol. extend-protocol has almost the same syntax as extend-type, except the type name and protocol names are reversed. While extend-type is used when you want to extend a single type to multiple protocols, extend-protocol is used when you need to extend a single protocol to several types. Everything else remains the same. Let us take a look at Listing GAN-9.

Listing GAN-9: Extending protocols

(defprotocol Show
  (pretty-print [this]))
        
(defrecord PersonRecord
  [name])
        
(deftype PersonType
  [name])
        
(extend-protocol Show
  PersonRecord
  (pretty-print [this] (str "I am a record: " (:name this)))
  PersonType
  (pretty-print [this] (str "I am a type: " (.name this))))
        
        
(let [rich (PersonRecord. "Rich")
    stuart (PersonType. "Stuart")]
  (println (pretty-print rich))
  (println (pretty-print stuart)))
        
;; Output
;; I am a record: Rich
;; I am a type: Stuart

Here we extend the Show protocol to the PersonRecord and the PersonType. We provide implementations to satisfy the protocol in each case, and we are off to the races. Another point to note is that we don’t have to implement all of the methods defined in the protocol for a particular type. If an unimplemented function is invoked then Clojure will throw an exception.

Let us take a step back and consider the implications of what we have seen so far. In our examples so far, we declare a protocol, and then extend it to an existing type. This is very different from Java interfaces, which have to be implemented when defining a type, and thus are statically baked into the type’s definition. This implies that we could take any existing type in Clojure and extend a protocol to it to match a particular construct. Let us see if we can extend a protocol to a java.lang.String in Listing GAN-10.

Listing GAN-10: Extending protocols to java.lang.String

(defprotocol Identify
  (id [this]))
        
(extend-type String
  Identify
  (id [this] (.hashCode this)))
        
(id "Rich")
        
;; Output
;; 2546940

Now, our Strings are identifiable! We must bear in mind that we are not monkey-patching String. Furthermore, since id is just a regular name-spaced function (referred to as user/id), there are no chances of this colliding with another developer’s id function! Protocols allow us to wire up abstractions in ways that the original authors of a type could not have conceived. So the next time we hear our colleagues complain about the “expression problem” [6] we can share a knowing smile (and perhaps if we like them enough we might even introduce them to Clojure protocols).

Another way to extend a protocol is to reify it. This, as we have seen earlier, gives us the ability to define the implementation inline, as well as an anonymous instance. Let us quickly see this in action in Listing GAN-11.

Listing GAN-11: reifying protocols

(defprotocol Show
  (pretty-print [this]))
        
(let [show (reify Show
    (pretty-print [_] "I am anonymous"))]
  (pretty-print show))
        
;; Output
;; "I am anonymous"

Both extend-protocol and extend-type rely on a function named extend to do the heavy-lifting. We can use extend directly as shown in Listing GAN-12.

Listing GAN-12: Using extend

(defprotocol Show
  (pretty-print [this]))
        
(defprotocol Identify
  (id [this]))
        
(defrecord PersonRecord
  [name])
        
(extend PersonRecord
  Show
  {:pretty-print #(str "My name is " (:name %))}
  Identify
  {:id #(.hashCode (:name %))})
        
(let [rich (PersonRecord. "Rich")]
  (println (pretty-print rich))
  (println (id rich)))
        
;; Output
;; My name is Rich
;; 2546940

extend works similarly to extend-type, in that the first argument is a type, followed by the protocols we want to extend. Except in this case where the implementations are maps, with the keys being the function name followed by an anonymous function implementation.

We might wonder — what is the point? After all extend-type does this with perhaps a nicer format. The takeaway here is that the protocol implementations are mere maps! This implies that we can construct these maps using the powerful associative API available to us! We could construct a map of function definitions elsewhere, and simple append or assoc it to implement a protocol for a particular type. Check it out in Listing GAN-13.

Listing GAN-13: Using extend with the associative API

(defprotocol Show
  (pretty-print [this]))
        
(defprotocol Identify
  (id [this]))
        
(defrecord PersonRecord
  [name])
        
(def mixin
  {:pretty-print #(str "I am mixed in. Name is: " (:name %))})
        
(extend PersonRecord
  Show
  (merge {:pretty-print #(str "My name is " (:name %))} mixin)
  Identify
  {:id #(.hashCode (:name %))})
        
(let [rich (PersonRecord. "Rich")]
  (println (pretty-print rich))
  (println (id rich)))
        
;; Output
;; I am mixed in. Name is: Rich
;; 2546940

Note that we start with one implementation of pretty-print but on a merge with mixin we actually replace the implementation with a new one. This implies that we can introduce, or even change how a protocol gets extended for a type dynamically! Start with a default implementation if we have one, swap it out for another as your logic dictates.

There is one more, and final way to implement a protocol and that is inlining it. Let us glance at Listing GAN-14 and see how this works.

Listing GAN-14: Inline protocols

(defprotocol Show
  (pretty-print [this]))
        
(defprotocol Identify
  (id [this]))
        
(defrecord PersonRecord
    [name]
  Show
  (pretty-print [this] (str "My name is " (:name this)))
  Identify
  (id [this] (.hashCode (:name this))))
        
(let [rich (PersonRecord. "Rich")]
  (println (pretty-print rich))
  (println (id rich)))
        
;; Output
;; My name is Rich
;; 2546940	
        

In Listing GAN-14 we use the usual defrecord but then we follow it up with the protocols we want to extend (and their corresponding implementations). We know that when we define a new type in Clojure, either by using defrecord or deftype, we get an actual Java class backing it. We also know that defprotocol creates a Java interface. When we use the inline method to extend a protocol, Clojure combines the two — that is, it creates a new Java type that actually implements the necessary interfaces, because it has everything it needs to do so. If we were to compile Listing GAN-14 and run it through a Java decompiler we would see the class definition in Listing GAN-15.

Listing GAN-15: Class backing an inline protocol extension

public final class PersonRecord
  implements Identify, Show,
             IRecord, IHashEq,
             IObj, ILookup,
             IKeywordLookup,
             IPersistentMap,
             Map, Serializable
{ 
  // ...
}

Note that the PersonRecord class implements both Show and Identify. This means that dispatch happens super fast since the protocol implementation is baked into the type definition. Of course this comes with a penalty. For one, we cannot extend two protocols (inline) that happen to have similar method signatures. Two, we lose the flexibility we saw earlier in Listing GAN-13 that allowed us to defer implementations to runtime. With that said, this is probably the best approach to use when we know exactly what our type’s contract is to be up-front.

I will end with yet another nugget of wisdom from Rich Hickey:

Prefer using protocols to specify your abstractions, vs interfaces.

— Rich Hickey

Conclusion

Clojure offers a whole slew of semantics for defining types and contracts. Furthermore, with the introduction of protocols and records, we now have true first class citizens in Clojure-land, abstracting us further and further away from Java-land. Clojure allows us to incrementally model our applications, starting with maps, and graduating to records and protocols as we deem necessary.

Share