пятница, 21 ноября 2014 г.

Zipping Sequences in Clojure

It is no secret that most of our programs involve processing sequences and this is especially true about programs written in any flavor of LISP, including Clojure. Because sequences play such an important role we frequently want to combine them into something bigger. There are dozens of possible ways to get a collection from several collections, but I want to stop on zipping sequences together.

Zipping means taking two (or more) collections and transforming them into a collection of tuples (pairs in case of two collections), so that we get a tuple of first elements of every input sequence followed by a tuple of second elements and so on. In most programming languages that I am familiar with there is a function that performs the zip operation. Here is what it looks like in C# (Zip is an extension method that lives in the System.Linq namespace):

var a = new List<int> { 1, 2, 3 };
var b = new List<int> { 10, 20, 30 };
a.Zip(b, (x,y) => new Tuple<int, int>(x, y))

//=> [{Item1:1, Item2:10}, {Item1:2, Item2:20}, {Item1:3, Item2:30}]

For the LINQ's Zip we have to provide a couple of collections and a function that will produce an element of the result from the elements of the inputs - I have chosen to create a Tuple of two ints, but it is possible to do virtually anything here. Obviously, one can go further and zip 3 or more sequences through chained calls to Zip:

var c = new List<int> { 100, 200, 300 };
a.Zip(b, (x,y) => new Tuple<int, int>(x, y))
    .Zip(c, (xy,z) => new Tuple<int, int, int>(xy.Item1, xy.Item2, z));

//=> [{Item1:1, Item2:10, Item3:100}, {Item1:2, Item2:20, Item3:200}, {Item1:3, Item2:30, Item3:300}]

As you can see, in C# this can get quite verbose, but that's the price of control over the form of the resulting items. Python takes a different approach and simply assumes that when we are zipping something we want a list of tuples back:

zip([1, 2, 3], [10, 20, 30], [100, 200, 300])

#=> [(1, 10, 100), (2, 20, 200), (3, 30, 300)]

That's what Python is praised for - it's simple and succinct.

Now let's go and see what Clojure has to say on the matter. The first impression from peeking into the cheatsheet is that there isn't any analog for zip - the closest thing that one can find is the zipmap function. zipmap actually does something quite similar, but as its name suggests it produces a hashmap instead of a vector or list:

(zipmap [1 2 3] [10 20 30])

;=> {3 30, 2 20, 1 10}

Most importantly, this means that when using zipmap we have to assume that one of the sequences holds keys, which is not always possible. Additionally, hashmaps don't preserve the order of items essential in many cases. And to take it even further, zipmap doesn't play well with several collections - it takes only two arguments and it is not easy to combine the result with another sequence in a meaningful way. Does that mean that Clojure, which is so passionate about data structures and sequences in particular, can't zip two vectors? Absolutely not - it's just the case that the simplest way to do it is a bit obscure.

Clojure has a vector function that wraps whichever parameters you pass it into a vector. Basically, (vector 1 2 3) yields [1 2 3]. Another Clojure function that is relevant here is map. This one calls a given function with each element of the sequence passed to it and returns a collection of results. The cool part is that map can take any number of input collections extracting one argument for the function being mapped from each of the collections. Essentially, this means that map over several sequences zips them into one according to the provided function. Thus, if we want an analog of Python's zip it is as simple as this:

(map vector [1 2 3] [10 20 30] [100 200 300])

;=> ([1 10 100] [2 20 200] [3 30 300])

So the next time you need to zip several vectors or lists into a sequence of tuples of respective elements you know how to do it. One may argue that the Clojure's "version of zip" is less elegant than the Python's one and that's true to some extent. However, it is almost a rule that when something is very easy to do using the existing building blocks Clojure doesn't implement it for us. In this particular case I do like this approach because it allows me to think of zipping as a more general operation than just combining a number of collections into a single collection of tuples. In some sense it feels like Clojure's way to zip collections combines the elegance and simplicity of the Python's zip with the high level of control of the C#'s one. For example, if my goal is to add a couple vectors together, I could do something like this:

(defn addvec [a b]
    (map (partial reduce +)
        (map vector a b)))

(addvec [1 2 3] [10 20 30])

;=> (11 22 33)

But taking into account the aspects of map function that we have discussed, it is very easy to simplify this dramatically. Do we really need this intermediate representation of the result - the collection of tuples? No, we want only the sums of elements, so let's avoid verbosity and simply map + over both collections at once:

(defn addvec [a b]
    (map + a b))

Furthermore, map allows for arbitrary number of input collections, so why focus on the edge case of adding together two vectors? The only thing that we have to add into our mix to allow for arbitrary number of input vectors is the apply function:

(defn addvec [& vs]
    (apply map + vs))

(addvec [1 2 3] [10 20 30] [100 200 300])

;=> (111 222 333)

This ability to produce concise and generic solutions is maybe the most impressive feature of Clojure. Of course, you can use all these functions to achieve your own goals and even though they are likely much more ambitious than summing vectors I bet you will be amazed by the simplicity of the solutions.

Here are some links to the documentation:

1. vector
2. map
3. apply
4. zipmap
5. mapv (if you want the result as a vector instead of a lazy sequence)

And a couple of insightful Stack Overflow discussions, which helped me find the way to zip collections in Clojure:

1. One
2. Two
3. Three

Комментариев нет:

Отправить комментарий