четверг, 22 мая 2014 г.

You Can Learn While You Are Programming

When entering my sixth year at the university a couple years ago I faced a very severe problem. That was the time when new MOOCs started appearing at the speed of light: Udacity was dense with Sebastian Thrun, Peter Norvig and other awesome folks, while Coursera simply shipped dozens of new courses from major universities each month. The abundance of high quality learning materials as well as the freshness of the idea that you can freely learn whatever you want from the world’s best tutors caught me unarmed and I spent a lot of time just watching what I could learn in addition to actually putting effort into some of the courses. However, gradually a concern started building up in me: I felt like I put all my energy into learning instead of creating something new (this is obviously a fallacy, because I just wasted a great deal of my time on other useless stuff.) At that time I had a part-time job, which made me do stuff, but somehow I still thought that my time, which I could devote to producing new things, is wasted on merely listening to lecturers and doing quizzes and homeworks. Quite quickly, this idea led me to giving up all the MOOCs that I had engaged into and switching my efforts to the attempts to create something worthy for the awesome Windows 8 platform. I ended up hacking a brand new task manager app, which would have definitely become cool, should it ever enter the Windows Store. I really put a lot of time into this thing and pushed it quite far – at least that's what it looked like for me – but all that was not enough to make an actual product of it. Somehow, my attempts to become a producer instead of a student only taught me a great deal of things, but did not end with delivering goods.

These days things look quite different. First of all, I have a full-time job which eats a solid share of my time and energy, but also gives the feeling of involvement into creation of something large and cool in exchange. Besides, over the last months I have built a couple Windows Phone apps to fulfill my old desire to be a creator and deliver content to the public. I also engaged into other activities, which feed on my free time, so little of it is left to learning. However, as one might guess from my long post about MOOCs, this all led me to worrying about the fact that I don’t allocate enough resources to expanding my education, firming its basement and exploring new grounds. From time to time I literally feel the lack of inflow of fundamental knowledge to solidify my education. Not that it prevents me from working or makes my life unbearable, but once or twice a week I do stumble upon the thought that I should spend some time doing online courses, which could fill the gaps in my education and make me think on the problems different from those bothering me in my developer’s life. Looks familiar, huh?

This way I keep oscillating between the two goals – learning more and producing more – and constantly thinking that I don’t work hard enough towards one of them. Honestly speaking, even though this sometimes makes me nervous and angry with myself I am grateful to my subconscious for bothering me. When I lean too much on the learning side, my mind kindly reminds me that there is a lot that a programmer aged 24 can produce to his or her and others benefit in the world, where software brings new opportunities and solves real problems every day. On the other side, once I engage too eagerly into creating (mostly useless) stuff, I soon get reminded that to stay afloat in our rapidly changing industry one has to keep learning constantly and do this not only to be on the front edge of technologies and Javascript frameworks, but also to learn and remember how to learn. For me this latter activity involves exposing myself to some fundamental science and ideas, which can serve as a good basement for applied programming stuff that I need to deliver great software. The problem here is that our time and energy are limited even when we are still young and thus it may seem difficult to do well in both areas. Seriously, if you are a lucky prisoner of 5x9 job and try to, for example, create some lovely mobile apps during your free time, it is hard to stick something like a dense Quantum Physics course into the schedule even if you can attend it in your bathroom or elsewhere. I didn’t manage to do this with Quantum Physics, but I have positive experience with a couple other courses as well as a few tutorials and must admit that it is much easier to combine studying and producing than it might seem at the first glance.

First of all, to be able to both learn and create I absolutely had to set proper goals for a long period of time – at least for the year 2014. In terms of MOOCs I have quite a modest target of passing 5 courses by the end of December. I am not sure if that’s enough to satisfy my appetite for new knowledge, but at the same time I am confident that trying to do more won’t work out well in combination with other challenges that I have chosen to face. For now I have done a course and about a half of another one, which is a bit less than what a uniform schedule would require. Still, I believe I will be able to deliver on my goal, should I increase the pace just a little bit. On the other side, my targets for learning applied stuff like languages, libraries and frameworks are not that measurable and clear, but this is compensated by the way I approach them.

The thing that really helped me learn a lot and keep going with my applications at the same time is a routine that I have developed. On a typical workday before showing up in the office I would usually spend about an hour with my laptop in a coffee-shop and that’s the time that I can devote to doing tutorials on AngularJS or playing with Clojure macros. The trick here is that I know that I can spend my morning either learning some stuff this way or working on a future post for my blog. Since I have a pretty defined schedule for the blog, a lot of mornings are left for studying new tools and that’s just cool. This approach helps a lot because I always have some time to put into “applied education” and in addition to this I also make myself finally wake up and get ready for job before actually appearing at the office.

With MOOCs the story is a bit different: I tend to allocate time for watching lectures and doing related exercises on weekends - particularly because they usually require longer and more focused timespans than just doing a tutorial or hacking some code. The fact that weekends come only once a week encourages me to put some time into a course on Sunday, because I know that if I don’t do this, the next chance will be no sooner than in 5 days. On the other side, because with online courses I am not as rigorous as with morning tutorials and have much more options to start my Sunday, I perform a bit worse here.

So, over the last years I have finally learnt that despite having limited time we still can and should put effort into both creating and learning. Fortunately for the educational side of things, we typically have a lot of time during any particular day that we can’t devote to creative activities, and which can be spent exploring new fields. When one has these periods depends on the person and their daily schedule: for me that’s usually mornings when I am not yet ready to think really hard and produce something. For someone else the best time to study may be, for example, in the evening because one is too tired from job and the whole day of debugging SQL scripts. Identifying these ‘non-productive’ intervals in your schedule and making yourself routinely use them for learning can really help expand your education and develop yourself professionally. On the other side, I feel that the contrast between learning and producing that bothers me so much is to some extent made up. In the end, when striving to deliver something new one will inevitably have to study a lot – I definitely did learn much from my Windows Phone affairs.

So, how do you satisfy your natural need for learning new stuff and balance it with delivering awesome products? What are the routines that help you do both these things and leave some time for entertainment and personal life? Please tell me in the comments – knowing how other people organize their time and work towards their goals can really help others a lot!

пятница, 9 мая 2014 г.

How to Make an SQL from Clojure or My Macros Kata

Despite the fact that I started learning Clojure about 2 years ago, until recently I didn’t explore, maybe, the most fascinating part of the language – macros. I did try to compose simple macros once or twice, but generally they failed to grasp my attention and I never understood how they work and, more importantly, when should I use them. To fix this, last month I decided to implement something interesting with the help of macros and here I will show what I did and sum up what I learnt.

The example that I’ve taken might be not the most suitable for the goal, but it worked out quite well. The idea was to allow for writing SQL-like queries against the lists of Clojure dictionaries or records – basically anything that allows key-based access to elements. The value of such a thing is doubtful, but SQL is a well-known Domain Specific Language and the fact that macros are considered a great tool for creating new languages in Clojure made me chose it as an exercise. Besides, I remember that there was an exercise like this – similarly with SQL-like queries – in the Joy of Clojure book, which made me think that this task suits my learning goals well. However, I intentionally avoided peeking into the book to not spoil my exercise and, kind of, do everything myself without borrowing others’ ideas and solutions.

The results at which I arrived are quite fascinating: I am now able to write almost real SQL against lists of records. The key difference between my queries and SQL is the use of keywords as the names of fields to select and filter on. In addition to that, well, my selects come in parenthesis, because I am still writing LISP. Otherwise, everything looks very similar to the queries one would execute against a database and that feels pretty awesome. Now, let me share my excitement and show you the code.

In the first article we will begin with simply selecting some fields from the records and add where clause for filtering. In the second one I will show how we can enable joining tables together and ordering the results. So, what we want to achieve at the first step is to be able to write a query like this:

(def table [
  {:a 1 :b 100 :c "100" :d 4}
  {:a 2 :b 200 :c "200" :d 3}
  {:a 3 :b 300 :c "300" :d 2}
  {:a 4 :b 400 :c "400" :d 1}])

(select :a :b from table)

This is not a very difficult task, because all that we have to do is limit the set of key-value pairs in each dictionary in the resulting list to only the keys, which are specified in the query. On the other hand, we have this from symbol to treat, but that’s quite easy as well.

(defmacro select [& what]
  (let [fields (set 
          (take-while #(not= 'from %) what))
      source (fnext 
          (drop-while #(not= 'from %) what))]
      `(map (fn [record#]
            (into {} (filter #(~fields (first %)) record#)))
          ~source)))

It turns out that similarly to the functions I can’t easily write a macro, which solves precisely one problem, so several things are going on in select. First, we have to get hold of a set of keys denoting fields to select. All of them come before the from word and for now we can assume that everything that goes before it is a field name. This said, we just take elements of the argument list until we meet from, push them into a set and bind it to the fields symbol. In a very similar fashion we extract the table, from which we want to select records – I expect that it is found right next to from, so I skip everything until from as well as from itself and grab the first element of the remaining sequence. Having fields and source, we can actually define what our macro will return to the caller. We simply map a function that filters each record according to the presence of the key in the fields set and composes a hashmap from the resulting key-value pairs  onto the “table”. Although this doesn’t feel like the most efficient and nice solution, I’m okay with it for now.

Because we are writing a macro we want to return something that will be evaluated upon execution of the program – that’s why we syntax-quote our map call with the `. Syntax-quoting prevents the form from being evaluated, but unlike simple quote (‘) it allows us to avoid unquoting map, fn and all the other stuff that is used inside it. One of the problems with quoting surfaces when we want to use a binding inside the body of a macro – in our case with fn. The idea is that binding requires a symbol and we need to introduce it somehow. There are several alternatives, but the simplest one is to use the gensym thing through suffixing a symbol with # character. When encountered with it inside a syntax-quoted form, Clojure will generate a non-qualified symbol, which we can then use. Finally, in select we need to unquote fields and source so that proper stuff will be inserted in the respective places. If you forget to unquote these symbols, syntax reader will try to resolve them to fully-qualified ones. This may result either in a failure in case you don’t have such a symbol outside the macro, or in a subtle failure – if you have one. The latter situation might lead to great confusion: the code that used to work will at some moment get broken because its surroundings have changed. Fortunately, in most cases such an error will surface quite soon if not instantly. Having done all these things right, we can check whether they work and, fortunately, they do:

(select :a :b from table)
;=> ({:a 1, :b 100} {:a 2, :b 200} {:a 3, :b 300} {:a 4, :b 400})

Let us try to understand what did macro allow us to do here that a function wouldn’t permit. Most notable thing is our use of the from symbol. Despite the fact that it is not bound to anything we can still utilize it in the macro – as a separator. A function wouldn’t allow that because it would try to evaluate all the arguments passed into it and fail with from. Macros, on the other side, receive their arguments in unevaluated form, which allows us to play with them in different and strange ways. Note that we could achieve the same behavior with a function if we used, for example, a keyword :from instead of the pure symbol, so there is actually little value in using macros like this. However, knowing that Clojure allows us to write down some utterance and can live with it is pleasant and inspiring.

Now, when we can select stuff, let us introduce filtering. This task is less trivial not only due to more complex constructs needed for conditions, but also because they will come in infix notation, which is natural for SQL, but hardly a common thing in LISPy Clojure. Even though this may sound difficult, it boils down mainly to tossing the list of arguments and just requires covering a couple of cases and careful use of recursion.

To make filtering possible we will define a macro that creates a Clojure-evaluatable condition from an SQL-like one. First, we will try to cover the most simple case – something like select :a :b from table where :a < :d. Our new macro – let’s call it condition – will get the :a < :d part of the above query from the select and return something like (< (:a table) (:d table)) – the piece, which Clojure will easily consume. The first version is going to be ve-e-ery simple:

(defmacro condition [t k1 op k2]
  `(~op (~k1 ~t) (~k2 ~t)))

The only thing that we do here is changing infix notation into prefix one and accessing the values from a record with the keys specified in the condition. We can use macroexpand to check that condition easily treats simple cases:

(macroexpand '(condition table :a < :d))
;=> (< (:a table) (:d table))

(macroexpand '(condition table :b = :c))
;=> (= (:b table) (:c table))

The fact that the macro fits easily into one line makes me ask myself whether there is any reason to writing it. However, instead of trying to find an answer I will simply fix the cause and… make it longer. One thing that condition misses is the ability to consume literals. For example, it is pretty natural to write a filter like this: where :a > 2 and we definitely should support this. We will do it the simple way – if we face a keyword in a condition, we will use it to get the corresponding value from the record and inject it in the resulting code. Otherwise, we won’t do any transformations to the value and will include it into the result as is:

(defmacro condition [t k1 op k2]
  (let [v1 (if (keyword? k1) `(~k1 ~t) k1)
      v2 (if (keyword? k2) `(~k2 ~t) k2)]
  `(~op ~v1 ~v2)))

Now condition treats both keys, which denote fields of the records, and mere numbers or strings equally well allowing us to filter both based on relationships between fields and on their comparison with some external values. What makes me feel particularly great about the new version of the macro is that there are no checks for keywords in the code that it outputs – it determines whether it looks at a keyword in the let form, which is not a part of the resulting code. This makes the output of the macro shorter and a bit more efficient. To see it we can do macroexpand as usual:

(macroexpand '(condition table :d not= :a))
;=> (not= (:d table) (:a table))
; no checks for keyword, table accessed in both cases

(macroexpand '(condition table :a < 3))
;=> (< (:a table) 3)
; no checks for keyword, and table accessed only once
; while for 3 we have just 3

Our condition macro can’t handle neither complex nor long conditions so far. We will make it almost universal a little bit later, but now let’s take a step back and assemble condition and select together. When faced with this task first I did introduce another level of indirection in the form of additional helper function, which would wrap the condition. However, it turns out that one can do this with much less clutter. After some experiments, I ended up using a lambda expression in the body of select:

(defmacro select [& what]
  (let [fields (set 
          (take-while #(not= 'from %) what))
      source (fnext 
          (drop-while #(not= 'from %) what))
      conditions (next (drop-while #(not= 'where %) what))]
      `(map (fn [record#]
            (into {} (filter #(~fields (first %)) record#)))
          (filter #(condition % ~@conditions) ~source)))) 

In the snippet above we have added another binding to take hold of the conditions passed into select. We assume that they go after the where word and thus extract them from the arguments list similarly to the fields keys and the table. Then, in the body of the macro we introduce filtering, which is performed with a lambda making use of our condition macro and the list of conditions in infix notation. The most surprising part is that this thing actually works:

(select :a :b from table where :a < :d)
;=> ({:a 1, :b 100} {:a 2, :b 200})
(select :d :c from table where :d >= 3)
;=> ({:c "100", :d 4} {:c "200", :d 3})
(select :a :b :c :d from table where :a = :b)
;=> ()

That’s a wonderful achievement, we can write SQL queries in Clojure! Well, almost. For now, we miss a couple things including joins, ordering and more complex conditions. The latter seem to be the most natural thing to address next, so let us enable queries like (select :a :b from table where :a <= 2 or :c = “400”). At the current stage trying to execute something like this will lead us to a “Wrong number of arguments passed to the condition macro” error, which is a not so often case when the error message is actually helpful. We do not in fact allow conditions composed of more than three elements and that’s what we will fix right now.

The most trivial way to allow for the longer conditions like :a <= 2 or :c = “400” is to introduce another version of condition macro, which would take 8 arguments: [t k1 op1 k2 link k3 op2 k4]. Here k3 and k4 are either field keys or values (:c and “400” in the above example) similar to k1 and k2, op2 is a comparison operator (=) like the op1 and link is a Boolean function (or), which glues two parts together. Now, 8 arguments is quite too much even if we try to bring SQL into Clojure. If you don’t think so, it is not difficult to come up with a three-part condition which would require 12 arguments with even more stupid names. Fortunately, we don’t have to write all these things out because the similarity between k3 op2 k4 and k1 op1 k2 suggests that we can generalize the implementation to allow arbitrary long conditions with quite a short and concise piece of code:

(defmacro condition 
  ([t k1 op k2]
    (let [v1 (if (keyword? k1) `(~k1 ~t) k1)
        v2 (if (keyword? k2) `(~k2 ~t) k2)]
    `(~op ~v1 ~v2)))
  ([t k1 op1 k2 link & other]
    `(~link
      (condition ~t ~k1 ~op1 ~k2)
      (condition ~t ~@other))))

First of all, the version with 4 arguments remains intact meaning that we don’t break the existing code, which is great. Another good thing to note, is that we reuse it inside the version with the longer argument list – there is no need to write the checks for keywords and move arguments once more, given that we have written this code before. Moreover, this new version of condition might reuse itself in case the other part of arguments list includes more than three elements. This basically means that in addition to the original goal we just allowed writing conditions like :a = 1 or :a = 2 or :a = 3 or :a = 5 or :a = 6 or :a = 7 at no cost. Recursion is a great thing by itself, but it is even greater when combined with argument lists of arbitrary length. Especially so when it works:

(select :a :b from table where :a <= 2 or :c = “400”)
;=> ({:a 1, :b 100} {:a 2, :b 200} {:a 4, :b 400})

(select :a :b from table where :a = 1 or :a = 2 or :a = 4 or :a = 5 or :a = 6 or :a = 7)
;=> ({:a 1, :b 100} {:a 2, :b 200} {:a 4, :b 400})

This all looks incredible, but there is another crucial thing that we miss – conditions can be complex, which means that we might want to include prioritization with parenthesis into them. This could be a difficult and messy thing to implement should we write in something that is not LISP. However, with Clojure macros everything wrapped in parenthesis is a list and can be treated as a list. The incredible power behind this idea becomes clear once we see how little code we have to add to the existing condition macro to  dramatically increase allowed complexity of conditions:

(defmacro condition
  ([t complex]
    `(condition ~t ~@complex))
  ([t k1 op k2]
    (let [v1 (if (keyword? k1) `(~k1 ~t) k1)
        v2 (if (keyword? k2) `(~k2 ~t) k2)]
    `(~op ~v1 ~v2)))
  ([t k1 op1 k2 link & other]
    `(~link
      (condition ~t ~k1 ~op1 ~k2)
      (condition ~t ~@other))))

As you can see, the only new thing in the macro is another override accepting two arguments: the usual record t and a complex condition. Having taken the complex part it basically unwraps it and calls itself with the extracted elements. As a positive side effect, this little addition drops the optional parenthesis in case someone decides to surround the entire restrictions list with them.

(select :a :b from table where :a = 2 or (:d < 3 and :b > 300))
;=> ({:a 2, :b 200} {:a 4, :b 400})

(select :a :b from table where (:a <= 2 or :c = “400”))
;=> ({:a 1, :b 100} {:a 2, :b 200} {:a 4, :b 400})

Once again, recursion and pattern matching do a lot of hard work for us. However, even though this feels like magic, it isn’t and there is an input that we should definitely handle, but don’t treat well so far. That’s the first of the above examples but with reversed order of restrictions: (select :a :b from table where (:d < 3 and :b > 300) or :a = 2) yields an error. The problem is that we allow parenthesis only in the tail position of the conditions list. On the other side, should we do TDD, tests would tell us that we have already broken the select macro – it doesn’t work without filtering anymore, because no matter if there is the where word and something after it in the arguments list, we always try to construct a filter. Since condition macro doesn’t work with a single argument, record, it rightfully fails. Here is the final version of the macro, which addresses both issues:

(defmacro condition
  ([t]
    `true)
  ([t complex]
    `(condition ~t ~@complex))
  ([t k1 op k2]
    (if (seq? k1)
      `(~op
        (condition ~t ~k1)
        (condition ~t ~k2))
      (let [v1 (if (keyword? k1) `(~k1 ~t) k1)
          v2 (if (keyword? k2) `(~k2 ~t) k2)]
      `(~op ~v1 ~v2))))
  ([t k1 op1 k2 link & other]
    (if (seq? k1)
      `(~op1 
        (condition ~t ~k1)
        (condition ~t ~k2 ~link ~@other))
      `(~link
        (condition ~t ~k1 ~op1 ~k2)
        (condition ~t ~@other)))))

To handle queries without filtering we simply introduce another overload of condition for empty restriction list that returns quoted truth (this even sounds cool!). When we invoke it from select, true ends up in the body of the filter function making it allow every record that it sees. As for making parenthesis work in any position, to allow this we have to do a little bit more work. The idea is that when introducing prioritization we made our older assumptions about what is a comparable value (k1, k2, etc.), what is a comparison operator (op, op1 and op2) and what is a Boolean link between several restrictions fallible. However, to fix this we can just check whether we look at something enclosed in parenthesis and handle the thing accordingly if we do. That’s what the new seq? parts are for. Even though this makes the macro grow in size significantly, I believe that’s a fair price for almost universally working solution:

(select :a :b from table where (:d < 3 and :b > 300) or :a = 2)
;=> ({:a 2, :b 200} {:a 4, :b 400})

(select :c from table)
;=> ({:c "100"} {:c "200"} {:c "300"} {:c "400"})

Now, that we achieved an important milestone of selecting stuff from lists of records or dictionaries and filtering it freely, let me sum up what I learnt from the exercise:
  1. Macros really make one think in a different way than when composing mere functions. Even though it is hard to communicate this feeling, writing code that produces code is whole another sensation than writing code that produces data.
  2. On the other side, Clojure does its best to make code as similar to data as possible. Storing programs in lists is cool, because here and there this saves you from extra difficulties.
  3. Arguments passed to a Clojure function are evaluated first; those passed to a macro are not. This makes a huge difference.  
  4. You can’t apply a macro, but you can splice-unquote arguments for it. 
  5. In a macro you can have code that it will return – usually quoted – and code that determines what will be returned and does not show up in the macro’s result. 
  6. Doing katas is worth the effort even if you don’t feel like it helps you understand anything. If something feels too much for you just force yourself into it and logic will gradually surface. To some extent learning is like violence. 
  7. If you don’t refactor, you might be missing a chance to understand both your code and your tools.
  8. Explaining what you have done in a sum-up blog entry might feel stupid, but if you are not doing this you are maybe missing another chance to understand both your code and your tools.

For now I didn’t cover a couple of things, which one expects from SQL. Most important are joins and ordering – these I will implement in the next article. Besides, there are a lot of minor features like, for example, selecting all the fields from a table with a star (*). This particular one is extremely easy to introduce, so I won’t show it here – you can take a look at the code on GitHub if you want.

Note that the above GitHub repository differs in some ways from the code shown in this post. If you want the latter, just grab this single file – zero dependencies is great.

I have also discussed my affair with macros in the recent episode of our Code Speak Loop podcast with my co-host Michael Pankov.