A Gentle Introduction to Functional Programming with Scala
Startups must be nimble and change direction on a dime when circumstances demand it. About five months ago we abandoned most of our back-end code written in Java and Groovy, and re-implemented it in Scala. Why? Let me tell you.
Java was a breakthrough 15 years ago, with its clean object-oriented structure, cross-platform portability, garbage collection that releases programmers from the shackles of malloc() and free(), and reasonably good performance for a non-native language. However, it often requires excessively verbose code due to its rigid syntax and boilerplate conventions (like getters and setters). It’s all too easy to lose the concepts you’re really trying to express amidst all that noise, and I’m all too familiar with that sinking feeling of impending tedium when I realize that I have to write a bunch more boilerplate code to solve a seemingly simple problem (or even worse have to change a bunch of boilerplate code when I realize I’ve made a mistake).
Groovy improved on Java in several ways while retaining compatibility with industry-standard JVMs. It has a more dynamic and expressive syntax, and it introduces some functional programming concepts like functions as first-class objects. In case you’re not familiar with the latter, it allow you to pass a function around like any other Object so that it can be executed at some other time in some other context. Sure you can do this in Java with something like the Strategy pattern, but at the cost of more boilerplate code. Also, Groovy is the core language for the Grails web framework, which greatly simplified our early efforts to prototype our SAAS application.
However, we outgrew this technology choice as we fleshed out more of our platform. Based on some informal performance tests we did, Groovy code executes 4-40x slower than Java. We’re certainly not the first to make this observation (notice that this link shows Scala performance on par with Java). Even worse, the dynamic typing that contributes to a pleasant Groovy programming experience, also translates into a nightmare for maintaining and testing a large application. There are a lot of bugs that can slip past its lax compiler and not reveal themselves until their code is executed, and this is unacceptable for a large, complex application.
Enter Scala, another 100% JVM-compatible language. Iceman has been advocating for Scala ever since… well his interview with us. And frankly it’s the best language I’ve learned yet. It has an expressive syntax like Groovy, but it’s backed up by a strong, static typing system that won’t let you contradict yourself later on. For example, when you assign var foo = 5, the smart Scala compiler infers that foo is of type Int (like a cross between Java’s int and Integer, without the autoboxing overhead). If you later try to reassign foo = “some string”, the compiler will throw a type mismatch error.
But there’s so much more to Scala than this toy example, far more than I can cover in a short article. It’s core advantage is its seamless blending of object-oriented and functional programming concepts. I assume that everyone is familiar with object-oriented programming, which has been the bread-and-butter of professional programming for decades. However, functional programming has always been somewhat esoteric, especially for those of us with non-traditional CS backgrounds. My eyes began to open when I heard an impassioned pitch for functional programming concepts at the 2008 International Conference on Autonomic Computing, but I didn’t really do much with it for four years except make a mental note that “I should learn more about that sometime”.
In a nutshell, functional programming is about corralling your code that does stuff to a small number of well-defined places. Examples include hitting the database, logging, and communicating with the AdWords and Bing APIs. The majority of your code should be pure functions, doing nothing more than taking input parameters and spitting out results. They don’t touch the input parameters (immutable values and collections enforce this), they don’t keep state, and they don’t even so much as write “Boo!” in the log (although they can return an intention for some other code to write “Boo!” in the log).
There are several advantages to this. One is that a function will always return the same result for the same input parameters. That makes it super easy to test and debug. Another is that it’s easy to parallelize; no shared state means no need for complex locking schemes and the risk of deadlocking and resource contention. You can save those mental pretzel-makers for the small amount of code that actually does stuff, and spend the rest of your time focusing on the algorithms for how to calculate what needs to be done. These algorithms are probably your real value-add anyway.
A great capsule summary I read about object-oriented versus functional programming is that object-oriented is focused on the nouns, whereas functional is focused on the verbs. Obviously both are important in a large, complex project, and which one you want to highlight depends on the problem you’re trying to solve. When describing the plethora of objects in the AdWords API and how we handle each of them in our campaign management code, I took an object-oriented approach. Nouns, nouns, nouns. There were a lot of nouns. By contrast, when I ported the code for our automated keyword grouping algorithm to Scala, I favored the functional paradigm. It was mostly about the verbs for identifying similar keywords.
Given that functional programming is all about the verbs, I’d like to wrap up with a few of my favorite Scala verbs. These are defined on all of the Scala Collections classes, from Traversable (like Iterable, but restricted to “internal iteration”) all the way down the inheritance hierarchy. Without further ado:
filter: Takes a function that processes each element of the collection and returns true or false. The result is the subset of elements for which the input function is true. Simple and sweet! There’s also afilterNotas a convenient complement.partition: What if you want to keep both classes of elements (true and false)?partitionreturns a pair of collections: the first is everyone for which the input function returns true, and the second contains the remaining falsies.groupBy: Taking another step up the abstraction hierarchy, this accepts a function that processes each element of the collection and returns a value of some arbitrary type (you get to pick the type, so the compiler can yell at you later if you contradict yourself). What’s done with this return value? It becomes a key for a Map thatgroupByreturns, and its corresponding value is the subset of elements for which the input function returned that key. Think ofgroupByas a n-waypartition.flatten: Let’s say you want to go the opposite direction: from the Map back to the original collection. For starters, you could callfooMap.map(_._2). Obvious, huh? (just kidding) Themapfunction shouldn’t be confused with the Map collection. It’s more like the map in map-reduce, but not exactly the same. In this example, it takes a list of key-value pairs from the Map, and returns just the values (position 2 of the key-value 2-tuple). But we still have a problem. The result is a collection of collections, such as aList[List[Element]]. To get it back to the way it was, we callflattenon it, which frees the Elements from the second layer of Lists, and returns a simpleList[Element]. Theflattenand closely relatedflatMapfunctions are also the foundation for the powerful and advanced magic of monads. Sehr kuhl!foldLeftandfoldRight: These are the swiss army knives of Scala Collections functions. You can basically do whatever you want with these functions. They’re a bit like the reduce part of map-reduce (and Scala Collections has closely relatedreduceLeftandreduceRightfunctions). The fold functions take two parameters: an initial starting value (zero, an empty list, a Siamese cat, whatever you like), and an input function. The input function in turn takes an element from the collection and an object of the same type as your starting value, and it returns another object of that same type. The first time it’s called, it receives the first element from the collection and the initial starting value itself. Where it goes from there is up to the input function. Most problems can be solved more simply with one of the previous functions, but when you’ve got a tough nut to crack,foldLeftandfoldRightare a tough pair of nutcrackers. Use them wisely.
Hopefully this article lived up to its title and provided you with a gentle introduction to Scala and its application of functional programming concepts. Please let me know what you think and if there are any aspects of Scala or functional programming that I could drill into more deeply with my next article.
