Programming Languages and the Path to Enlightenment

Thursday April 23, 2015

Post a comment

This post is adapted from a workshop I put together to introduce beginning and intermediate Computer Science students to programming language theory.

Why Languages Matter

There are an awful lot of programming languages. Even if we chop off the long tail and consider only "popular" languages, there still seem to be quite a few competing for programmers' attention. According to the TIOBE Index, more than twenty different languages get at least 1% each of all language-related search queries. Although Java and C are by far the most searched-for languages, together they still account for less than a third of programmers' mental market share.

When thinking about languages, the question inevitably becomes: "which one should I use?" There are lots of easy answers to this question. If you're writing web apps, you're going to need some flavor of JavaScript. If you need to have something small done by yesterday, just choose something you've used before. If you're raising venture capital, use Node.js because it's Hip and Webscale.

However, sometimes there isn't an easy answer. When starting a new project, unencumbered by legacy code or platform limitations, how can you be smart about your language choice? There's probably no Best Language, or else everyone would be using it. Even very specific categories offer enough popular options to incite perpetual flamewars (check out Holy War: PythonVsRuby if you're looking for a dynamic, high-level, garbage-collected, object-oriented, interpreted language). There are lots of reasons to choose a language: some good, like library support; some bad, like syntax; and some ugly, like stereotypes and joke punchlines. The most interesting (and useful) "good" reason I've run into, though, comes from Guy Steele's description of "growable" languages, which I found on the Recurse Center (then Hacker School) blog

Language Growth

Steele's paper is great and I highly recommend reading the whole thing, but the tl;dr is that language designers can't include every feature everyone needs, but giving programmers the tools to implement what they need as first-class language features is close enough. Any Turing complete language will let users implement whatever they want (more or less) somehow, but that bold bit is important. The rest of this post is my attempt to explain the meaning and importance of first-class language growth.

In some most languages, there is a distinction between features built into the language (what I'm calling first-class functionality) and features added by users of the language (libraries). Take Java as an example. Java is a Mostly Object-Oriented language, but it has primitive "raw" types that aren't objects, like int and boolean, used to improve performance and confuse newbies. From the official Oracle documentation, "a primitive type is predefined by the language and is named by a reserved keyword." There are eight of them. If you want nine, or zero, tough luck.

Some Numbers Are More Equal Than Others

One of the easiest-to-understand privileges given to Java primitives but not objects is use of the arithmetic operators +1,-,*,/, and %. Many programs do a lot of arithmetic, so it's very handy to be able to say int x = y + z instead of add(y, z) or x.add(y, z). Custom numeric types, on the other hand, must use the "named method" form. This works just fine, but it's more verbose and feels a little icky. There are lots of reasons to use custom numeric types: for example, using custom types for unit conversions is a great way to prevent bugs. Trying to divide Hours by Miles to get MPH will trigger a compiler error, preventing issues down the line. Here's a Java program from the wokshop that takes this approach:

jump to end

jump to start

This program accomplishes the task, but mph.mul(h) feels clunky. We want our Hour and Mile types to feel like numbers, but without operators that's just not possible in Java.

No New Numbers

In languages like C++ that support custom operators, we can get further. Here's what the same program looks like in C++:

jump to end

jump to start

We can do mph * h now. Woohoo! That said, there's more to truly growable languages than operator overloading! Our types still clearly aren't proper numbers. Many existing functions that operate on numbers expect their inputs to look a certain way. For example, the standard math library2 offers three flavors of absolute value: abs(double x), abs(float x), abs(long double x). All three of these are built-in primitive types, just like in Java. To use this function on a Miles object, we'd need to do something like Miles(abs(miles.value)). Again, this works, it just doesn't feel right. We know that Miles and Hours are numbers, so why doesn't the compiler know it too?

If It Quacks Like A Number

Another factor of language growability is the level of abstraction built into and expected by the language: for example, whether standard libraries typically operate on interfaces or typeclasses3, or on concrete types. An interface is a description of behavior. For example, the Num typeclass in Haskell guarantees the existence of +, *, etc, but not what they do. A concrete type is a description of identity, like float or Bignum. As discussed above, nearly all math libraries in C++ expect numbers to look a certain way, usually based on the primitive int and float types. Custom numeric types aren't compatible with these libraries without a bunch "unwrap value, apply function, wrap up value again" boilerplate. In contrast, many numeric libraries in Haskell operate on the Num typeclass; if we make our new type an instance of this typeclass then we can take advantage of these existing libraries for free. Here's that same program one last time, in Haskell. The code is heavily commented, since Haskell is less familiar and intuitive than Java or C++ for many programmers, myself included.

jump to end

jump to start

Notice that this time around, our new numeric types can be used with some builtin math libraries, like the sum function that operates on a list of Nums. Neat!

Wrapping Up

One of Steele's central points in Growing a Language is that as a user of a programming language, you will inevitably need a feature that the language creators didn't or couldn't implement. At this point you have two choices: do something else, or implement it yourself. Often enough, doing something else is the right choice. Sometimes there's a good reason that a feature doesn't exist--maybe there's a better approach, or maybe you're just trying to do something stupid and the language wants to save you from yourself. The rest of the time, though, having the option to implement missing language features or even missing language syntax4 is a huge boon to developer productivity and happiness.

[1] The + operator can also be used for String concatenation, a great example of language implementors breaking their own rules for convenience while preventing users from doing the same.

[2] A templatized version for integral types was added in C++11. We're learning!

[3] There are important differences between interfaces (à la Java) and typeclasses (à la Haskell), but they're thoroughly out of scope here. Check out this paper by Lämmel and Ostermann if you're interested.

[4] The best example of languages with growable syntax is the Lisp family, which I'm not really qualified to write about. Paul Graham's What Made Lisp Different is a good place to start.

Post a comment


Wednesday April 22, 2015! I wrote a thing! I'm going to post it! Woohoo!


Sunday June 23, 2013 not that day.

I have a blog!

Saturday June 22, 2013

Yep, this is a blog alright.

Maybe someday I'll put something on it.