Thursday, September 4, 2008

Working towards a Vertical Language

As part of our concept work here at Jodoro, we are working on a computing language, which we internally call Teme.

Anecdotally, there appear to be a lot of languages. The memory is probably a bit sketchy by now, but I recall from my University lecture days, there were circa 100 viable High Level Languages in the late 1970s. Today this would is in 1,000s, depending on how you'd classify a viable language. Anecdotally, in my own work I'm increasingly aware (and somewhat overwhelmed) of the number of language and technology options that are available [1].

Personally, I think the increase in computing languages in the recent era boils down to a few factors:

  1. (Ever) increasing availability of processing power makes dynamic languages more accessible, and is increasing the (broader) interest and activity in this domain.

  2. Environments such as Eclipse allow make it very easy to create basic tooling for a new language. Adequate tooling is a common impediment for a language gaining adoption. I could also point at other key examples, such as Apache Ant or the LLVM project.

  3. I'm sure some pundits will disagree, but I believe dominance of languages such as Java, C# in corporate environments has allowed niche areas to develop.

  4. The Open Source Movement has greatly enriched the amount of libraries and useful code that is available. This can give languages the "shoulder of giants" kickstart they need. There are innumerable examples, but LLVM is another great example for this.

  5. We do more computing nowadays - and the computing we do is more novel.

... I think that some observers can underestimate how significant the last point is. As the computing world evolves, languages are changing in their focus and capability. A computing language is, first and foremost, a means of getting a human to tell a computer what to do. Most significantly, what we are asking computers to do is broadening day by day.

However, all of this begs the question, what is a language exactly? [2] Any significant software solution will eventually result in what I'd describe as some form or subset of a language - A large SAP or Siebel installation may nominally be running on a Java platform, but you'll need to understand the semantics of SAP or Siebel to actually develop anything. It's possible to argue that many developers will be developing Domain Specific Languages in their solutions without really realising it [3].

Equally, you may look at Microsoft .NET as a set of languages - However, at the same time, these languages all share the same underlying class library. Arguably, the "language" of the class library is a more significant than the specifics of C#, VB, or another .NET language.

This (perceived) mayhem of different languages is compounded by a degree of flux with a number of emerging technologies. For example, a lot of institutions are investing in Business Rule Engines [4]. There are a variety of these engines available, each with a different area of specialisation and their own inherent language and tooling. There also also other technologies that are emerging rapidly in corporate environments - Business Process Execution is another classic example.

With that in mind, you could consider the Google search box a language. I often use "1 USD in AUD" in Google, or "Time in London" (there are dozens more, such as calculations or stock prices). It's a way from a formal grammar, but it's a new language construct that we're all using every day. The nice thing about Google is they obviously have mined and researched these kinds of queries and catered for these common examples. It's a language that is evolving in response to the actions of users.

So why are we developing another one? To accommodate the novel things we want our system to do (novel in our minds in the very least). As Paul Graham points out as part of this Arc Language FAQ - It would be surprising if we didn't still need to design more languages. [5], [6]. If you need to work in domain that is any way novel, there is a good chance you need to start thinking about the language constructs required.

There are a number of specific reasons why someone might start developing a language in earnest. For example, there is a lot of buzz around the Semantic Web at the moment. A lot of the effort in this area has focused on the SPARQL Query Language. The development of this language, and other standards, are absolutely fundamental. For us, there were a few drivers in taking starting to look at our own language:

  • As an intellectual exercise.

  • To explore a model of computing that is highly parallel and distributed.

  • To address a particular problem domain that is unapproachable or clumsy in other languages.

In particular, we are interested in increasing the parallelism and distribution of our systems. This is key in constructing a system that can both (1) process large data sets in "user time" as well as (2) be capable of scaling to meet any increase in user demand. My previous article on Map-Reduce discusses one language construct for parallelism - we're keen to take this further and into other processing domains.

Developing in a language is a useful exercise. Developing a language lets you put a lens your work and view it in a different way. If you take your problem and look as a language problem, you see the semantics in a new light. Even if you're working in "another language" such as Java or Ruby - it's still useful to think about the constructs your are building as a language and work towards that. This is a key philosophy to languages such as Lisp [7] , but it's a still a fundamentally useful exercise regardless of your target language.

How to go about this is the next question question - It's very easy to set up the key axioms for a new language or approach, but the difficulty arises as you inevitably encounter trade-offs along the way. In our effort we're continually walking a line between having something ultimately flexible, against having something that remains interoperable.

In later articles I'll write about how we decided to go about this (for better or worse) and some of the motivations and experiences that drove our efforts. I'll also discuss what we mean by a "Vertical Language" and how that is shaping our end goals.

[1] Eric Lebherz, HyperNews Computer Language List,, 2005
[2] Trying to define what a language is semantic is perhaps the ultimate tautology, but hopefully you get my point. Wikipedia has an article on Computer Languages that is less philosophical.
[3] This is perhaps expressed more succinctly by Greenspun's Tenth Rule
[4] I've personally come across three Business Rule Engines in the same organisation - ILOG JRules, Fair Isaac Blaze and Experian. All very different rules engines, with their own language definitions.
[5] Paul Graham, Arc Language FAQ,, Year Unspecified
[6] Paul Graham also has a great essay called the The Hundred-Year Language that is well worth reading.
[7] Lisp pundits call Lisp a "Meta-Language" for this very reason.


MaysonicWrites said...

You might be interested in OMeta - a new language for language development - using the concept of executable grammars. Developed by Alan Kay's group at VPRI.

jon said...

Great! Thanks - I'm just working through his paper now.

Anonymous said...

This is my first time i visit here. I discovered so numerous interesting stuff in your weblog especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all of the enjoyment here! maintain up the great work.
windows 7 key