Saturday, December 06, 2008

The Thematic Structure of English and Groovy

After working as a programmer for many years, I tossed it in to teach English in China. I spent a few years reading the many books on language and linguistics in the bookshops up here, before returning to programming as a hobby. I then started to see many similarities between natural and computer languages, which I'm intermittently blogging about. Here's today's installment...

Of the books on language I've read, M.A.K. Halliday's ideas make a lot of sense. He suggests we should analyse language in terms of what it's used for, rather than its inherent structure. From this basis, he's isolated three basic functions of natural language, and their corresponding structural subsystems: the ideational, the interpersonal, and the textual.

The ideational function is a representation of experience of the outside world, and of our consciousness within us. It has two main components: the experiential and the logical. The experiential component embodies single processes, with their participants and circumstances, in a transitivity structure. For example, “At quarter past four, the train from Newcastle will arrive at the central station.” has a transitive structure with process to arrive, participants train from Newcastle and central station, and circumstance quarter past four. The primary participant is called the actor, here, the train from Newcastle. Computer languages have a structure paralleling the transitivity structure of natural languages, e.g. train.arrive(station, injectedCircumstance) for object-oriented languages. The logical component of ideational function concerns links between the experiential components, attained with English words such as and, which, and while. These have obvious parallels in programming languages.

The interpersonal function involves the producer and receiver/s of language, and the relationship between them. This function accounts for the hundreds of different attitudinal intonations we overlay onto our speech, interjections, expressions of politeness we prepend to English sentences, e.g. “Are you able to...”, many adverbs of opinion, the mood (whether indicative, interrogative, imperative, or exclamative), and the modal structure of English grammar. The mood structure causes verbal phrases to have certainty, ability, allowability, polarity, and/or tense prepended in English, and can be repeated in the question tag, e.g. isn't he?, can't we?, should they?. The interpersonal function gives the grammatical subject-and-predicate structure to English. In programming languages, the interpersonal function determines how people interact with the program, and how other programs interact with it. The interpersonal functions are what would normally be extracted into aspects in aspect-oriented programming. They generally disrupt the “purer” transitivity structure of the languages.

The textual function brings context to the language through different subsystems. The informational subsystem divides the speech or text into tone units using pauses, then gives stress/es to that part of the unit that is new information. The cohesive subsystem enables links between sentences, using conjunctions and pronouns, substitution, ellipsis, etc. The thematic subsystem makes it easy for receivers to follow the flow of thought. Comparing this structure of the English and Groovy languages is the topic of today's blog post...

Thematic structure of English
Theme in English is overlaid onto the clause, a product of the transitive and modal structures. The theme is the first position in the clause. English grammar allows any lexical item from the clause to be placed in first position. (In fact, English allows the lexical items to be arranged in any order to enable them to be broken up in any combination into tone units.) Some examples, using lexical items to give, Alan, me, and the book, with theme bolded:
  Alan gave me that book in London.
  Alan gave that book to me in London. (putting indirect object into prepositional phrase)
  To me Alan gave that book in London. (fronting indirect object)
  I am who Alan gave that book to in London. (fronting indirect object, with extra emphasis)
  To me that book was given in London. (using passive voice to front indirect object)
  That book was given in London. (using passive voice to omit indirect object)
  That book Alan gave me in London. (fronting direct object as topic)
  That book is the one Alan gave me in London. (fronting direct object in more formal register)
  In London, Alan gave me that book. (fronting adverbial, into separate tone unit)
  London is where Alan gave me that book. (fronting adverbial in the same tone unit)
  There is a book given by Alan to me in London. (null topic)

Although not common, English also allows the verb to be put in first position as theme:
  Give the book Alan did to me in London.
  Give me the book did Alan in London.
  Give did Alan of the book to me in London.
  Given was the book by Alan to me in London.

First position is merely the way English indicates what the theme is, not the definition of it. Japanese indicates the theme in the grammatical structure (with the inflection はwa), while Chinese (I think) uses a combination of first position and grammatical structure (prepending with 是shi).

Thematic structure of Groovy
One way of indicating theme could be to bold it, assuming the text editor had rich text capabilities. This would similar to Japanese. For example, for thematic variable a
  def b = a * 2; def c = a / 2;.
Another way is to use first position, how English indicates it. This would be an Anglo-centric thematic structure to programming languages, which generally already have an Anglo-centric naming system. Perhaps the best way is a combination of both front position and bolding.

Let's look at how Groovy could enable front-position thematic structure. We'll start with something simple, the lowest precedence operator: a = b. If we want to front the b, we can't. We would need some syntax like =:, the reverse of Algol's :=
  b =: a

We'd need to provide the same facility for the other precedence operators at the same level += -= *= /= %= <<= >>= >>>= &= ^= |=. Therefore, we'd have operators =+: =-: =*: =/: =%: =<<: =>>: =>>>: =&: =^: =|:.

At the next higher precedence level are the conditional and Elvis operators. Many programming languages, such as Perl and Ruby, enable unless as statement suffix, allowing the action to be fronted as the theme. Groovy users frequently request this feature of Groovy on the user mailing list. An unless keyword would be useful, but we could also make the ? : and ?: operators multi-theme-enabling by reversing them, i.e. : ? and :?, with opposite (leftwards) associativity. The right-associative ones would have higher precedence over these new ones, so, for example:
  a ? b : c ? d : e would associate like a ? b : (c ? d : e)
  a : b ? c : d ? e would associate like (a : b ? c) : d ? e
  a : b : c ? d ? e would associate like a : (b : c ? d) ? e
  and a ? b ? c : d : e would associate like a ? (b ? c : d) : e

On a similar note: Groovy dropped the do while statement because of parser ambiguities. It should be renamed do until to overcome the ambiguities.

Next up the precedence hierarchy, we need shortcut boolean operators ||: and &&:, which evaluate, associate, and shortcut rightwards. Most of the next few operators up the hierarchy | ^ & == != <=> < <= > >= + * don't need reverse versions, but these do: =~ ==~ << >> >>> - / % **. It's good Groovy supplies the ..< operator so we can emphasize an endpoint in a range without actually processing it. We'll also provide the >.. and >..< operators.

Just as in English we have the choice of saying the king's men or the men of the king, depending on what we want to make thematic, we should have that choice in Groovy too.
We can easily encode reverse-associating versions of *. ?. .& .@ *.@ as .* .? &. @. @.*. To encode the standard path operator ., we could use .:.

A positive by-product of having these reverse-associative versions of the Groovy operators is they'll work nicely with names in right-directional alphabets, such as Arabic and Hebrew, when we eventually enable that.

When defining methods in Groovy, we should have the choice to put return values and modifiers after the method name and parameters, like in Pascal. This would cater speakers of Romance languages, e.g. French, who generally put the adjectives after the nouns.

Groovy, like most programming languages, doesn't enable programmers to supply their own thematic structure to code, only the transitive structure. When used well, thematic structure in code enables someone later on to more easily read and understand the program. Perl was a brave attempt at providing “more than one way to do things”, but most programming languages haven't learnt from it. I'm working on a preprocessor for the Groovy Language, experimenting with some of these ideas. If it looks practical, I'll release it one day, as GroovyScript. It will make Perl code look like utter verbosity.