For the record, I have a very limited linguistics training - only a handful of undergrad courses as my electives, so please point out and criticise any inaccuracies (and be generally skeptical, although I'd imagine most of my readers are by now =P) Any errors in attempts at explaining the linguistics behind the following paper are mine, and mine alone. Slightly edited to suit broader audience. Enjoy!
Have you ever wondered why English seems so simple compared to some other languages, particularly those notious for complex grammar like Russian or German? Have you wondered whether there was any reason why the local [Pacific Northwest] languages are so complex and filled with intricate grammar?
May I interest you in a very recent awesome paper from PLoS ONE:
Language Structure Is Partly Determined by Social Structure Lupyan & Dale 2010 (open access)
They examined 2236(!) languages and looked for correlation between their morphological complexity and the 'linguistic niche' -- whether the language is spoken over a vast area mostly by strangers, or used within a small tightly-knit community. The majority of the world's languages are 'esoteric' (smaller population, fewer neighbouring languages, smaller area; eg. Tatar, Piraha, Ju|'hoan, Nuu-chah-nulth), contrary to what is most obvious to us, ie the 'exoteric' languages like English or Swahili. One would expect that the use of an exoteric language as a lingua franca may result in some changes in its structure, as its 'purpose' or 'function', if you will, is quite different. Anyway, they found that:
1. Exoteric languages tend to be isolating; that is syntactic stuff (tense, person, etc) is marked by independent morphemes rather than affixes or other inflections. For example, in Russian (which is still very exoteric, but less than English or Mandarin) house would be /dom/, but to say "of [the] house" (that is, house[gentitive]) you say /doma/, using a suffix instead of a preposition to indicate the case. In this case, English would be more of an isolating language, whereas Russian is more of a fusional one. By the way, we Russians do use a mixture of both suffixes and prepositions -- would be interesting to see if the use of prepositions intensified over time as Russian became more dominant in its region.
2. Exoteric languages tend to have fewer case markings (see Russian example above); furthermore, Exoteric languages seem to use the Nominative/Accusative system (English, Russian, German, etc) rather than the Ergative/Absolutive system (eg. Basque), which still completely eludes me. Probably because it's rare and unusual. As far as I can 'understand', ergative languages basically use the object as the subject of the verb. Ie, 'dog walked' would actually mean 'walked' acted on the dog, ie the 'dog was walked'. Now 'dog walked boy' would mean 'the dog was walked by the boy', so the primary argument of the verb is the object, not the subject. Or something like that.
When I was randomly reading up on some German back in the day, it was interesting to find their case system to be completely in ruins -- it was obviously about to become the grammatical analogue of a pseudogene! Many of their suffixes either repeat in different instances, or don't even exist anymore. It's a mess to learn, and it seems like modern German relies on their case system less and less. Furthermore, Old English had cases. Yes, this language once had a hard-core, well-structured and absolutely essential case system, just a few centuries ago! I wouldn't be surprised if German cases go the way of the English ones in a few hundred years... for the record, Proto-indoeuropean had something like 8 or 9 of them.
3. Exoteric languages have fewer grammatical categories marked in the verb -- some of you may remember from learning French (or Spanish, or German) the billions of different conjugation schemes you had to memorise for the verbs -- damn things had to agree in number, gender, various intricate tenses, aspects etc. English seems to be much simpler morphologically. And it is. This is how we inflect 'walk' in English: (remember I'm talking strictly about morphology -- the syntax is still quite complex and intricate)
I, we, you, they - walk; he, she, it - walks
past - walked
progressive - walking
Now to compare with Russian: infinitive -- /gulyat'/ (to go for a walk) (picking regular verbs)
[1st person singular] /gulyayu/ [2nd p sg] /gulyayesh/ [3rd p sg] /gulyayet/
[1st p plural] /gulyayem/ [2nd p pl] /gulyayete/ [3rd p pl] /gulyayut/
past: [masculine signular] /gulyal/ [fem sg] /gulyala/ [neuter sg] /gulyalo/ [plural] /gulyali/
adverbial participle: /gulyaya/
imperative: [2sg] /gulyay/ [2pl] /gulyayte/
And probably a few more I missed. Now, Russian is a piece of cake compared to perhaps MOST of the world's languages!
4. Exoteric languages tend not to mark noun-verb agreement. As we've seen above, English is strikingly simple in that department -- there's only number agreement! Now, if French or Russian are intimidating, try thinking about inflecting the verb based on the subject AND the object AND how the verb is done by the subject onto the object... apparently, many languages do that. We had to look at Nuu-chah-nulth (aka "Nootka") in an introductory grammar&syntax course and oh my do they inflect for EVERYTHING. You have one word sentences that go on for a few rows of syllables...fascinating!
5. Exoteric languages seldom mark evidentiality by affixation -- that is, how a certain thing is known about is instead marked by verb choice and random modifiers. You may have noticed that academic writing requires a lot of cumbersome qualifiers and disclaimers embedded in every sentence, apparently. It tends to be that this may well be an inherent feature in research writing, or so I've heard. Some languages not only allow you to identify the nature of evidence for a statement, but actively require it much like English always requires tense (which, btw, not every language does; after all, 'today', 'in the past' and 'tomorrow' work perfectly fine instead!). It seems that this feature tends to mostly happen in the 'obscure' esoteric languages. Wikipedia has some nice examples from Pomo in the intro.
6. Exoteric languages are more likely to: a) encode negation lexically (eg. by EN 'no', FR 'ne...pas', RU 'nye', etc) rather than inflectionally (eg. JP -nai)
b) have obligatory plural markers (as in EN 'one cat - two cats', RU 'odin kot - dva kota'; in contrast, Japanese and Mandarin don't bother with obligatory plural markings, although in JP you could add -tachi ('many') if you really want (disclaimer: not a JP speaker...) This is why English with the stereotypical Chinese accent lacks plural marking: "Very cheap -- two dollar!" It is very curious that this is one of the very few increases in morphological complexity in exoteric languages.
c) less likely to have a distinct associative plural ("he and his friends", to use the example in the paper)
d) are more likely to have a dedicated question particle. English lacks this, but Japanese, for example, adds 'ka' at the end of an interrogative sentence. In a way, the particle is susbtantially simpler than all the weird obstruse syntactic movements you've got going on in English. Although I'd argue that simple tone raising is even more simple and likely ancestral too. Apparently tone raising is universally natural when you end a sentence with the expectation that someone else picks it up (which is what a question essentially is).
7. Exoteric languages a) are less likely to encode future morphologically. That is, English will not bother with a special inflection for future tense, will it? Even in French, where you have two forms of future tense, the lexical one is used more commonly in spoken form -- Je vais tomber as opposed to Je tomberai. There is a slight semantic difference, which is perhaps why both forms coexist, but less precise everyday use seems to be ok with the first form. Japanese famously "lacks" the future tense; however, as you can probably see by now, that is utter BS as English lacks it too! The difference is that French uses inflection, Russian uses a prefix with a strange aspect shift thing (that makes me pitty any students of the language...), English uses an auxiliary (will, going to) whereas Japanese uses simple lexical modifiers (eg. tomorrow, etc). Ashita watashi wa benkyou suru [tomorrow-I-[subj]-study-to do] is quite sufficient to express future tense!
Furthermore, exoteric languages are less likely to inflect for remote vs. proximal past tense (eg. happened recently vs. happened a looong time ago).
Again, I really want to stress that a lack of morphological marking for a grammatical category does NOT mean the language lacks a way to express it! English speakers are perfectly fine at distinguishing proximal past and remote past when they need to! ('Once upon a time' can be viewed as your classical 'remote past modifier')
b) more likely to mark imperatives with inflection, but less likely to distinguish singular vs. plural imperatives. English doesn't inflect imperatives, whereas Russian and French both do, although both seem to also mark singular vs. plural. Japanese, on the other hand, marks the imperative without inflecting for number (-(t)te-), so it wins the Exoteric Award for that category.
c) less likely to inflect posession. Eg. JP otoko no ken [man-of-sword] (that is, 'man's sword', Japanese uses postpositions rather than prepositions) but RU cheloveka mech [person.[gen]-sword] (more naturally, mech cheloveka, but to be parallel with the Japanese example). Japanese would be more exoteric in this case. English is weird so I won't go there. We really don't need to go into the DP Hypothesis today...
8. Exoteric languages don't seem to have definite vs. indefinite articles. (English fails this miserably - 'a' vs. 'the') If they do, they use separate words for them. Russian sort of lacks articles, although some argue it doesn't (syntax people are weird =P), so I can't tell what's going on there.
9. Less likely to use use demonstratives for distinctions. Ie. obligatory 'this here apple' vs. 'that there apple'. English doesn't do it, although you're more than welcome to if you'd like. Again, most of these points refer to obligatory grammar, rather than optional modifications you can add on. Inflections tend to be obligatory, and quite built in. By built in, I mean thoroughly engrained in your brain, usually outside your own awareness, if you're a native speaker -- I didn't realise Russian had cases (and SIX of them) until I was in my teens!
10. Exoteric languages tend to use lexical pronouns rather than expressing them morphologically (eg. as suffixes). Although in many languages where you do have subject-verb agreement, the pronoun becomes optional. Thus, this makes sense -- once you completely lose subject-verb agreement marking (as exoteric languages tend to), you are now required to have an obligatory subject pronoun. As in English. A wonderful example of an evolutionary ratchet!
SUMMARY: Languages spoken by more people in a wider area subject to more interlinguistic contact tend to be simpler morphologically than languages used in smaller isolated close-knit communities.
Now for some more of my own notes and ramblings:
First of all, let's point out that exoteric vs. esoteric is on a continuum; you may have noticed that English, which is blatantly exoteric, falls on the esoteric side for some characteristics, just like some Papuan languages would possess features of exoteric languages. It's not a binary distinction.
Regarding the degrading cases trend in Germanic (all IndoEuropean?) languages (see point #2): If cases degrade, how are they formed in the first place? The likelihood of a language simplifying should be higher than vice versa, yet we still find plenty of utterly complicated languages today. Why do they still exist? You can argue:
1. Many of those languages preserve ancestral features of some proto-proto-proto languages of the distant past, which were ridiculously complicated and later became honed down to something manageable in more-used languages. In other words, the complex languages evolved slower.
2. Under some circumstances, language evolution is actually driven towards complexity by whatever mechanisms (analogous to drift or constructive neutral evolution?). Incidentally, this circumstances seem to match those of esoteric languages -- small population size, etc.
I'll leave it as an optional exercise to the reader to devise experiments for testing those hypotheses (ie. I need to sleep soon...)
Onward to the bigger question: Why do the more widespread and promiscuous exoteric languages seem to have simpler morphology than the small and isolated esoteric ones? Isn't this a bit counterintuitive -- the so-called 'primitive' peoples should have simpler "ME TARZAN!" morphology while the refined Victorian Englishmen should sip their exotic teas to a conversation thoroughly inflected for evidentiality, twenty-something cases, five levels each of past and future compounded by five noun 'gender' categories and speckled with the fine Ergative-Absolutive alignment, preferrably with a touch of Nominative-Accusative just for kicks. Why is it that small, isolated linguistic communities, despite having arguably simpler social structure (in terms of numbers of components anyway), tend to have such amazingly intricate grammars and morphologies?
We can analyse this in three ways:
1. Simpler languages are more likely to be learned and thus spread easily. A complex language would find it harder to survive in a population of related simpler variants.
2. Languages with a larger and more diverse speaker population evolve faster, and thus become more efficient as speakers fail to grasp more complex grammatical elements, thereby 'mutating' the language, if you will. Bilingualism makes the matter worse as adult learners are notoriously bad at picking up new languages.
3. Simpler features are better at spreading themselves laterally*, thereby displacing more complex morphological characters through interlinguistic contact. Esoteric languages tend to be more isolated, and thus would remain safe from the viral simplicity**.
*(via areal linguistics -- characters like sounds or certain syntactic structures or inflections can be shared spatially between unrelated languages; a proposed classical example of this would be tones in East Asia -- the Mon-Khmer, Tai-Kadai and Sino-Tibetan languages are actually quite distantly related, despite appearing very similar to the western ear by tending to be tonal and isolating)
** There's actually a relevant field called 'linguistic epidemiology' which basically takes a similar approach. Would be interesting to see if anything's been said there regarding complexity.
Again, designing experiments and models are left as an exercise to the reader. Although my hunch is that all three may be involved to some extent, but perhaps the epidemiology argument may be most prominent, helped out by mutations and selection. Another fun topic to examine, perhaps?
This is actually quite reminiscent of Effective Population Size stuff in biol -- I have seen arguments that smaller populations can actually be less streamlined ('less adapted' than we'd expect them to be for their environment) than larger populations, where competition is much harsher and there is a strong pressure towards the efficient mean. I am now far outside of my fields though, and could be wrong in my interpretation; but it has been used as an argument (eg. by Michael Lynch, 2007 PNAS, if I understood correctly) as to why bacteria tend to be far more efficient and streamlined, with far less complex crap going on, than large multicellular eukaryotes, where some utterly ridiculous design ideas seem to be tolerated. This too would be fun to explore and/or model...
I'd like to conclude with the following proposition: 'Caveman language', Tarzan-style, was a widespread, multicultural language with hundreds of millions, if not billions, of speakers!
(And yes, I write these up for fun. Us research bloggers are a weird bunch. =P)
Reference:
Lupyan, G., & Dale, R. (2010). Language Structure Is Partly Determined by Social Structure PLoS ONE, 5 (1) DOI: 10.1371/journal.pone.0008559
AAARGH!!
ReplyDeleteIt's the day before I have a presentation, which I have yet to start working on, and you post something long and interesting like this! There will be much to say here after I recover from the recovery, which will involve a fair amount of alcohol, so probably on Saturday (or, if said recovery is particularly exuberant, Sunday). Until then, I must bite my tongue. Except when I give the presentation. That would not help.
Grr... oh, all right... I will say this much for now, before someone else beats me to it. You have the details on the Ergative/Absolutive case system a little confused. It relates to how languages deal with intransitive verbs. With transitive verbs, the Nominative/Accusative and Ergative/Absolutive systems, the former case indicates the subject, and the latter the direct object. To use English examples, each would say "I see her." The difference is that in N/A languages, the sole noun directly associated with an intransitive verb is also nominative, while in E/A languages, it is the absolutive. To use the English-pidgin approximant, an N/A language would say "he sleeps" while an E/A language would say "him sleeps".
ReplyDeleteThere are all sorts of psychological and philosophical consequences that one can propose from these distinctions, some of which form the introduction of the Wikipedia article to which you link, but the above is the fundamental distinction.
Right, back to work now....
Nice review of the work. I'm a linguist and I'm hoping to read this over the weekend and offer thoughts. Until then, I'll briefly repeat the comments I made at Gene Expression:
ReplyDelete1. less morphology does not = less compelx (you brought up the point about compelxity too; what is this??).
2. You are right to bring up the "cline of grammaticalization" (whereby forms tend to go from morphological to lexical). Some historical linguists suggest that this is actually a cyclical pattern, not an asymmetric one (I'll look for references this weekend, Traugot is likely involved).
3. WALS does not give historical data (I don't think), but rather a synchronic snapshot of contemporary data; how can we draw conclusions about language change from this?
Keep up the great blogging!
Opisthokont, thanks for clearing up the Ergative stuff!
ReplyDeleteChris:
1. By less complex, I meant morphologically complex. Got sick of saying that word over and over again. In that sense, it is perfectly valid to use the term 'complexity', that is, number of components involved; comparing overall linguistic complexity is as sketchy and pointless as comparing organismal complexity -- and both are some of my worst pet peeves! =D However, you can compare complexity of specific characters or fairly well-defined systems, such as language morphology or a particular genetic pathway.
Is 'morphocomplexity' a word? ^_^
2. That's interesting -- I'll look it up! Of course, the increasing morphocomplexity in esoteric languages must come from somewhere -- perhaps because of the greater 'selective tolerance' in smaller effective population sizes, resulting in a greater excess capacity of various systems, resulting in a massive surge in complexity (that is, number of components involved) via constructive neutral evolution and so on. Much like the descent of multicellularity =P
However, it may well be that simplification and complication occur in morphologies of esoteric and exoteric languages, respectively. Please do link me the paper!
3. We can draw conclusions about language change the same way we draw conclusions about biological evolution -- the vast majority of relevant life fossilises very poorly or not at all (I don't consider large metazoa to be particularly important in the big scheme of evolutionary biology). So our only hope is to try and understand how these biological systems work, and from that try to infer how change may have happened. We're also lucky to have somewhat more stable sequences than what linguists have to work with. Thus, we can resolve our trees a bit further. Linguists too are seldom lucky to have relic inscriptions, the analogue of fossils, for most of the world's languages. Writing systems are rare -- well-preserved relics of ancient writing systems are even rarer. So they have to rely on what we have today to infer historical events. Of course, it's just a desperate attempt, but what else do we have?
(I've recently been asked the SAME exact question about biological evolution!)
Thanks for your comments!
Fascinating stuff. I teach ESOL (English for speakers of other languages) and I often wonder at the complexity of language in general and certain languages in particular. I tried for 4 years to learn Polish and was constantly horrified by its "unnecessary" complexities - a different word for a singular noun, another for up to 4, another for more than 5 etc. Also, as far as I know (I'm sure I'll be quickly corrected if I'm wrong) English is the only language which doesn't (generally) ascribe gender to inanimate objects. Why would a table be masculine in one language and feminine in another?? And then you need to have a masculine or feminine adjective to go with it!
ReplyDeleteTruly, truly fascinating stuff!
ReplyDeleteIt's so easy for me to delve deeper and deeper into wikipedia entries like http://en.wikipedia.org/wiki/Anglo-Norman_language#Characteristics
. It's all there: 'horizontal word transfer', paralogs, orthologs, ancestral states, mutation rates..
Thanks for a great post!