Indo-European languages are thousands of years older than the majority opinion had it and were spread by people from the Fertile Crescent along with their genes, researchers say.
Almost half the world’s population speaks a language of the Indo-European family, which includes English, Hindi, Spanish and hundreds of living languages from Icelandic to Bengali. It also includes important extinct languages such as Sanskrit, Latin and ancient Greek. Although they may sound very different, the languages share numerous telltale similarities, such as between “mother” in English, madar مادر in Persian and mitéra μητέρα in Greek.
As a result of these related “cognate” words, linguists have realised for centuries that the languages must share a common origin. However, it has remained unclear exactly where they originated and how and when they spread across Eurasia. Now, a pioneering study claims to have answers. It suggests that two popular existing models are incorrect but contained clues to a more complex and fascinating story that unfolded from around 8,000 years ago.
Dr Paul Heggarty, a linguist at the Pontifical Catholic University of Peru and lead author of the study in Science, said: “To draw a parallel, before radiocarbon dating came along, archaeologists had to estimate their chronology from stratigraphy. Then with carbon dating they suddenly realised, at Troy, for example, that many things were much older than they had thought. Likewise, this is a moment when we have to recalibrate our whole picture for Indo-European languages.”
One of the previous models, the dominant “steppe” hypothesis, argues for the expansion of Indo-European languages out of the Pontic-Caspian steppe no earlier than 6,500 years ago, and mostly with horse-based pastoralism around 5,000 years ago. The second, the “Anatolian” or “farming” hypothesis, suggests the languages spread out of the Fertile Crescent with agriculture, starting as early as 9,500 to 8,500 years ago. In this model, the European branches travelled west via a southern route through the Balkans, rather than across the Caucasus and steppe.
According to the new research, previous analyses have had methodological problems that led to errors in dating different languages and branches. These included use of datasets that were too limited in scope and inconsistent in coding; and the erroneous assumption that modern languages derive directly from ancient written ones, rather than parallel spoken varieties — such as Vulgar Latin spoken by ancient Romans and differing from literary Classical Latin.
The new study uses computer modelling to predict ages and relationships of different languages and branches based on the evolution of cognate words. The team created a dataset intended to eliminate inconsistencies in previous attempts and provide a more complete language sample for better dating calibration. The database of Indo-European cognate relationships, called IE-CoR and now online at iecor.clld.org, covers 161 languages, coded by over 80 specialists.
Heggarty, who was at the Max Planck Institute for Evolutionary Anthropology in Leipzig at the time of the research, said: “The only way we could do this study properly and know what was a cognate and what was a loanword and so on was building on two centuries of extremely good, very detailed scholarship by Indo-European specialists. That’s why the database should be so reliable. We have thousands of references to the standard works in modern Indo-European linguistics. I wouldn’t like people to have the impression that we are the first to research these things. We’re not — we’re standing on the shoulders of giants.”
Their analysis suggests that Indo-European languages started to diverge sometime in the centuries either side of the median estimate of 8,120 years ago. It also indicates that Indo-European had already diverged rapidly into multiple major branches by about 7,000 years ago. This puts the divergence thousands of years earlier than many scholars’ estimates and is inconsistent with both “steppe” and “farming” models as standalone explanations.
Considered alongside studies of ancient DNA, the dating evidence is consistent with a homeland for Proto-Indo-European south of the Caucasus — around today’s northern Iran and Iraq, eastern Turkey, Armenia and Azerbaijan — and an influential branch later travelling north onto the steppe with a movement of people.
In this analysis, Indo-European languages spoken on the steppe were then ancestral to many European languages. In other words, both previous models were partly right and the truth lies with a hybrid of the two.
The hybrid hypothesis suggests that the Indo-Iranic branch, that includes languages such as Hindi and Pashto, emerged and moved eastward soon after the Indo-European languages started to diverge. The team found that the Indo-Iranic branch has no close relationship with Balto-Slavic, weakening the case for it having spread east via the steppe and favouring a more direct southern route. Heggarty said: “My personal take is that the most straightforward route is that they went south of the Caspian Sea, through what is now Iran.”
In their interpretation, the Balkan branch ancestral to ancient and modern Greek, Albanian and other languages now extinct, also diverged and shifted westward early — again, without detouring via the steppe.
Then, around 7,000-6,500 years ago, a branch reached the steppe northward through the Caucasus. It was from this “secondary homeland” that, from around 5000 years ago, Indo-European languages spread into Europe with expansions of populations such as those associated with the Yamnaya pastoralists and Corded Ware culture. Languages in the Italic, Germanic and Celtic branches would all trace their origins from the steppe, according to the researchers.
Of these, they estimate that Germanic and Celtic diverged from each other around 4,890 years ago, and Italic from them somewhat earlier — around 5,560 years ago. Balto-Slavic languages split earlier still, around 6,460 years ago, and the researchers suggest that this branch may have arrived in Europe with a later wave than Italic, Germanic and Celtic grouping, although this requires further study.
In the hybrid hypothesis, the Proto-Indo-European language would have been spoken by early farmers in the north of the Fertile Crescent who spread their language along with their genes and agricultural techniques. Through contacts to the south, they possibly exchanged loanwords from speakers of early Semitic languages, including the word for number seven — in Hebrew, sheva שבע. Some of their descendants on the steppe were pastoralists who came to rely less on crops and more on livestock grazing the grasslands of the steppe.
Significantly, ancient DNA shows no significant influx of ancestry from the steppe to the southeast at the time of the early divergence of Indo-European languages. However, it does show an influx of “Iranian-like” ancestry from south of the Caucasus into the steppe around 7,000 to 6,200 years ago, which created the “steppe” mix of ancestries that would later spread across Europe around 5,000 to 4,500 years ago.
It is the Iranian-like ancestry that the scientists believe is the genetic marker, or “tracer dye” of the Proto-Indo-European speakers and it can be found today — intermixed with other, differing elements — in many speakers of Indo-European languages from Wales to Waziristan.
Significantly, the early divergence in the new model suggests that Indo-European languages related to ancient Greek and Sanskrit, respectively, are likely to have been spoken by early Mycenaean Greeks and by the people of the Indus Valley civilisation, who had only limited ancestry from the steppe.
Some linguists have suggested Proto-Indo-European included words for items such as the wheel and axle and that this supports a later divergence — spread by steppe pastoralists who were among the first to use these technologies. However, the study team argue that these words are likely to have referred to more generic, related concepts, such as the circle and turning.
The team’s methodology will not only help our knowledge of Indo-European languages but can be applied to other language families. Speaking of his plans for following up the study, Heggarty said: “There are two things I want to do. One is a similar analysis for a completely different language family, Quechua, in South America, which I’ve also spent years working on. Then there are various proposals I would have for ways to improve our methodology even more. A lot of people will be using our database.”
The image at the top of the article shows the Zagros Mountains in northern Iran, which are in the region of the suggested Proto-Indo-European homeland. Photo: Shutterstock