choose a language or topic »|Monday, February 18, 2019
You are here: Home » Indo-European » Baltic » Indo-European Language Family
  • Follow Us!

Indo-European Language Family 

indo european

Indo-European is a family of languages that first spread throughout Europe and many parts of South Asia, and later to every corner of the globe as a result of colonization. The term Indo-European is essentially geographical since it refers to the easternmost extension of the family from the Indian subcontinent to its westernmost reach in Europe. The family includes most of the languages of Europe, as well as many languages of Southwest, Central and South Asia. With over 2.6 billion speakers (or 45% of the world’s population), the Indo-European language family has the largest number of speakers of all language families as well as the widest dispersion around the world.

The cradle of the Indo-Europeans may never be known but an ongoing scholarly debate about the original homeland of Proto-Indo-European (PIE), may some day shed light on the ancestors of all Indo-European languages as well as the people who spoken it. There are two schools of thought:

  • Some scholars (e.g., Marija Gimbutas) propose that PIE originated in the steppes north of the Blackand Caspian Seas (the Kurgan hypothesis). Kurgan is the Russian word of Turkic origin for a type of burial mound over a burial chamber. The Kurgan hypothesis combines archaeology with linguistics to trace the diffusion of kurgans from the steppes into southeastern Europe, providing support for the existence ot a Kurgan culture that reflected an early presence of Indo-European people in the steppes and southeastern Europe from the 5th to the 3rd millenium BC.
  • Other scholars (e.g., Gamkrelidze and Ivanov) suggest that PIE originated around 7,000 BC in Anatolia, a stretch of land that lies between the Blackand Mediterranean seas. It lies across the Aegean Sea to the east of Greece and is thus usually known by its Greek name Anatolia (Asia Minor). Today, Anatolia is the Asian part of modern Turkey.

It would not have been possible to establish the existence of the Indo-European language family if scholars had not compared the systematically recurring resemblances among European languages and Sanskrit, the oldest language of the Indian subcontinent that left many written documents. The common origin of European languages and Sanskrit was first proposed by Sir William Jones(1746-1794). Systematic comparisons between these languages by Franz Bopp supported this theory and laid the foundation for postulating that all Indo-European languages descended from a common ancestor, Proto-Indo-European (PIE), thought to have been spoken before 3,000 B.C. It then split into different branches which, in turn, split into different languages in the subsequent millennia.

Since PIE left no written records, historical linguists construct family trees, an idea pioneered by August Schleicher, on the basis of the comparative method. The comparative method takes shared features among languages and uses procedures to establish their common ancestry. It is not the only method available but is one that has been most widely used. The examples below show how this method actually works with some Indo-European languages.

PIE *dekm > Proto-Germanic *texun > Old English teon > Modern English ten
Proto-Italic *dekem > Latin decem > Modern Italian dieci
Old Church Slavonic desenti > Modern Bulgarian deset
Sanskrit dáça > Hindi/Urdu das
Greek deka
  • proto means ‘old’ in Greek
  • * means the form was reconstructed, not attested.
  • > means ‘became’


Indo-European languages are classified into 11 major groups, 2 of which are extinct, comprising 449 languages (Ethnologue).

This conservative group has preserved many archaic features thought to have been present in PIE. Some scholars think that Baltic languages share a common ancestral language with the Slavic languages. This hypothetical language is called Balto-Slavic.

Number of speakers
Where spoken primarily
Latvian 1.5 million Latvia
Lithuanian 3.1million Lithuania

Celtic languages were largely unknown until the modern period. They were once spread over Europe in the pre-Christian era. The oldest records of these languages date back to the 4th century AD.

Number of speakers
Where spoken primarily
Breton 533,000. France
Irish 355,000 Ireland
Scottish (Scots Gaelic) 62,175. Scotland
Welsh 575,000 Wales


West Germanic
Number of speakers
Where spoken primarily
Afrikaans 6 million South Africa
Dutch 17 million Holland
English 309 million UK, US, Australia, Canada
German 95 million Germany
Yiddish 50,000 Germany, Israel
North Germanic
Number of speakers
Where spoken primarily
Danish 5.3million Denmark
Icelandic 240,000 Iceland
Norwegian 4.6 million Norway
Swedish 8.8 million Sweden

Romance (Italic)

Number of speakers
Where spoken primarily
Catalan 6.7 million Spain
French 65 million France
Italian 61.5 million Italy
Portuguese 178 million Portugal, Brazil
Romanian 23.5 million Romania
Spanish 322 million Spain, Latin America


West Slavic
Number of speakers
Where spoken primarily
Czech 11.5 million Czech Republic
Polish 43 million Poland
Slovak 5 million Slovakia
Sorbian 70,000 to 110,000 Germany


East Slavic
Number of speakers
Where spoken primarily
Belarusian 9 million Belorusia
Russian 150 million L1 speakers Russia
Ukrainian 37.1 million Ukraine


South Slavic
Number of speakers
Where spoken primarily
Bosnian 4 million Bosnia & Hercegovina
Croatian 6.2 million Croatia
Macedonian 1.6 million Macedonia
Serbian 11.1 million Serbia
Slovenian 2 million Slovenia


Indo-Aryan (Indic)
Number of speakers
Where spoken primarily
Balochi 1.8 million Pakistan
Bengali 100 million 1st language; 211 million 1st & 2nd language speakers Bangladesh
Bhojpuri 26.6 million India
Hindi 180.8 million India
Gujarati 46.1 million India
Kashmiri 4.6 million India
Marathi 68 million India
Nepali 17.2 million Nepal
Maithili 24.8 million India
Oriya 31.7 million India
Punjabi 60.8 million India
Romani 1.5 million Romania & elsewhere
Sanskrit 194,000 2nd language speakers India & elsewhere
Sindhi 21.3 million Pakistan
Sinhalese 13.2 million Sri Lanka
Urdu 60.5 million Pakistan


Number of speakers
Where spoken primarily
Dari 7.6 million Afghanistan
Farsi (Persian) 24.3 million Iran
Kurdish 11 million Iraq & elsewhere
Pashto 19 million Afghanistan & elsewhere
Tajik 4.3 million Tajikistan


Number of speakers
Where spoken primarily
Albanian 5 million Albania
Armenian 6.7 million Armenia
Greek is the only surviving language of this group.
12.3 million Greece


Tocharian (extinct)
Attested by texts dating to 500-1000 AD that were found in early 20th century in Chinese Turkestan
Anatolian (extinct)
Unknown until the 20th century when it was discovered during excavations in Turkey. Texts written in cuneiform date to 13th-7th centuries BC.

In addition to these main groups, there are fragmentary records of other Indo-European languages. These records, mostly in the form of inscriptions, do not provide sufficient material for the reconstruction of PIE.



Sound system
There have been numerous attempts to reconstruct the vowels and consonants of PIE, all of which encountered serious problems due to the uneven nature of the written records and to the huge differences in the age of the records. As a result, the reconstruction of PIE phonology continues to be a matter of scholarly debate and speculation. Among the most notable reconstructions are those by August SchleicherKarl BrugmannWinfred LehmannOswald Szemerènyi, and Jacob Grimm.

First Germanic Sound Shift (Grimm’s Law)
You probably know of Jacob Grimm as the author of fairy tales. But he was also one of the great linguists of the 19th century. He found evidence for the unity of all the modern Germanic languages in the phenomenon known as the First Germanic Sound Shift (also known as Grimm’s law ), which set the Germanic branch apart from the other branches of the Indo-European family. This shift occurred before the 7th century when records started to be kept. According to Grimm’s law, the shift occurred when /p, t, k/ in the classical Indo-European languages (LatinGreek, and Sanskrit) became /f, t, h/ in Germanic languages. For example, Latin pater > English father, Latin cornu > English horn.
You can easily see the resemblances among four common words across five Indo-European languages.

father pater pater pita
brother phrater frater bhratar
foot poda pedem pada
three tris tres trí

Click here for an amusing illustration of Grimm’s Law and of words for family, plants, animals, sky, and counting in nine Indo-European languages.

Centum-Satem division
The Centum-Satem division explains the evolution of PIE labiovelarvelars, and palatovelar consonants.

  • Labiovelar consonants include [kw, gw, xw, ngw] which are pronounced like [k, g, x, ng] but with rounded lips.
  • Velars are consonants articulated with the back part of the tongue (the dorsum) against the soft palate (the back part of the roof of the mouth, known also as the velum). They include [k, g, x, ng].
  • Palatovelar consonants are articulated with the back part of the tongue against the hard palate. They include [k’, g’, x’, ng’]. For example, [k’] is pronounced as the k in keen.

The terms centum-satem come from the words for ‘one hundred’ in representative languages of each group. Please note that not all languages fall neatly into these categories.


Click here to see the complete Centum language tree.

It is believed that PIE had a pitch accent system. All words had only one accented syllable which received a high pitch. Stress could fall on any syllable of a word.

Unevenness of existing records and huge gaps in the chronology among Indo-European languages make the reconstruction of PIE grammar a difficult task. Discoveries of HittiteTocharian and Mycenaean Greek in the 20th century have made changes in the data base on which the reconstruction of PIE is based that in turn have modified existing views of PIE. .

Many of the older well-documented languages, such as Sanskrit, Greek, and Latin, have rich morphologies with clearly marked gender and number, as well as elaborately marked case systems for nouns, pronouns, and adjectives. Verbs in these languages also have elaborately marked systems of tense, aspect, mood, and voice, in addition to person, number, and gender. Reconstructed PIE is based on the assumption that it contained all the features found in attested languages. If a given language lacks a particular feature, it is assumed that the feature was lost or that it had merged with other features.

Modern Indo-European languages reflect the rich morphology of PIE to various degrees. For instance, Sanskrit, Greek, Latin, Baltic, Slavic, Celtic, Armenian have extremely rich morphologies. On the other hand, Germanic, Romance, Albanian, and Tocharian do not possess quite as many finely differentiated morphological features.

Nouns, pronouns and adjectives

  • Case
    Sanskrit had the most cases (8), followed by Old Church Slavonic, Lithuanian, and Old Armenian (7), Latin (6), Greek, Old Irish, Albanian (5), Germanic (5).
  • Gender
    The three genders (masculine, feminine, neuter) have survived in a number of Indo-European languages.
  • Number
    The three numbers (singular, dual, plural) survived in Sanskrit, Greek, and Old Irish. Vestiges of the dual number can be found in many other Indo-European languages.
  • Adjective-Noun agreement
    Adjective-noun agreement has survived in many Indo-European languages.

Reconstructed PIE verbs had different sets of endings tense/aspect, voice and mood in addition to person and number. :

  • Tense and aspect
    It is thought that the PIE verb system was aspect-based, although traditionally, aspect has been confused with tense. Although tense was not formally marked in PIE, most Indo-European languages define their verbal systems in terms of tense, rather than aspect. .
  • Voice
    PIE had two voices: active (e.g., The child broke the glass) and medio-passive which combined reflexive and passive voices (e.g., The child washed himself and The child was washed by his mother). In addition to the active voice, various Indo-European languages use the middle or the passive voices.
  • Mood
    It is hypothesized the PIE had four moods: indicative, optative, subjunctive, and imperative. Most of these moods exist in all Indo-European languages.
  • Person and number
    PIE verbs were marked for person (1st, 2nd, 3rd) and number (singular, dual, plural).

Word order
Less is know about the syntax of PIE than about its morphology. What is known about PIE word order, therefore, is a subject of conjecture and debate. It is thought likely that word order in PIE sentences was Subject-Object-Verb. This word order is found in Latin, Hittite, Vedic Sanskrit, Tocharian, and to some extent in Greek.

The comparative method enables linguists to reconstruct a basic PIE vocabulary referring to many common elements of their culture. This basic vocabulary is not uniformly attested across all Indo-European languages which suggests that some words may have developed later or were borrowed from other languages. Among words that are reliably reconstructed are words for day, night, the seasons, celestial bodies (sun, moon, stars), precipitation (rain, snow), animals (sheep, horse, pig, bear, dog, wolf, eagle), kinship terms (father, mother, brother, sister, son, daughter), tools (axe, yoke, arrow).

Click here to explore cognates in different Indo-European languages


Written records for various Indo-European languages have different date lines. The table below shows when the first written records appeared, what writing system was used, and which writing systems are used by the languages today.

Earliest written records
Earliest writing system
Current writing system(s)
Armenian 500 AD Armenian alphabet Armenian alphabet
Albanian 15th century AD Greek alphabet Modified Latin alphabet
Greek 1,400 BC Linear B Greek alphabet
Celtic 4th century AD Ogham alphabet Modified Latin alphabet
Baltic 16 th century AD Modified Latin alphabet Modified Latin alphabet
Romance 6th century BC Latin alphabet, adapted from Etruscan Modified Latin alphabet
Germanic 3rd century AD runic Futhark Modified Latin alphabet
Slavic 9th century AD Old Church Slavonic alphabet Cyrillic and Latin alphabets
Indo-Aryan 3rd century BC Brāhmī script BengaliDevanāgarī, Gujarati, OriyaGurmukhiSinhalaKaithi,modified Perso-Arabic
Iranian 9th century AD Perso-Arabic script Modified Perso-ArabicArabic, modified Cyrillic, modified Latin.
Tocharian 500-1,000 AD Brāhmī script


Language Difficulty
questionHow difficult is it to learn Indo-European Languages?
Indo European Languages range from Category I  to Category II in terms of difficulty for speakers of English.

44 Responses to Indo-European Language Family

  1. Anonymous

    Just thought you should know that there’s a mistake in the Celtic section: ‘Scots’ should be ‘Scottish Gaelic’. The link also goes to the Scots Ethnologue page, not the Scottish Gaelic one.

    • Irene Thompson

      Thank you for your comment. We are working on updating all the language pages, and will definitely act on your input.

  2. Krip Borah

    Amongst the Indo-European language and Sanskrit the language Assamese is not shown. This is a distinct different east Indian language having different pronunciation practically non existent in other languages other than in certain form in German with the word ending like in loch, doch, noch etc, where the guttural/s/ is spoken very softly giving a new type of phonetic expression

    • Irene Thompson

      Thank you for your comment. We will eventually include more languages such as Assamese.

  3. PermReader

    Everybody knows that the Greek`s alphabet appeared about the 8th century BC.Compare the author`s 14 century BC.

    • Irene Thompson

      Thank you for the correction.

    • Irene Thompson

      What is your point? Please clarify. And make sure you keep history of the language distinct from history of writing.

      • Ruvell

        What I find so innitesterg is you could never find this anywhere else.

    • Christo Tamarin

      Probably, the Linear B script was meant for Greek.

    • Ntoum Ntoum

      The earliest writing system for Greek used in 1450 BC should be Linear B, not Greek alphabet

      • Irene Thompson

        Thank you for pointing this out. We made the change.

  4. Ute (expatsincebirth)

    Thank you for this very interesting post. May I point out that German is not only spoken in Germany but also in Austria, Switzerland, North-East Italy, parts of Alsace-Lorraine, Luxembourg, Belgium and Liechtentstein? And Italian is spoken also in the Vatican City, Southern Switzerland, the Repubblica di San Marino, Slovenian Istria and Istria County…

    • Irene Thompson

      We already mentioned all these countries on our German page under Status.

  5. Christo Tamarin

    1. The number of speakers of Russian is much more than 9 million.

    2. In the list of South Slavic languages, Bulgarian is omitted.

    3. Serbian, Croatian and Bosnian are listed in the list of South Slavic languages. Actually, there was one language until recently. Now, languages are split by political reasons only. And, there is a “Montenegrian” as well.

    4. The earliest writing system attested for Iranian languages is Cuneiform for Old Persian.

    • Irene Thompson

      Yes, indeed. This page needs lots of additional work. Do you want to edit it?

      We will give you credit. Ditto for Bulgarian and Slavic Branch.

    • Terry

      I am sorry but I must correct you. Serbian, Croatian and Bosnian existed much longer before the 2nd WW when they were artifically united for the political reason. And, they have different development in the spoken and written language for centuries. So your statement isn’t right.

  6. Pingback: Czy esperanto ma swoją kulturę? | No Easy Answers

  7. soheil

    Iranian languages were written in pahlavi script as early as second century bc, and in the modified elami far before that

  8. Vladimir

    9 million speakers of Russian in Russia is far from the truth!!! Please correct this article.

    • Irene Thompson

      Thank you. Our Russian language page says “Russian is primarily spoken in the Russian Federation and, to a lesser extent, in the other countries that were once part of the Soviet Union, as well as in Eastern Europe. According to the 2002 census, there were 142.6 million speakers of Russian in the Russian Federation alone. In addition, Russian is spoken in Canada, China, Finland, Germany, Greece, India, Israel, and the U.S. Russian is listed among the ten most-spoken languages in the world (Wikipedia).” We will correct the entry in the Indo-European Language Family.

    • Irene Thompson

      We have made the correction. Thank you for pointing out the mistake.

      • antonio pardo

        Sorry, but you miss something about the Galician, it was spoken before the portuguese, before the 1200 was already spoken and Alfonso X(The Wise) A king of Leon, Castille and Galicia almost make it the oficial lenguage of his Kingdon: “The Cantigas de Santa María are typical of cantigas in using the Galician-Portuguese language, the “lingua franca” so to speak of Iberian medieval poetry, and the ancestor of the present Portuguese & Galician languages. The Cantigas de Santa María encompass a huge number of themes, make reference to a variety of countries & places, and basically serve to tie everything in medieval life to miracles of the Virgin Mary. Thus they represent Alfonso’s ideals of vernacular language expression and human devotion.
        In the present day more than 5 Million people speak Galician
        Galicia, Venezuela and Argentina (Galician people) who emigrated to those countries also some other South America.

        • Irene Thompson

          According to Ethnologue, Gallego is spoken by2,340,000 in Spain (European Commission 2012 with a population total for all countries: 2,355,000. Wikipedia gives about the same data. Where did you get the 5 million number?

  9. ashoke dutta

    Thanks for givinig us a brief outline of pie is more informative and helpful for research.ashoke dutta

  10. Man Yee

    I wondered from which branch does Turkish come? It geometrically surrounded by Greece, Albania, Bulgaria, Iran, Syria and Armenia!

  11. RussellW

    To clarify some earlier comments. The Greeks used an entirely different script during the Bronze Age, it wasn’t related to any modern alphabet and disappeared entirely during the so called Dark Age. The Greeks were illiterate for centuries until the adoption of the Phoenician alphabet.

    The map showing the distribution of Germanic languages in Australia requires updating, it suggests, incorrectly, that English is not spoken in the north of the country.

    • Irene Thompson

      Thank you for the comment. Do you have a suggestion for a more accurate map of language distribution in Australia?

      • RussellW

        Yes, color the entire country “Germanic'”.

        Certainly English is a second language for some Indigenous communities (as presumably in some areas of the US) however it’s misleading to depict Northern Australia as not a Germanic speaking area.

        • Irene Thompson

          Yes, English is the national language of Australia, and the dominant language of the country. Most aboriginal tribes/clans speak English as a first or second language. Most of the aboriginal languages are severely endangered or on the brink of extinction. Let’s hurry up the process and paint the entire continent Germanic.

  12. Madelyn Verona

    Thank you Irene for such a brief, informative article. It is obviously meant as an overview and provides the necessary linguistic history of this extensive language family. It was obviously not meant to be a thesis for a doctorate!
    To those who want to correct population figures, I hope you yourself took into account all the babies born in the last 24 hours or your numbers would be inaccurate also. Since 2002 statistics have been quoted as a source in this article and the critic’s comment was written in 2015 (13 years later) I rest my case.
    To those of you who wish to bring up historical evidence of Greek writing, I suggest you refer to a thesaurus to check whether alphabet and language are synonyms.
    To those of you who want to point out some other country where a particular language is spoken, I ask you to turn on the news and see emigrants settling in new countries every day due to war, natural disasters, political unrest, etc. all far more prevalent today than even 20 years ago.
    So, to those immature critics who only want to get their name in print or delight in sharing their ignorance, I would say this: if you are such an expert, why are you even reading this page as you obviously know it all anyway and why don’t you go and write your own page so people like you can write ridiculous comments on your website.
    Unfortunately I have not had the privilege of meeting or even having hear of the author of this article, but I appreciate the work she has put in to educate the masses and I apologize for the immaturity and ignorance of a few.

    • Irene Thompson

      Thank you for your comments. This website was meant as a rudimentary introduction, not a comprehensive resource on all aspects of the languages included. Anyone who uses online resources should know that the information contained in them evolves continuously and becomes obsolete (or partly so) as soon as it appears on the screen.

    • Irene Thompson

      Thank you for the moral support. This is a tough job.

  13. Tatjana Davila

    South Slavic languages exist but South Slavs do not. All these people are of the same origin. They preferably carry Haplo-group(I2a)in compare with Slaves R1a. They are probably descendants of old Illyrians, Dacians to less extent Thracians because now days Bulgarians carry Greeks’ and Albanians’ Haplo-group E1b1b and Near and Mead East Haplo group J. For now, Illyrians’ language(s) have disappeared, Dacians’ has survived in Romanian language at least in 156 old Dacians’ words. Albanians share genes with the Greeks but have 156 old Dacians’ words in language but do not share Haplogrup I2a with others Balkan people. So how Albanians can be descendant of Illyrians. This theses is wrong and should be revised. Is the Old Dacians’ language Indo-European Language?

    • Irene Thompson

      Thank you for the excursus. There is some consensus that Old Dacian was an Indo-European language that may have originated in Southern Russia. See more at
      The term Southern Slav most commonly refers to current geographic distribution without reference to the genetic makeup of the people who inhabit the southern part of the Balkans.

  14. Jack Haywood

    Actually, Proto Germanic had five cases: nominative, genitive, accusative, dative and instrumental, although the latter two were combined in pronouns.

    • Irene Thompson

      Thank you for the correction. We appreciate your help.

  15. comment-645

    so much wonderful information on here, : D

  16. William Davis


    I am a 57 year old senior in college and currently enrolled in a Language and culture class. Your paper was a tremendous help in clarifying this subject. (Thank you!) Also, I loved your responses, reactions, corrections and updates. You represented the educational process at a high level. Education is constant, evolving and never ending.

  17. Sayed

    The 4 languages – Dari, Persian, Farsi and Tajik – are all the same. The only difference it has is the accent. just like american English and british english.
    I can understand what a Persian speaker says or what a Tajik speaker says. Because I speak Dari, which is the same as Persian and also Tajik.

  18. Pingback: Similarities and differences between English and French | Lingua Translations

Add a Comment