Tower of Babelfish

Choosing your Vocabulary

Vocabulary Size

Now you have an incredibly efficient way to memorize words, you know how to teach yourself each word, and you know what each word sounds like.  How do you choose which words to learn, and how many words do you need to learn?  You could memorize a dictionary, but it will take you years.  English has around 250,000 words (depending upon how you count them).  Other languages will be similarly gigantic.

Here, I’ll quote Dr. Alexander Arguelles, an American professor and polyglot currently working in the Department of Applied Linguistics at the Regional Language Centre of the South East Asian Ministers of Education Organization:

“The maddening thing about these numbers and statistics is that they are impossible to pin down precisely and thus they vary from source to source. The rounded numbers that I use to explain this to my students I usually write in a bull’s eye target on the whiteboard […]:

-250 words constitute the essential core of a language,those without which you cannot construct any sentence.

-750 words constitute those that are used every single day by every person who speaks the language.

-2500 words constitute those that should enable you to express everything you could possibly want to say, albeit often by awkward circumlocutions.

-5000 words constitute the active vocabulary of native speakers without higher education.

-10,000 words constitute the active vocabulary of native speakers with higher education.

-20,000 words constitute what you need to recognize passively in order to read, understand, and enjoy a work of literature such as a novel by a notable author.”

Let’s set aside the 20,000 word possibility for a bit and talk about your goals.  What is fluency, anyways?

Fluency and Goal Setting

The goal I like to suggest to students is C1 fluency according to the CEFR (Common European Framework of Reference for Languages).  I like the CEFR because it’s testable, it works across every language, and it means a lot more than just saying “I’m fluent”.

The CEFR is a framework for deciding how fluent a given student is in a language.  Students progress from A1 (true beginner) to A2 (elementary, can talk about themselves and ask some personal questions), B1 (can talk about most common topics, discuss past events and future plans), B2 (can handle some complicated texts and talk about most topics without much difficulty), C1 (can handle a wide range of situations without significant difficulties, read and write complex texts), and finally C2 (can understand nearly anything heard or read without difficulty, read and write very complicated texts, and understand subtle shades of meaning in all of these circumstances).  C1 is a reasonable goal for 1-2 years of study (or substantially less, if you know what you’re doing and/or have a lot of time for studying.)  It’s also very functional.  I’m living in Austria with C1 German, and I went to a German speaking university here for my masters’ degree.  If you decide you want to be more fluent, great.  I get a kick out of learning lots of languages (and it benefits my singing), so I aim for C1 fluency.  If you fall in love with one language in particular (for me, it would be Italian; I can’t wait to finally bump my Italian up to C1 or maybe even C2), then by all means, keep studying all the way up to C2 and beyond (CEFR doesn’t go past C2, but there’s a difference between someone who tests into C2 and a highly educated native speaker, and overcoming that gap takes a great deal of study).

For the purposes of this website, I’m going to call C1 “fluent” and make that our goal.  Make adjustments as needed if you have different goals!

Word Coverage and Frequency Lists

Going back to Dr. Arguelles’ numbers, he claims that you need 250 words to do much of anything.   Let’s look at this another way.  Let’s say you had just 10 words in the English language: the, to be, of, and, a, to, in, he, have, and it.  How much of any text would you recognize?  According to Dr. Paul Nation, 23.7%.  Those 10 words make up 0.00004% of the 250,000 words of the English language, and yet I use them so often that they make up nearly 25% of every sentence I write.

Broadening out to the 100 most frequent words, including words like year, to see, to give, then, most, great, to think, and there will bring you all the way to 49%.  Shocking, no?  Just 100 words and you’ll recognize nearly half of the content of every sentence?  Granted, the concept of ”to be” is quite complicated and has many forms - when I call to be “a word”, I’m really including am, are, is, was, were, etc.  But still, the idea that 100 words - now just 0.0004% of the 250,000 words of the English language will make up nearly half of every sentence is a big deal.  How far will this take you?

Broadening out to 1000 words gives you 70% text coverage.   This is absolutely a worth-while goal, and there are a number of words in this group that are not necessarily words a first-year language student would usually learn - while your colleagues in a community college French class are learning names of fruits, you’d be looking at words like education, to organize, property, committee, association, and frequent.  The first fruit I could find on the English frequency list was word #2001.  This is not to say that you shouldn’t learn fruit names, but you’re going to see the word “property” much more often than “grapefruit”.  If you were to pick one word to know first, I’d recommend “property”.

Where do the benefits stop?  This is a hard question to answer.  2000 words will give you 80% coverage.  Here’s an excerpt from an academic article with the 2000 most frequent words filled in:

If current planting rates are _____ with planting _____ satisfied in each _____ and the forests milled at the earliest opportunity, the _____ wood supplies could further increase to about 36 million _____ meters _____ in the _____ 2001-2015. The _____ _____ wood supply should greatly _____ _____ _____ , even if much is used for _____ production. (Sources: Tom Cobb, Paul Nation)

At 80% coverage, it’s clear that the topic has something to do with forests, trees and wood supplies, but it’s not particularly clear what’s going on.  If you can get to 95% coverage, you will get enough information from context to skip using dictionaries; however, you’re faced with diminishing returns.   Reaching 95% coverage for all texts involves much more vocabulary (nearly 12,500 words!) compared with 80% coverage.

Fortunately, the vocabulary of a language is highly dependent upon its context.  For a student of English planning on reading academic texts like this one, learning an additional 570 words (the Academic Word List) will boost his comprehension of academic texts from 80% all the way to 90%:

If current planting rates are maintained with planting targets satisfied in each region and the forests milled at the earliest opportunity, the available wood supplies could further increase to about 36 million __ meters __ in the period 2001-2015. The __ available wood supply should greatly exceed domestic requirements, even if much is used for energy production.

This text is much more readable and you might be able to guess the meanings of the missing words (cubic, annually, additional) without a dictionary.  While this will not help much for understanding Shakespeare, gossip magazines or crime novels, this student has massively boosted his comprehension of academic texts with a relatively small amount of work (relative, that is, to the four thousand additional words he would have to learn to reach 90% coverage across the board).

Vocabulary Strategy

The strategy, then, is to figure out your personal vocabulary needs, so you don’t waste time with vocabulary you never need.   Take the first 1000-2000 most frequent words in your language as a foundation, and then start customizing. Skim through a vocabulary book and check off any words you expect to need, based on your own career, hobbies and interests. In this way, if you’re a musician, you can skip directly to the music section and pick out most of the vocabulary you might need, and then, if you want to learn 30 words for pasta dishes, skip to the food section and pick out what you need from there.  Choosing your base vocabulary is one of the most fun parts of learning a new language; it’s like brain shopping: you get a big list of words and check off each word that you eventually want to stick into your brain.

As a time-saving step, I’ve made a base vocabulary list of 400 words based off of the English language that will transfer pretty readily (say, at least 80%) to any other language.  Start there, then move on to your frequency list. I’ve provided links to frequency lists and dictionaries in the language resources section of this website.

How to Learn a Language

The key components:

You’ll find details at each link above.