Collins Report on Gender Language in English

In Discover Truth by BiblicaLeave a Comment

As part of their work on the latest update to the NIV, the Committee on Bible Translation commissioned one of the most comprehensive studies of gender language in English, drawing on the 4.4-billion word Collins Bank of English, the world’s largest database of English language. Below is a summary of their findings, which helped inform the translators as they worked on the NIV.


by Dr. Douglas Moo, Chair of the Committee on Bible Translation, September 2010

Prior to the update of the New International Version of the Bible (NIV) for 2011, all previous Bible translation efforts have been hampered by the lack of accurate, statistically significant data on the state of spoken and written English at a given time in its history. Beyond appealing to traditional style guides, all that translators and stylists have been able to do is rely on their own experiences and others’ anecdotal evidence, resulting in arguments such as, “I never see anybody writing such-and-such,” or “I always hear such-and-such,” or “Sometimes I read one thing but other times something else.”

As part of the review of gender language promised at the announcement of the latest update to the NIV on September 1, 2009, the Committee on Bible Translation sought to remove some of this subjectivity by enlisting the help of experts. The committee initiated a relationship with Collins Dictionaries to use the Collins Bank of English, one of the world’s foremost English language research tools, to conduct a major new study of changes in gender language. The Bank of English is a database of more than 4.4 billion words drawn from text publications and spoken-word recordings from all over the world.

Working with some of the world’s leading experts in computational linguistics and using cutting-edge techniques developed specifically for this project, the committee gained an authoritative, and hitherto unavailable, perspective on the contemporary use of gender language — including terms for the human race and subgroups of the human race, pronoun selections following various words and phrases, the use of “man” as a singular generic and the use of “father(s)” and “forefather(s)” as compared to “ancestor(s).” The project tracked usage and acceptability for each locution over a twenty year period and also analyzed similarities and differences across different registers and varieties of English: for example, UK English, US English, written English, spoken English, and even the English used in a wide variety of evangelical books, sermons and internet sites.

Research of this type is just one tool in the hands of translators, and, of course, it has no bearing on the challenge of preserving transparency to the form and structure of the original text. But since its first publication in 1978, the NIV has always aimed not only to offer transparency to the original documents, but also to express the unchanging truths of the Bible in forms of language that modern English speakers find natural and easy to comprehend. And this is where a tool like the Bank of English comes into its own.

The summary that follows provides insight into the wealth of information that emerged from this program of research and the methods that were employed. We hope it will be of interest to scholars and lay people alike as they familiarize themselves with the updated text of the NIV.


The analysis of generic pronouns was facilitated by the development of a ground-breaking anaphora resolution grammar with built-in semantic tagging designed to track the relationship between pronouns/determiners and antecedents in citations drawn from all corpora. The anaphora resolution grammar yielded a higher proportion of positive (relevant) citations than has previously been possible using manual techniques and allowed researchers to fully exploit the immense scale and breadth of Collins’ corpus holdings.

Summary of findings

The study examined gender language in English concentrating on three specific areas of usage over a 20-year period from 1990 to 2009.

1. Generic pronouns and determiners

This part of the study considered the types of pronouns and determiners that are used to refer to indefinite pronouns (such as someone, everybody and one) and non-gender specific nouns (such as a person, each child and any teacher):

A. masculine (he, his, himself, etc.);
B. feminine (she, her, herself, etc.);
C. plural/gender-neutral (they, them, one, themselves, etc.);
D. alternative forms (s/he, him or her, his/her, etc.)

In all the varieties of English analyzed, plural/neutral pronouns and determiners account for the majority of usages. Between 1990 and 2009, instances of masculine generic pronouns and determiners, expressed as a percentage of total generic pronoun usage in general written English, fell from 22% to 8%.

e.g. ‘…when a person accepts unconditional responsibility, he denies himself the privilege of “complaining” and “finding faults.”‘

Instances of ‘alternative’ generic pronouns and determiners fell from 12% to 8%.

e.g. ‘Any citizen who wants to educate himself or herself has plenty of sources from which to do so.’

Instances of plural/neutral generic pronouns and determiners rose from 65% to 84%.

e.g. ‘If you can identify an individual who metabolises nicotine faster you can treat them more effectively.’

Figures for the other corpora analyzed in the study are broadly comparable with figures from the general written English corpus both in overall magnitude and in the general trend over time.

2. Mankind, man and synonyms

This part of the study considered the use of the terms man, mankind, humankind, humanity, humans, human beings, the human race and people when used to refer either to all humans or to smaller subsets of humanity. In all the corpora analyzed except Evangelical English, when all instances are considered, people is by far the most frequent synonym, followed by humans.People and humans, however, are much looser synonyms when the focus narrows to references to the human species as a whole. In these instances, man, mankind, humankind, humanity, the human race and human beings are more precise.

Of these more precise alternatives, man, humanity and mankind are the most frequent synonyms in the general written English, general spoken English, US written English and US spoken English corpora. Manaccounts for between 22.8% and 30.3% of relevant citations, humanity accounts for between 21.8% and 32.7% of relevant citations, and mankind accounts for between 15.9% and 17.8% of relevant citations Humankind, Human beings and the human race are comparatively infrequent.

In Evangelical English, man is the synonym that occurs most frequently, accounting for more than half of all genuinely collective occurrences. Mankind accounts for 14.2% of genuinely collective occurrences andhumanity accounts for 11.3% of genuinely collective occurrences. Humankind, human beings and the human race are, as in the other corpora, relatively infrequent.

In all the corpora except Evangelical English, man and mankind have become steadily less frequent (with some fluctuations) over the 20-year course of the study, tapering off to very similar levels in current usage (approximately 3 citations per million words for man,and approximately 2 citations per million words formankind.)

In the Evangelical corpus, the frequency with which all of the synonyms tracked in this part of the study occur is markedly higher than it is in the other corpora, most likely due to the nature of the subject matter addressed in Evangelical books and sermons.

When man, mankind and their synonyms occur with follow-on pronouns (e.g. ‘Clinical ecology shows us how to restore the balance between man and his environment,’ ‘When the Almighty himself condescends to address mankind in their own language…’), man is almost invariably followed by the pronoun he,humanity is typically followed by the pronoun it, and mankind — on the rare occasions where it is used with a follow-on pronoun — is generally followed by the pronouns it or they.

3. Forefather, ancestor and father

This part of the study considered the use of the terms forefather(s), ancestor(s) and father(s) in the sense ‘a person/people from whom one is descended’ or ‘the founder(s) of a movement/nation etc.’ Frequencies have fluctuated, but it is evident thatancestor is significantly more frequent than forefather in each corpus and each period. The frequency of forefather is higher in Evangelical English than in the other corpora, but still much less frequent than ancestor.