From Hull AWE
Jump to: navigation, search

A datum is a singular item of data. Datum originally meant 'something given'. The principal definition now, as given in OED (2012), is "A thing given or granted; something known or assumed as fact, and made the basis of reasoning or calculation". The meaning of data is "Facts, esp. numerical facts, collected together for reference or information".

Datum is a word in the singular form in Latin. The plural is data. (See -um in Latin.) Datum is a word rarely used in English - though data themselves are [itself is] essential for academic writing! The word 'datum' is rather like 'a singular statistic'.

In mapmaking and related subjects, datum is used to mean 'a base line'.

There is much discussion among those who care about such matters about how the English language should treat the word. Purists and people who can speak Latin prefer to say "the data are ...". Some support for this view comes from OED, which says (s.v. datum) that the plural is data. Other people, including many academics and perhaps most ordinary speakers, say "the data is ..." Burchfield's Fowler points out that there is a tendency for this to vary by subject: "In computing and allied disciplines [data] is treated as a singular noun." It appears that data in these areas is becoming a non-count noun, and is therefore correctly treated as a singular - and this is a sentence you may quote to show that there is some grammatical correctness behind your choice of "This data is ..."
Here, as so often, the writer must make a choice. First, think of your reader. If your lecturers are known to be purists - especially Latin-speaking purists - then say "the data are ...". If you know that the reader who is going to mark your essay has no time for this old-fashioned business, then write "the data is ..." If you don't know, then make up your mind. AWE would suggest that to treat data as a plural will never be wrong - though it may mark you as a purist.
The expression data set may be of use. It is current particularly among social scientists, who use it mean roughly '[separate] collections of data'.