You are here: Home » OED News » Newsletter archive » March 2002 newsletter » Special characters
Search the site | Contact us
 
March 2002 newsletter

The OED's special characters

The subject of this piece is not in fact the staff of the Dictionary, but rather the characters and symbols which are non-standard in the English language, and known to typographers as 'special sorts' - or, less esoterically, the thousands of alphabetic, mathematical, and assorted obscure symbols that pepper the text of the OED.

Well, I had to find some way of grabbing your attention on a topic not generally known for its entertainment value!

The special characters in the OED data are recorded in code form known as 'character entity references', each symbol having its own unique code. Some of the more commonly occurring characters in the Dictionary, such as é, æ, â, etc., can be displayed in HTML (hypertext markup language, the code used to mark up web pages) - which is how you come to be reading them now if you are looking at the online version of OED News. But what about characters that don't belong to standard HTML, such as characters from non-Roman alphabets, mathematical symbols, and the like?

Before OED Online was launched, it was decided that the special characters should be displayed as inline GIF images. 'GIF', by the way, is shorthand for Graphics Interchange Format. Downloadable fonts are available on many web sites for the purpose of displaying special characters, but we wanted to make the OED Online site as user-friendly as possible. GIF images, despite not being cutting-edge technology, have the advantage of taking up minimal amounts of memory, and are quick to download. Also, the ALT text feature (the HTML coding responsible for displaying a text description of a graphic on a web page) means that users with older versions of browsers which are unable to display graphics can identify the character at least by its entity name.

Once we'd made the decision on how to display the special characters, we set about analysing the Dictionary data to identify exactly which characters had to be produced. Then the task of commissioning approximately 2,500 hand-drawn GIF images lay ahead.

Going online posed a challenge for displaying special characters in the sense that there is a conflict between the way the data is described for the web, on the one hand, and in the OED text itself on the other. More specifically, a special character is listed in the OED data table only once, but it can in theory exist in many forms in OED Online, depending on where in a particular entry the character appears: headword, etymological form, and so on. In fact, there are at least 12 potential areas in which a character may appear within an OED entry.

To illustrate this, take a look at the revised OED entry for meliphane (noun).

a picture of the OED entry, meliphane

Note that Greek text occurs twice in the entry - as a form in the etymology, and in the smaller-type 1868 quotation within the etymology. The size, the font, and even the colour of a character is dependent on the context in which it appears. So, in this entry, we can say that the character 'Greek lower-case epsilon with acute accent' occurs twice, as a 12-point character and as an 11-point character (which is blue in the online version).

What we wanted to achieve for OED Online was a digital approximation of the printed text of the Second Edition, and the goal was to commission a GIF image for every special character, no matter how infrequently it appeared in the data. The scruple symbol for a scruple, the apothecaries' unit of weight, for instance, might reasonably be expected to occur in a definition or quotation, but never in a headword or etymology. In fact, it occurs only once - in definition text. Pinpointing exactly where each special character occurred in the Dictionary was carried out by a lengthy process of analysis. This involved cross-checking data from a program which had parsed the characters into 12 different 'sets' against the entities in the OED database. Eventually, we arrived at a figure of approximately 2,500 special characters to be produced as GIFs out of a potential 12,000.

Agfa Monotype, the specialist typographers, were commissioned to produce the GIF images for the OED. They worked with great patience to come up with images we were happy with; and HighWire Press, OED Online's developer and host, implemented the GIFs on the web site, applying the finishing touches just a few weeks before the site went live.

But the work didn't stop there. Each quarterly update brings another batch of new entities to the Dictionary, and the process of identifying new special characters for displaying on the web site is ongoing. As of March 2002, OED Online contains around 2,950 special character GIFs.

And counting...