General Editors: Thomas C. Crochunis and Michael Eberle-Sinatra  
VERSION 2.0 OF THIS SITE WILL BE UNVEILED AT A NEW URL ADDRESS IN THE FALL OF 2008 (Last update: April 2008)
 


Flanders, Julia. 'One Encoding Example from the Brown's Women Writers Project Encoding of Drama.' British Women Playwrights around 1800. 1 December 1998. 5 pars. <http://www.etang.umontreal.ca/bwp1800/essays/flanders_encoding.html>


Copyright © Contributor, 1998-2008. This essay is protected under the copyright laws of the United States and the Universal Copyright Convention. Publication (print or electronic) or commercial use of any of the copyrighted materials without direct authorization from the copyright holder is strictly prohibited.

1.

The following samples are taken from Margaret Cavendish's The Unnatural Tragedies, 1662.

First of all, a disclaimer: these excerpts from Cavendish's Unnatural Tragedie are from a research draft and may contain errors of encoding or typography. They are presented here only for the sake of illustration and discussion, not as models of perfection.

2. Second, a comment on the encoding system generally. The Women Writers Project, from whose textbase these excerpts are taken, encodes its texts using the Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange. We have made a number of adaptations to the specifications given in TEI, using the provision for adaptation provided by TEI. Our encoding system is thus both TEI-conformant and slightly different from what is documented in the Guidelines themselves. These adaptations were necessitated by the fact that early modern women's writing departs in many respects from the patterns and structures anticipated in TEI. Our feeling is that most text encoding projects who wish to encode at the level of detail illustrated here will probably find themselves wanting to alter TEI somewhat, but will find it an invaluable aid in conceptualizing the issues and providing a framework within which to work.
3.

This excerpt omits the TEI header and the frontmatter for the text; it consists of a single page (page 325) together with the minimal necessary framework to make it a self-sufficient piece of encoding. Here are some specific points to note about the encoding illustrated here:

The TEI header
  • The TEI header is a required part of any TEI-conformant document, and it provides various kinds of information about the document itself, its provenance and creation, the methods of encoding used, its editorial and transcriptional principles, and the source upon which it is based. This information is all crucial in substantiating the electronic text as a piece of scholarly work and a source to be trusted. As you can see from the example, the WWP's header can become quite extensive; however, for individuals wishing to prepare texts more informally a much simpler header is possible.
  • The most important parts of the header from the viewpoint of a scholarly edition are the <titleStmt> (which records the basic identity of the file: its title, author, and provenance), the <sourceDesc> (which records the identity of the source on which the transcription is based), and the <editorialDecl> (which records the editorial principles governing the preparation and treatment of the text). The more detailed and accurate these are, the more useful the text will be to a scholarly audience. The WWP stores a detailed editorial statement in a separate file, which is referenced with the entity "&editorial;".
  • The <revisionDesc> is an essential part of the header from the viewpoint of project management, since it tracks the work done on a file. This can be especially helpful if several people are collaborating on the file over a long period of time, since details of decisions and changes can be hard to recall if not recorded.
  • The header can also be used to record renditional defaults (see below for more information) which indicate how different elements are formatted in the original. If you are recording this information in detail, using defaults can save considerable time since they save the separate encoding of renditional information on each element in the text.
4. The castlist
  • The WWP encodes the castlist as a separate <div> element, and uses it as a place to store unique identifiers for each character (for instance, "RUTMPE"). These identifiers are pointed to by corresponding id references on each speech, indicating who is speaking even in cases where characters appear under changed names. This helps the user locate all the speeches by a given character regardless of the spelling of the character's name.
5. The text sample
  • The text has been transcribed exactly from the source, including page breaks, line breaks, forme work, and any typographical errors or period spellings. Errors in the original are encoded using a <sic> element, which allows us to record both the original error and a corrected spelling.
  • Individual speeches are encoded using the <sp> element. The who attribute on this element is used to record the identity of the speaker, using a unique identifier (e.g. "RUTFRE"). These identifiers in turn point to entries in the castlist.
  • Long s is recorded using the entity reference, "&s;". Soft hyphens are indicated using the entity reference "&shy;".
  • Note that different kinds of stage directions have been categorized using the type attribute on the <stage> element, with values like "business" or "entrance".
  • To capture the renditional detail of the original text, we use the rend attribute. In order to pack into this single attribute all the complex information about alignment, font, capitalization, and so forth, we use a system of renditional ladders which consist of a sequence of keywords, each followed by an argument in parentheses. Renditional information can be stored as a set of default values in the header for the document, thus saving the labor of indicating over and over that all paragraphs begin on a new line, or that the entire text is in roman type. Where individual elements override these defaults, their special renditional features are indicated in their own rend attribute. Where renditional information can be stored on an existing structural element, we do so; in cases where a renditional shift is more or less decorative, we indicate it using the <hi> element.
  • The physical structure of the book (page breaks, catchwords, signatures) is encoded using elements which are specially designed to avoid overlapping with the textual structure of acts, scenes, and speeches (since such overlap is illegal in SGML). The <pb> element is used to mark page breaks, and to record the real pagination of the text. The <fw> element is used to encode various types of forme work, including the actual printed page number and the printed signatures and catchwords. The <milestone> element is used to record the true collational information for each page.

 

 

Julia Flanders
Brown University

Julia Flanders is the Textbase Editor and Project Manager of the Women Writers Project at Brown University, a research project creating a TEI-encoded textbase of pre-Victorian women's writing. She is also a member of the Executive Council of the Association for Computers and the Humanities, and has spoken and published on text encoding and issues in electronic editing. She holds undergraduate degrees from Harvard and Cambridge Universities, and is currently working on a PhD from Brown University, on editorial theory, electronic textuality, and the relationship between text and data.