|
 |
 |

| Flanders, Julia. 'One Encoding Example from the Brown's Women Writers Project Encoding of Drama.' British Women Playwrights around 1800. 1 December 1998. 5 pars. <http://www.etang.umontreal.ca/bwp1800/essays/flanders_encoding.html>


|
Copyright © Contributor, 1998-2008. This essay
is protected under the copyright laws of the United States and
the Universal Copyright Convention. Publication (print or electronic)
or commercial use of any of the copyrighted materials without direct
authorization from the copyright holder is strictly prohibited.
 
|
| 1. |
The following samples are taken from Margaret Cavendish's The Unnatural Tragedies, 1662.
First of all, a disclaimer: these excerpts from Cavendish's Unnatural Tragedie are from a research draft and may contain errors of encoding or typography.
They are presented here only for the sake of illustration and
discussion, not as models of perfection. |
| 2. |
Second, a comment on the encoding system generally. The Women Writers Project,
from whose textbase these excerpts are taken, encodes its texts
using the Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange. We have made a number of adaptations to the specifications given in TEI, using
the provision for adaptation provided by TEI. Our encoding
system is thus both TEI-conformant and slightly different from
what is documented in the Guidelines themselves. These adaptations
were necessitated by the fact that early modern women's writing
departs in many respects from the patterns and structures anticipated
in TEI. Our feeling is that most text encoding projects who
wish to encode at the level of detail illustrated here will
probably find themselves wanting to alter TEI somewhat, but
will find it an invaluable aid in conceptualizing the issues
and providing a framework within which to work. |
| 3. |
This excerpt omits the TEI header and the frontmatter for the text; it consists
of a single page (page 325) together with the minimal necessary
framework to make it a self-sufficient piece of encoding. Here are some specific points to note about the encoding illustrated here:
The TEI header
- The TEI header is a required part of any TEI-conformant document, and it provides
various kinds of information about the document itself, its
provenance and creation, the methods of encoding used, its
editorial and transcriptional principles, and the source
upon which it is based. This information is all crucial in
substantiating the electronic text as a piece of scholarly
work and a source to be trusted. As you can see from the
example, the WWP's header can become quite extensive; however,
for individuals wishing to prepare texts more informally
a much simpler header is possible.
- The most important parts of the header from the viewpoint of a scholarly edition
are the <titleStmt> (which records the basic identity of the file: its title, author, and provenance),
the <sourceDesc> (which records the identity of the source on which the transcription is based),
and the <editorialDecl> (which records the editorial principles governing the preparation and treatment
of the text). The more detailed and accurate these are, the
more useful the text will be to a scholarly audience. The
WWP stores a detailed editorial statement in a separate file,
which is referenced with the entity "&editorial;".
- The <revisionDesc> is an essential part of the header from the viewpoint of project management,
since it tracks the work done on a file. This can be especially
helpful if several people are collaborating on the file over
a long period of time, since details of decisions and changes
can be hard to recall if not recorded.
- The header can also be used to record renditional defaults (see below for more
information) which indicate how different elements are formatted
in the original. If you are recording this information in
detail, using defaults can save considerable time since they
save the separate encoding of renditional information on
each element in the text.
|
| 4. |
The castlist
- The WWP encodes the castlist as a separate <div> element, and uses it as a place to store unique identifiers for each character
(for instance, "RUTMPE"). These identifiers are pointed to by corresponding id references on each speech,
indicating who is speaking even in cases where characters
appear under changed names. This helps the user locate all
the speeches by a given character regardless of the spelling
of the character's name.
|
| 5. |
The text sample
- The text has been transcribed exactly from the source, including page breaks,
line breaks, forme work, and any typographical errors or
period spellings. Errors in the original are encoded using
a <sic> element, which allows us to record both the original error and a corrected spelling.
- Individual speeches are encoded using the <sp> element. The who attribute on this element is used to record the identity of the speaker, using
a unique identifier (e.g. "RUTFRE"). These identifiers in turn point to entries in the castlist.
- Long s is recorded using the entity reference, "&s;". Soft hyphens are indicated using the entity reference "­".
- Note that different kinds of stage directions have been categorized using the type attribute on the <stage> element, with values like "business" or "entrance".
- To capture the renditional detail of the original text, we use the rend attribute. In order to pack into this single attribute all the complex information
about alignment, font, capitalization, and so forth, we use
a system of renditional ladders which consist of a sequence
of keywords, each followed by an argument in parentheses.
Renditional information can be stored as a set of default
values in the header for the document, thus saving the labor
of indicating over and over that all paragraphs begin on
a new line, or that the entire text is in roman type. Where
individual elements override these defaults, their special
renditional features are indicated in their own rend attribute. Where renditional information can be stored on an existing structural
element, we do so; in cases where a renditional shift is
more or less decorative, we indicate it using the <hi> element.
- The physical structure of the book (page breaks, catchwords, signatures) is encoded
using elements which are specially designed to avoid overlapping
with the textual structure of acts, scenes, and speeches
(since such overlap is illegal in SGML). The <pb> element is used to mark page breaks, and to record the real pagination of the
text. The <fw> element is used to encode various types of forme work, including the actual
printed page number and the printed signatures and catchwords.
The <milestone> element is used to record the true collational information for each page.
|
| |
Julia Flanders
Brown University
Julia Flanders is the Textbase Editor and Project Manager of the Women Writers Project at Brown University, a research project creating a TEI-encoded textbase of pre-Victorian
women's writing. She is also a member of the Executive Council
of the Association for Computers and the Humanities, and has spoken
and published on text encoding and issues in electronic editing.
She holds undergraduate degrees from Harvard and Cambridge Universities,
and is currently working on a PhD from Brown University, on editorial
theory, electronic textuality, and the relationship between text
and data. |
|
|
|
|
|