Creagh Cole, 'Form and Content: Historical and Literary Texts on the WWW at SETIS': Paper for Virtual Histories, Real Time Challenges Seminar

Home: Conferences: Virtual Histories Index: Creagh Cole

© Electronic Journal of Australian and New Zealand History

Form and Content: Historical and Literary Texts on the World Wide Web at SETIS

By Creagh Cole, SETIS Project, Fisher Library, University of Sydney.

[email protected]

It is still commonly asserted that the internet lacks significant content and therefore is of limited value to the scholar and serious student. This seems to me to be rather self-fulfilling when it comes from scholars, researchers, librarians and archivists. Who is to supply the content? Who will support and nurture the internet sites devoted to creating and publishing quality primary research material in digital form to the World Wide Web? It does seem to me that in many ways we are still bewitched by the form of this new means of publication and communication and that as a result there is a lack of quality content and a lack of informed debate and discussion in this country about the issues surrounding content creation.

Even the enthusiasts (or perhaps, especially the “enthusiasts”) are unconcerned about content, other than as a means to demonstrate the “power of the medium”. There is a great deal of interest in tightly focussed or single issue “multimedia packages”, usually aimed at high school students (which perhaps means that the authors of such packages expect not to be judged too severely by generally accepted scholarly and bibliographic standards). There are also sites that will gladly link to other people’s work, generally at the item level . Of course, there is a need for these sites, but they are not the real “wealth creators”, to borrow a phrase from the economists. (I have, however, heard this kind of work referred to as “value-adding” which seems to be a peculiar use of the language!) Some sites may go further and simply appropriate the work of others directly and without acknowledgement. This would be only an irritant to the true creators except that important header, bibliographic and contextual information about the text and its digitisation also goes missing. But who is doing the real work of content creation, facing the really difficult decisions and labours of text encoding and quality image creation so that others might have digital material to work with further down the track?

I think we should consider content creation in terms of first and second order levels of work. Too often these are conflated and scholarly and library values are sacrificed in the process. It seems to me that everyone wants to do the “second order” work, i.e., work with existing content to create exciting new learning tools with movement and colour and hyperlinks, but very few are in a position to conduct the “first order” work of content creation. At the first level we should be concerned about the material itself, how best to digitise it so that the result will not only be of greatest utility to researchers, but also so that it will survive technological change, will be useful to researchers into the future. At the second level, we are concerned with the immediate application of the material.

At the first level, questions of accuracy, proof-reading, bibliographic checking, appropriate encoding and search functionality, image resolutions and so forth, enforce some rather difficult decisions upon the content creators, and there may be no immediate pay-back for these decisions. Is proper proof-reading of texts part of the process of creation? Of course it is, or should be, and there is no more important work in the process of digitisation, but it is expensive and slow. Do we scan our colour and grayscale images at 300d.p.i. or 400d.p.i.? If the latter, we almost double our file sizes to more than 20mb per file and how many of us can efficiently handle and manipulate these files. (Of course, most will simply create low resolution images which look good on the screen now – another sign that the authors have accepted the current state of technology as the standard. These low resolution images should be generated from archival quality digital images rather than be seen as the end-point of the creation process – to do otherwise is to conflate the levels again.) Too often we approach the problems of digitisation as if they were merely problems of getting material directly to the World Wide Web. That is, too often we conflate first order and second order decisions in digital creation.

This is all the more difficult because the “immediate outcome” is so important to attracting funding bodies and department decision-makers, and these “outcomes” are all second-order. But we are bewitched by the mere form of the World Wide Web when we ignore first order standards and constraints and jump straight to the second. Alternatively, in doing so, we have accepted that digitisation projects will have severely limited life spans, or that the products of these projects will never be serious scholarly resources. It is disturbing to me that even scholars, librarians and archivists are tempted by the new technology into believing that the old standards no longer apply, and that even the nation’s institutional custodians of primary source material find greater value in one-off “exhibitions” and show-cases of resources, rather than in the pursuit of long-term goals and standards in digital creation.

The Scholarly Electronic Text and Image Service (SETIS) at the University of Sydney Library, has been creating texts for the World Wide Web since 1996. These projects are, of course, limited by available resources and so are limited in scope, but the guiding aim in each has been to ensure that the texts are accurate, contain full bibliographic information and other metadata, and are encoded to ensure rich search functionality as well as long life for the texts created. The texts are marked up in a form of Standard Generalised Markup Language (SGML), a form resulting from the Text Encoding Initiative (TEI) which was devised specifically for humanitites source texts, rather than the form of SGML called HyperText Markup Language (HTML) which was devised purely to display documents on the World Wide Web, (and so in a sense ties the text directly to current technology). The sgml-encoded texts in the SETIS collections are then able to be searched on a relatively rich tag set, and the results filtered to html for display.

The projects include the Australian Literature Database, which contains currently, one hundred Australian literary, political and historical texts from the 18th century until the early twentieth century. Many of these texts have been donated from other institutions in a variety of formats. All of them are now in a common sgml-format, fully searchable and deliverable, and are part of a process of continual revision and correction. Texts otherwise available only in major research collections are now available world wide. I have on more than one occasion sought contributions from content providers around the country as this collection is well placed to act as a repository for digital files which may otherwise not survive their immediate purpose, but have had very little response. If you have Australian literary and historical texts in digital form please consider depositing them with the SETIS collection. All contributions will be fully acknowledged as part of the header descriptive information, and all will be encoded according to the Guidelines for Text Encoding and Interchange.

Other projects include the Professor John Anderson papers, works in systematic botanical classification by Joseph Maiden and Ferdinand von Mueller, electronic journals and scholarly editing projects by adademic staff at the university. There is a great variety in these projects and different values and interests come into play for each of them. What unites them is a concern that we get it right in the first place, to learn as much from the process as possible so as to improve our skills in content creation and to ensure that this material exists in the future for further applications, as well as for the advances in information technology yet to come. Quality content will always be in short supply, as the people wanting to use it (either as immediate consumers or as providers) will always outnumber those creating it in the first place.

For more information about the SETIS projects, or to contribute texts to the collection, contact me or visit the web site at