Paul Scifleet, ''Virtual Utopia': a Finding Aid to the Records of the New Australia Co-operative Settlement Association', Paper for Virtual Histories, Real Time Challenges Seminar

Home: Conferences: Virtual Histories Index: Paul Scifleet

© Electronic Journal of Australian and New Zealand History

Virtually Utopia.

An Electronic Finding Aid to the Records of the New Australia Co-operative Settlement Association

    1. The New Australia Movement.

    Under the leadership of William Lane who founded the movement in 1892, a party of 241 left Australia on July 17 1893 and established the settlement of New Australia on the Tebicuary River, Paraguay. The utopian community in Paraguay eventually broke down, however descendants of the movement remain there till this day. The movement included many of Australia’s early labour leaders including politicians, journalists, unionists and writers. The records of the movement hold great historic and research value. A significant part of Australia’s early labour movement the story of New Australia occupies a singular place in our history. The records of the Association continue to be used frequently.

    Commencing at the Scholarly Electronic Text and Image Service (SETIS), Sydney University during November 1997 the Online Guide to the Records of the New Australia Co-operative Settlement Association is a prototype finding aid encoded using SGML and Encoded Archival Description standards.

    Electronic text and digital images for the project are located at SETIS, the Scholarly Electronic Text and Image Service at the University of Sydney Library.

    Officially opened in September 1996, the aim of SETIS is to facilitate and support textual study and research throughout the University of Sydney through the provision of information technologies and associated services and expertise. In addition to providing access to in-house and remote textual images databases such as, the American Poetry Full Text Database and Goethe’s Werke auf CD-ROM, SETIS undertakes text and image creation projects for Australian collections.

    The distinctive feature of these projects is their use of guidelines for electronic text encoding and interchange resulting from the SGML based Text Encoding Initiative. There are also projects for the creation of digital archival images, and the creation of SGML encoded archival finding aids that conform to guidelines for Encoded Archival Description.

Projects using the Text Encoding Initiative guidelines include:

  • Australian Literature Database

  • Labour History – 1st edition of Vere Gordon Childe’s How Labour Governs

  • Lectures of Professor John Anderson

Projects using the Encoded Archival Description guidelines include:

  • New Australia Project
  • Rex Armitage Collection
  • Alan E. Bax Collection

    Projects combining Encoded Archival Description and guidelines from the Text Encoding Initiative include:

  • New Australia Project – William Lane’s The Working Man’s Paradise

3. The New Australia Project

    The aim of the New Australia Project is to provide an electronic finding aid to the Records of the New Australia Co-operative Settlement Association using SGML and standards for Encoded Archival Description.

    The Guide will incorporate links to digital images of items from the collection which are being created to ensure ongoing preservation and extended access to these heritage materials. The project will also include a digital version of Robert Samuel Ross’ copy of William Lane’s The Workingman’s Paradise; this edition includes the marginalia of Robert Ross and Dame Mary Gilmore.

    The project methodology aims to incorporate principles underlying the development of standards for digital libraries and provides a flexible environment for trailing related standards and technologies, such as TEI, E-bind, digital images and formats for online presentation. Included in these are standards supported by the Library of Congress, the Society of American Archivist and the World Wide Web Consortium (W3C).

    The online finding aid provides a logical (or ordered) sequence to a disparate and dispersed collection of materials that share their origins in the New Australia Movement. Records documenting the Association are held in several institutions and originate from a number of diverse sources. Some of the holding institutions that are participating in this project include the State Library of New South Wales, University of Sydney Library, City of Sydney Library and the National Library of Australia. The records in custody have been acquired over a number of years and include both donations and purchases. Donors of the materials include Dame Mary Gilmore, W.G. Spence, Robert Samuel Ross, Walter Alan Head and the Spiers family.

    The collection includes printed materials, manuscripts, correspondence, photographs, sketches, maps, membership lists and membership cards and associated realia. (For example, there is cash from Paraguay and a Yerba Mate (tea) strainer donated to the Mitchell Library by Dame Mary Gilmore.)

    While in some institutions items from the New Australia Co-operative Settlement Association form discrete units which have been catalogued individually, in others they are represented only as very small parts of much larger collections such as the Papers of Dame Mary Gilmore or Papers of W.G. Spence.

4. Virtually Utopia

    The result of this work will be a digital archive of the New Australia Movement with images linked in context to their descriptions. It will include references to the location of original materials, a collection summary and an administrative history of the Association.

    Part of the attraction of bringing together descriptions of materials from the New Australia Movement is that it makes it possible to provide information about the records in one finding aid, in an ordered sequence, that does not interfere with the control of items within the custodial institution. The result of this will be improved and wide access to resources currently available only in major research collections.

    A further benefit will be in the digital preservation of deteriorating print collections. Whenever possible digital images of items are being made to ensure one complete preservation set is maintained. Encoded Archival Description allows for the description of the full range of archival holdings including electronic documents. Links to digital images are embedded in the finding aid along with descriptions for the digital object.

    The information can be searched, manipulated and presented across time in a variety of formats including, online, CD-ROM and print.

The Finding Aid will be made freely available over the World Wide Web

5. Signposts

    By April 1997, when research for this project began, the outcomes of a number of early initiatives to provide networked access to the collections of cultural institutions were pointing the way towards sustainable standards such as Encoded Archival Description.

    The signposts outlined below are based on notes taken while attending conference and workshop meetings over the last two years, commencing with the annual Australian Museum’s Online (AMOL) Conference which was held in April 1997. While the list is not exhaustive it encompasses the primary reasons for undertaking an investigation of standards for online delivery:

    Need to provide a cultural experience that works:

    The information explosion of the World Wide Web offered a lot of promise, yet the environment was dominated by pamphleteers and publishers who were quick to capitalise on the medium’s advertising potential. The earliest web sites developed by cultural institutions were passive and reflected the practice of the dominant community by providing simple brochures for information services

    While a few institutions were providing galleries of images these included no significant links to curatorial records. Opportunities for interactivity were limited to providing the reader with an email address for requesting further information.

    The Library of Congress’ pointed the way by showing how even this simple level of interactivity could be enlivened by adding the very successful component of ‘Ask a curator?’ to the images it displayed under the American Memory Project.

    Signpost: We can’t just expect people to connect to our sites must provide an online experience that reflects and extends the experience of the institution itself.

Need to be seen as an island of trust on the WWW (Information in context is an asset)

    Amateur sites were also abundant but again, they were found wanting. These sites often providing access to images of heritage materials out of context. There were no statements of authority or responsibility; no provenance record; no indication of the location of the original materials and no other supporting documentation (e.g. the provision of copyright statements).

    In short, a researcher could not find the rich layers of useable knowledge usually provided to establish the authenticity and evidential value of an object over the Internet.

    It is the contextual information about a collection or object that distinguishes and values items held in our cultural institutions: The descriptions, research notes, guides and records of authenticity and provenance that are used to tell the stories of our collections.

    Signpost: We need to maintain the unique identity of our institutions in the virtual world by providing access to the ‘metadata’ that supports heritage materials.

Need to provide access to (re) usable information resources

    While reporting on the experience of Canada’s Cultural Heritage Information Network at the Australian Museumís Online (AMOL) conference in 1997 , Lynn Elliot Sherwood, noted that a successful internet presence was dependent on putting core information to a number of uses. There must be scope for special exhibits to be linked to accessible heritage information and a range of value added services such as printed publications or content designed to meet curriculum standards.

Signpost: Information must be scalable and re-useable

Need to build an a sustainable infrastructure

    Recent rapid technological evolution has brought with it the high cost of administering new communications system. Generational change along with difficult data migrations and reformatting has and continues to be expensive. It often results in information systems that do not reflect or improve on long the established tools and resources that can be found within the institution. To establish a strong digital identity there is a need to build a sustainable technological infrastructure that recognises that it is the information we have always provided that is the core strategic asset. The Information technology infrastructure must be designed to incorporate tools that allow for the delivery of the most appropriate information. This development must include solutions to ensure that both the content and context of information we provide can be moved from one generation to the next.

    Signpost: A sustainable infrastructure to move information from one generation to the next

Need to provide a standards framework

    In looking for ways to present information there has been an abundance of proprietary systems available, all offering different directions to organisations, who are themselves distinguished by the different and varying levels of documentation they hold. In fact, in most institutions the level of documentation can vary from collection to collection. Often complex solutions have been employed with records encoded in document formats that range from proprietary word processing formats through to formatting languages such as RTF and HTML. In many case this has been complicated through the incorporation of complex database scripting. More recently Adobe System’s Portable Document Format has been added to this long list of diverse solutions.

    Not only has there been an apparent need for standardisation between operating systems (so that information can be exchanged from system to system); but there has been a need standardise the structure for encoded data so that levels of consistency and accuracy in an institutions record keeping systems can be maintained.

Signpost: Need to standardise the encoding of information

6. What is SGML

    While leading research on integrated law office information systems at IBM in the 1960s, Charles Goldfarb and his team created a method (known as “Generalised Markup Language”) to let text editing, formatting, and information systems share documents. The methodology behind a markup language was itself nothing new: writers, editors, publishers and printers had always shared information about the editing and formatting of a document by placing code in the document’s margins. Goldfarbs team simply applied this methodology to the electronic realm. Electronic markup is a method of adding information to text to indicate the logical components of a document in a way that can be interpreted by automated systems. Markup tags are wrapped around the information you want to encode as part of the document itself.

    Over the course of two decades Goldfarb’s research into markup languages eventually gave rise to SGML, the markup language which lays behind Encoded Archival Description . Standard Generalised Markup Language is a standard (ISO 8879). It was adopted by the International Organisation for Standardisation in 1986 as a clearly defined and agreed upon convention for marking up electronic text. Since then it has been increasingly adopted as the international standard for data and document interchange in open systems environments, including the automative, defence, aerospace, pharmaceuticalís, electronics, telecommunications and printing industries.

    It is a meta-language or syntax which allows for the definition and structure of documents as well as the relationships between the elements of a document to be expressed as part of the content.

    Underlying the methodology of SGML is the assumption that documents comprise three types of information: data, structure and format.

    The data in a document may include text, graphics, images and even multimedia objects such as video and sound. The data may also include information that does not itself appear on the printed page. For example a particular tagged element may include hidden data, such as the source of an authority for a name, as part the machine readable attributes for the tag.

    The structure of a document refers to the relationship among the data elements. At the most basic level encoded archival finding aids consist of two segments: a segment that provides information about the finding aid itself (eg., its title, publication date etc) and a segment that provides information about the body of archival materials (eg., description at collection, series and item level). The structure of a finding aid is often used to provide the context or relationship between items within a collection and indicate their location within a repository or a box.

    The format of a document is its appearance, Both SGML and Encoded Archival Description allow for the inclusion of formatting features. For example it is possible to specify conventions to ensure that the title of printed matter should always be in italics, or even to present at a particular point size, but generally speaking formatting is not the domain of SGML. SGML recognises that data, structure, and format are separable elements. It preserves the data and structure, but does not specify the format of the document. It recognises that formatting should be optimised to user requirements at the time of delivery. It assumes that providing that the data and its structure are provided most formatting can be dealt with across time and across systems.

7. Document Type Definition

    Identifying, or defining the structure of information is part of what archivist, librarians and curators have always done.

    The way that we organise information to provide access to a collection is essential for understanding both the content and context of historical materials. This is what finding aids do. They provide detailed a description of a collection along with an indication of the its organisation and structure so that researchers can determine whether the collection likely to satisfy their research needs.

    In a similar way every SGML document has a defined structure known as the document type definition (DTD). The DTD specifies the rules for the structure and markup of the document. The Encoded Archival Description standard is an SGML Document Type Definition for encoding finding aids. Each encoded finding aid obeys rules for specifying the document’s structure. For example, at its broadest level the EAD DTD specifies that each document must have a header and a finding aid structure. Information about the publication is always provided in the header. Information about the collection itself is provided hierarchically in the finding aid component of the document.

8. What is EAD?

  • The Encoded Archival Description standard grew out of the Berkeley Finding Aid Project which commenced at the University of California in 1993. The mission of the Finding Aid Project was to develop a standard which archival institutions could use to publish and share information in electronic format with the same level of detail printed findings aid have usually provided.
  • Under the project leadership of Daniel Pitti a number of options including HTML, ASCII data, and extended MARC tagging were investigated before the team at Berkeley settled on the development of an SGML DTD as the standard most suitable for the functional requirements of archival finding aids.
  • Draft tags sets were first developed and used at Berkeley during 1995
  • During September 1995 the Society of American Archivist established the EAD working group as part of its Committee on Archival Information Exchange. The Committee is comprised of 16 members including representatives from the United Kingdom and Canada to review and develop a model for archival finding aids.
  • In October 1995 the Library of Congress accepted the role of EAD standards agency for the working group. The SAA remains responsible for the ongoing oversight of the standard.
  • In June 1996 the beta version of the EAD standard was released to the International Archive community (via the internet) for testing and evaluation. Evaluation included responses from beta testers in Spain, France, Sweden, United Kingdom, Canada, Mexico, Italy,
  • October 1997, evaluation of the beta version closed
  • November 1997, 30 changes to the tag library recommended. Most of the changes are only minor variations to attributes. 5 changes have been included to ensure conformance with the forthcoming XML standard. 2 new tags have been added and 1 removed. Changes were made to the attributes of 2 tags to ensure compliance with ICA description standards, the Canadian Rules for Archival Description and the Categories for the Description of Works of Art.
  • 1998, preparation of EAD v1.0 commences with the version due for release in September of this year.

    To ensure the broad application of the standard, the use of words such as collection, archives, series, item, fonds etc., have been replaced by generic terms, such as unit or component which are not specific to any system or institution.

9. What are the benefits of EAD?

    By encoding finding aids using the EAD DTD it is possible to standardise the structure of finding aids and to meet the aims adopted by the International Council on Archives Ad hoc Commission on Descriptive Standards in January 1993. It is possible to:

  • Ensure the creation of consistent, appropriate, and self explanatory descriptions.
  • Facilitate the retrieval and exchange of information about archival material.
  • Share authority data.
  • Integrate collection descriptions from different institutions.

Because the encoded finding aid is in digital format (using SGML) it is also possible to:

  • Make those finding aids available online over the Internet.
  • Produce and maintain the finding aids independent of a platform. It is an investment protection documents encoded using SGML are not locked in to a single vendors technology or methodology because they are not stored in a proprietary format. This interoperability means that institutions working with SGML can choose the tools that suit their organisation’s needs in creating, managing and retrieving documents.
  • Search or compile data, using one or more of the descriptive elements within a finding aid, or across finding aids.
  • Validate the data structure according to the rules specified by the DTD. The consistency of documents is assured because their structure adheres to a definition which is created by an author or authority using the SGML framework.
  • Provide a visual representation of the relationships between the levels of information represented in a finding aid.
  • Build links between a finding aid and other related digital materials (including digital surrogates), wherever they may reside on a computer network.
  • Revise and update finding aids with the assistance of easy to use automated procedures.
  • Provide path for the migration of data across time as hardware and software becomes obsolete. The document files themselves are plain text files, they are small and easily moved from platform to platform. By tagging data with its role, and any other useful identifiers in place, SGML allows the information and context of that information to be readily located and re-used by automated systems.

Principles and Criteria for an SGML Document Type Definition (DTD) for Finding Aids

The standard accommodates registers and inventories of any length describing the full range of archival holdings, including textual and electronic documents, realia, visual materials and sound recordings.

It permits both the creation of new finding aids and the conversion of existing ones from print formats. Unlike documents encoded by the TEI the Finding Aid itself is not an object of study but rather a tool leading to the objects.

  • The tag library is designed to ensure the encoding of finding aids with a minimum level of required elements but allows for progressively more detailed and specific levels of description.
  • It preserves and enhances the functionality of existing finding aids.
  • The markup of finding aids supports function of description, control, navigation, indexing, online and print presentation.
  • The application of terms for description and control can be applied not only to the original source material but also to digital representations and surrogates such as reproductions of photographic prints.
  • It is intended to facilitate interchange and portability, increase consistency between institutions (and between finding aids), permit the sharing of identical data between institutions.
  • Endure changing hardware and software platforms through a process based on standards of open systems.
  • The DTD with an existence of a Tag library does work towards ensuring routine finding aid production is possible – particularly once templates are developed.


Encoded Archival Description EAD Header

EAD Identifier

File Description

Title Statement

Proper Title



Publication statement



External Pointer


Note Statement

Profile Description


Language Usage Header

Finding Aid Finding Aid

Archival Description

Descriptive Identification



Unit Title

Unit Date


Physical Description


Genre and Form

Administrative Information

Custodial History

Acquisition Information

Access Conditions & Restrictions

Preferred Citation

Scope & Content

Biography or History

Description of Subordinate Components

Component (First Level)

Location of the Unit

Component (Second Level)

Digital Archive Object

DAO Group

Corporate Name Sample Tags Geographic Name

Personal Name




Australian Museums Online (AMOL) Conference held at the Australian War Memorial, Canberra during April 1997

Ghosh, Shikar. ‘Making Business sense of the Internet’, Harvard Business Review, March/April 1998, pp126-135.

Gilliland-Swetland, Anne J. & La Porte Thomas A. (eds.) Beta Encoded Archival Description Application Guidlelines, Library of Congress and Society of American Archivist, December 20, 1996

Seventh International World Wide Web Conference held in Brisbane, April 1998.

SGML University Power Tool and Resource Guide, CD ROM published by SGML University Press, 1997.

The New Australia Cooperative Settlement Association Project. A guide to the Records of The New Cooperative Settlement Association held in the Mitchell Library, State Library

of New South Wales and the Rare Books Library, University of Sydney Library. Finding aid encoded at the Scholarly Electronic Text and Image Service, University of Sydeny Library. 1997.

About the Author

Paul Scifleet is an Education Officer with the Education and Client Liaison Division of the State Library of New South Wales where he works on the selection and provision of resources for students studying the New South Wales Higher School Certificate as part of the State Library’s Infocus service. Paul is Project Coordinator for Infocus_Online and is currently involved in the development of technology and the preparation of digital resources for the delivery of the Infocus service online via the World Wide Web.

Paul has a Bachelor of Arts Degree from Macquarie University and post graduate qualifications in Librarianship and Archives Administration from the University of New South Wales where is currently undertaking a Masters Degree by research in SGML and evolving standards for digital libraries

Paul’s research into Encoded Archival Description standards has resulted in the co-operation of several institutions including the State Library of New South Wales, University of Sydney Library, National Library of Australia and Sydney City Library for the development of the New Australia Project.

Commencing at the Scholarly Electronic Text and Image Service, University of Sydney, on 10 November 1997, the New Australia Project is the first Australian electronic finding aid using SGML and Electronic Archival Description standards to be made available.