Hockx-Yu, Helen; Johnson, Stephen; Crawford, Lewis; Coram, Roger
A prerequisite for digital preservation is to be able to capture
and retain the content which is considered worth preserving.
This has been a significant challenge for web archiving,
especially for websites with embedded streaming media
content, which cannot be copied via a simple HTTP request to
a URL. This paper describes the approach taken by the British
Library in capturing and replaying streaming media in a web
archive. A working system is now in place which will lead to
the development of more generic tools and workflows,
contributing to addressing a common challenge for the web
archiving community. The British Library recently archived a
large scale public arts project website,
Gomm, Moritz; Hemmje, Matthias; Brocks, Holger; Schrimpf, Sabine; Werkmann, Björn
In this paper we present a case study and selected results
from a research on digital preservation amongst digital
libraries in Europe. We propose a framework for gap
analysis in digital preservation encompassing the
diffusion of preservation practices and the life-cycle of
data. We also present a Gap Analysis Tool that we
developed to support visual analysis of gaps in the
implementation of digital preservation amongst
communities. We discuss selected results from the
application of the tool in the community of libraries in
Goethals, Andrea; Gogel, Wendy
Because of the historical value of email in the late 20th
and 21st centuries, Harvard University Libraries began
planning for an email archiving project in early 2007. A
working group comprised of University archivists,
curators, records managers, librarians and technologists
studied the problem and recommended the undertaking
of a pilot email archiving project at the University
Library. This two-year pilot would implement a system
for ingest, processing, preservation, and eventual end
user delivery of email, in anticipation of it becoming an
ongoing central service at the University after the pilot.
This paper describes some of the unexpected challenges
encountered during the pilot project and how they were
addressed by design decisions. Key challenges...
Gavrilis, Dimitris; Angelis, Stavros; Papatheodorou, Christos
Repository platforms offer significant tools aiding institutions to preserve the wealth of their information resources. This paper presents the data model as well as the architectural features of Mopseus, a digital library service, built on top of Fedora-commons middleware, designed to facilitate institutions to develop and preserve their own repositories. The main advantage of Mopseus is that it minimizes the customization and programming effort that Fedora-commons involves. Moreover it provides an added value service which semantically annotates the internal structure of a Digital Object. The paper focuses on the preservation functionalities of Mopseus and presents a mechanism for automated generation...
Fauduet, Louise; Peyrard, Sébastien
The Bibliothèque nationale de France has developed its
trusted digital repository, SPAR (Scalable Preservation
and Archiving Repository), as a data-first system. This
implies having fully described collections, through use
of metadata standards in the information packages, such
as METS, PREMIS, MIX or textMD, in a way that will
make sense given the diversity of our documents.
The need for full documentation also applies to the
system itself. On the one hand, SPAR is self-describing
in order to ensure its durability. On the other hand, all
the information that is ingested into the system
contributes to determine its settings and its behavior.
The Data Management module is at the heart of these
The British Library’s web archive comprises several
terabyte of harvested websites. Like other content streams
this data should be ingested into the library’s central
preservation repository. The repository requires a
standardized Submission- and Archival Information
Harvested Websites are stored in Archival Information
Packages (AIP). Each AIP is described by a METS file.
Operational metadata for resource discovery as well as
archival metadata are normalized and embedded in the
METS descriptor using common metadata profiles such as
PREMIS and MODS.
The British Library’s METS profile for web archiving
considers dissemination and preservation use cases
ensuring the authenticity of data. The underlying complex
content model disaggregates websites into web pages,
associated objects and their actual digital manifestations.
Di Iorio, Angela; Lunghi, Maurizio
This article describes the development of Archives
Ready To Archival Information Packages (AIP)
Transmission a PREMIS Based Project (ARTAT).
Following the project approach, the starting phase
consisted of prototyping a layer conveying preservation
metadata, which can be encoded from the existing
archival systems, and exchanged with other repositories.
This layer called Preservation Metadata Layer (PML)
uses PREMIS semantics as the common language to
overcome archival systems differences, and to transmit
out of its original context, relevant preservation
information about content objects comprising an AIP.
Since a repository, following the OAIS reference model,
usually provides resources with metadata container
objects, the experiment performed an analysis on
commonly used container formats, in order to enable the
As mechanisms emerge to certify the trustworthiness of
digital preservation repositories, no systematic efforts
have been devoted to assessing the quality and
usefulness of the preserved content itself. With generous
support from the Andrew W. Mellon Foundation, the
University of Michigan’s School of Information, in close
collaboration with the University of Michigan Library
and HathiTrust, is developing new methods to measure
the visual and textual qualities of books from university
libraries digitized by Google, Internet Archive, and
others and then deposited for preservation. This paper
describes a new approach to measuring quality in largescale
digitization; namely, the absence of error relative
to the expected uses of the deposited content. The paper
specifies the design...
Campbell, Laura E.; Dulabahn, Beth
On April 14, 2010, the Library of Congress and Twitter made the joint announcement that the Library would receive a gift of the archive of all public tweets shared through the service since its inception in 2006. The media and community response was tremendous, raising many questions about how the Library would be stewarding and providing access to the collection. There are many issues to consider, from the technical mechanisms of transfer to the Library and the ongoing updates to the archive, to curatorial policies, to planning for a new type of research access to a Library collection. The Twitter...
Beruti, Vincenzo; Giaretta, David; Conway, Esther; Forcada, M.Eugenia; Albani, Mirko
Digital preservation is difficult. The technical difficulties
are the cause of much research. Other types of difficulty are
those to do with organisational commitment, funding and
With the increasing interest on global change monitoring,
also the use and exploitation of long time series of Earth
Observation (EO) data has been increasing systematically,
calling for a need to preserve the EO data without time
On the other hand:
· Data archiving and preservation strategies are still
mostly limited to the satellite lifetime and few years
· The data volumes are increasing dramatically.
· Archiving and data access technology are evolving
· EO data archiving strategies, if existing at all, are
different for each EO...
Beinert, Tobias; Brantl, Markus; Kugler, Anna
BABS is an acronym for Library Archiving and Access System (Bibliothekarisches Archivierungs- und Bereitstellungssystem), which constitutes the infrastructure for digital long-term preservation at the Bavarian State Library (BSB). During the two-year project BABS2 funded by German Research Association (DFG) BSB focuses together with the Leibniz-Supercomputing Centre (LRZ) on advancing its organizational and technical processes under the aspect of trustworthiness according to the nestor criteria catalogue. Important achievements are e.g. framing an institutional policy for digital preservation including local, regional, national tasks of a large-scale research and archive library, conducting and evaluating a survey concerning the archiving requirements of all BSB...
Beers, Shane; York, Jeremy; Mardesich, Andrew
HathiTrust is a collaboration of universities working
together to establish a repository that archives and shares
their digitized collections. Initially, the Submission
Information Packages (SIPs) deposited into HathiTrust
were extremely uniform, being constituted primarily of
books digitized by Google. HathiTrust’s ingest
validation processes were correspondingly highly
regular, designed to ensure that these SIPs met agreedupon
qualities and specifications. As HathiTrust has
expanded to include materials digitized from other
sources, SIPs have become more varied in their content
and specifications, introducing the need to make
adjustments to ingest and validation routines. One of the
primary sources of new SIPs is the Internet Archive,
which has digitized a large number of public domain
materials owned by HathiTrust...
Beagrie, Neil; Eakin-Richards, Lorraine; Vision, Todd
Data attrition compromises the ability of scientists to validate and reuse the data that underlie scientific articles. For this reason, many have called to archive data supporting published articles. However, few successful models for the sustainability of disciplinary data archives exist and many of these rely heavily on ephemeral funding sources.
The Dryad project is a consortium of bioscience journals that seeks to establish a data repository to which authors can submit, upon publication, integral data that does not otherwise have a dedicated public archive. This archive is intended to be sustained, in part, through the existing economy of scholarly publishing....
Antunes, Gonçalo; Barateiro, José; Borbinha, José
Apart from being a technological issue, digital
preservation raises several organizational challenges.
These challenges are starting to be addressed in the
industrial design and e-Science domains, where
emerging requirements cannot be addressed directly by
OAIS. Thus, new approaches to design and assess
digital preservation environments are required. We
propose a Reference Architecture as a tool that can
capture the essence of those emerging preservation
environments and provide ways of developing and
deploying preservation-enabled systems in
organizations. This paper presents the main concepts
from which a Reference Architecture for digital
preservation can be built, along with an analysis of the
environment surrounding a digital preservation system.
We present a concrete Reference Architecture,
consisting of a process to...
Aitken, Matthew Barr, Brian; Ross, Seamus; Lindley, Andrew; Barr, Matthew
The Planets Testbed, a key outcome of the EC co-funded
Planets project, is a web based application that provides
a controlled environment where users can perform experiments
on a variety of preservation tools using sample data
and a standardised yet configurable experiment methodology.
Development of the Testbed required the close
participation of many geographically and strategically disparate
organisations throughout the four-year duration of
the project, and this paper aims to reflect on a number
of key lessons that were learned whilst developing software
for digital preservation experimentation. In addition
to giving an overview of the Testbed and its evolution,
this paper describes the iterative development process that
was adopted, presents a set of...
Properties of digital objects play a central role in digital
preservation. All key preservation services are linked via
a common understanding of the properties which describe
the digital objects in a repository's care. Unfortunately,
different services deal with properties on
sometimes different levels of description. While, for example,
a preservation characterization service may extract
the fontSize of a string, the preservation planning
service may require the preservation of the text’s formatting.
Additionally, a value for the same property may be
obtained in various ways, sometimes resulting in different
observed values. Furthermore, properties are not always
equally applicable across different file formats.
This report investigates where in these three situations
relationships between properties need to...