Bailey, Jefferson; Donovan, Lori
Many institutions are now building rich, significant archives of web content. Though the number of web archiving programs has grown, access models for these collections have remained focused on URL-based discovery and traditional live-web-style browsing. Given the resources required to build and maintain web archives, finding new forms of access for these collection will help increase use and thus allow institutions to better advocate for the value of collecting and preserving web content.
Distant reading, text mining, digital humanities, and other data-driven forms of analysis have become increasingly popular methods of using digitized and digital collections. Web archives, being born-digital, of...
Caron, Bertrand; Nef, Andreas; Hoebelheinrich, Nancy; Habing, Thomas
The Metadata Encoding and Transmission Standard (METS) 1.x schema has an established community of users including academic and national libraries, archives, and museums as well as support from a number of commercial and open source tool and service vendors. While the established community of METS users has adapted systems and tools to METS expressed in XML, many in the library and archive communities are moving toward the use of newer technologies such as those of the Semantic Web and linked data for the digital content that they have been collecting. As a result, the METS Editorial Board (MEB) has been...
Duretec, Kresimir; Kulmukhametov, Artur; Rauber, Andreas; Becker, Christoph
This proposal is for a full day workshop to be held at the IPRES 2015 conference. The focus of the workshop will be on areas in the digital preservation field which could benefit from benchmarking. Benchmarking is a method of comparing entities against a well-defined standard (i.e. benchmark). The workshop is focused on discussing software benchmarking practices in digital preservation and how these can contribute to improving digital preservation tools.
McLellan, Evelyn; Bredenberg, Karin; Guenther, Rebecca
This workshop provides an overview of the PREMIS Data Dictionary for Preservation Metadata, a standard addressing the information you need to know to preserve digital content in a
repository. It includes a brief introduction to PREMIS and the
launch of version 3.0, which changes the data model. In addition there are reports from the preservation community on implementation of the standard in various systems or contexts, in particular the integration of preservation systems that support PREMIS with other digital management systems.
Mumma, Courtney; Shallcross, Michael; Meister, Sam; Di Bella, Christine; Westbrook, Bradley; Lee, Christopher; Eckard, Max
This workshop offers a space to talk about open-source software for digital preservation, and the particular challenges of developing systems and integrating them into local environments and workflows. Topics will include current efforts and grant-funded initiatives to integrate different open source archival software tools; the development of workflows involving multiple open source tools for digital preservation, forensics, discovery and access; and the identification of gaps which may need filled by these or other tools.
Owens, Trevor; Wilson, Carl
Developing, deploying and maintaining open source software is increasingly a core part of the core operations of cultural heritage organizations. From preservation infrastructure, to tools for acquiring digital and digitized content, to platforms that provide access, enhance content, and enable various modes for users to engage with and make use of content, much of the core work of libraries, archives and museums is entangled with software. As a result, cultural heritage organizations of all sizes are increasingly involved in roles as open source software creators, contributors, maintainers, and adopters. Participants in this workshop shared their respective perspectives on institutional roles...
Tibbo, Helen; McGovern, Nancy; Sierman, Barbara; Mumma, Courtney; Dillo, Ingrid
This tutorial will focus on an array of options and programs for audit and potential certification of trustworthy digital repositories. These will include self-audit, the European three-level model of certification, the Data Seal of Approval, peer-audit, ISO 16363 audit, and forthcoming certification of trustworthy repositories.
Cox, David; Woods, Andrews
Fedora is a flexible, extensible repository platform for the management and dissemination of digital content. Fedora 4, the newly released, revitalized version of Fedora, introduces a host of new features and functionality that both new and existing Fedora users are interested in learning about and experiencing first-hand.
This tutorial will provide an introduction to and overview of Fedora 4, with a focus on the latest features. Fedora 4 implements the W3C Linked Data Platform recommendation, so a section of the tutorial will be dedicated to a discussion about LDP and the implications for Fedora 4 and linked data. Fedora 4 is...
Tibbo, Helen; Christian, Thu-Mai; Goatley, Rachel
In this poster, we illustrate the work of the IMLS-funded Curating Research Assets and Data Using Lifecycle Education (CRADLE) project in developing a data curation massive open online course (MOOC) targeted to two distinct audiences: researchers who are becoming increasingly burdened with data management policies, and information professionals tasked to support these researchers. The poster describes data curation concepts selected for its applicability to both audiences as well as how content and delivery of educational materials are varied to enable students to achieve learning objectives.
In this poster, we present the current status, lessons learned, and best practices experienced thus far in the preparation for audit and certification of the Government Publishing Office’s FDsys as a Trustworthy Digital Repository. The poster will serve as an introduction to a future, publically accessible toolkit and set of resources and case studies for use within repositories seeking an audit-based approach of evaluation.
Truman, Gail; Henderson, Jaime
This poster session describes the selection criteria and process used for evaluating three repository software offerings and cloud platforms, with pros and cons. It describes implementation of workflows, representations of PREMIS metadata for objects in the repository, documenting fixity checks performed on datastreams, mapping of “rights” elements in DC datastreams to PREMIS “rightsExtension” elements, and more.
This poster presentation describes the results of a research project conducted by the National Diet Library (NDL), which investigated the accessibility of digital documentation stored on physical media across different versions of operating systems. This project was conducted from 2012 to 2013 as a part of a larger research project to investigate the practicality of long-term preservation and use of digital library materials stored by the NDL on physical media.
Preservation of our cultural heritage on the Internet is increasingly in danger of getting lost due to the challenges faced when collecting it. An increasing amount of national webpages are moving to generic Top Level Domains like .com or .org. The movement is so fast that we are at risk of losing it, since we do not get in time to identify the change before it has disappeared again. Therefore this question becomes increasingly crucial for organizations covering digital national heritage including web archives for a specific country.
This poster presents the results from a research project that evaluated two different...
Westcott, Stephanie; Cruz, Kelle; Olson, Eric
This poster will profile and demonstrate a new collaboration focused on community curation, preservation of digital communications, and the archiving of science information on the open web. The PressForward Project, a research initiative concerned with the discoverability of digital gray literature, including blog posts, white papers, data visualizations, and podcasts, and Arceli, a collaborative effort within communities of astronomers whose mission is to preserve informal astronomy communications, are in the process of developing a method to make it possible to curate, archive, index, and cite digital alt-publications.
The PressForward Project, funded by the Alfred P. Sloan Foundation and based at the...
Görzig, Heike; Engel, Felix; Brocks, Holger; Hemmje, Matthias
This paper outlines an approach for developing tools and services that support automated generation, management, evolution and execution of data management plans (DMPs)
by generating rules derived from the DMPs which can be applied to the data to be archived. The approach is based on existing models and tools that were developed in successive research projects SHAMAN, APARSEN, and SCIDIP-ES. The models include the Curation Lifecycle Model from the
DCC, the OAIS Information Model and the Extended Information Model to support processes, domains, and
organizations. An approach for deriving rules from policies is outlined to support using iRODS. OAIS and Context
Information related to...
Eldakar, Youssef; Nagi, Magdy
Archiving web content is bound to produce datasets with duplication, either across time or across location. The Bibliotheca Alexandrina (BA) has a web archive legacy spanning a period of 10 years and is continuing to expand the collection. Initial assessment of this very large store of data was conducted. Given a high enough rate of duplication, deduplication would lead to sizable savings in storage requirements. The BA worked through the International Internet Preservation Consortium (IIPC) to compile best practices for recording duplicates in ISO 28500, the WARC File Format. To deduplicate legacy web archives “after the fact,” the BA is...
Duretec, Kresimir; Kulmukhametov, Artur; Rauber, Andreas; Becker, Christoph
Creation and improvement of tools for digital preservation is a difficult task without an established way to assess any progress in their quality. This happens due to low presence of solid evidence and a lack of accessible approaches to create such evidence. Software benchmarking, as an em-
pirical method, is used in various fields to provide objective evidence about the quality of software tools. However, the
digital preservation field is still missing a proper adoption of that method. This paper establishes a theory of benchmarking of tools in digital preservation as a solid method
for gathering and sharing the evidence needed to achieve...
Klindt, Marco; Amrhein, Kilian
In this paper, we describe an OAIS aligned data model and architectural design that enables us to archive digital information with a single core preservation workflow. The data
model allows for normalization of metadata from widely varied domains to ingest and manage the submitted information utilizing only one generalized toolchain and be able to create access platforms that are tailored to designated data consumer communities. The design of the preservation system is not dependent on its components to continue to exist over its lifetime, as we anticipate changes both of technology and
environment. The initial implementation depends mainly on the open-source tools...
Heinen, Joey; Goethals, Andrea
In this paper, we describe the development of a file format migrations framework at Harvard Library, using one migration case study, Kodak PhotoCD images, to demonstrate implementation of the framework.
Maemura, Emily; Moles, Nathan; Becker, Christoph
As the field of digital preservation continues to mature, there is an increasing need to systematically assess an organization's abilities to achieve its digital preservation goals. A
wide variety of assessment tools exist for this purpose. These range from light-weight checklists to resource-intensive certification processes. Conducted as part of the BenchmarkDP
project, this paper presents a survey of these tools that elucidates available options for practitioners and opportunities for further research.