Strathmann, Stefen; Engelhardt, Claudia
In this poster, we introduce the results of a survey of the training needs in digital preservation conducted by the DigCurV project.
Sacchi, Simone; Wickett, Karen M.; Renear, Allen H.
The problem of identifying and re–identifying data put the notion of of ”same data” at the very heart of preservation, integration and interoperability, and many other fundamental data curation activities. However, it is also a profoundly challenging notion because the concept of data itself clearly lacks a precise and univocal definition. When science is con- ducted in small communicating groups, with homogeneous data these ambiguities seldom create problems and solutions can be negotiated in casual real-time conversations. However when the data is heterogeneous in encoding, con- tent and management practices, these problems can produce costly inefficiencies and lost opportunities. We...
Kucera, Karel; Stluka, Martin
This short paper describes problems arising in optical character recognition of and information retrieval from historical texts in languages with rich morphology, rather discontinuous lexical development and a long history of spelling reforms. In a work-in- progress manner, the problems and proposed linguistic solutions are shown on the example of the current project focused on improving the access to digitized Czech prints from the 19th century and the first half of the 20th century.
Stepanyan, Karen; Ross, Seamus; Trier, Matthias; Joy, Mike; Cristea, Alexandra I.; Gkotsis, George; Kalb, Hendrik; Kim, Yunhyong
The quest for identifying ‘significant properties’ is a common challenge for the digital preservation community. While the methodological frameworks for selecting these properties provide a good foundation, a continued discussion is necessary for further clarifying and improving the available methods. This paper advances earlier work by building on the existing InSPECT framework and improving its capabilities of working with complex/compound objects like blogs. The modifications enable a more thorough analysis of object structures, accentuate the differences and similarities between the framework’s two streams of analysis (i.e. Object and Stakeholder analysis) and, subsequently, improve the final reformulation of the properties. To...
Chituc, Claudia-Melania; Ristau, Petra
Companies face challenges towards designing and implementing a preservation system to store the increasing amounts of digital data they produce and collect. The financial sector, in particular the investment business, is characterized by constantly increasing volumes of high frequency market and transaction data which need to be kept for long periods of time (e.g., due to regulatory compliance). Designing and developing a system ensuring long term preservation of digital data for this sector is a complex and difficult process. The work presented in this article has two main objectives: (1) to exhibit preservation challenges for the financial sector/ investment business,...
Albani, Mirko; Leone, Rosemarie; Tona, Calogera
Long Term Data Preservation (LTDP) aims at ensuring the intelligibility of digital information at any given time in the near or distant future. LTDP has to address changes that inevitably occur in hardware or software, in the organisational or legal environment, as well as in the designated community, i.e. the people that will use the preserved information. A preservation data manages communication from the past while communicating with the future. Information generated in the past is sent into the future by the current preservation data. European Space Agency (ESA) has a crucial and unique role in this mission, because it...
Ammitzboll Jurik, Bolette; Sindahl Nielsen, Jesper
We describe algorithms for automated quality assurance on content of audio files in context of preservation actions and access. The algorithms use cross correlation to compare the sound waves. They are used to do overlap analysis in an access scenario, where preserved radio broadcasts are used in research and annotated. They have been applied in a mi- gration scenario, where radio broadcasts are to be migrated for long term preservation. This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).
Huber-Mork, Reinhold; Schindler, Alexander; Schlarb, Sven
Digital preservation workflows for image collections involving automatic and semi-automatic image acquisition and processing are prone to reduced quality. We present a method for quality assurance of scanned content based on computer vision. A visual dictionary derived from local image descriptors enables efficient perceptual image fingerprinting in order to compare scanned book pages and detect duplicated pages. A spatial verification step involving descriptor matching provides further robustness of the approach. Results for a digitized book collection of approximately 35.000 pages are presented. Duplicated pages are identified with high reliability and well in accordance with results obtained independently by human visual...
Skinner, Katherine; Schultz, Matt; Halbert, Martin; Phillips, Mark
In this paper, we describe research led by Educopia Institute regarding the preservation needs for digitized and born-digital newspapers. The Chronicles in Preservation project, builds upon previous efforts (e.g. the U.S. National Digital Newspaper Program) to look more broadly at the needs of digital newspapers in all of their diverse and challenging forms. This paper conveys the findings of the first research phase, including substantive survey results regarding digital newspaper curation practices.
Guttenbrunner, Mark; Rauber, Andreas
Evaluating digital preservation actions performed on digital objects is essential, both during the planning as well as quality assurance and re-use phases to determine their authenticity. While migration results are usually validated by comparing object properties from before and after the migration, the task is more complex: as any digital object becomes an information object only in a rendering environment, the evaluation has to happen at a rendering level for validating its faithfulness. This is basically identical to the situation of evaluating the performance in an emulation setting.
In this paper we show how previous conceptual work is applied to an...
Derrot, Sophie; Peyrard, Sebastien; Oury, Clement; Fauduet, Louise
In the beginning, SPAR, the National Library of France's repository, was designed as the OAIS softwarified. It was intended to be a "full OAIS", covering all preservation needs in one tidy system. Then as its potential revealed itself across the library, high hopes arose for a do-it-all digital curation tool. Yet in day to day preservation activities of the BnF, it turns out that SPAR's growth takes a practical approach to the essentials of preservation and the specific needs of communities. Renewed dialogue with producers and users has led to the addition of functions the digital preservation team would not...
Halbert, Martin; Skinner, Katherine; Schultz, Matt
This paper conveys findings from four years of research conducted by the MetaArchive Cooperative, the Networked Digital Library of Theses and Dissertations (NDLTD), and the University of North Texas to investigate and document how academic institutions may best ensure that the electronic theses and dissertations they acquire from students today will be available to future researchers.
Owens, Trevor; Potter, Abigail
Viewshare is a free, Library-of-Congress-sponsored platform that empowers historians, librarians, archivists and curators to create and customize dynamic interfaces to collections of digital content. This demonstration of Viewshare will start with an example spreadsheet or data harvested via OAI-PMH to generate distinct interactive visual interfaces (including maps, timelines, and sophisticated faceted navigation), which can be copy-pasted in any webpage. The data augmentation services associated with Viewshare will also be demonstrated.
Nordland, Lori Podolsky; Hank, Carolyn
Digital curation may be thought of as a set of strategies, technological approaches, and activities for establishing and developing trusted repositories, and ensuring long-term access to digital assets. It spans many disciplines and communities, as well as individuals seeking to maintain, preserve and add value to the ever-expanding body of digital content. This diversity has given way to ambiguity in defining digital curation, particularly in consideration of potentially synonymous terms, such as digital stewardship, preservation, and archiving. This poster will provide a forum for participants to challenge and engage in the dialogue that defines and describes digital curation.
Rechert, Klaus; von Suchodoletz, Dirk; Valizada, Isgandar; Fauduet, Louise
The goal of the bwFLA project is the implementation and development of services and technologies to address Baden-Wurttemberg state and higher education institutes’ libraries’ and archives’ challenges in long-term digital object access. The project aims on enabling diverse user groups to pre- pare non-standard artifacts like digital art, scientific applications or GIS data for preservation. The project’s main goal is to build-on ongoing digital preservation research in international and national projects to integrate workflows for emulation-based access strategies.
Dappert, Angela; Peyrard, Sebastien; Delve, Janet; Chou, Carol C.H.
“Digital preservation metadata” is the information that is needed in order to preserve digital objects successfully in the longterm so that they can be deployed in some form in the future. A digital object is not usable without a computing environment in which it can be rendered or executed. Because of this, information that describes the sufficient components of the digital object’s computing environment has to be part of its preservation metadata. Although there are semantic units for recording environment information in PREMIS 2, these have rarely, if ever, been used. Prompted by increasing interest in the description of computing...
In this paper I am discussing the repositioning of traditional conservation concepts of historicity, authenticity and versioning in relation to born digital artworks, upon findings from my research on preservation of computer-based artifacts. Challenges for digital art preservation and previous work in this area are described, followed by an analysis of digital art as a process of components interaction, as performance and in terms of instantiations. The concept of dynamic authenticity is proposed, and it is argued that our approach to digital artworks preservation should be variable and digital object responsive, with a level of variability tolerance to match digital...
This paper presents an investigation of the best suitable package formats for long term digital preservation. The choice of a package format for preservation is crucial for future access, thus a thorough analysis of choice is important.
The investigation presented here covers setting up requirements for package formats used for long term preserved digital material, and using these requirements as the basis for analysing a range of package formats.
The result of the concrete investigation is that the WARC format is the package format best suited for the listed requirements. Fulfilling the listed requirements will ensure mitigating a number of risks of...
Tzitzikas, Yannis; Analyti, Anastasia; Kampouraki, Mary
The main digital preservation strategies are based on metadata
and in many cases SemanticWeb languages, like RDF/S, are used for expressing them. However RDF/S schemas or ontologies are not static, but evolve. This evolution usually happens independently of the “metadata” (ontological instance descriptions) which are stored in the various Metadata Repositories (MRs) or Knowledge Bases (KBs). Nevertheless,
it is a common practice for a MR/KB to periodically update its ontologies to their latest versions by “migrating” the available instance descriptions to the latest ontology versions. Such migrations incur gaps regarding the specificity of the migrated metadata, i.e. inability to distinguish those descriptions...
Takhteyev, Yuri; DuPont, Quinn
This project explores the world of retrocomputing, a constellation of largely—though not exclusively—non-professional practices involving old computing technology. Retrocomputing includes many activities that can be seen as constituting “preservation,” and in particular digital preservation. At the same time, however, it is often transformative, producing assemblages that “remix” fragments from the past with newer elements or joining historic components that were never previously combined. While such “remix” may seem to undermine preservation, it allows for fragments of computing history to be reintegrated into a living, ongoing practice, contributing to preservation in a broader sense. The seemingly unorganized nature of retrocomputing assemblages...