Abstract

In the field of multimedia, very little attention is given to the activities involved in the preservation of audio documents. At the same time, more and more archives storing audio and video documents face the problem of obsolescing and degrading media, which could largely benefit from the instruments and the methodologies of research in multimedia. This paper presents the methodology and the results of the Italian project REVIVAL, aimed at the development of a hardware/software platform to support the active preservation of the audio collection of the Fondazione Arena di Verona, one of the finest in Europe for the operatic genre, with a special attention on protocols and tools for quality control. On the scientific side, the most significant objectives achieved by the project are (i) the setup of a working environment inside the archive, (ii) the knowledge transfer to the archival personnel, (iii) the realization of chemical analyses on magnetic tapes in collaboration with experts in the fields of materials science and chemistry, and (iv) the development of original open-source software tools. On the cultural side, the recovery, the safeguard, and the access to unique copies of unpublished live recordings of artists the calibre of Domingo and Pavarotti are of great musicological and economical value.

1. Introduction

Considerable efforts have been spent on the preservation of sound archives over the past decades (see [1, 2] for an overview), and a variety of methodologies and best practices are currently made available by the international community (see [36]). The importance of audio recordings (speech and music) as documentary sources for disciplines such as linguistics, musicology, ethnomusicology, anthropology is fully recognized today. But, unlike other cultural materials such as books and paintings, the Life Expectancy (LE) of audiovisual documents is very short; that is, it can be measured in decades rather than in centuries. In addition, the problem of preservation of audio and video is increasingly overshadowed by the problems of the obsolescence of the technology used to access them [7]. In their experience with real-world archives, the authors found that an overall underestimation of the importance of quality control during the process of remediation (the process of transferring the acoustic information from a medium onto another medium) affects many digitization projects as well as their output.

The main reason for this underestimation is that traditional actors of cultural institutions are unprepared against complex problems with a strong technological component. The authors believe that computer science and other scientific disciplines (including engineering and materials science) can give significant contribution in devising innovative solutions to the problem of preserving audio documents by collaborating with experts of archival science and archivists. “The development of successful preservation strategies will require the cooperation of computer scientists, data storage experts, data distribution experts, fieldworkers, librarians, and folklorists” [8].

Before the first audio document is replayed, it should be clear (i) why the documents are being remediated (objectives), (ii) if there are the conditions to remediate the entire collection or a selection must be made (priority criteria, resources), and (iii) if and how the audio documents can be made available to the public (copyright policies, (remote) access tools). For an overview of preservation planning, see [3]; for digital preservation planning, see [9].

This paper presents the methodology applied and the results obtained in the Italian research project REVIVAL (REstoration of the VIcentini archive in Verona and its accessibility as an Audio e-Library), active from January 2009 to December 2011 (24 months). The project aim was to develop a hardware/software platform to support the preservation of the audiovisual collection of the Fondazione Arena di Verona, with a special attention on protocols and tools for quality control. REVIVAL was not intended to be just another digitization project: besides the creation of over a thousand preservation copies of audio documents (a definition of preservation copy is given in Section 3), among the project key objectives were (i) the setup of a laboratory for preservation inside the archive (see Section 4.3), (ii) the knowledge transfer to the archival personnel, (iii) the realization of chemical analyses on magnetic tapes, never performed before within a preservation project, in collaboration with experts in the fields of materials science and chemistry (see Section 3.2), and (iv) the development of original open-source software tools to control and to automatize specific steps of the preservation process (see Section 5.1).

The paper is organized as follows. Section 2 introduces some related works on quality control applied to the process of preservation in audiovisual archives. Section 3 presents the methodology proposed by the authors for the active preservation of audio documents, explaining the key concepts of preservation copy, first-level access copy, and second-level access copy. The section includes a paragraph dedicated to innovative chemical analyses performed on magnetic tapes in the treatment of a particular syndrome known as Soft Binder Syndrome-Sticky Shed Syndrome (SBS-SSS). Then Section 4 describes the research project REVIVAL, to which the methodology presented earlier has been applied, and specifies the archive’s profile (Section 4.1), with a description of the laboratory setup during the project inside the archive of the Fondazione Arena di Verona (Section 4). Section 4.2 defines the scenario for which the methodology and the tools described in the article have been designed and tested. Section 5 addresses the problem of quality control, not intended as a static concept but rather as “the result of a process”, and puts it in relation with the methodological apparatus presented in Section 3. Section 5.1 presents some of the original software tools that have been developed during the project to enhance automation and facilitate the implementation of a protocol for quality control.

Ten principles for the development of digital libraries have been outlined by McCray and Gallagher [10]: expect change, know your content, involve the right people, design usable systems, ensure open access, be (a)ware of data rights, automate whenever possible, adopt and adhere to standards, ensure quality, and be concerned about persistence. Many of these aspects significantly involve computer science and are considered to be open issues in the international community of sound archives. For example, the rapid development of storage technology poses problems for data persistence; the wide spread of mobile devices increases the opportunities for access to content but requires the definition of new strategies for searching and browsing audio information.

This paper is mainly concerned with two of the mentioned issues: automation and quality control during the process of remediation of audio documents. During the last years, very few works addressed these problems. Most of the contributions on the subject of automation come from the research projects Presto and Prestospace [11], the aim of which was to provide technical solutions and integrated systems for the digital preservation and access of audio-visual collections.

The partners of the projects, among which there were some of the most important broadcasting companies in Europe (e.g., British Broadcasting Corporation (BBC), the Italian public broadcasting company (RAI), etc.), introduced the concept of preservation factory, showing that an approach similar to a semiautomated assembly line, with each operator running multiple “preservation chains,” can save up to 50% of the preservation costs. Some commercial products have been developed basing on the preservation factory concept, which even became a registered trademark. Although this approach has economic advantages when applied to very large archives, it is not suitable for small-to-medium archives, because the latter rarely relies on the funding that is necessary to access that kind of massive infrastructures. In addition, the authors believe that the factory approach does not apply well to the preservation of art (electronic) music, because in this type of repertoire the audio recordings are very often unique copies and therefore require a special attention in the handling and in the signal extraction, attention that is more similar to that of historical paintings and sculptures (as a matter of fact, nobody dares to propose a “factory” approach for the preservation of Donatello’s and Brunelleschi’s works). So, in contrast to the factory approach, the authors have defined a protocol for preservation that is structured on single documents being processed one at a time. The authors also show how specific software tools developed ad hoc can enable partial parallel workflows managed by a single operator.

Another system for the management of backup and archival routines has been developed by Strodl et al. [12]: it consists in an automated service for backing-up and migrating data collections, differing from the authors’ work in that a logical model for digital preservation is proposed, skipping the requirements of the labor-intensive preservation chain that starts with the physical audio documents.

Quality control needs to be a primary concern in long-term preservation, just as the reliability, the accuracy, and the authenticity of the documents. Today, the authenticity of digital materials is difficult, if not impossible, to prove [13]. One digital document without certain provenance can compromise the reliability of the entire archive, nullifying the digitization campaign with incalculable loss of time, money, and even cultural materials (in case the originals became unaccessible). Documenting the process that generated the preservation copy is particularly important in the audio field, because the medium from which the signal was extracted might be irrecoverable (in case of advanced degradation) with subsequent impossibility of future comparisons to determine a document’s authenticity.

Despite the large attention that digitization and audio archives have received in the last decades, the authors believe that not enough attention has been paid to quality control procedures, as they first pointed out in [14]. A fervid debate on the ethics of preservation, restoration, and rerecording was started in 1980 with the “Proposal for the establishment of international re-recording standards” by William Storm [5]: as the debate went along, it stayed crystal clear that the fundamentals of the practice of preservation were as follows:(i)“accurate, verifiable, and objective” procedures,(ii)measurements based on ideally objective knowledge,(iii)modern playback equipment, fully compliant with the format specific parameters of the recordings,(iv)a careful documentation of all measures employed and of each manipulation applied (ensuring reversibility) [15].

All of these actions are directed to fight a common enemy: the falsification of history, which is the problem of “authenticity” by another name.

The paper [16] introduces the key concept that authenticity cannot be evaluated by means of a boolean flag, but it is rather the result of a process, and never limited to the resource itself but extended to the information/document/record system [17]. The authors believe that their systemic approach to preservation, described in Section 3, makes the concept of authenticity as the result of a process even stronger.

Once a firm position with respect to theoretical issues is taken, it is time to translate it to practice. Here is where lots of the just imposed limits get lost: very often the concrete work of digitization indulges in cursory compromises, mainly justified by the fact that an accurate job is time consuming or that it requires expertise that traditional archives traditionally do not have. It indeed is time consuming: in agreement with [18, p. 5], the authors believe that there is no way out of this. Very little of the digitization process can be automatized (carrier analysis, evaluation of the state of preservation and physical restoration; definition of the format parameters; carrier handling during signal extraction), whereas nearly all of the data production, processing, and archiving can be fully automatized by means of convenient software (see Section 5.1). To ensure a rigorous preservation process, the authors have defined algorithms for quality control addressed to the archival personnel in charge of the digitization.

3. Methodology

Methodology in preservation is about theoretical positions on the one hand and working procedures on the other. In this section, the authors present the principles behind the methodology applied during the REVIVAL project and then introduce the ideas that inspired the working procedures, more extensively described in Section 5.

Audio carriers are doomed to degrade, until the loss of the information is total and irreversible (Figure 5 shows two corrupted audio documents). Whatever processing is to be performed on a recording, there should always be a version of it that can be traced as the reference copy for future comparisons and for issues related to the problem of authenticity (see Section 5).

The preservation copy of an audio document is meant to perform the function of a reliable reference replacing the original when/if this is gone. In the definition of the International Association of Sound and Audiovisual Archives (IASA), the preservation copy, or archive copy, is “the artifact designated to be stored and maintained as the preservation master. Such a designation means that the item is used only under exceptional circumstances” [19]. Its aim is to preserve the documentary unity, and its bibliographic equivalent is the facsimile or the diplomatic copy.

A preservation copy is obviously a different artifact than the document of origin, so to be called a “preservation master”—according to the IASA definition [19]—it must minimize the loss of information represented by the document of origin (audio and nonaudio data), and it must report an exhaustive documentation of the document provenance, of the data transfer, and of the transfer system. Figure 2 shows the logical structure of a preservation copy: it includes (a) a descriptive sheet listing all of the files in the preservation copy, the provenance of the document, the details about each audio file, and the venue of the transfer along with the person responsible for the creation of the copy; (b) the audio signal; (c) first level metadata: the checksum (MD5) of the audio files; second level metadata: technical specifications of the file formats included in the preservation copy (bwf, pdf, etc.); (d) photographical documentation of the carrier, its case and the accompanying material, and a technical sheet describing the transfer system.

In a preservation copy, a distinction is made between metadata and contextual information. From the viewpoint of computer science both are metadata, but speaking of a preservation copy it is useful to make a distinction: by (audio) metadata we refer to (audio) content—dependent information, which concerns the audio signal and can often be automatically extracted; by contextual information we refer to metadata that represents additional content-independent information, such as a photographical documentation of the carrier box and the accompanying material [1].

The purpose of providing this documentation meets the requirement expressed in [15] that all compensations and processing, if applied, are “based on the capacity for precise counteraction” (which means reversibility of each operation and, consequently, capacity to trace the original characteristics/values that were modified).

The actions that go from the evaluation of the document’s condition to the moment when the document is ready to be stored again are part of the remediation process. Figure 3 shows the general scheme of the remediation process. It consists of three main steps (before playback, playback, and after playback), each of which is articulated in procedures and subprocedures. The output of each procedure and sub-procedure is either data, a report, or a different state of the system.

1. Preparation of the carrier

1.1. Physical documentation

1.1.1. Pictures

1.1.2. Scanned images

1.1.3. Data validation

1.2. Visual inspection

1.3. Chemical analysis

1.4. Optimization of the carrier

2. Signal transfer

2.1. Analysis of the recording format/parameters

2.2. System setup

2.2.1. Playback equipment (e.g., reel-to-reel tape

recorder)

2.2.2. Remediation equipment (converter, acquisition

software, monitoring, etc.)

2.3. Monitoring

2.4. Data validation

2.5. Archival of the source carrier

3. Data processing and archival

3.1. Metadata extraction

3.2. Completion of the preservative copy

The procedures are described in detail in Section 5.

The scheme refers to the treatment of one document at a time. Some tasks have been automated (see Section 5.1) but most need a careful supervision by the operator: in particular, monitoring the signal transfer leaves little freedom for simultaneous activities and imposes that the relation between the operator and the remediated document is always 1 : 1. The operator does not necessarily know a priori the content of the recording nor its quality. His task, however, is not to interpret, to recognize, to classify, or to catalogue he content. His listening is only finalized at documenting signal drops, wows and flutters, and other events that are considered significant. These events are never related to the content but rather to its characteristics at an audio signal level. The better the operator knows the (expected) content of the recording, the better he/she will tell the noises that are likely to be in the recording from the noises that can be attributed to the playback equipment, and if so, what part of it. For example, it is not straightforward to determine the nature of a local disturbance: if it is certain or highly probable that a disturbance was generated by the playback equipment, the signal transfer should be started over. This control cannot be performed unless the operator monitors the signal transfer throughout (including silent parts). Without monitoring, the disturbances in the transferred audio file would not be documented, with invalidating consequences on the reliability of the preservation copy (Section 5): when the audio file is listened to again in the future and the carrier of origin is gone or unachievable for other reasons, the source of the disturbance will be impossible to backtrace. Monitoring is 100% effective when the amplified audio does not come directly from the playback device, but it is redirected after the conversion and the acquisition are complete.

Physical documentation includes pictures and scanned images of the medium, its cover, and the accompanying material. These elements very often provide useful information about the recording (content title or author, date and place of creation, equalization curve, noise reduction system, etc.). The photographic documentation avoids that text and writings are misinterpreted by the operator, should he copy them by hand.

As for the quality of the rerecording, REVIVAL stuck to the main standard proposed by the International Association of Sound and Audiovisual Archives in [20]:(1)electromagnetic/optical recordings: 96 kHz/24 bit;(2)mechanical recordings: 192 kHz/24 bit (based on the principle: “the worse the signal, the higher the resolution.”

For Digital Audio Tapes (DATs), the sampling frequency and the bit rate of the source document were kept in the preservation copy. Similarly, for Compact Discs (CDs), the standard values 44.1 kHz/16 bit were kept in the preservation copy.

3.1. Access Copies

The motivations that make archives and cultural institutions want to start a digitization project usually are as follows: (i) granting their public enhanced access, (ii) facilitating new forms of access and use, and/or (iii) preservation. The REVIVAL project was mainly oriented to preservation, but the methodology proposed by the authors suggests that a preservation copy of each document is made before any other activity is planned: the creation of an archive for preservation is an intermediate step that cannot be skipped regardless of the final goal, because it is the only way to ensure a close continuity between the source document and its authoritative copies.

When preservation is the goal, the job ends when the long-term archive of preservation copies is created. If access is the goal, other documents, with special characteristics, will have to be derived from the preservation copies, which are not meant for access (see definition in Section 3).

First-level access copies are not subject to strict limitation for file formats and other characteristics as much as preservation copies. Their main function is to enable access to the content of the preservation copies, which are imperatively “used only under exceptional circumstances” [19] and which are usually stored in places other than the archive or/and on carriers with slow access (the REVIVAL project chose a Tandberg Data Linear Tape Open (LTO) tape drive for low-term storage). In first-level access copies, the duration, the number, and the order of the tracks should remain unaltered with respect to the preservation copy. At this level, restoration is mainly allowed (e.g., the application of dehiss and denoise filters) to enhance the Sound-to-Noise Ratio (SNR) and the intelligibility of the speech.

The archive of the Fondazione Arena di Verona keeps being enriched with new recordings of present concert seasons, and the archivists keep being asked to assemble and to hand out copies of the documents for various purposes (television and radio broadcasting, documentaries, ballet rehearsals, private requests, etc.). So, even though REVIVAL was mainly focused on preservation, the issue of immediate fruition had to be solved.

By definition, first-level access copies are unaltered with respect to preservation copies in the number and in the duration of audio files (which include silence). The purpose of a second-level access copy, by contrast, is to provide easy access to the content, meaning that between the first-level access copy and the second-level access copy there is a shift in the approach to the document: first-level access copies are strictly about preservation, and second-level access copies are about the content, which requires interpretation.

The format of the second-level access copies is flexible: this is easily understood when the importance of the preservation copies is clear. If there is a preservation master that meets all the requirements discussed in Section 3, numberless versions of the recording can be made without losing the objective knowledge about the source document, ensuring reliability, accuracy, and philological authenticity. The dichotomy between carrier and content (i.e., artifact and information) distinguishes audio recordings from other cultural materials such as sculptures and paintings: in these cases, preservation and restoration are addressed to the object representing the cultural good, the meaning of which cannot be separated from the physical expression [21]. Conversely, this separation can be performed in audio recordings, allowing for multiple restorations (interpretations) without altering directly the document of origin.

A second-level access copy is an audio digital resource, derived from downsampling, cutting, or processing, according to the archive target’s needs, audio files that come from the archive of preservation copies. Examples are as follows: (i) an opera in MP3 format, from which the silent portions of the tape have been eliminated and which was split in tracks matching the scenes according to the score (at this stage the knowledge about the content, musically speaking, is equally important than the audio-technical expertise); (ii) a Compact Disc with a collection of tracks requested by the ballet for rehearsals in a specific order and of specific length.

3.2. Chemical Analysis of Magnetic Tapes

In the REVIVAL project, the authors have found a significant number of documents clearly affected by Soft Binder Syndrome-Sticky Shed Syndrome (SBS-SSS) [22], a type of degradation that affects magnetic tapes and that often causes them to be unreadable. Thermal treatment is the most common remedy for this syndrome; however, the scientific literature about the treatment is surprisingly scarce. The authors started a collaboration with the Department of Industrial Engineering—Chemical Sector, University of Padova, in order to gain deep knowledge of the tapes’ chemical/physical properties and their behavior and varying environmental conditions, such as temperature during thermal treatment. For further details on the syndrome and on thermal treatment, see [23, 24].

The results of the preliminary analyses conducted by the authors are not sufficient to devise a specific protocol for the thermal treatment of magnetic tapes, but some important facts can be drawn. In the first place, the preliminary characterization of the tapes, obtained with the Fourier Transform Infrared Spectroscopy (FTIR) in Attenuated Total Reflectance (ATR) [25], revealed a great variety of materials, as summarized in Table 1. This variety suggests that treatments should be customized for each tape according to its material (surface and substrate), as each material behaves differently at varying temperature, duration of the exposure to heat, and aging in general.

Secondly, the FTIR combined with the Thermogravimetric Analysis (TGA) [26] showed that the presence of water in the tapes is negligible. Water, which is presumed responsible for the stickiness that characterizes the SBS-SSS, is certainly involved in the degrading process of hydrolysis, but it is not absorbed by the tapes. This contradicts the claim that thermal treatment is aimed at “drying” tapes, literally “extracting” the water that the tapes have absorbed during years of storage in dump environments.

A third important result of the analyses conducted by the authors is that current practices of thermal treatment do not alter the tape physical properties. The TGA showed that significant losses of weight/heat only occur above , while thermal treatment is generally carried out at temperatures between 50 and 54°C. This proves that tapes are not damaged, even when the desired results are not produced, which is important because “lack of understanding of the sticky shed problem does not justify inaction on the part of audio archivists, since sticky shed grows gradually worse over the years” [24, p. 3]. Figure 4 shows the Scanning Electron Microscope (SEM) analysis of two tape samples: the ferromagnetic particles can be seen, including a typical discontinuity in the tape surface (on the right) representing a clear sign of degration of the polymer.

In the light of these preliminary results, further analyses have been planned in order to investigate what modifications actually occur during the treatment and what impact they eventually have on the audio recorded on the tape. The analyses comprise surface morphology and chemical analysis, by means of the environmental scanning electron microscope (ESEM); crystallographic nature of magnetic iron oxides, by means of X-Ray Diffraction (XRD); evaluation of the binder degradation, by means of acetone extraction [27, 28]; evaluation of water extractable acidic products, by means of acidity test; Gas Chromatography-Mass Spectrometry (GC-MS); broadband dielectric relaxation spectroscopy; mechanical analysis. The final aim of these analyses is to gather the necessary understanding to compile a reference protocol to plan remedial actions specifically addressed to each type of tape, with the objective of effectively solving their problems in the least invasive way and of avoiding any negative effect in them.

4. The Project

REVIVAL (REstoration of the VIcentini archive in Verona and its accessibility as an Audio e-Library) is an Italian joint project involving the Fondazione Arena di Verona and the Department of Computer Science of the University of Verona, with the scientific support of Eye-Tech [29]. The project lasted for 24 months, from January 2009 to December 2011. The purpose of the project was to develop a hardware/software platform to support the active preservation of the audio collection of the Fondazione Arena, with collateral results and benefits that distinguish REVIVAL from other digitization projects: (i) the setup of a laboratory for preservation inside the archive, with the best conditions to achieve autonomy and self-sustainability by the end of the project; (ii) the knowledge transfer from the research team to the archival personnel; (iii) the realization of chemical analyses on magnetic tapes, in collaboration with the Department of Industrial Engineering—Chemical Sector, University of Padova; (iv) the development of original open-source software tools to control and to automatize some procedures of the preservation protocol.

4.1. The Archive

The value of the archive has been estimated at 2,300,000 Euros. It comprises tens of thousands of audio documents stored on different carriers (from wax cylinders to digital carriers), nearly a hundred pieces of equipment for playback and for recording (from wire to magnetic tape recorders and phonographs), and bibliographic publications (including monographs and all issues of more than sixty music journals from the 1940s to 1999). The archive consists of two sections: a historical section, with live recordings of the operas staged at the Arena during the summer seasons Figure 1(a), and a “Mario Vicentini” section, named after its donor. Along with a history of the recording techniques, the archive traces the evolution of a composite genre such as opera, with one of the largest collections of live and studio recordings in Italy. The first opera festival was organized in 1913 by the tenor Giovanni Zenatello and the theatre impresario Ottone Rovato to celebrate the centenary of the birth of Giuseppe Verdi (1813–1901). Another hundred years later, in 2013, Arena celebrates its first Centennial festival, with Plácido Domingo as honorary artistic director. All the audio documents stored by the archive are unique copies, including performances by artists the calibre of Luciano Pavarotti and Katia Ricciarelli, most of whom have never been published. The archive is constantly growing with new recordings from the current opera seasons, now stored on HDD devices.

4.2. The Scenario

Although the intent of the methodology for preservation is to be general, some of its aspects are addressed to a specific type of archives. While all archives face common problems such as obsolescence and degradation of the carriers, some problems depend on the archive size, its history, and its policies. The archive of the Arena di Verona is representative—a type of archive that is most often found in Europe (with few exceptions, such as the British Library or the Institut National Audiovisuel in Paris), as opposed to the the type of archive found in North America and in Australia. The main difference between the two resides in the fact that archives tend to be big and centralized in younger countries such as the USA, Canada, and Australia, while there are many small-to-medium archives in Europe. The reasons are obviously historical, but the consequences are manyfold: the funding for archival services including preservation is fragmented in the European scenario, preventing economies of scale. The creation of a network of shared resources (documents, equipment, personnel, and infrastructures) is obstructed by the lack of coordination at political and technical level; these archives cannot afford technological transfer autonomously nor in-house software development. When necessary, the choices come down to commercial closed software systems, hardly ever designed on requisites that match the archives’ needs. The strong point of the present work consists in the fact that an original scientific methodology has been defined and applied in archives with the characteristics and problems typical of the European scenario. In this sense, the achievements of the REVIVAL project discussed in the paper (the working environment, the software tools, the chemical analyses, the number of digitized documents, etc.) are proportioned to this scenario and to the resources available for institutions such as the archive of the Arena di Verona as well as for many others such as European foundations, theatres, private collections, and radio broadcastings.

In [30], the authors introduced the preliminary phase of the project: the main task consisted in the development of an operational protocol aimed at the preservation of the audio documents stored in the archive. The main international guidelines were considered [5, 6] and trade-offs were made to meet the characteristics of the Arena archive, in terms of number and type of documents, genre of the recordings, and objectives of the digitization (see Section 3).

4.3. Working Environment

In [31], it is said that “nothing has ever been preserved—at best, it is being preserved,” comparing preservation to an ongoing process rather than a process limited in time. In agreement with [31], the authors believe that preservation is not a service and that it should not be outsourced for multiple reasons: (i) preservation involves digitization as well as maintenance routines on the documents and on the digitized archive (an “ongoing” duty), which are best performed in situ; (ii) turning to an external body means to accept its delivery times and costs, which may not meet the archive needs and may lead to inconvenience or conflicts; (iii) preservation is a core activity for archives; therefore, it should be kept inside the archives and not delegated to third parties, who hardly ever embrace the principles of the archival philosophy in their business mission.

One of the main objectives and of the most original traits of the REVIVAL project was to enable the archive to carry out the preservation routines in an autonomous way, by setting up a complete working environment inside the archive and by training the archival personnel. The general trend is that documents are shipped out to private bodies that charge the archives to perform the digitization of their documents. Due to lack of funds, the archives are often forced to make their choices based on the costs of the service, and, due to a limited expertise in the technical-scientific field of preservation, the archives often turn to private recording studios or to whomever possesses a functioning reel-to-reel tape recorder but who does not necessarily have any experience in the field of preservation in scientific terms or is aware of the international standards, guidelines, and best practices. The in-house laboratory enables the Fondazione Arena di Verona to take care of its own collections, maintaining a total control on the handling of the documents and also covering the costs in the medium run. Of course, it is not thinkable that every archive has its own laboratory: in the authors’ view, a strategical solution might be that some reference archives (e.g., with the size and the economic framework of the Fondazione Arena di Verona) gather the equipment and the competences to carry out digitization services for smaller archives with a protocol scientifically validated. This would have the additional benefits of (i) sustaining the laboratory from a financial point of view and of (ii) creating a network of archives sharing the same methodology for preservation.

The setup of the laboratory was completed during the first six months of the project. The laboratory features two working stations equipped with Apple desktop machines. One is dedicated to Digital-to-Digital transfers (D/D) as required by DATs and by CDs. The other is dedicated to the Analog-to-Digital transfers (A/D) and is connected to an A/D-D/A converter (PRISM ADA-8XR): here, mainly open-reel tapes and Compact Cassettes are treated. The working stations are in different rooms so that the operators can constantly monitor the re-recordings on the amplification system without interfering with each other’s session.

The equipment for reading open-reel tapes consists in a large variety of models, spanning from semiprofessional machines (e.g., REVOX B-77, two and four tracks heads) to professional ones (three Studer A-812, two tracks heads).

The choice of the playback/recording equipment followed these criteria: (1) full compatibility with the format—specific parameters of the documents and (2) technological state of the art. Optimal retrieval of the signal can only be achieved by modern, well-maintained replay equipment, ideally of the latest generation, in order to keep replay distortions to the absolute minimum [20, p. 6]. In [15, p. 1015], Schüller agrees that “the older the format and original playback equipment, the more advisable it is to adjust modern equipment to historical formats or even to design new equipment.”

The choice of the A/D-D/A converter (PRISM ADA-8XR) followed these criteria: (1) support of sampling frequency and of bit depth compatible with the requirements of the preservation project (96 kHz/24bit) and (2) the highest Effective Number Of Bits (ENOB) available on the market.

The laboratory features a photographical working station for the production of the contextual information (Figure 1(b)). It was designed for short and frequent photographical sessions, maximizing the quality of the picture with the minimum effort to (i) adjust the positioning of the camera and its parameters for each session and to (ii) transfer the new files to the desktop working station without dismantling the photographical setting or moving too many things around. The functionality of the photographical working station was first based on the requirements reported by the Istituto Centrale per il Catalogo e la Documentazione (ICCD) and by the Italian Ministry of Culture in [33] and then enriched by the experience of the REVIVAL project with the specific problems of audio documents. The experience resulted in a set of documented good practices, which were added to the description of the procedures involved in the remediation process (see Section 5).

For the physical recovery of magnetic tapes, a precision incubator is used (a Memmert INP 400, which can be seen in Figure 1(a)). For more information on thermal treatment for magnetic tapes, see Section 3.2.

5. Quality Control

Along the remediation process as described in Section 3, things can go wrong in many ways, and each inaccuracy reverberates down the process creating traceable but mostly nontraceable flaws. Undocumented flaws/distortions mean “falsification” of the process output, so nothing must be left to chance and each action must comply with the protocol. This is why each step of the remediation process was divided into simple tasks, described by a separate flowchart, and each block extensively commented. Exceptions are managed, and the precision of the descriptions is as rich as possible in order to reduce the indecision that comes from the large variety of carriers and the numberless combinations of symptoms presented by the carriers.

Figure 6 shows an example of a flowchart: it represents step 1.2 of the process (Preparation of the carrier Optimization of the carrier). In the notation adopted by the project, blocks marked with double lines are subfunctions described separately.

The structure of the workflow in Figure 6 is straightforward, but the example is representative because the aim of the document is to provide precise descriptions of each task. To achieve this goal, visual material and notes are associated to the blocks with several references to separate sections where more material is presented and commented. This workflow describes the physical documentation of the audio medium, and it comes with a section where guidelines for visual documentation of cultural heritage are presented [33], along with suggestions for setting up a photographic workspace and a number of warning and tips that can make a difference in the quality of the output data.

The following are concise textual descriptions for each procedure.

(1) Preparation of the Carrier. The aim is to document the source carrier in its material form and to prepare it for the preliminary analyses to the playback session. It consists of four steps: (1.1) the physical documentation of the carrier, which in turn is divided in photographical documentation, in scanned images or flat objects, and in data validation (human control for pictures to be clear and centered, for texts to be readable, etc.); (1.2) a visual inspection that is aimed at detecting physical corruptions, from the least severe (splice replacement, bad winding, dirt, dust, etc.) to the most severe (brittle, tears, curling, etc.), which will be treated in procedure 1.4 “Optimization of the carrier”; (1.3) the chemical analyses can detect corruptions that elude the human eye: for tapes in very good condition it may be skipped, but the choice should be documented and responsibility should be taken. In case of tapes that have already been considered for thermal treatment after the visual inspection, chemical analyses are mandatory (see Section 3.2); (1.4) in order to extract the best signal possible, the physical condition of the carrier must be the best possible, meaning that it should be treated to maximize its performance during playback. This step involves cleaning and physical restoration (mechanical and chemical). At the end of this procedure, the carrier is ready for further analyses and for signal extraction.

(2) Signal Transfer. The entire remediation process is often thought to coincide with this step only, which is wrong: each single procedure, from “Preparation of the carrier” to “Data processing and archival,” contributes to the achievement of a reliable, accurate, and philologically authentic document, and each must be performed with absolute care. Nevertheless, signal transfer is undoubtedly a crucial moment in the remediation process under many points of view, with the main being that the carrier is not an object of restoration anymore: it is still handled with care but it must do what it was built for, it must resist an entire playback session, and it must give a performance to the top of its physical condition. This is true for all carriers; however, there might be differences: the best signal should always be extracted in one single playback session, even for those carriers that would tolerate multiple playback sessions, but there are cases when the carrier is damaged or “consumed” at microlevel during playback to the extent where it could tear apart after being read. This is why the “Preparation of the carrier” is crucial: “it is important to do it right the first time (and hopefully the only time). This implies an optimal signal extraction from the original carriers, and this should be carried out before the physical and/or chemical degradation of the carrier or the obsolescence of hardware becomes critical” [18, p. 5].

Before the carrier is actually played back for signal extraction, two more actions are necessary: (2.1) the recording format/parameters must be defined, which is not always doable without actually playing short portions of the carrier (absent, vague, or not trustable information on the case/label). For open-reel tapes, some parameters are tape width, playback speed, number of sides and number of channels per side, equalization curve, and noise reduction system; (2.2) once these parameters are defined, the best available equipment must be chosen and adjusted to be compatible with the values; after that, the technical chain responsible for the conversion, for the data acquisition, and for the monitoring must be checked; (2.3) monitoring is the step that really coincides with the extraction of the signal: while the carrier is being played, an operator must attentively listen to the audio being amplified after the conversion and the acquisition (see Section 3) and report any disturbance heard with the precise timing. In case of suspect disturbances with unidentified origin, the operator should evaluate the situation and decide whether he should just report the event or replay the carrier to make sure the disturbance was not introduced by the remediation digital system; (2.4) before archiving the audio file thus obtained, the operator should check its integrity (manually and/or with software tools) and also take some time to play again specific parts of the file that might have coincided with delicate moments of the session (end of the tape, splits, joints, etc.); (2.5) when the audio file is technically validated, the carrier is no longer needed in the remediation process and it can be prepared for long-term storage. Digitized carriers should not be destroyed ([20, p. 4] and [31, p. 56]).

(3) Data Processing and Archival. This is the final step of the remediation process, in which the preservation copy of the audio document is completed and archived. (3.1) At this point, the preservation copy contains the audio file(s), the images of the cover and of the accompanying material, the documentation about the file formats, and the technical scheme of the remediation system. The only thing missing is the metadata about the audio file(s) (see Section 3): in the case of the REVIVAL project, it is the checksum (MD5) of each file, formatted into a single text document with extension md5. (3.2) When all of the required elements are placed into the preservation copy and the associated database records are filled, the descriptive sheet can be generated with an automatic procedure and copied into the root folder of the preservation copy, which can be now archived. The remediation system is ready for the next audio document, starting from the “Preparation of the carrier”.

If all of the workflows reach the end without exceptions, the remediation process is complete. The expected output includes (i) a preservation copy of the document of origin; (ii) a set of records in the database; (iii) the document of origin, with optimized physical conditions, ready to be stored again [15]. The authors present some of the characteristics of an adequate storage environment, which depend on the material composing the media, in [34].

5.1. Software Tools

Managing large amounts of digital data without adequate tools is an arduous task, let alone performing extensive controls on it. Whether the data in the preservation copies gets digitized (analog-to-digital conversion) or produced along the remediation process, all is digital in the end. Not planning a management system, even a simple one, before the digitization starts is a mistake to be paid for later on. As plain as this may seem, it is a common mistake in the authors’ experience with most archives. Apparently, the amounts of resources spent on data validation and on the archive maintenance are very often underestimated by the stakeholders, whose interest is limited to the moment of remediation.

In the REVIVAL project, digitization started before the software tools introduced in this section were developed, because real data was needed to shape the algorithms. After a few months of steady work, some hundreds of documents had been digitized (mainly open-reel tapes and some Compact Cassettes). The preservation copies were stored on an external hard drive, and the database records were created separately by hand. Having worked with dedication and zeal, the archivist was sincerely surprised when he saw the number of errors found by a prototype of the algorithm. Unfortunately, the errors were quite expected, since it is known that low level and repetitive tasks, intrinsic of any archival routine, cause attention problems and let the operator introduce flaws in the system.

The advantage of using software tools developed ad hoc in archival routines, namely, in the audio preservation field, is twofold; first, it offers a rigorous control over the procedures and the data; secondly, it allows some degree of automation for some procedures [35]. These two aspects are intertwined, since automation implies control and what is not automated can still be controlled. Automation is also a good solution for reducing costs, saving man-hours (i) during the creation of the preservation copies and (ii) during the periodic controls over the entire archive. Figure 7 shows the logical scheme of the user-software system at the Fondazione Arena. The scheme shows that the software utilities mediate the user’s actions on the local file system and on the database, thus preventing his/her from manipulating the data without supervision.

After considering the aspects of the remediation process that needed attention in this sense and could support automation, five different software tools were developed.(1)Utility to perform controls on the entire collection of preservation copies, searching for empty directories, missing directories, anomalous directories, mismatch in file names and file formats, and missing checksums: the criterium used to determine the errors is given by the structure of the long-time archive.(2)Utility to rename audio and visual data pertaining to a preservation copy (based on a drag-and-drop interface): contrary to most existing freeware for batch-renaming files, this utility was programmed to reflect the specific structure of the long-term archive, thus calculating filenames and finding paths automatically.(3)Utility to perform a control over the checksums of the entire collection of preservation copies and calculate the missing ones (grouping them in a single text file, depending on the structure of the preservation copy): Figure 8 shows this utility at work.(4)Utility for the long-term maintenance of the archive: it recalculates the checksum of the audio files in each preservation copy and compares it with the checksums retrieved from the database. Mismatches are notified.

The utilities were developed in Java, because (i) of cross-platform compatibility as well as high-level abstraction from the physical machine and (ii) the fact that fast code development and a large availability of libraries are necessary to design, implement, and test the tools during the short life cycles of the project. They were provided a GUI (an example is shown in Figure 8) and can be launched by clicking on an icon as most desktop applications, suiting the limited computer skills of the archive personnel.

Some of the tasks implemented by the utilities are not audio specific and, taken singularly, some could be performed with generic pieces of free software (such as batch renaming). The advantage of developing ad hoc software tools lies in the increased level of automation and in deeper controls, enabled by the archive model hard-coded or passed to the utilities. At the same time, path variables and serialized objects have been kept as general as possible, making it easy for other archives to benefit from the same utilities in preservation programmes based on similar approaches.

Efficiency has been a key parameter in the development of the software tools described in this section. But, besides the meaning that efficiency holds in the field of computer science, the authors considered the performance of the software in comparison with the same job performed by a human. In fact, it is not very important to test the efficiency of the software on massive data sets, because the order of magnitude of the target archives’ audio collections is equal to hundreds of thousands of files. The real efficiency is actually represented by the adoption of software tools, as opposed to a completely manual routine.

A preservation copy is a folder with a specific subfolders structure. In total, there are six subfolders for each preservation copy. Each copy has a technical scheme, four documentation files, a descriptive sheet, an MD5 file, and a number of pictures greater than three. The number of audio files can vary depending on the type of medium and on exceptional syndromes that required multiple stops during the signal extraction session (SBS-SSS can be one of these). If we assume that the preservation copy of a Compact Cassette or an open-reel tape contains two audio files and that the preservation copy of a Compact Disc or a Digital Audio Tape (DAT) contains 14 audio files, we can estimate the size of the data set on which the software tools described in this section operate. By the end of the REVIVAL project, over 1200 preservation copies have been completed, roughly divided into 600 Compact Cassettes and open-reel tapes and 600 CDs and DATs. The final estimate is then 7,200 folders and 40,800 single files. The utility marked number 1 in the list presented a few paragraphs earlier, for example, requires only a few seconds to examine the entire data set and output a detailed report. Now the possibility to control the consistency of the entire data set at any time, even several times a day, is very convenient. The utilities that involve the calculation of the checksums take longer, because they work on audio files with a size that normally exceeds 1 GB.

The REVIVAL utilities aim was to start supporting preservation projects with original software tools that are clearly missing and needed; in our experience, small and medium size archives take for granted that data managing and data entry are manual jobs (resigning to the drawbacks deriving from this), and most surprisingly they do not seem to be aware of what technology can actually offer them. This is a classic example showing the importance of cross-domain collaborations, based on daily knowledge exchange, with a true will to learn more about the others’ research domain peculiarities. This approach, recently theorized by [36], in the authors’ opinion may be the key to novel research results in an authentic multidisciplinary spirit.

6. Conclusions

This paper presented the methodology and the results of the Italian research project REVIVAL (2009–2011), engaged with cultural activities related to the chemical analysis of magnetic tapes and active preservation of audio documents. The strong points of the project are as follows:(i)the transfer of knowledge and skills from academia to archival institutions;(ii)the creation of a strong link between the academic research in computer science (area of multimedia) and the applied research in archival science;(iii)the definition of a methodology to meet the needs of audio archives typical of the European model in size, collections, and resources (see Section 4.1), considering the problem of quality control in the process of preservation;(iv)the planning of chemical analyses on magnetic tapes in collaboration with experts in the fields of materials science and chemistry (REVIVAL is the first preservation project that performed this type of analyses, and the results (Section 3.2) show that preservation practices would largely benefit from further study in this direction);(v)the development of original open-source software tools especially designed to control and to automatize the procedures of the preservation protocol.

The project also obtained remarkable results on the cultural side, discovering and recovering some documents of immense value for Arena and for the entire musical community: unique copies of unpublished performances by Plácido Domingo, Luciano Pavarotti, Katia Ricciarelli, etc.. Then, it increased the visibility of the Fondazione Arena, attracting public and funding (cultural industries are a leading sector even in the present critical years [37]).

On the scientific side, important results have been the setup of a laboratory inside the archive and the training of the archive personnel, who are now able to support the active preservation of the audio collection of Arena and of other/smaller archives on the territory.

Acknowledgment

This work has been partially supported by the Fondazione Arena di Verona.