Surfing the sea of data
Handling data storage and preventing degradation
Data degradation affects everyone. Be it the ageing of storage media or the loss of data through imperfect copying. Preserving data for future generations has been an ongoing challenge for mankind. Read on to learn more about the problem of data storage and data degradation and what to do about it.
Evolution of storage capacity
How did this mass of data come to pass? Technology has always been the driving influence behind how much data is stored at any given time. A look at the evolution of data carriers and the growth of the storage capacity over the centuries illustrates this.
Infinite growth of storage capacities? – Moore’s law
In 1965 Gordon Moore, the cofounder of Intel, observed that the number of transistors in a circuit approximately doubles every two years. In other words, the capacity of data carriers has increased dramatically while the size of storage drives has shrunk. For the last few years, however, there has been a stagnation. Nevertheless, the possibilities in a digital future seem to be endless.
Looking ahead: data degradation in the future
Nowadays, scientists are researching how to prevent data degradation and to make data last longer. Holographic storage, for example, would allow data to be encoded on many layers of tiny holograms. 3 Another even more extreme scenario is the encoding of a single bit of information on a quantum mechanical system, such as an electron which can be read by a quantum computer. Research is also being conducted on the longevity of data. Scientists from the University of Southampton discovered a way to store data in five dimensions on nanostructure glass that could survive for billions of years. 4 Similarly futuristic research is being carried out at ETH Zurich, where researchers have found a way to store information in the form of DNA, thereby preserving it for nearly an eternity. 5
As history shows, the evolution of data storage is varied and fast-changing. However, the question always remains the same: “How can I store my data as conveniently as possible for as long as necessary?” Data, just like all the data carriers, are ultimately human-made and therefore ephemeral. 6
Lots of copies keep stuff safe
…let us save what remains […] by such a multiplication of copies, as shall place them beyond the reach of accident 7— Thomas Jefferson, lamenting the loss of documents in a letter dated 18 February 1791
Modern data carriers have been a tremendous help in saving information in many different ways. Nevertheless, humanity still risks losing important data as it increasingly exists solely in digital form and can only be read using fitting technology. Mankind has been fighting data degradation by copying information onto newer and more modern media over and over again ever since the dawn of writing. But to make these preserved data usable, we need more: appropriate software enabling us to read and handle data needs to be available, as well as basic information about what sort of data we’re dealing with. Theoretically, digital data should be invulnerable. Thus, it lulls us into a false sense of security. People keep thinking that if it’s digital, it’s safe. But modern data carriers are not immune to decay and degradation, and are sometimes even more fragile than paper, due to their dependence on certain technologies.
A story of data loss and recovery at ETH Zurich
Everyone has experienced data degradation or data loss in some form or other – be it by not making a backup of holiday pictures or not being able to play a VHS tape. One public example, as discussed in a recent article 8 , stems from the Terrestrial Systems Ecology Group 9 at ETH Zurich led by Professor Andreas Fischlin. 10 Their interdisciplinary research depended on diverse data sources and the group faced particular challenges in managing its research. One of the key topics was the ongoing field measurements along the entire length of the Alps as part of a larch bud moth project, which started in 1949. 11 Since its launch, the project continuously applied the most modern techniques of the time. Over the last few decades, the data collected has been stored on many carriers, including punch cards, paper tape, magnetic tapes etc. In the late 1970s, a customised database was even developed. However, the high demand for manpower, together with high costs to transfer the database system to a modern host, led to its discontinuation.
Beware, the latest state-of-the-art technology is no warranty for success!— Prof. Fischlin, 2016
Despite the best intentions and plans, data degradation in the form of software erosion had also made it impossible to properly retrieve the collected data. Thus, the majority of the data could only be salvaged in its raw form. One of the key causes of data loss, according to Professor Fischlin, was the aging of the storage media:
It’s hard to predict material aging. We need more research from material sciences towards durability of different storage media, for the purpose of curation, because it requires different properties than day to day use.— Prof. Fischlin, 2016
What else is needed to ensure the usability of existing data, depends on their exact properties. Dependencies such as the software needed for rendering data or additional information which is required to understand its meaning must be known to make use of such data later.
Nevertheless, the story of the Terrestrial Systems Ecology Group at ETH Zurich has a largely happy ending: due to the investment of a lot of time and effort by many people, most of the data could be salvaged. Some parts still remain unreadable as the hardware used to read the data carriers is no longer available and custom-made solutions need to be engineered. It’s a work in progress.
ETH Library’s services to data preservation
Ensuring the long-term preservation and usability of relevant data at the ETH Zurich and supporting its staff and researchers in handling and preserving their data are among the core tasks of the Digital Curation Office at ETH Library. With the ETH Data Archive, ETH Library provides an infrastructure for the medium and long-term storage of digital data. Within this context, the Digital Curation Office serves as point of contact for technical and conceptual issues concerning long-term electronic archiving and data management. Furthermore, it offers researchers support in managing and publishing their data, as well as how to follow the requirements stated in the Guidelines for Research Integrity at ETH Zurich. 12 It also gives advice when it comes to the correct choice of file formats. Thus, the Digital Curation Office can be seen as a part of the ever growing network of institutions, which are needed to fight against data degradation.
Six easy tips to keep your data safe
Don’t want to run the risk of losing your data? Follow these six tips and you’ve made a good start!
1. Organise and standardise
3. Automate backups
4. Know the lifespan
5. Use simple tools
6. Use open file formats
- Lerner F (2009) The story of libraries: from the invention of writing to the computer age. 2nd ed. New York: Continuum. ↩︎
- https://www.clir.org/pubs/reports/pub54/4life_expectancy.html ↩︎
- https://mozy.com/infographics/the-past-present-and-future-of-data-storage/ ↩︎
- Zhang J, Čerkauskaitė A, Drevinskas R, et al. (2016) Eternal 5D data storage by ultrafast laser writing in glass. 9736: 1–16. ↩︎
- Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ: Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes. Angewandte Chemie International Edition, 54, 8, 2552,-2555, DOI: 10.1002/anie.201411378 ↩︎
- Smith Rumsey A (2016) When We Are No More How Digital Memory Is Shaping Our Future. Bloomsbury Press. ↩︎
- National Archives (2016) From Thomas Jefferson to Ebenezer Hazard, 18 February 1791. Founders Online. Available from: http://founders.archives.gov/documents/Jefferson/01-19-02-0059 (accessed 12 July 2016). ↩︎
- Ana Sesartic, Andreas Fischlin, Matthias Töwe (2016): Towards Narrowing the Curation Gap—Theoretical Considerations and Lessons Learned from Decades of Practice ISPRS Int. J. Geo-Inf. 5: 6. 91. ↩︎
- http://www.sysecol.ethz.ch/ ↩︎
- http://www.sysecol.ethz.ch/people/afischli ↩︎
- Baltensweiler, W.; Fischlin, A. The larch bud moth in the Alps. In: Dynamics of Forest Insect Populations: Patterns, Causes, Implications; Berryman, A.A., Ed.; Plenum Publishing Corporation: New York, NY, USA, 1988; Volume 1, pp. 331–351. ↩︎
- https://www.ethz.ch/content/dam/ethz/main/research/pdf/forschungsethik/Broschure.pdf ↩︎