• Technologie

Surfing the sea of data

Handling data storage and preventing degradation

27. Februar 2017

Data degradation affects everyone. Be it the ageing of storage media or the loss of data through imperfect copying. Preserving data for future generations has been an ongoing challenge for mankind. Read on to learn more about the problem of data storage and data degradation and what to do about it.

Evolution of storage capacity

How did this mass of data come to pass? Technology has always been the driving influence behind how much data is stored at any given time. A look at the evolution of data carriers and the growth of the storage capacity over the centuries illustrates this.

Clay tablets – the oldest data carriers to surviveAs far as we know today, the oldest writings to survive are found on Sumerian clay tablets, inscribed five thousand years ago by temple bureaucrats recording economic transactions. 1 The amount of information that these tablets could accommodate was relatively small, but the longevity of one of the earliest data carriers is far higher than what is available to us these days.
Papyrus and paper – easy to handle, easy to decayWith the advent of writing on papyrus in Egypt and paper in China, a fibre-based material emerged. If stored properly, the material proved to be extremely long-lasting. If kept in more humid climates, however, it fell prey to mould and could decompose within a few decades. Furthermore, other dangers included fire, bookworms and acids that destroyed the material from within.
Industrialisation the era of punch cardsWith the arrival of the industrial revolution, more and more data had to be organised, thus bringing the traditional means of capturing information to the limit. Perforated paper tape on control looms were used for the first time in 1725. Later on, punch cards became a means of storing information and were used well into the 1970s in spite of their limited storage capacity.
From microfilm to magnetic tapes and floppy discsBy the 1920s, microfilm was used in commercial settings. Another big leap occurred with the invention of magnetic tape. By the 1950s, IBM computers started operating with this tape and it soon became the de-facto industry standard for data storage. Even today, this is still the case, despite the relatively short life span of ten to twenty years. 2 Based on magnetic storage, many memory devices were subsequently developed such as the music tape or floppy discs.
Hard disk drives and USB flash drivesParallel to easily portable storage media, 1956 saw the introduction of the first hard disk drive. Back then, it consisted of fifty disks measuring two feet in diameter which stored 5MB of data each. By comparison, today’s 2.5” hard drives that can fit into one’s pocket easily reach terabytes of space. USB flash drives with a capacity of 8MB were originally introduced in 2001, while the similar Solid State Drive (SSD) technology of today reaches a capacity of terabytes.
Optical devices for high data storageIn 1980 the compact disc (CD) was born, an optical means of storing digital data. The next steps in this technological evolution were the DVD in 1995 and the Blu-ray Discs in 2003, which are able to hold 25 GB and more. In spite of their capacities, optical devices are susceptible to scratching, sunlight and temperature differences, leading to varying lifespans from 2 to 25 years.
Data storage in the cloudBesides physical data storage, cloud storage has become standard nowadays. It allows data to be stored on multiple servers hosted by third parties. Ultimately, the data is still stored on physical devices. But as they remain remote and only accessible via the internet, the expression “in the cloud” was coined. Cloud storage made its breakthrough when Amazon launched its cloud services in 2006 followed by Dropbox in 2007.

Infinite growth of storage capacities? – Moore’s law

In 1965 Gordon Moore, the cofounder of Intel, observed that the number of transistors in a circuit approximately doubles every two years. In other words, the capacity of data carriers has increased dramatically while the size of storage drives has shrunk. For the last few years, however, there has been a stagnation. Nevertheless, the possibilities in a digital future seem to be endless.

The evolution of data carriers' capacity according to their areal densityAreal density in MB/in², 1951 – 2016

Looking ahead: data degradation in the future

Nowadays, scientists are researching how to prevent data degradation and to make data last longer. Holographic storage, for example, would allow data to be encoded on many layers of tiny holograms. 3 Another even more extreme scenario is the encoding of a single bit of information on a quantum mechanical system, such as an electron which can be read by a quantum computer. Research is also being conducted on the longevity of data. Scientists from the University of Southampton discovered a way to store data in five dimensions on nanostructure glass that could survive for billions of years. 4 Similarly futuristic research is being carried out at ETH Zurich, where researchers have found a way to store information in the form of DNA, thereby preserving it for nearly an eternity. 5

As history shows, the evolution of data storage is varied and fast-changing. However, the question always remains the same: “How can I store my data as conveniently as possible for as long as necessary?” Data, just like all the data carriers, are ultimately human-made and therefore ephemeral. 6

Lots of copies keep stuff safe

…let us save what remains […] by such a multiplication of copies, as shall place them beyond the reach of accident 7

— Thomas Jefferson, lamenting the loss of documents in a letter dated 18 February 1791

Modern data carriers have been a tremendous help in saving information in many different ways. Nevertheless, humanity still risks losing important data as it increasingly exists solely in digital form and can only be read using fitting technology. Mankind has been fighting data degradation by copying information onto newer and more modern media over and over again ever since the dawn of writing. But to make these preserved data usable, we need more: appropriate software enabling us to read and handle data needs to be available, as well as basic information about what sort of data we’re dealing with. Theoretically, digital data should be invulnerable. Thus, it lulls us into a false sense of security. People keep thinking that if it’s digital, it’s safe. But modern data carriers are not immune to decay and degradation, and are sometimes even more fragile than paper, due to their dependence on certain technologies.

A story of data loss and recovery at ETH Zurich

Everyone has experienced data degradation or data loss in some form or other – be it by not making a backup of holiday pictures or not being able to play a VHS tape. One public example, as discussed in a recent article 8 , stems from the Terrestrial Systems Ecology Group 9 at ETH Zurich led by Professor Andreas Fischlin. 10 Their interdisciplinary research depended on diverse data sources and the group faced particular challenges in managing its research. One of the key topics was the ongoing field measurements along the entire length of the Alps as part of a larch bud moth project, which started in 1949. 11 Since its launch, the project continuously applied the most modern techniques of the time. Over the last few decades, the data collected has been stored on many carriers, including punch cards, paper tape, magnetic tapes etc. In the late 1970s, a customised database was even developed. However, the high demand for manpower, together with high costs to transfer the database system to a modern host, led to its discontinuation.

Beware, the latest state-of-the-art technology is no warranty for success!

— Prof. Fischlin, 2016
Photograph of larch tree forest near Sils (Engadine, Switzerland, 1981), affected by the larch bud mothPhoto, Prof. Andreas Fischlin, ETH Zurich; Glitch-art, Will Crook

Despite the best intentions and plans, data degradation in the form of software erosion had also made it impossible to properly retrieve the collected data. Thus, the majority of the data could only be salvaged in its raw form. One of the key causes of data loss, according to Professor Fischlin, was the aging of the storage media:

It’s hard to predict material aging. We need more research from material sciences towards durability of different storage media, for the purpose of curation, because it requires different properties than day to day use.

— Prof. Fischlin, 2016

What else is needed to ensure the usability of existing data, depends on their exact properties. Dependencies such as the software needed for rendering data or additional information which is required to understand its meaning must be known to make use of such data later.

Nevertheless, the story of the Terrestrial Systems Ecology Group at ETH Zurich has a largely happy ending: due to the investment of a lot of time and effort by many people, most of the data could be salvaged. Some parts still remain unreadable as the hardware used to read the data carriers is no longer available and custom-made solutions need to be engineered. It’s a work in progress.

ETH Library’s services to data preservation

Ensuring the long-term preservation and usability of relevant data at the ETH Zurich and supporting its staff and researchers in handling and preserving their data are among the core tasks of the Digital Curation Office at ETH Library. With the ETH Data Archive, ETH Library provides an infrastructure for the medium and long-term storage of digital data. Within this context, the Digital Curation Office serves as point of contact for technical and conceptual issues concerning long-term electronic archiving and data management. Furthermore, it offers researchers support in managing and publishing their data, as well as how to follow the requirements stated in the Guidelines for Research Integrity at ETH Zurich. 12 It also gives advice when it comes to the correct choice of file formats. Thus, the Digital Curation Office can be seen as a part of the ever growing network of institutions, which are needed to fight against data degradation.

Six easy tips to keep your data safe

Don’t want to run the risk of losing your data? Follow these six tips and you’ve made a good start!

1. Organise and standardise

Establish a file and folder structure that works for you and use it consistently.

2. Identify

Determine which files need to be preserved.

3. Automate backups

Create automated backups and keep them both locally and off-site.

4. Know the lifespan

Know the lifespan of your data carriers and re-copy your data to new ones in time.

5. Use simple tools

When collaborating, agree on simple workflows and backup tools. Don’t forget to document the context of your data.

6. Use open file formats

Use open file formats and don’t compress data to ensure its compatibility with different operating systems.
  1. Lerner F (2009) The story of libraries: from the invention of writing to the computer age. 2nd ed. New York: Continuum. ↩︎
  2. https://www.clir.org/pubs/reports/pub54/4life_expectancy.html ↩︎
  3. https://mozy.com/infographics/the-past-present-and-future-of-data-storage/ ↩︎
  4. Zhang J, Čerkauskaitė A, Drevinskas R, et al. (2016) Eternal 5D data storage by ultrafast laser writing in glass. 9736: 1–16. ↩︎
  5. Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ: Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes. Angewandte Chemie International Edition, 54, 8, 2552,-2555, DOI: 10.1002/anie.201411378 ↩︎
  6. Smith Rumsey A (2016) When We Are No More How Digital Memory Is Shaping Our Future. Bloomsbury Press. ↩︎
  7. National Archives (2016) From Thomas Jefferson to Ebenezer Hazard, 18 February 1791. Founders Online. Available from: http://founders.archives.gov/documents/Jefferson/01-19-02-0059 (accessed 12 July 2016). ↩︎
  8. Ana Sesartic, Andreas Fischlin, Matthias Töwe (2016): Towards Narrowing the Curation Gap—Theoretical Considerations and Lessons Learned from Decades of Practice ISPRS Int. J. Geo-Inf. 5: 6. 91. ↩︎
  9. http://www.sysecol.ethz.ch/ ↩︎
  10. http://www.sysecol.ethz.ch/people/afischli ↩︎
  11. Baltensweiler, W.; Fischlin, A. The larch bud moth in the Alps. In: Dynamics of Forest Insect Populations: Patterns, Causes, Implications; Berryman, A.A., Ed.; Plenum Publishing Corporation: New York, NY, USA, 1988; Volume 1, pp. 331–351. ↩︎
  12. https://www.ethz.ch/content/dam/ethz/main/research/pdf/forschungsethik/Broschure.pdf ↩︎