Digitize, share, connect
Data from natural history collections go mobile
In 1790 visitors would wait 14 days for their personal data and letter of recommendation to be checked before they were allowed to enter the British Museum in London. Even to this day, access to the riches squirrelled away in museums and academic collections is often reserved for an exclusive circle of specialists who travel from far and wide in search of objects and the knowledge they harbour. However, the tide has turned in recent years. Thanks to digitization, data from collections are now freely available online for the whole world to use. Where do we currently stand? Who uses these data? What does digitization actually mean and is it worth all the effort?
The magic of big numbers
“Have you already got this butterfly?” This question crops up time and again on tours of entomological collections and, although understandable, it is wrong. After all, having one specimen per species is not the goal. The display cases in ETH Zurich’s Entomological Collection, for instance, contain dozens or even hundreds of butterflies of the same species. The days when collections of curiosities tried to outdo each other with exotic rarities are long gone. The true value for academia lies in constantly collecting the same species for many years and in many places. The larger the collection, the more opportunity for comparison there is and the more information and results it yields. Aristotle described this as “emergence” 1 , which can be paraphrased somewhat more crudely as “the whole is greater than the sum of its parts”. With this in mind, some wonderful museums and collections have sprung up in many countries over recent centuries, which have amassed millions of insects, plants, birds and other flora and fauna and meanwhile rank among the most valuable cultural assets in the world.
Boasting 68 million objects from the four corners of the globe, the National Museum of Natural History in Paris is one of the biggest collections in the world. Although, by comparison, the Naturmuseum in Frauenfeld only has 100,000 objects, it focuses on Eastern Switzerland and can probably provide more accurate information on the bugs of the Canton of Thurgau than its counterpart in the French capital. However big the collection, it is forever destined to remain incomplete as a full representation of nature will never be possible. To date, nature researchers have described 2 million species of animal, plant and fungus worldwide. All the collections in the world put together contain an estimated total of around 3 to 4 billion archived specimens of these species 2 (concerned animal rights activists should remember that 4.5 billion insects die on the front of cars every single day in the Netherlands alone 3 ). That said, the majority of these objects are scattered widely in small regional institutions. This is precisely where digitization really comes into its own: the idea is to create one, single global collection; a virtual one that unites the data from all the collections in the world.
The Global Biodiversity Information Facility (GBIF) network was founded in Copenhagen in 1999 with the goal of exchanging biodiversity data freely, and currently has 93 members in 54 countries. The GBIF collects the data from the member institutions, which in turn set up national databases and record the local biodiversity. In Switzerland, these data centres include the Centre Suisse de Cartographie de la Faune (CSCF), Info Flora and the Swiss Ornithological Institute. Digitization projects, including those currently underway in ETH Zurich’s collections, also send their data to these centres.
Goodbye knowledge monopoly – free data for everyone!
It is expensive and time-consuming to digitize a collection. Checking the identification of the species and subsequently entering the information in the database usually costs between 50 cents and several dollars per object 5 . Even if the first few steps can be automated, the work remains laborious and takes several years or even decades, depending on the size of the collection. This is complicated by the fact that the insects, pressed flowers and other fragile objects require special handling, the labels are often barely decipherable and there is a shortage of qualified specialists with expertise in the diverse organism groups. Therefore, steps that require a high level of knowhow and experience need to be separated from the simple and repetitive ones, automated processes and computers need to be used wherever possible, and the techniques generally need to be optimised in the same vein as industrial mass production. By the spring of 2018, over 130 million objects from natural history collections had been digitized worldwide 6 – an impressive score, granted, but merely a start.
The nature of Western Europe and North America displays relatively little biodiversity. For historical reasons, however, these countries have the largest natural history collections in the world. Compared with its population or area, Switzerland also boasts a wealth of natural history collections: in total, over 40 million natural history specimens 7 are housed here, including 345,000 types, i.e. specimens of first descriptions 8 . This is exceptional, given that Switzerland does not have a colonial past and, as a landlocked country, largely remained disconnected from the global trade networks.
Compared to the West, the rest of the world harbours the majority of the biodiversity yet to be researched, but seldom has well-developed research institutions. On the one hand, attempts are being made to tackle this deficiency via international agreements, such as the Nagoya Protocol 9 , which create a legal international framework for fair access to the biodiversity and genetic resources of all the nations involved. On the other hand, open data initiatives, which include digitization and the free publication of natural history data, offer the research community all over the world access to knowledge which used to be exclusive and geographically bound. Digitization therefore plays a key role in equal opportunities in researching and utilising the world’s biodiversity and genetic resources.
Nature research in the Anthropocene
Environmental protection and species conservation, spatial planning, research, tourism, agriculture – many industries depend on reliable, complete data on the condition and future of our biodiversity. The environment is not static; it has been changing since the year dot. Cold periods alternated with warm spells; regions became flooded or rose out of the sea; new land bridges united groups of animals and plants that had previously been separated. For a number of centuries, the influence of human civilisation has also been tangible. Researchers refer to a new geochronological era, the “Anthropocene”, which dawned in the mid-20th century and where humankind emerged as the principle influential factor. Climate change, agriculture, urban development – intentionally or not, we humans are shaping our environment ever more intensively. How are plants and animals coping with the changed environment? Are their populations spreading, is their distribution shrinking or are they even dying out? Collections hold the answers.
Two species of butterfly, the map (Araschnia levana) and the common ringlet (Coenonympha tullia), are prime examples of the impact that human encroachments are having on nature. Until the 1950s the map butterfly was only found locally in Central Europe, but soon spread prolifically and now populates the Swiss Plateau, the Jura region and certain Alpine valleys in Switzerland. What is the reason for this expansion? The map butterfly is reliant upon shady forest edges to develop and eats stinging nettles and other weeds in nutrient-rich habitats. The frequent use of fertiliser in our agriculture and the encroachment on forest edges encourage these very habitats and enable the map butterfly to spread over a wide area. This makes the map one of only a handful of species to benefit from current agricultural and forestry practices.
The example of the common ringlet is far more typical of the way our fauna is developing: once abundant all over Switzerland, the species has become a very rare sight today and all but disappeared from the Swiss Plateau. The butterflies live in marshland, need a lot of humidity and depend on tufts of grass to overwinter. However, the frequent mowing of the vegetation and the drainage of large wetland areas have hit the species hard in recent decades.
Digitization: ETH Library as a partner for research
Over the centuries, libraries have built up vast knowledge of how to manage and mediate large quantities of objects and data. Consequently, it only seems natural for them to also exploit their knowhow in the digitization of collections. ETH Library’s core services thus include establishing, expanding and maintaining digital infrastructure for scientific and culturally historical collections, as well as for ETH Zurich’s various archives.
ETH Library’s DigiCenter is a service-provider in this field and digitizes – often in projects spanning several years – selected holdings and collections, which are ultimately available in digital form for research and teaching, but also the general public.
However, the availability of an individual natural history collection in digital form is often insufficient for research without supplementary and secondary information. Additional information is necessary. This is where academic libraries make ideal partners for research and teaching – thanks to both the existing holdings, which might contain research results and complement the collections, and their information science skills, especially when it comes to indexing and preserving holdings.
Entomology is a prime example of just how varied ETH Library’s digital offerings are: depending on the object of the research, information on biodiversity, soil conditions, political borders, local circumstances or the climate is required in this field to reconstruct historical biodiversity and interpret current data. This information can be found in often old and rare books, in journals published regularly over several decades or even in the form of archival material, such as images, field books or manuscripts.
But how are the users supposed to access the information efficiently? In recent years, many of these documents have been digitized at ETH Library. For instance, the DigiCenter has produced a total of around 15 million scans in the last nine years. Digital information is published – usually in collaboration with other libraries – on various platforms run by ETH Library. The contents can be searched for directly on these platforms and are also accessible via ETH Library’s Knowledge Portal, where different source systems, such as catalogues, databases or even platforms are pooled and can be searched via one, single access point.
The undisputed value of these platforms, which are run and fed in collaboration with other renowned Swiss libraries and archives, lies in the fact that documents which are physically scattered far and wide can be found and downloaded in one single place. This paints a comprehensive picture of which documents, objects and publications might be available for entomological research in Switzerland. Thus, different digitization projects complement each other perfectly: on the one hand, these include collections that are hundreds of years old, such as ETH Zurich’s Insect Collection, which is made available to the global research community online. And on the other hand, there are the holdings of the libraries and archives which – where legally permissible – are also provided in digital form.
Meanwhile, the extent of the information available in digital form is considerable – not only in entomology, but also in many other research areas. Over 6 million digitized pages have been uploaded onto E-Periodica alone. But there is still a long way to go: the stacks of the libraries and collections still harbour countless other relevant documents and the offerings are constantly being expanded.
Digitizing information and rendering it accessible is not enough by itself, however. Securing the data so that they remain available in the long term and do not eventually become unreadable and thus useless poses a major challenge for digitization projects. This topic area is also part of a comprehensive overall solution for ETH Library: it is concentrating on long-term digital archiving and has set up the corresponding infrastructure which secures ETH Library’s data, but is especially available to ETH Zurich for its digital information and research data.
Digitization opens up natural history collections and libraries to the whole world. This is merely the beginning of a success story, which, along with other open data initiatives, will increasingly unfurl its full potential. Thanks to digitization, however, will physical collections soon become redundant? Certainly not! The actual flora and fauna are packed full of untapped information, such as DNA, proteins, toxins or medicinal substances that are yet to be recorded and can only be studied on the real object.
- Aristotle: Metaphysics. Book 8.6. ↩︎
- CETAF – Consortium of European Taxonomic Facilities, personal message. www.gs.ethz.ch ↩︎
- Waterfield, B: Two trillion insects killed on Dutch cars every year. The Telegraph, 11.07.2011. ↩︎
- Bovey, P. & Sauter, W.: Utilisation d' armoires compactes pour le classement de collections entomologiques. Mitteilungen der Schweizerischen Entomologischen Gesellschaft, 33: 275–278, 1961. ↩︎
- Heidorn, P.B.: Biodiversity Informatics. Bulletin of the American Society for Information Science and Technology, 37: 38–44, 2011. ↩︎
- GBIF - Global Biodiversity Information Facility: GBIF Occurrence Store; data accessed in January 2018. www.gs.ethz.ch ↩︎
- Klaus, G. & Martinez, S.: Die naturwissenschaftlichen Sammlungen der Schweiz. HOTSPOT, 13: 6–7, 2006. ↩︎
- Agosti, D., Linder, P., Burckhardt, D., Martinez, S., Löbl, I., & Loizeau, P. A.: Switzerland's role as a hotspot of type specimens. Nature, 421:889–889, 2003. ↩︎
- Convention on Biological Diversity. www.cbd.int/abs/about ↩︎
- Gimmi, U., Lachat, T., & Bürgi, M. (2011). Reconstructing the collapse of wetland networks in the Swiss lowlands 1850–2000. Landscape ecology, 26(8), 1071 ↩︎