top of page
Search

Dnabarcoder: a tool for eDNA identification

In this blog, the Westerdijk people of ARISE talk about a tool that helps you recognize fungal (for now) sequences in eDNA samples. It's integrated with existing databases and now comes with a web interface.


A close up picture of a fungus
Sporidesmiella juncicola Crous & Osieck, a novel species from the Netherlands, described in 2021

Biodiversity drives the fundamental ecosystem processes that maintain and support life on Earth. Currently, about 2 million species have been described. However, it has been estimated that the number of species on Earth ranges between 5.3 million and 1 trillion, indicating that the majority of the world’s biodiversity remains undiscovered (Locey and Lennon 2016). Fungi, the area of expertise of the present authors, constitute the second-largest group of all eukaryotic organisms based on global richness estimates, with 2-3 million predicted species (Niskanen et al. 2023). To date, less than 5% of the estimated fungal species have been described.


While scientists are working hard to discover and describe new species, we are witnessing biodiversity loss at an unprecedented and most alarming rate, with one million species on the brink of extinction. It is increasingly recognized that biodiversity loss is one of the most pressing threats to the environment. Therefore, it is crucial to monitor and maintain biodiversity for a sustainable future.


Metabarcoding

The rapid development of sequencing technologies in recent decades has enabled us to quickly assess biodiversity and the abundance of environmental samples using the metabarcoding (eDNA) approach. This method targets specific genetic markers to assess biological diversity in environmental samples. The eDNA sequences thus generated are subjected to taxonomic identification against reference sequences to determine the presence of species and the diversity of the samples. This approach thus allows assessment of all species in a sample - including species that very small or that otherwise have life-styles or life stages that are not typically considered in traditional attempts at biodiversity assessment.  


The most common approach for eDNA taxonomic identification is the sequence similarity search suite BLAST (Altschul et al. 1997). Machine learning/deep learning-based classifiers have also been developed to expedite eDNA identification. However, as observed in Vu et al. (2020), when classifying a dataset with reference sequences not really present in the training dataset, these classifiers - including the Ribosomal Database Project (RDP) Bayesian classifier (Wang et al., 2007) and the two deep learning-based classifiers CNN (LeCun et al., 2015) and DBN (Hinton & Salakhutdinov, 2006) – may incorrectly identify sequences and/or detect fewer species compared to BLAST. This implies that until we discover an AI model capable of robustly identifying rare species from environmental samples, BLAST will remain a popular tool for sequence identification.


a group of people watching a presentation in a small room
Duong Vu (front) and her ARISE colleagues,having a...wait for it...BLAST

Problematic

In most eDNA studies, a single similarity cut-off has been used for sequence identification in the sense that a sequence with at least, say, 97% similarity to a reference sequence over its full length is identified to the species name of that reference sequence. As more fungal DNA barcodes were generated, it gradually became clear that the use of a single, static threshold value for taxonomic identification was problematic. Threshold values that worked well in some parts of the fungal tree of life either overestimated or underestimated species boundaries in other parts of the tree (Vu et al. 2016, 2019; Abarenkov et al. 2016). Different clades will require different similarity cut-offs to maximize taxonomic resolution and explanatory power.


In Vu et al. (2022), we proposed dnabarcoder, which is an open-source tool to predict similarity cut-offs for different fungal clades and to classify/identify eDNA sequences based on the predicted similarity cut-offs. Our results show a significant improvement in the accuracy and precision of eDNA identification.


Web interface


To assist users in overcoming the intricacies of using the command-line version of dnabarcoder (Mikryukov et al., 2023), we have joined forces with Hogeschool Leiden student Ruby van der Holst to develop a web interface for dnabarcoder. The similarity cut-offs for large databases such as UNITE and the Westerdijk Institute's own CBS database for fungal identification have been pre-computed and are ready for action.



Not just fungi

Although dnabarcoder was initially designed for fungal communities, we recognize its potential relevance to the metabarcoding and eDNA communities at large. Westerdijk Institute in collaboration with the UNITE community, will release and maintain the online dnabarcoder platform and so it remains accessible for mycologists and also researchers of other groups of organisms, including the entire community of ARISE. This initiative will aid in monitoring and tracking microbial biodiversity changes in the Netherlands and beyond, facilitating solutions to address challenges related to climate change and environmental sustainability. Even eDNA datasets from mundane samples such as private gardens or public parks tend to have much to say, and we view dnabarcoder as a step towards a richer and more informed understanding of our living world.


 

Duong Vu (d.vu@wi.knaw.nl), R. Henrik Nilsson, Gerard J.M. Verkley

Westerdijk Fungal Biodiversity Institute (DV, GJMV), University of Gothenburg (RHN)


 

References

Abarenkov, K., Adams, R. I., Laszlo, I. et al. (2016). Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from a May 23-24, 2016 workshop (Gothenburg, Sweden). MycoKeys 16, 1–15.

Altschul, S.F. et al. (1997). Gapped BLAST and PSI-BLAST: a new generation protein database search programs. Nucleic Acids Research 25, 3389–3402.

Hinton, G.E., Salakhutdinov, R.R (2006). Reducing the Dimensionality of Data with Neural Networks. Science 313, 504–7.

LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep learning. Nature 521, 436-44.

Locey K.J. & Lennon J.T. (2016). Scaling laws predict global microbial diversity. Proceedings of the National Academy of Sciences 113, 5970-5975. Mikryukov, V. et al. (2023). Connecting the multiple dimensions of global soil fungal diversity. Science Advances 9, eadj8016(2023). DOI:10.1126/sciadv.adj8016

Niskanen, T., Lücking, R., Dahlberg, A., et al. (2023). Pushing the Frontiers of Biodiversity Research: Unveiling the Global Diversity, Distribution, and Conservation of Fungi. Annual Review of Environment and Resources 2023 48:1, 149-176.

Vu, D., Groenewald, M., Szöke, S. et al. (2016). DNA barcoding analysis of more than 9000 yeast isolates contributes to quantitative thresholds for yeast species and genera delimitation. Studies in Mycology 85, 91–105.

Vu, D., Groenewald, M., de Vries, M. et al. (2019). Large-scale analysis of filamentous fungal DNA barcodes reveals thresholds for species and higher taxon delimitation. Studies in Mycology 92, 135–154.

Wang, Q., Garrity, G. M., Tiedje, J. M., et al. (2007). Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology 73, 5261–5267.

bottom of page