Digital species identification

Our development team is making great progress, says Patrick Vine

You take a picture of a mushroom, or an animal track in the dunes. You record an owl hooting in the night, or crickets chirping. And within seconds, you know what species it is. One of the goals of Arise is to enable inference, training, data annotation, and all scenarios related to digital species identification.

The Digital Species Identification team is building software infrastructure to enable uploading of generic species identification algorithms and to use those algorithms to infer data from a sensor in the wild, and to report on those results.

Currently, we will support algorithms that are implemented inside a standalone Docker container (explanation) that can receive media items (images, sounds, etc) to analyze. Results from the AI algorithm are then stored in the Biocloud, the Arise data store. In the future, we have ambitions to share pre-trained weights and mix and match other parts of algorithms.

The system

To date, the system consists of a repository for managing AI algorithms, a job runner to run AI jobs, and the Arise user interface to enable data browsing, querying and creation of datasets to be analyzed.

Our first core focuses are on captured images (from a Diopsis Camera) and audio (from bird song). We are keeping two algorithms in mind at all times to ensure that the solution we are building remains able to work beyond one algorithm and media type.

Our biggest challenge right now is finding the best way to dynamically run containers with GPUs. One of the experiments we will be trying is to run our algorithms on Snellius, the Dutch Supercomputer, for both inference and training.

Snellius Supercomputer
The Snellius has 76,832 cores and 144 GPUs, and 245 TB of memory. At peak performance, it reaches 6.1 Pflop/s. Available since 2021, it really puts the "super" into supercomputer. Image by SURF.

Incremental value

The development team is getting to know each other better. We are ramping up with being able to deploy software repetitively and safely and setting up the feedback loops that we need to know whether things are still working. The core technologies that we are using include Docker, AWS, Python, JavaScript, a little bit of React - all in a test driven agile manner. We’re focusing on building incremental value safely with good feedback loops.

We’re still bedding down our first prototype, so there is a lot to learn about how best to do things for this specific domain and many, many interesting enabling features to experiment with going forward. The ambition of the team is large and I look forward to seeing where we can take this.

Patrick Vine is a software developer and member of the Digital Species Identification team.