Species range model details #234

kahst · 2024-01-15T14:08:24Z

kahst
Jan 15, 2024
Maintainer

Introduction

The BirdNET Species Range Model V2.4 - V2 uses eBird checklist frequency. data to estimate the range of bird species and the probability of their occurrence given latitude, longitude and week of the year. eBird relies on citizen scientists to collect bird species observations around the world. Due to biases in these data, some regions such as North and South America, Europe, India and Australia are well represented in the data, while large parts of Africa or Asia are underrepresented (see map below). In cases where eBird does not have enough observations (i.e. checklists), the data "only" contain binary filter data of likely species that could occur in a given location. Therefore, the training data for our biodiversity model is a mixture of actual observations and filter data curated by experts. We included all locations for which at least 10 checklists are available for each week of the year, and randomly added other locations with a 3% probability. This is what the final training data looks like on a map:

Figure 1: This map shows availability of eBird checklist frequency data (yellow = good representation, dark purple = only filter data)

Here is an example of what the training data for a given location (Chemnitz) looks like:

'gretit1': [72, 90, 98, 93, 96, 88, 95, 94, 99, 99, 93, 92, 90, 96, 85, 97, 89, 78, 67, 68, 48, 39, 35, 40, 49, 49, 49, 51, 48, 55, 55, 73, 60, 64, 62, 63, 72, 72, 72, 67, 66, 80, 63, 74, 67, 76, 88, 70], 
'carcro1': [62, 81, 83, 82, 85, 75, 90, 75, 83, 80, 76, 80, 84, 90, 72, 73, 83, 67, 70, 75, 54, 48, 42, 55, 51, 53, 55, 49, 55, 53, 55, 62, 57, 55, 66, 69, 63, 65, 69, 63, 59, 74, 61, 63, 76, 79, 69, 60], 
'eurbla': [55, 80, 84, 92, 71, 70, 72, 84, 85, 86, 82, 95, 88, 92, 86, 91, 90, 75, 87, 81, 84, 72, 69, 62, 67, 70, 57, 66, 55, 56, 49, 32, 36, 37, 41, 49, 55, 62, 57, 58, 41, 37, 58, 67, 69, 64, 69, 49], 
'blutit': [67, 83, 92, 93, 96, 83, 87, 93, 96, 90, 82, 80, 84, 88, 58, 79, 74, 52, 46, 36, 34, 29, 25, 26, 39, 43, 36, 43, 47, 42, 49, 48, 49, 51, 45, 52, 61, 64, 55, 55, 65, 72, 62, 71, 66, 67, 69, 64], 
'grswoo': [61, 84, 80, 80, 90, 83, 85, 77, 76, 82, 72, 77, 77, 78, 64, 76, 81, 69, 73, 75, 66, 44, 46, 41, 47, 41, 38, 44, 42, 42, 52, 68, 37, 35, 38, 43, 44, 41, 43, 41, 49, 61, 41, 49, 48, 47, 67, 47], 
'cowpig1': [9, 10, 3, 3, 16, 16, 30, 54, 65, 61, 69, 76, 83, 81, 80, 86, 80, 71, 68, 78, 68, 69, 79, 68, 76, 69, 69, 79, 70, 70, 68, 73, 64, 63, 58, 54, 53, 49, 53, 56, 44, 21, 33, 38, 45, 43, 5, 11],
'eurnut2': [43, 76, 88, 82, 79, 78, 91, 84, 92, 86, 76, 77, 75, 85, 69, 75, 60, 34, 47, 58, 34, 24, 33, 33, 31, 23, 28, 25, 23, 21, 23, 52, 26, 26, 31, 28, 25, 29, 32, 23, 47, 46, 24, 31, 30, 36, 61, 53], 
'comcha': [26, 33, 30, 33, 34, 34, 39, 48, 70, 75, 80, 83, 80, 90, 76, 85, 80, 74, 77, 74, 59, 52, 51, 40, 34, 44, 33, 31, 22, 15, 17, 21, 17, 18, 26, 34, 44, 48, 53, 49, 31, 27, 33, 39, 44, 39, 30, 28]

...

The data consists of a species code and 48 values - one for each week - indicating the checklist frequency between 0 and 100. In the case above, great tits appear to be the most common species and have a checklist frequency of 72 in week 1, meaning that 72% of all submitted eBird checklists for this location in week 1 contain at least one great tit.

Training

During training, we randomly select a location, use circular embeddings to encode latitude, longitude, and week, scale the checklist frequency to a uniform format, and run this data through a simple classifier with three fully connected layers plus an output layer with 6,522 output units, one for each species.

If we now query the trained model for the same location as above, we get these values for great tits:

'gretit': [99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 98, 98, 98, 98, 98, 97, 97, 97, 97, 97, 97, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99]

Ok, so apparently great tits are very common in Chemnitz all year round, which is true. Let's take a look at a migratory species - the
the Common House-Martin:

'cohmar1': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 5, 12, 16, 21, 26, 36, 43, 54, 62, 65, 66, 67, 70, 72, 73, 75, 78, 80, 80, 76, 72, 55, 34, 20, 13, 7, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

We can see that the estimated probability of occurrence increases in mid-March, peaks in June/July and decreases towards the end of the year, which is a good reflection of the migratory behavior of this species.

Range maps

We can plot the data for all locations worldwide on a map to get a better understanding of the capabilities and limitations of the trained model. This is what the distribution map looks like for the Common House-Martin in week 24:

Figure 2: Occurrence probability for the Common House-Martin in week 24.

Well, that does look a bit strange. But remember: we only have binary filter data for large parts of Africa and Asia, hence the bright yellow (= very likely) parts in Africa and Asia, even though this species has its breeding range in the Western Palearctic.

Let's look at the occurrence estimate for week 1:

Figure 3: Occurrence probability for the Common House-Martin in week 1.

We can clearly see that things have changed in Europe (where we have good data coverage), so the model is able to accurately estimate migration when a location has training data coverage. That's encouraging.

Here are a few more examples for other species from North America (where eBird coverage is better):

Figure 4: Occurrence probability for the Magnolia Warbler in week 1 (blue) and week 24 (red).

Figure 5: Occurrence probability for the Yellow-rumped Warbler in week 1 (blue) and week 24 (red).

Conclusion

Overall, we can assume that the model works well in North and South America, Europe, India and Australia. In other regions, there is a lack of eBird observations and the resulting species lists do not reflect the actual probabilities of occurrence. Nevertheless, we can use these lists to filter for species that may or may not occur in these locations.

Feel free to ask questions in the comments below.

robinsandfort · 2024-01-19T13:25:31Z

robinsandfort
Jan 19, 2024

Hi Stefan,
thanks for this great explaination and beautiful maps!
Do the species predictions for each call (recording) change when using a custom
species list compared to the BirdNET Species Range Model V2.4 - V2?
So would the same call of a Common House-Martin recorded in Chemnitz during week 24
result in different prediction values using a custom species list (including Common House-Martin)
compared to the eBird list? Thanks for clarification!
Greetings from the forest,
Robin

1 reply

kahst Jan 19, 2024
Maintainer Author

No, it wouldn't change the prediction score since we're using species lists as binary filters. If the species is on the list, you'll see a score, otherwise not. The species range model does apply a cut-off threshold and excludes species below this threshold, so during migration that might affect the house-martin compared to your custom list. However, if it makes the list, you'll get a detection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Species range model details #234

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Species range model details #234

kahst Jan 15, 2024 Maintainer

Introduction

Training

Range maps

Conclusion

Replies: 1 comment · 1 reply

robinsandfort Jan 19, 2024

kahst Jan 19, 2024 Maintainer Author

kahst
Jan 15, 2024
Maintainer

Replies: 1 comment 1 reply

robinsandfort
Jan 19, 2024

kahst Jan 19, 2024
Maintainer Author