Species range model details #234
kahst
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
Hi Stefan, |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Introduction
The BirdNET Species Range Model V2.4 - V2 uses eBird checklist frequency. data to estimate the range of bird species and the probability of their occurrence given latitude, longitude and week of the year. eBird relies on citizen scientists to collect bird species observations around the world. Due to biases in these data, some regions such as North and South America, Europe, India and Australia are well represented in the data, while large parts of Africa or Asia are underrepresented (see map below). In cases where eBird does not have enough observations (i.e. checklists), the data "only" contain binary filter data of likely species that could occur in a given location. Therefore, the training data for our biodiversity model is a mixture of actual observations and filter data curated by experts. We included all locations for which at least 10 checklists are available for each week of the year, and randomly added other locations with a 3% probability. This is what the final training data looks like on a map:
Figure 1: This map shows availability of eBird checklist frequency data (yellow = good representation, dark purple = only filter data)
Here is an example of what the training data for a given location (Chemnitz) looks like:
The data consists of a species code and 48 values - one for each week - indicating the checklist frequency between 0 and 100. In the case above, great tits appear to be the most common species and have a checklist frequency of 72 in week 1, meaning that 72% of all submitted eBird checklists for this location in week 1 contain at least one great tit.
Training
During training, we randomly select a location, use circular embeddings to encode latitude, longitude, and week, scale the checklist frequency to a uniform format, and run this data through a simple classifier with three fully connected layers plus an output layer with 6,522 output units, one for each species.
If we now query the trained model for the same location as above, we get these values for great tits:
Ok, so apparently great tits are very common in Chemnitz all year round, which is true. Let's take a look at a migratory species - the
the Common House-Martin:
We can see that the estimated probability of occurrence increases in mid-March, peaks in June/July and decreases towards the end of the year, which is a good reflection of the migratory behavior of this species.
Range maps
We can plot the data for all locations worldwide on a map to get a better understanding of the capabilities and limitations of the trained model. This is what the distribution map looks like for the Common House-Martin in week 24:
Figure 2: Occurrence probability for the Common House-Martin in week 24.
Well, that does look a bit strange. But remember: we only have binary filter data for large parts of Africa and Asia, hence the bright yellow (= very likely) parts in Africa and Asia, even though this species has its breeding range in the Western Palearctic.
Let's look at the occurrence estimate for week 1:
Figure 3: Occurrence probability for the Common House-Martin in week 1.
We can clearly see that things have changed in Europe (where we have good data coverage), so the model is able to accurately estimate migration when a location has training data coverage. That's encouraging.
Here are a few more examples for other species from North America (where eBird coverage is better):
Figure 4: Occurrence probability for the Magnolia Warbler in week 1 (blue) and week 24 (red).
Figure 5: Occurrence probability for the Yellow-rumped Warbler in week 1 (blue) and week 24 (red).
Conclusion
Overall, we can assume that the model works well in North and South America, Europe, India and Australia. In other regions, there is a lack of eBird observations and the resulting species lists do not reflect the actual probabilities of occurrence. Nevertheless, we can use these lists to filter for species that may or may not occur in these locations.
Feel free to ask questions in the comments below.
Beta Was this translation helpful? Give feedback.
All reactions