Our first new Computer Vision Model (v2.10) for 2024 including 1,599 new taxa

We released a new computer vision model today. It has 83,622 taxa up from 82,023. This new model (v2.10) was trained on data exported on November 26th.

Here's a graph of the models release schedule since early 2022 (segments extend from data export date to model release date) and how the number of species included in each model has increased over time.

Here is a sample of new species added to v2.10:

Posted on January 4, 2024 08:19 PM by loarie loarie

Comments

For what reasons may a taxon with around 100 observations be excluded?

Posted by eleodesthermopolis 4 months ago

No RG observations?
Zie header 'we changed a few things about how we generate training data'
https://www.inaturalist.org/blog/63931-the-latest-computer-vision-model-updates

Posted by optilete 4 months ago

In my case there are RG observations, the lowest being around 30. And I do believe that at least two had over 100 at the start of December.

Posted by eleodesthermopolis 4 months ago

If shot by the same observer or with the same camera/phone, there needs to be diversity.

Posted by marina_gorbunova 4 months ago

The inclusion threshold appears to be more closely linked to the number of images than the number of observations, perhaps specifically the number of images associated with verifiable (or potentially verifiable?) observations. Answers from iNat staff have been a little imprecise on this topic, but I get the impression that a taxon needs 200+ photos to become a candidate for CV model training.

Posted by rupertclayton 4 months ago

@rupertclayton minimum threshold excluding other factors like eg Marina mentioned is 100 photos

Posted by thebeachcomber 4 months ago

I've also read that it's 100 photos, and that did previously appear to be the case, but for the current iteration and previous several, spot checking suggests to me that a larger number of photos seems to be required.

Here is the newly added plant with fewest observations (60):

https://www.inaturalist.org/taxa/284587-Aristolochia-nelsonii

And this seems to currently have 209 photos. Here's another one with 65 observations:

https://www.inaturalist.org/taxa/587108-Heliophila-lactea/browse_photos

That one has 164 photos. Conversely, several species that I've worked to add identifications for have not been added to the CV model even when they get to 120 photos. And those certainly had a very varied set of observers. I guess I'm just saying that 100 photos is not the reliable criterion that I interpreted it to be!

Posted by rupertclayton 4 months ago

one thing to note is that current number of photos is not equal to number of photos when the latest model began training. In this case the data was exported on November 26th last year; how many photos did those two taxa have at that point in time? And indeed, how many photos did they have when the previous model was trained? Would be interesting to see the numbers from then

Posted by thebeachcomber 4 months ago

A nice haul from southern Africa. 86 species of plants, although the tail end suggests 13 of these (6 if filtered for verifiable only) are garden plants or escapes.
These range from 277 observations (Solanum lichtensteinii) to 50 observations (Fenestraria rhopalophylla) - with 2 species over 200 observations, 2 over 150 observations, 6 over 100 observations, 14 over 80 observations, 19 over 70 observations, 22 over 60 observations, 4 over 50 observations. A real tail-ender of 20 observations for southern Africa was augmented by 91 from mainly West Africa.
Interpreting this is complicated by the cut-off being during the Great Southern Bioblitz (November 24 - 27; https://www.inaturalist.org/projects/great-southern-bioblitz-2023-southern-africa-umbrella) where lots of observations were made, many of which were not identified until the following week.

My two questions are:
-1. When are subspecies and varieties going to be included? Because the CV does not include these, these species are seldom identified further, despite these being crucial for taxonomy, conservation planning, red listing and environmental impact assessments.
We have 135 plant species (over 250 taxa) in southern Africa with over 200 observations identified at the subspecies and variety level, which only get ID'd at the species level (https://www.inaturalist.org/observations?hrank=subspecies&lrank=variety&place_id=113055&subview=map&verifiable=any&view=species&iconic_taxa=Plantae). OK, some of these only have one taxon in southern Africa - but some have half a dozen - although in some cases ~90% of observations is for one taxon, but the total is 1,845 plant species with RG subspecific IDs. Having the CV help suggest these would be a big boon to getting identifications done.
-2. I understand the issues with hybrids being intermediate and confusing the CV. However, is it possible to nominate specific ones that do not? So Safari Sunset has 1,442 observations - it is utterly distinctive from its parents Leucadendron laureolum and salignum - but the CV does not even ID most of them as even Leucadendron, coming up with really spurious IDs in some cases. Almost all observations of this are cultivated, but it is the most planted cultivar by far, and Americans (mostly Californians - over 600 observations [way above the 100 cutoff!!]) insist on posting it on iNaturalist and getting the incorrect IDs.

Posted by tonyrebelo 4 months ago

@thebeachcomber: You're correct that the number of photos as of the data export date is important. But now that the model training takes only about 6 weeks, this doesn't vary much. For the two taxa I cited, all the observations were added before October 31, 2023. If we assume that adding photos to existing observations is an insignificant factor, then the number of photos eligible for training model 2.1 was probably the same as today (209 and 164). The cutoff date for CV model 2.9 was October 15, 2023. As best I can tell, those two taxa had 208 and 159 photos respectively at that date. So, I'm still puzzled as to the inclusion criteria...

@tonyrebelo: I wholeheartedly support your proposal to make infrataxa and hybrids eligible to be included in CV.

I know that there were problems with how CV suggested some hybrid bird taxa, but I think these special cases don't justify a blanket exclusion of all hybrids. Many cultivated plants (and therefore, quite a lot of invasive species) are hybrid taxa, e.g. Crocosmia × crocosmiiflora. Lots of people upload photos of these hybrid plants in cultivation or growing wild. CV doesn't have the option to suggest the hybrid, so most observers select another species. Fixing this requires attention from at least one knowledgeable identifier, and in the many cases where the observer doesn't respond, it requires three opposing votes. So, there's a huge amount of work involved to ensure that iNat (and GBIF exports) have accurate data and it could mostly be prevented if the CV model was allowed to include hybrid taxa. How about a "Hybrid exclusion list"? This could be updated by curators or staff as appropriate and would define branches of the taxonomy below which hybrids would not be eligible for CV. So, adding "Aves" (taxon_id=3) would prevent CV from considering any hybrid bird taxa.

As to infrataxa, I think iNat is missing a real scientific opportunity. I think we're all aware that today's subspecies is tomorrow's species and vice versa. Taxonomists put a lot of work into aligning names with the various species concepts, but new data and perspectives mean that revisions are constant. No problem there, except that iNat makes infrataxa much less prominent for observers and identifiers, especially through the CV engine. Certainly, varieties and subspecies are often distinguished by small details, and there are lots of observations where the relevant details are not visible. But there are plenty of infrataxa that can be reliably distinguished in iNat observations and quite a few identifiers willing to do that. An identifier who helps apply distinguish two similar species with a few hundred observations can expect that CV will pick up the difference and suggest better IDs to future observers. The same exercise for subspecies is a perennial fight against new CV suggestions.

Posted by rupertclayton 4 months ago

@loarie, Thanks a lot once again for this data log.

Posted by apseregin 4 months ago

I also wonder about the requirements for the CV to 'unlearn' a taxon.
After cleaning up, one species now has 86 observations, with 30 of them RG.
Curious whether this will drop out in the next round

Posted by carnifex 4 months ago

@rupertclayton I like the idea of an exclusion list. It could also apply to species (or organisms at any taxonomic level) that can't be identified to photos. It would greatly reduce the problems that Computer Vision can occasionally cause. I don't think I've seen a Feature Request for this on the forum, but if someone were to propose it, I'd vote for it

Posted by deboas 4 months ago

Always like checking the newly included species . Thanks again for the info!

Posted by ajott 4 months ago

I really enjoy browsing the newly added species and looking at what I might have contributed to, thanks for including it.

Posted by brnhn 4 months ago

The latest model seems to no longer recognise/suggest Melitaea athalia as being "seen nearby" in Kent, UK.
https://www.inaturalist.org/taxa/132875-Melitaea-athalia

Posted by bsteer 3 months ago

@bsteer: iNat failing to suggest Melitaea athalia as "expected nearby" is probably an artefact of the iNaturalist geomodel, rather than something specifically caused by the computer vision model. But you're right that the thresholded geomodel is not currently predicting Melitaea athalia in eastern Kent despite there being 91 iNaturalist observations in the Canterbury area.

@loarie: Is this discrepancy something you can feed into the process of tweaking future versions of the geomodel?

Posted by rupertclayton 3 months ago

Thanks – I remember seeing that and did wonder if that might alternatively the cause. Hopefully there is some means by which we can feedback to correct the system.

Posted by bsteer 3 months ago

February update coming soon?

Posted by dianastuder 3 months ago

yes - v2.11 will be out this coming week or the following week

Posted by loarie 3 months ago

awesome

Posted by apseregin 3 months ago

Add a Comment

Sign In or Sign Up to add comments