A new Computer Vision Model (v2.3) including 1,624 new taxa

We released a new computer vision model today. It has 74,135 taxa up from 72,511. This new model (v2.3) was trained on data exported last month on April 2nd and added 1,624 new taxa.

Taxa differences to previous model

The charts below summarize these new taxa using the same groupings we described in past release posts.

By category, most of these new taxa were insects and plants

Here are species level examples of new species added for each category:

Click on the links to see these taxa in the Explore page to see these samples rendered as species lists. Remember, to see if a particular species is included in the currently live computer vision model, you can look at the “About” section of its taxon page.

We couldn't do it without you

Thank you to everyone in the iNaturalist community who makes this work possible! Sometimes the computer vision suggestions feel like magic, but it’s truly not possible without people. None of this would work without the millions of people who have shared their observations and the knowledgeable experts who have added identifications.

In addition to adding observations and identifications, here are other ways you can help:

  • Share your Machine Learning knowledge: iNaturalist’s computer vision features wouldn’t be possible without learning from many colleagues in the machine learning community. If you have machine learning expertise, these are two great ways to help:
  • Participate in the annual iNaturalist challenges: Our collaborators Grant Van Horn and Oisin Mac Aodha continue to run machine learning challenges with iNaturalist data as part of the annual Computer Vision and Pattern Recognition conference. By participating you can help us all learn new techniques for improving these models.
  • Start building your own model with the iNaturalist data now: If you can’t wait for the next CVPR conference, thanks to the Amazon Open Data Program you can start downloading iNaturalist data to train your own models now. Please share with us what you’ve learned by contributing to iNaturalist on Github.
  • Donate to iNaturalist: For the rest of us, you can help by donating! Your donations help offset the substantial staff and infrastructure costs associated with training, evaluating, and deploying model updates. Thank you for your support!
Posted on May 12, 2023 10:16 PM by loarie loarie

Comments

Sweeet!

Posted by kevinfaccenda 12 months ago

This is one of my favorite days for Inat!

Posted by yayemaster 12 months ago

Nice update. Why is https://www.inaturalist.org/taxa/908145 included in the new model since it only has 12 research grade observations with 29 photo's ? Is the minimum bar lowered or has this taxa been cleaned up recently?

Posted by rudolphous 12 months ago

Btw, you might want to change the destination of the link to the description of the groupings of taxa in the stats - right now it links in several rounds to pages that just contain the same reference to an earlier post and no further explanation. By repeated clicking I assume the correct destination should be https://www.inaturalist.org/blog/69958-a-new-computer-vision-model-including-4-717-new-taxa

Posted by jlisby 12 months ago

You should not be heroising the CVI platform. You do not know much time this one person spends trying correct incorrect/ridiculous CV idenifications. It mostly makes me want to leave the platform.

Posted by oneanttofew 12 months ago

@oneanttofew While the CV certainly does make conspicuous mistakes, it is still generally quite accurate at getting things to genus / family level where they can then be improved by humans. I'm personally very proud of the CV and how much it has improved over the past two years.

Posted by kevinfaccenda 12 months ago

CV is a tool. We need better onboarding for new people, how to evaluate CV options. It has most definitely improved in the years that I have been on iNat. And my patch is the almost half (okay one third) is - green stuff - just dump it in plants.

PS left a few comments - added to CV May 2023 - on the Cape Peninsula species. Since the info holds only while we can use the links embedded in this blog post.

Posted by dianastuder 12 months ago

@rudolphous I think the rules for inclusion in the model may have changed, but the Help page (https://www.inaturalist.org/pages/help#cv-taxa) still lists having at least 100 observations, so that is confusing.

Posted by cthawley 12 months ago

In between, I think it was 100 pictures - about 60 obs depending.

But that seems to have changed since? @loarie ?

Posted by dianastuder 12 months ago

This could be wrong but I think more iconic species (which presumably have more photos on the internet) get included in the model even if they are well below the threshold.

E.g. The pygmy hippopotamus was included in this model despite having only 4 RG observations.

Posted by mabuva2021 12 months ago

@mabuva2021 @cthawley @rudolphous The model is also training on captive observations. E.g. there are 140 observation of pygmy hippos from zoos: https://www.inaturalist.org/observations?place_id=any&taxon_id=74192&verifiable=any

I don't think that the model traing on any data which isn't on iNat

Posted by kevinfaccenda 12 months ago

Please add the button to drop this post from the dropbox page. I have read it.

Posted by tonyrebelo 12 months ago
You should not be heroising the CVI platform.

I for one do think the CV AI is a hero! For all its mistakes it is brilliant on what it is trained for. To me it is already an indispensable tool. I foresee the day that when I post an ID, it will ask me - are you sure? It looks more like ...
(and I wish it already did it, for when I post - instead of a plant - Passerina the bird or Elegia the moth or Erica the spider).

Posted by tonyrebelo 12 months ago

So, where is it released and does the model have an free/libre/open license? I miss a link to the download.

Posted by davidak 12 months ago

It is, what we are now using on iNat.

Posted by dianastuder 12 months ago

Absolutely, the CV is an awe inspiring acheivement. I tested once that it got 80% of Eristalis hoverfly observations right at species level. It won't be great for every taxon - (and obviously not for those species that are not included) - but it is great for the majority of observations. When the suggestion is poor, I doubt the user's best effort would have been better!

Posted by matthewvosper 12 months ago

@rudolphous it looks like there has been some identification churn as there has been a considerable effort recently to update misidentified observations in genus Eucereon. the next model will likely remove E chalcodon in favor of some of its siblings. See https://www.inaturalist.org/journal/regisrafael/77128-eucereon-chalcodon-are-being-misidentified-as-eucereon-compositum for some info

Posted by alexshepard 12 months ago

@matthewvosper That clearifies a lot. Thanks for pointing out.

Posted by rudolphous 12 months ago

@alexshepard could you confirm whether "This has changed over time, but as of the model released in March 2020, taxa included in the computer vision training set must have at least 100 observations, at least 50 of which must have a community ID." is still the case, or whether the requirements are different now

Posted by thebeachcomber 11 months ago

@thebeachcomber that's not the case anymore - we require 100 photos, but that can be spread across a number of observations. We prefer not to use too many photos from the same observation, but we will use a few from each, so the number of observations can be lower than 100.

We're always looking for ways to increase the robustness of the training dataset, so this criteria might change in the future.

For example, lately I've been concerned about taxa like https://www.inaturalist.org/taxa/387943-Costelytra-brunnea - it has enough photos to be included in the vision model, but it only has 3 observers and 3 identifiers, and 98 of the 100 observations of this taxon were made by a single observer, almost all in 2 months in late 2021. I fear there may not be enough diversity of photography equipment, observer behavior, and identifier opinions to say that we really know what this thing looks like well enough to teach a computer vision algorithm. If this persists and turns out to hurt the model (still TBD), we'll probably have to add a floor to the number of photographers and perhaps identifiers as well.

Posted by alexshepard 11 months ago

thanks for that; so theoretically a species could enter the model if it had just 5 observations, with each of those having 20 photos?

Posted by thebeachcomber 11 months ago

@thebeachcomber nope, as I said, we'll use a few from each, but we prefer to not use too many photos from the same observation.

Posted by alexshepard 11 months ago

thanks again [I should've clarified, I meant without the human choice aspect, 5 obs should be possible based purely on the numerical requirement]

Posted by thebeachcomber 11 months ago

'add a floor to the number of identifiers'

That would also be good for new species, where there may literally only be one or two identifiers.

Posted by dianastuder 11 months ago

'add a floor to the number of identifiers' - except that as the AI gets trained on rarer and rarer species, the number of people capable of accurately identifying them gets less and less. Especially with rare plants or invertebrates. If the only identifier is the world expert, then surely that is good enough: otherwise we will require "supporting" IDs behind taxonomical specialists to get species onto the training dataset.

Posted by tonyrebelo 11 months ago

I am torn both ways. It is not good for CV to offer a rare new species, instead of the genus where there are many species. For some reason the Species novum bounces to the top of the list.

Posted by dianastuder 11 months ago

"For some reason the Species novum bounces to the top of the list." - any examples please: this should not be the case, unless the "Species novum" is the best match.

Posted by tonyrebelo 11 months ago

Can't find it, but I remember an Indigofera? With a (new species) number, instead of a name. Which was subsequently offered by CV where it was not relevant.

@alexshepard is there a threshold for Seen Nearby? I tidy up distribution maps as it only takes one obs, with one ID to offer as Seen Nearby. That error multiplies fast. Threshold should at least be CID and Research Grade for That One?

Posted by dianastuder 11 months ago

Add a Comment

Sign In or Sign Up to add comments