New computer vision model

We’ve released a new computer vision model for iNaturalist. This is our first model update since April 2022. The iNaturalist website, mobile apps, and API are all now using this new model. Here’s what’s new and different with this change:

  • It includes 60,000 taxa (up from 55,000)
  • It was trained using a different approach than our previous models, which made it much faster to train

To see if a particular species is included in this model, you can look at the “About” section of its taxon page.

It’s bigger

Our previous model included 55,000 taxa and 27 million training photos. The new model was trained on over 60,000 taxa and almost 30 million training photos.

It was trained using a transfer learning strategy

During previous training runs, our strategy was to train the entire model on the dataset. This means that all of the model weights were candidates for being updated, in order to learn the most efficient and useful visual features for making suggestions for the taxa in that dataset. When training this model, we froze most of the model weights (thereby freezing the visual feature extraction) and only trained the very last layer of the model, the layer that makes the taxa suggestions. This is a machine learning strategy known as transfer learning.

One way to think about this is to imagine that someone was asked to learn all about different kinds of cars. Later, that person was asked to differentiate between two different kinds of pickup trucks, but only using distinguishing characteristics they learned from their study of cars (for example, color, size, visual shape, branding, engine size, etc), without learning anything new about pickup trucks (for example bed capacity, towing limits, etc). Chances are, that person could distinguish between most kinds of trucks without needing to learn anything new specifically about pickup trucks. They may not perform as well as someone who learned about trucks from the beginning, but they have strong foundational knowledge to draw upon for the task.

Our new model was trained using a transfer learning strategy. We used the internal weights and visual features from our previous model which was trained on 55,000 taxa. The advantage of this approach is that we didn’t need to learn all of those internal model weights and visual features again, so training was quite a bit faster. It’s only been four months since our last model was released, which is the shortest time between model releases so far.

As with the pickup truck analogy, it could be that this model trained with the transfer learning approach is slightly less accurate overall than if we had trained the entire model again. However, in our testing this new model appears to achieve nearly the same accuracy as the previous model while containing more taxa. Our plan going forward will be to spend the time fully training a model about once a year to maximize accuracy with new photos and taxa, and to use the faster transfer learning approach in between full training runs so we can release models more frequently than we have in the past.

Future work

First, we are still working on new approaches to improve suggestions by combining visual similarity and geographic nearness. We still can’t share anything concrete, but we are getting closer.

Second, we’re still working to compress these newer models for on-device use. The in-camera suggestions in Seek continue to use the older model from March 2020.

We couldn't do it without you

Thank you to everyone in the iNaturalist community who makes this work possible! Sometimes the computer vision suggestions feel like magic, but it’s truly not possible without people. None of this would work without the millions of people who have shared their observations and the knowledgeable experts who have added identifications.

In addition to adding observations and identifications, here are other ways you can help:

  • Share your Machine Learning knowledge: iNaturalist’s computer vision features wouldn’t be possible without learning from many colleagues in the machine learning community. If you have machine learning expertise, these are two great ways to help:
  • Participate in the annual iNaturalist challenges: Our collaborators Grant Van Horn and Oisin Mac Aodha continue to run machine learning challenges with iNaturalist data as part of the annual Computer Vision and Pattern Recognition conference. By participating you can help us all learn new techniques for improving these models.
  • Start building your own model with the iNaturalist data now: If you can’t wait for the next CVPR conference, thanks to the Amazon Open Data Program you can start downloading iNaturalist data to train your own models now. Please share with us what you’ve learned by contributing to iNaturalist on Github.
  • Donate to iNaturalist: For the rest of us, you can help by donating! Your donations help offset the substantial staff and infrastructure costs associated with training, evaluating, and deploying model updates. Thank you for your support!
Posted on August 19, 2022 12:43 AM by alexshepard alexshepard

Comments

Excellent! Thank you for making these continuous improvements to the process, and thank you for sharing this concise update. Well done!

Posted by tsn over 1 year ago

Wow! Thanks for sharing & thank you staff!

Posted by gatorhawk over 1 year ago

Awesome! Thanks for the explanation of how this works and even more thanks for making it work in the first place :)

Posted by earnoodles over 1 year ago

Very nice work! I'm excited to see more updates. I love this use of this technology and love to see how its evolving.

Posted by roachiecanada over 1 year ago

Impressive, very nice. Glad to see continued updates to the infrastructure.

Posted by kemper over 1 year ago

In which countries or continents will this new model perform better then the old model? Can you say that the 5.000 added new species are mainly from the continent Asia or that the improvement is about insekts, worms or centipeds?

Posted by optilete over 1 year ago

Congratulations!

Posted by prokhozhyj over 1 year ago

I want to highlight this discussion in the iNat-Forum: https://forum.inaturalist.org/t/possible-increase-in-cv-errors-around-organism-range-location/34411

I for my part noticed (without being aware of a new model released) less accurate suggestions lately, with especially the 'seen nearby' species disappeared - maybe the new model is putting regions with fewer observations at a disadvantage, and favors especially North American species?

Posted by carnifex over 1 year ago

Good!

A question: is it possible, for the future, to use the shared computing power of users (those who voluntarily make themselves available) to train the entire model on the dataset, as is done, I believe, in astronomy, to manage large masses of data?

As reference: https://boinc.berkeley.edu/

Posted by valentino_traversa over 1 year ago

Nice and important improvement! Congratulations!

Posted by valeriosbordoni over 1 year ago

Please - can we have a blog post, or just a link to explore - for the 5K new species?

How many for Africa? How many plants?
(With those gifted graphics you have brought us before?)

Posted by dianastuder over 1 year ago

Excellent!

Posted by wildlife13 over 1 year ago

That´s great news! Would also like to know a bit more about those new additions, if this is possible.

Posted by ajott over 1 year ago

This is fast!

Posted by sedgequeen over 1 year ago

This is great! iNaturalist is truly the perfect example of how technology and nature can come together to create something amazing! I would love to see a list of what species have been given the honor of being added! I hope they were a lot of hexapods..! But I don't wish to pressure anyone :)

Posted by timo27 over 1 year ago

I know at least one new CV-species 😄

Posted by carnifex over 1 year ago

Thank you for the update!

Posted by silaseckhardt over 1 year ago

Have the "Included" labels on species "About" pages been updated?

Posted by dan_johnson over 1 year ago

Thanks so much! Been waiting for these small updates that help a ton!

Posted by yayemaster over 1 year ago

https://www.inaturalist.org/taxa/260419-Panopeus-herbstii meets the 100 observation mark but isn’t included.

Posted by yayemaster over 1 year ago

HELP!! I have been searching for "how to's" on this site - and while the drop down menu says "video tutorials" and other invitations, when you open it up it says "this page does not exist". I am so grateful for this tool but after trying to use it for a few years, I still am not proficient and would sincerely appreciate being able to learn from somebody who is proficient. THANKS again for this amazing tool and all your work.

Posted by kimnoreen over 1 year ago

Could we please get the new species list? It might help people see any misidentifications that they may have had.

Posted by yayemaster over 1 year ago

Yes the new species list would be fascinating. :)

Posted by wildlife13 over 1 year ago

https://www.inaturalist.org/taxa/260419-Panopeus-herbstii meets the 100 observation mark but isn’t included.

In April, when the export for this model was created and training was started, there were only 96 verifiable observations and only 28 research grade observations of this taxon, so it's possible that it was under one of the taxon cutoffs.

Posted by alexshepard over 1 year ago

I for my part noticed (without being aware of a new model released) less accurate suggestions lately, with especially the 'seen nearby' species disappeared - maybe the new model is putting regions with fewer observations at a disadvantage, and favors especially North American species?

@carnifex - the new model was released just a few hours before this blog post was published, so none of the observations mentioned in that forum post would have been affected by the new model.

Posted by alexshepard over 1 year ago

A question: is it possible, for the future, to use the shared computing power of users (those who voluntarily make themselves available) to train the entire model on the dataset, as is done, I believe, in astronomy, to manage large masses of data?

@valentino_traversa - unfortunately, computer vision training is not (to my knowledge) modular and granular the way that many scientific computing jobs are.

Posted by alexshepard over 1 year ago

Have the "Included" labels on species "About" pages been updated?

@dan_johnson - yep they are automatically updated when the model goes live.

Posted by alexshepard over 1 year ago

Whoop whoop!

Posted by muir over 1 year ago

Awesome stuff y'all! Thanks for sharing!

Posted by tristonli over 1 year ago

Great stuff, folks!

Posted by radrat over 1 year ago

Great stuff, and seems like a smart strategy!

Posted by deboas over 1 year ago

Thanks once again for everyone's hard work!

Posted by susanhewitt over 1 year ago

Magnificent! Wonderful! Amazing! :) Always love to hear about these updates.

Posted by sambiology over 1 year ago

Nice! Is there a way to see all the new species included in this model?

Posted by torgos216 over 1 year ago

I'm happy to see the new ants added, hopefully even more will come with the next round as IDs continue going through.

Posted by arman_ over 1 year ago

Geographical inclusion is necessary.Many times the computer suggestions are faulty because the species is not found in that region.I think that should be the way to go.
Any way great progress so far.congratulations.

Posted by satishnikam over 1 year ago

Thanks, and the plain-language explanation of transfer learning is appreciated!

Posted by janetwright over 1 year ago

@alexshepard @valentino_traversa
I would be happy to contribute some processing power to distributed deep learning as well.

It seems like someone has worked on that topic. Here is a paper from 2021 I found: https://arxiv.org/pdf/2103.08894.pdf

Posted by hedaja over 1 year ago

Are males and females of dimorphic taxa learned separately?

Posted by trichopria over 1 year ago

@trichopria, no they are not, and neither are egg, larva, pupa, and adult of insects that have complete metamorphosis.

Posted by susanhewitt over 1 year ago

Nor are flowers, seeds, leaves, trunks, tubers, etc. - but it is an interesting idea for the future!

Posted by deboas over 1 year ago

That's good to hear. The computer learning is one of the factors that keeps me motivated to put in so many hours photographing and editing photos to provide the best photos I possibly can for the learning models. I am always curious if anyone else puts in as many hours as I do every day to get the best possible taxon photos.

Posted by royaltyler over 1 year ago

Although I suspect most of us who post thousands of photos do sometimes crop our pictures to make identification easier, sounds like you, @royaltyler , do a much better and more consistent job of making the photos as good as possible. That's great!

Posted by sedgequeen over 1 year ago

Super good news! Transfer learning FTW!

Posted by dgilperez over 1 year ago

Add a Comment

Sign In or Sign Up to add comments