A larger experiment to learn about the accuracy of iNaturalist observations

We launched our 3rd Observation Accuracy Experiment (v0.3) today. Thank you to everyone helping us conduct these experiments as we continue to learn how to improve accuracy on iNaturalist.

Changes in this Experiment

The only change is that this experiment uses a sample of 10,000 observations whereas the previous two experiments (v0.1 and v0.2) each used samples of 1,000 observations. We hope this larger sample size will allow give us more insight into more nuanced subsets of iNaturalist observations than the first two experiments.

The page for this experiment is already live and the stats will update once a day until the validator deadline at the end of the April 1st. You won’t be able to drill into the bars to see the sample observations or the validators until the deadline has passed.

Thank you!

Thank you to everyone contacted as a candidate validator for participating in this experiment. This is a more ambitious experiment than the first two and very much appreciate your participation with this larger sample size - especially for those of you asked to review up to 100 observations! As always, please share any feedback or thoughts you may have on the topic of data quality on iNaturalist.

Results (added 04/02/2024)

The results for our third observation accuracy experiment and our first one with a sample size of 10,000 observations are in. We’re so grateful to everyone who participated!

Results

The results of this experiment were very similar to the first two experiments. The average Research Grade accuracy (fraction correct) was 95%. You can explore the results including clicking through bar charts to observations here.

Other issues

Logistics
With over 2,750 participating validators, coordinating this was no small task. We’re mainly hearing three issues about the experiment logistics:

1. Concerns about mismatching validators and observations

We can’t perfectly predict which observations people are comfortable IDing, but we can probably improve upon the current methods by further constraining matching observations and candidate validators by location.

We want to emphasize adding a coarse, ancestor ID to an observation you are not able to comfortably ID does not interfere with the experiment design. But we realize that it is not a great experience for validators to receive a batch of observations they aren’t familiar with and may result in less participation in these experiments. Next time, we’ll try some new techniques to further constrain things by location.

2. Issues with communicating via messages

In v0.1 we contacted people via a no-reply email but realized that many people don’t receive or open emails from iNaturalist. So for v0.2 and v0.3 we used the messaging infrastructure to send messages from an Admin user account. Two issues are that:

  • Many people are responding to these messages since it’s not clear that it's a no-reply message. We don’t have the capacity to handle the hundreds of replies to the messages and our apologies if those messages go unanswered.
  • Many people interact with messages on the Android mobile device in which the URL doesn’t work which is causing confusion

In the next Experiment, we’ll explore other ways to address these issues with contacting and communicating with candidate validators.

Design
We’re mainly hearing two concerns about the design of the experiment

3. Distrust of the validators

There’s some continued distrust of the validators’ capacity to not incorrectly validate an observation. For example, there is concern that a validator would knowingly add an agreeing species level ID to an observation that they aren’t able to independently identify rather than follow the instructions and add a coarser family or order level ID.

Some of this distrust probably stems from discomfort with our methods of selecting semi-anonymous validators based on their reputation earned on the platform (number of improving identifications etc.) as opposed to externally credentialed validators selected based on their reputation earned elsewhere. We understand these concerns and we’d love to learn more about ways to vet, coordinate, and incentivize validators at scale that improves upon our methods here and increases the level of trust.

4. Interest in measurements other than average accuracy

Our main question is “how accurate are iNaturalist observations?”. To answer this, our methods are to draw a random sample form iNaturalist and estimate the average accuracy.

Several people have expressed more interest in knowing the accuracy of different subsets of the iNaturalist dataset such as their taxa or place of interest. We’re also interested in this and may adjust the sample structure in future experiments to more efficiently focus on different subsets. But we're excited that with the larger random sample size of this experiment (10,000 vs 1,000) we can start getting reasonably certain estimates for some less common subsets.

For example, here's a graph of the subsets by continent, taxon group, and rarity (<100 observations is rare) sorted by the uncertainty (95% confidence intervals) in accuracy (fraction correct) from Research Grade results from v0.3. We're more certain for the groups you expect (common North American and European plants insects and birds). Those groups also have high accuracy in the 90th percentile. But there are groups like common European Fungi with 0.81 (0.68-0.91) where we are nearly certain that the RG fraction correct is <90%. For other groups, we'll need much larger sample sizes or targeted samples to reduce the uncertainty.

Thanks again for participating in this experiment. It's a huge opportunity to be able to conduct experiments like this at this scale that wouldn't be possible without such a skilled, engaged, and generous community of identifiers.

Posted on March 26, 2024 01:56 AM by loarie loarie

Comments

I was glad to receive the message! The taxa observations were well within my field of knowledge and I’ve already added an identification to all of them.

Posted by lj_lamera about 1 month ago

Happy I could help! Most of the ones I got weren't what I normally identify though. I think my pool was based more off of IDs I've left on my own observations rather than taxa I actually know in depth.

Posted by tcriley about 1 month ago

@tcriley I noticed you on this observation but it looks like you haven’t identified any other observations of Unionidae (from other people) in the past.

Posted by lj_lamera about 1 month ago

@lj_lamera, this is the query the experiment is using to find candidate validators https://www.inaturalist.org/identifications?user_id=tcriley&taxon_id=51903&category=improving

Posted by loarie about 1 month ago

Thanks for the invite! I'm happy to be part of this experiment!

Posted by trscavo about 1 month ago

Feliz por participar!

Posted by alanhentz about 1 month ago

Thanks for letting me help! 6 of my observations weren't what I usually do so I think I'm in the same boat as @tcriley

Posted by leytonjfreid about 1 month ago

Turns out your IDs on your own observations do count as "Improving" if the community agreed.

Posted by dhasdf about 1 month ago

I got 5 human observations in my set, perhaps those should be left out of future experiments?

Posted by danieldas about 1 month ago

Happy to help out again and got a nice mix of easy and more challenging stuff, mostly close to my usual area with a few outliers. Some I won't be able to confidently put a species ID on, and proud to be a maverick on at least one. LOL Curious to see where that one is going.

Posted by annkatrinrose about 1 month ago

Okay, done mine, though one I had originally made an ID so didn't know if I needed to make another one and one record was my own where I had made the original ID so skipped that one too.

Also, one of my tasks was a bracket fungus with only the accepted auto ID of an inexperienced user of a species. With the information available this couldn't really be verified past Order and I'm not sure if I should have taken it back as they probably just selected the first suggestion.

Posted by reiner about 1 month ago

Thanks for the invite, happy to help. One of mine completely out of my experience so left blank.

Posted by kw841432 about 1 month ago

Thanks for the invite !
I guess it's just lucky but among the observations I had was a new salamander species record for Mexico, if my ID is correct. A quick search of the litterature/museum databases yielded no record of Bolitoglossa dofleini in this country, yet I'm pretty sure this observation is from this species: https://www.inaturalist.org/observations/130028481

I'm waiting for other competent identifiers to confirm, but I just thought it was pretty amazing.
If this is the case, I'm sorry to lower the 'accuracy score' (as I'm disagreeing with previous IDs) but it's still a nice consequence of this experiment :)

Posted by benjaminmb about 1 month ago

One broken image ... processing
https://www.inaturalist.org/observations/61678661

Cnidaria? I can jellyfish but.

Very curious to see the % response after the first 24 hours!

Posted by dianastuder about 1 month ago

This time the set is all very well within my expertise.. well done 👍

Posted by ajott about 1 month ago

Always happy to help with these experiments!

Posted by lynnharper about 1 month ago

my first time helping out. 7/17 were observations where I could at least give some vague help, including 4 at genus or better. so, it seems that the selection process is working OK. I do wish it would favor giving me observations for taxa that I normally ID at a fine level. looking forward to results.

Posted by astra_the_dragon about 1 month ago

Happy to be invited for the first time to participate in this experiment!

@reiner if you're happy with your previous IDs, you can leave them as they are, but if you have gained more knowledge since then and can improve them, I understand that you're encouraged to do so

Posted by deboas about 1 month ago

Encouraging to see my notifications roll in, as some of the IDs move - after years in limbo.

Posted by dianastuder about 1 month ago

My set was good for me, happy to see some rusts, mildews etc in the expanded set!

Posted by andydonegan about 1 month ago

Thanks for the invite!

Posted by anastasiiamerkulova about 1 month ago

I don't know the best way to respond to the original 'invite' to participate. I emailed back, but this is what I wrote:
to the iNat experiment designer team-
I would very much have liked to contribute to the experiment - 3rd edition. BUT, you have given me observations in geographic areas that are beyond the area in which I am comfortable IDing species. I do not know the variation in appearance of species that I know in my home range, and I do not know the local look alikes elsewhere - so I don't normally ID beyond my home range. I think asking for IDs in areas not normally investigated (if that is the right word) changes the likely outcome AND makes the experiment unlike the standard operating procedure for me in IDing (and presumably me as a representative of IDers in general), so makes the results of the experiment not equivalent to standard results.
So - I can scan the list of locations and ID only ones in my normal ID geographic area (more or less New England, the more or less is very occasionally for some species in the northeast US and Canada).
Or - I can just not participate
Or - you can ask me to try to ID outside of my home range knowing that I think that negates part of the purpose of the experiment .
I really will wait to hear from someone involved before I move forwards with any experiment IDs.

Posted by patswain about 1 month ago

This was interesting and fun, thank you! I like the idea of reaching out to prompt known identifiers/observers to check out specific observations. I was comfortable with most of the 17 observations I got, aside from two borderline cases where I was certain of the genus but uncertain of the species, because the observations were in places with different congenerics than are present in my region of familiarity (in one case, it was the Atlantic ocean, while I observe and identify on the Pacific; three species of the genus were present in that area, while only one is commonly present in my area).

In both cases I agreed with the genus and explained why I wasn't completely sure of the species, and I noted that I was contributing because of the experiment. The experiment does say to identify even if only at higher-level taxa, so I think that was the correct thing to do, but I worry about people feeling that adding genus-level ID to something others have already put a species to is a disagreement or derailment.

Posted by guerrichache about 1 month ago

Is something off in the image https://static.inaturalist.org/wiki_page_attachments/3785-original.png or am I just misinterpreting the correspondence between numbers and colors?

Posted by sqfp about 1 month ago

I think you're right sqfp, the image has 4 red/incorrect and 7 grey/uncertain dots in the cluster, but the equations calculate 7 red dots and 4 grey dots.

Posted by guerrichache about 1 month ago

Happy I could help! and I’ve already added an identification to all of them.

Posted by richardfernandes about 1 month ago

@sqfp and @guerrichache . thanks fix for the figure is in the pipe
@patswain - thanks for participating and happy to answer any questions. Please just add the finest identification you feel comfortable adding that you are confident in, even if its coarse (e.g. order, family, or genus). Coarse IDs won't negate the experiment, only incorrect IDs (e.g. its in family A but you say family B).

Posted by loarie about 1 month ago

@guerrichache for that reason I leave a copypasta comment

Not disagreeing with you
Observation Accuracy Experiment 3

Posted by dianastuder about 1 month ago

Give me more native species 😭. We have almost 100 native North American species on inat in my genus but all but 2 of my assigned observations are invasive or garden cultivars and the other 2 are the same species that can only really be confused with things not in the genus. I know observations like that statistically make up like half of observations of the genus and maybe it's just a small-number statistics fluctuation that none of the my assigned observations are from the fun half of the observations.

I wonder, at what level do you calculate validator eligibility for a taxon? For example, if I am eligible to contribute for genus-level observations in North America, am I eligible for all observations in that genus in North America, or do I lose eligibility if an observation is already species-level and I haven't refined to that species enough times? I think the distinction matters for statistical robustness of any conclusions; if a particular observation happens to be mis-ID'd, then the logical pool of validators would be the genus-level pool. So if IDers do lose eligibility if they haven't done a particular species, then it would create a statistical bias where observations mis-ID'd to that species have a smaller and different pool of people eligible to correct them than they would if the observation had originally been ID'd only to genus. I imagine said potential statistical bias could be a particular problem if a subset of IDers for a particular genus is over-confidently ID'ing things to species that other IDers believe should be left at genus, and then they are the only ones eligible to be assigned to correct said over-confident IDs. Is this accounted for in the sampling design?

Posted by wildskyflower about 1 month ago

Would it be possible to exclude taxa that users have only observed (rather than ID'd on on other people's observations), and/or CV IDs? I only got about 3-4 of these so it wasn't a big issue, but for example I was given an observation of Diplotaxis to ID; I have 0 IDs of that genus that are not my own (3) observations, and on my own observations those were CV suggestions that I'm unable to really verify. Excluding CV IDs might be a bit of a problem for people who actually select them when they are correct to avoid typing in a name rather than as a guess, but I don't know how common that is. For the former I would think that if someone is specialized in a taxon that they observe a lot, they'll also have a lot of IDs on other people's observations.

Posted by arman_ about 1 month ago

Lots of us use CV as a muscle memory shortcut. Not so different to using a paper field guide. And a good way to avoid the tiny typo homonyms which lurk. Grevillia or Grevillea - one's a plant the other is Kingdom Disagreement.
You have no way to evaluate if someone 'just accepted the CV suggestion' - unless they leave a comment which says that.

Posted by dianastuder about 1 month ago

Yeah, that's why I think the former is probably better, but maybe there's a potential issue in that too.

Posted by arman_ about 1 month ago

I just completed my set of IDs. This was much more challenging than the previous experiment, particularly in making me face my entirely amateur limitations in offering identifications. For instance, I'll confidently ID some plants if the flowers are visible. Without those, I wouldn't even try. (I'm grateful for that copypasta that I left in comments, to explain why I was even entering an ID in these cases.)

Posted by larry216 about 1 month ago

comment 1 of 2

I commented on the previous posts about experiments 1 and 2, and want to update and clarify my overall view on these continuing experiments. I think the experiments are a good idea and that it will be ideal and necessary to continue to conduct additional ones, and to conduct ones where the many involved variables are modified in different ways. The only things I partly have a different view on are the following. It would be helpful to ask validators for feedback on experimental design before designing/continuing to conduct these experiments. Participants have given good feedback on the experiment journal post threads, although it can be difficult to bring up issues there. It would be relevant to know if there are plans are to publish the results of these experiments or future ones in any external sources, since in that event we'd want to be cautious that the results are presented accurately and that the experimental limitations, atypical circumstances, etc. are noted.

So, it would be best for staff to keep a record of all the limitations etc. people have mentioned in all the threads. For e.g., I just thought of another that I'm unsure if has been mentioned. Which is that these are obs. that are atypically receiving multiple identifier verifications even after already being RG, I estimate at least 3-5 identifier IDs. Also, because validators/all users are primed to know these specific obs. are in an experiment, that incentivizes them to expend more time or effort, and/or to collaborate more or consider the ID input of users perceived as having the highest ID expertise. Furthermore, large groups of identifiers have already previously coordinated and worked individually to nearly "complete" reviewing all pre-2024 obs. for several North America bee (e.g. Bombus, Apis) and aculate wasp (e.g. all of Vespidae and some Crabronidae and Sphecidae) groups (see here), which overall increased the number of prior validations (including for obs. that are still Needs ID) and increased the overall uncommon practice of checking already-RG obs.

And, at least some of these obs. (e.g. ones that are still Needs ID) are included in the experiments. Those obs. helpfully boost the accuracy results, although it could be mentioned as contextual note or limitation that the circumstances around how they've been identified prior to these experiments has been atypical, largely identifier-driven, but also in case further incentivized by by e.g. US state-wide pollinator surveys, research, or publications.

Posted by bdagley about 1 month ago

We had the topic recently.. I use the CV all the time, especially when IDing on the phone because I hate typing on the phone.. saves me a lot of time if the correct CV suggestion pops up 😉

Posted by ajott about 1 month ago

comment 2 of 2 from my previous

So, overall, the obs. pool being used in the experiments and their circumstances are somewhat atypical (also for additional reasons), and it's also clear that on average, any obs. receiving atypically larger numbers of validator/user IDs will have higher accuracy, especially for already-RG obs., which often otherwise only have 2 precise IDs. This could be tested and proven by comparing the results to the experiment accuracy results of RG obs. that only have 2 precise IDs (that made the obs. RG) randomly selected without taking into account whether validators reviewed them, or randomly selected after excluding all obs. used in past and present experiments. As a side note, the average accuracy of all existing RG obs. (before or w/o doing experiments) is also a very relevant stat to calculate in its own right.

As I and certain others have commented or implied, the accuracy results reported for experiments 1 and 2 would seem to need to be at least somewhat overestimated (it's hard to estimate the extent) for reasons we gave (another reason is that easy and overabundant species like honeybees may "inflate" the accuracy; also note that earlier inat accuracy experiments found lower accuracy, although methods differed somewhat), although are still a meaningful experiment and stat if caveated and described in an accurate way. Also, the previous accuracy results to me sounded like they were saying they were the accuracy results for the average RG obs., despite that that stat would be expected to be lower relative to whatever the former is, as I explained. I might instead describe the current experiments as something like the average accuracy of [atypically] extra-validated RG obs. Lastly, just to be clear, I agree with most everyone that the RG obs. accuracy, especially of extra-validated obs., is somewhat high, and that that's ideal and we're seeking to further improve that, and that it's ideal when possible for obs. to receive extra validations, i.e., for identifiers to check already-RG obs. All I'm "cautioning" is to take more consideration of the feedback validators have given when deciding how to describe or publish on the results of the experiments, and ideally when considering what future experimental designs will be.

E.g., I've suggested that multiple different kinds of experiments would not only be ideal but also necessary in order to more fully understand and represent what these current experiments are finding, e.g. to assess whether or to what extent the reported accuracy findings have been overestimated if reported verbatim without caveats. And, I'm continuing to participate in all of the experiments as a validator, even though that the organizers haven't indicated whether they're taking the feedback by myself and some of the other commenters into account. One final interesting thought is that overall, these experiments themselves must be increasing both the average all-obs. RG accuracy and the extra-validated obs. RG accuracy on the website.

Posted by bdagley about 1 month ago

I think a simple way to increase the diversity of the samples in a future experiment might be to do weighted random draws instead of uniform random draws. Specifically, I would suggest giving each observation a weight of 1/sqrt(number of observations attached to the same direct parent taxon). That still gives you more mallards than any other specific species, but the margin of preference for mallards would be greatly reduced; maybe you get 5 mallards out of 10,000 observations instead of 25. Advantages of this approach:

1.) You get a greater diversity, which will converge the top-line accuracy faster if the uncertainty in the top-line accuracy is mainly due to the accuracy profile of less observose species being characteristically different from the accuracy profile of more observose species
2.) An improved ability to test the hypothesis that less observose species have characteristically different accuracy profiles than more observose species
3.) The ability to report more detailed breakdowns of accuracy as a function of observation number and finer taxa.
4.) More balanced workload for validators (less for birds, more for niche specialists).
5.) Potentially more fun for validators because of increased diversity of observations

You can always get the top-line accuracy number by using a weighted average that inverts the original weights, i.e. multiplying the numbers of correct and incorrect observations for each species by sqrt(number of observations attached to direct parent taxon). You can also easily combine the results of the current test and the future test using (total correct between both tests)/(total observations between both tests) to get an accuracy estimate that will be strictly improved for all species, not just the less observose ones.

It would also be relatively straightforward to adapt the weighted-draws procedure to more optimally iteratively refine the accuracy estimates at different taxon levels over the course of a series of several experiments. Regardless the first adaptation step would be basically the same as taking the square root of the observation number anyway, and the square root version of the procedure is easier to implement and explain initially.

Posted by wildskyflower about 1 month ago

An additional relevant and related version of the experiments would be to specifically exclude obs. of overabundant species.

Posted by bdagley about 1 month ago

Would it be possible in future experiments to stratify IDs for identifiers by regions they predominantly identify in, rather than by taxa?

I just wanted to comment based on an observation I was asked to identify outside of my normal region of California, I left an ID to Tribe level (Eriogoneae), which was as far as I could go, given the above limitation, and that I had assumed a different genus entirely (Chorizanthe vs Nemacaulis) before looking at the comments and previous IDs which were for a genus whose existence I was completely unaware of!

I left a comment that CA (and I imagine many other regions) can be very specific regionally, so I hope the experimenters take this into further account in the next iteration of the experiment, since identifier knowledge, including mine, is generally not encyclopedic (at least for most identifiers!) as to all members of a genus or family in a particular state or region.

I'm sure others have found this to be the case as well.

To repeat the request: would it be possible in future experiments to stratify IDs for identifiers by regions they predominantly identify in, rather than by taxa? Alternatively, ID assignments could be stratified by taxa we've actually identified rather than broad groups like having identified, for example, Quercus before, but not taking into account that there's at least 8 Sections of Quercus, 500+ species, and 180 hybrids, at least according to Wikipedia (https://en.wikipedia.org/wiki/List_of_Quercus_species), of which I'm personally comfortable identifying probably 3 Sections, and maybe 20-30 species (and hybrids) restricted to Central CA and some parts of Southern CA (not really including San Diego area yet).

Which boils down to the fact that being asked to ID an oak in Virginia or Ohio isn't going to yield significant results, since there are parts of my home state I'm not yet comfortable identifying.

All that being said, most requested identifications were improved I think, except for Quercus, which was more challenging and not as well tailored to my identifying area of expertise.

Posted by yerbasanta about 1 month ago

@bdagley I don't understand your concern about the observations in the experiment getting additional IDs. The point is to verify whether the ID they had before the experiment (when they did not have excess IDs) is correct. It's a good point that once an observation has been used in one experiment, it should ideally be excluded from future experiments - I don't know if this is being done. It could be done by defining narrow date ranges for each experiment, perhaps.

Posted by deboas about 1 month ago

For one of the observations that was assigned to me, the iNat taxonomy doesn't appear to agree with the accepted standard taxonomy for American butterflies, the Butterflies of America website. The ID appears to be correct using iNat's taxonomy. So I'm at a loss for how to ID it. Should I go ahead and agree with the ID (which is correct according to an apparently outdated taxonomy)? Or should I disagree by using a taxon that doesn't exist in iNat (seems like a bad option because the ID is not technically incorrect, it's the taxonomy that appears to be incorrect (same critter, different name))? Or should I just not add an ID for this observation as part of the experiment?

Posted by euproserpinus about 1 month ago

That's an interesting problem. I would probably agree on the observation level but flag the taxon for its outdated taxonomy.

Posted by annkatrinrose about 1 month ago

@annkatrinrose -- Thanks for the suggestion. Will do!

Posted by euproserpinus about 1 month ago

Received and finished with it

Posted by ck2az about 1 month ago

Many identifiers have commented about on the experiments requesting to show them observations in the region they primarily ID in. In my commentary/notes on the experiments, I've recommended conducting multiple different variations of the experiments, varying each factor, not only obs. count, so the geographic requests people have made could also fit into that. Although a global focus is also relevant, and didn't bother me. For example, if an identifier is familiar with a family/taxonomic group mostly in one region, they may still be able to contribute toward photos elsewhere, even if not always to species level, including correcting some genus or broader misidentifications. And other identifiers may help bring it to species level. In general, I recommend all identifiers aim to try to ID globally to some extent in general, doing so has many benefits and from my experience I can say it's a learnable task for all dedicated identifiers. It also helps many observers globally, including in locations that have fewer experienced identifiers.

Posted by bdagley about 1 month ago

@bdagley Eliminating overabundant species from a test like this is not optimal, because it introduces a non-random bias so quantities like the top-line accuracy number cannot be recovered in an unbiased way. It also makes hypothesis testing harder (for example whether accuracy profile is a function of number of observations), and cross-checks of the accuracy rates observed across different versions of the test are not possible. It is better to modify the relative weights of species by a smooth monotonic invertible function of observation count (like the square root I suggested) instead, because that transformation can be statistically inverted when reporting results.

Also this experiment is a little different from an ID blitz in that for the test it is most ideal if people mainly answer in areas they are relatively expert in. It really becomes a challenge to figure that out though especially for genus-level observations where someone who mainly IDs in California might have literally no idea about an observation in Alabama.

Posted by wildskyflower about 1 month ago

It also depends on the species, of course. We have some monotypic genera of plants locally that I would be confident identifying anywhere and have in fact looked at these world-wide to check for out-of-range IDs. There are others that I would feel confident ID'ing to species only within my county and the surrounding mountain areas at similar elevation, but would stick to genus due to additional species I'm unfamiliar with in the coastal areas of the same state.

Posted by annkatrinrose 30 days ago

@bdagley -- It seems to me that what is being tested is the overall accuracy of identifications on iNaturalist, the accuracy of ID's before the experiment, you might say. During the experiment, we observers do evaluate the selected observations carefully, maybe check references or consult experts if necessary. At the end of the experiment, these selected observations should be more accurate than they were to start. That is fine! The ID's tested being tested are the ones that existed before the experiment. (ID's before the experiment are typical of iNaturalist. After the experiment, the ID's are not. That can be dealt with by excluding tested observations from future tests.)

It's true that there is a lot of variation in ID accuracy in iNaturalist. Accuracy is probably close to 100% for RG observations of North American birds and quite low for Needs ID grasses. That is important. Every researcher using iNat data should consider it. However, this experiment is a very first step in evaluating iNaturalist accuracy. Like most experiments, it is limited. It's telling us interesting and useful information but there are VERY many other questions we would like answers to. We can't get everything at once. Unfortunately. I'm impatient.

Posted by sedgequeen 30 days ago

Would these observations even need to be excluded from future experiments? They are actual iNat observations. If a few of them are randomly selected again, and they very slightly raise the average accuracy of the future experiment's sample, that just shows that the past experiments very slightly raised the average accuracy of iNat observations.

Posted by dhasdf 30 days ago

@dhasdf iNat has well over 150,000,000 observations eligible for this experiment. a few thousand observations getting half a dozen fresh eyes is negligible, if its signal could even be distinguished from the constant flux of IDs and new observations that are made every day.

Posted by astra_the_dragon 30 days ago

@dhasdf @astra_the_dragon fair point, as long as selection is truly random and there is no reason to expect a bias towards repeatedly selecting certain observations

Posted by deboas 30 days ago

Maybe the algorithm for matching IDers and obs should be improved for the next round. I got 34 observations, and Im glad to work through. Hovever, some do not perfectly match my field of expertise. Though I educate myself to make the best of it, this might alter the result, because I might be unsure not because an ID is unsure, but I don't have much experience the species and/or region in question.
Here are characteristics for are some of my examples:
I have two Pinopsida cultivated in a garden, both in a region, where they would not apear in the wild, so in the end those could have been from any temperate region almost worldwide. Outside the experiment, I would have ignored those.
Two example did not include geolocation data. I think those should have been excluded because of this.
A major challange is the region in question. I do by far most of my IDying in Germany. For the experiment I got observation from Belarus to Spain. It is an extra challange to exclude species similar to what I am reviewing, that I might not be aware of because they are unkown in my region. While most of my identifications take seconds, some of those require additional hours of research to educate myself.
I’ve noticed comments in some saying "I am not an expert for this, but I do it for the experiment". With that, the experiment tests more the knowledge of the selected IDers and less so the quality of the original ID.
Just to emphasize: I love to do my 34 and I am done with 32 of them. I'll work hard for the remaining two ;-)

Posted by misumeta 29 days ago

I've had a perfect match this time - got one genus-level ID to validate that was not only right within the geographic area I do most of my identifications for, but also a species that I'm the top identifier for. Bullseye! Was happy I could easily narrow that down to species. :-) I notice though that it is still sitting at Needs ID. None of the validators following me are names I recognize and this appears to be a species occurring outside of their respective ranges. I've had other observations in my set where it was me who didn't feel comfortable going further than genus because it was outside of my usual range, even though others were quick to confirm species. I imagine it is very difficult for the algorithm to guess perfect matches but hopefully that is addressed at least somewhat by having multiple validators per observation. Also, I think the point of the experiment is to measure accuracy rather than improvements to precision, so likely the one obs where I disagreed is more useful data than the one where I was able to refine the initial ID.

Posted by annkatrinrose 29 days ago

@deboas Believe it or not due to the Birthday Problem/Paradox, even though there is only 10,000 drawn observations out of almost 200,000,000, if you draw another 10,000 randomly there is roughly a 50% chance you will get at least one observation from the first set again in the second set. Sure that would only contribute a ~0.01% source of error but it would make sense to just avoid it.

Posted by wildskyflower 29 days ago

This obs lurked at family for 3 years! I can daisy, and I did for the experiment. But also @mention trusted identifiers to move the ID forward.

Now RG thanks to taxon specialists and with a link to the relevant field marks in a comment. 100% accurate and precise.
https://www.inaturalist.org/observations/76441836

Posted by dianastuder 29 days ago

I love this! I often leave my observations at order, because the suggestions are not reliable for PNG. I'm glad that you are working towards improving this feature. Also, a HUGE thank-you to everyone who takes the time to identify my observations, PNG is full of fun and interesting life!

Posted by barefootwanderer 29 days ago

Happy to have been invited for this test, looking forward to the results!

Posted by kaylynpearce 29 days ago

@wildskyflower interesting, thanks!

Posted by deboas 29 days ago

Thanks for the invite. Happy to have been part of this and would love to contribute in future too.

Posted by amir1987 29 days ago

''a sample of 10,000 observations'' I hope only Research Grade observations are sampled?

Posted by optilete 28 days ago

A few casual and needs ID observations are sampled too, @optilete

Posted by lj_lamera 28 days ago

Two example did not include geolocation data

@annkatrinrose This is one of the reasons for the experiment. If you don’t think an observation can be identified as finely as it has been due to the lack of data, you can add a disagreement :)

Same thing applies to poor quality photos and cultivated obs

Posted by lj_lamera 28 days ago

@optilete Why? That would exclude a large body of observations e.g. of spiders, which are often only IDed to genus, as species ID is difficult to Impossible.. still valid to ask whether those observations have been IDed as good as possible

Posted by ajott 28 days ago

done with my 34 :-)

Posted by misumeta 28 days ago

From what I see it's a completely meaningless experiment: you're proposing people to ID stuff similar to what they usually ID to figure out how accurate are IDs of those same people. Seriously? To check the accuracy of IDs you should show it to the actual experts in the groups only, not to the same people whose ID accuracy you want to check.

For me it proposed only about a half of observations related to my expertise, others were anything including plants, fungi and birds.

Posted by igor117 28 days ago

I don't really see that happening a lot. More than a third of the observations in my set just had an initial ID from the observer, and mostly based on CV suggestions. Even out of those at RG already, several just had the initial ID from the observer and one confirming one both based on CV suggestions. I think it's more about evaluating CV/initial IDs than checking community ID accuracy.

Posted by annkatrinrose 28 days ago

@igor117 the risk of circularity is reduced by the number of people invited to ID each observation. Frequently, just one* or two identifiers (* one with observer agreement) are needed to take an observation to Research Grade and remove it from the Needs ID pool. There is always some risk of overconfident IDs and error. But if you have five people revising the observation, it's unlikely they will all be wrong in the same way, so that risk of error is greatly reduced. Your assumption appears to be that experts (however defined) will be more likely to correctly identify observations, but many of the identifiers on iNaturalist are experts, and the wisdom of the crowd approach is widely recognised as producing robust results.

It could be interesting to explicitly test both methods (expert and crowd) in parallel. How would you define experts? You might be interested in the results of an experiment that recruited experts to assess identification accuracy of Melaleuca in Australia.

Posted by deboas 28 days ago

@annkatrinrose I completely agree, but in that case I wonder about an experiment specifically designed to root out exactly those types of observations. What percentage of observations with no community ID (essentially one ID) are accurate? What percentage of RG observations with only two IDs are accurate? It would be great if one could search for these observation sets with the Identify tool (in fact, I think this would be necessary to conduct an experiment).

Posted by trscavo 28 days ago

@deboas it's reduced, but it's still nowhere near 99%. I've seen numerous cases with over 5 wrong IDs agreeing with each other. It means little when most of IDs are from the overconfident amateurs, it's not like actually checking if IDs are accurate, it's just making accuracy slightly higher. Definition of an expert is easy, it's someone who have multiple peer-reviewed publications on the taxonomy (or at least something related) of a group he or she IDing, most of all authors of the taxonomic revisions and identification guides.

Posted by igor117 28 days ago

Good luck getting those people in large enough numbers on iNat and keep the site running. In that case I would be able to help out with one spider genus, which is more or less an observational fringe case here with just a little more then 1000 observations in total. Also, I am not a fan of excluding all those amazing self-taught experts without an official degree. Taxonomists are probably a safe bet, but from my experience actually also not always.. in the spider world they are often over-focussed on genital traits completely ignoring prominent fieldmarks, enthusiastic amateurs easily recognize.. good luck finding a spider taxonomist that would be of any help in an experiment like this by using only the traits they actually have certified expertise in...and then they often only work on fringe cases as well..

Posted by ajott 28 days ago

Reminder, if you are invited to ID an observation sitting at Birds (Class Aves). That doesn't mean you're expected to be able to add a species level ID. It just means you're expected to be able to confirm its accurately ID'd at Birds (Class Aves) as opposed to, for example, Amphibians (Class Amphibia).

Posted by loarie 28 days ago

@loarie I know, but my opinion is not relevant, and it's more like observations with species-level IDs, so my class-level ID would be even less relevant (and would look like a disagreement with species ID). More importantly, most people who received invitations will not place class-level IDs, but will be guessing and picking AI-suggestions.

Posted by igor117 28 days ago

most people who received invitations will not place class-level IDs, but will be guessing and picking AI-suggestions.

That is ungracious to the scientists who were invited. I have certainly seen taxon specialists in my batch.

Posted by dianastuder 28 days ago

"most people who received invitations will not place class-level IDs, but will be guessing and picking AI-suggestions."

That is also ungracious to all of us amateur identifiers who were invited. According to the Validator candidate criteria, "If an identifier had made at least 3 improving identifications on a taxon in a continent, we considered them qualified to validate that taxon within that continent." It is my understanding that those of us who make identifications for others are a small percentage of the overall iNaturalist community. Those of us who met these criteria must be a smaller percentage, yet. I'll admit that I can rarely read through a dichotomous key without my eyes glazing over, but I do the best I can with the resources I can understand. And, yes, that means often using the CV to start my identification process.

If this case, it often was obvious that several others had already offered IDs as part of this experiment (when there were a group of recent IDs on observations that were years old). I made it a point to not be swayed by these votes and to only make my ID--often at a higher level--according to my own ability.

Posted by larry216 28 days ago

25% of IDs are made by 130 users. One hundred and thirty. That is not sustainable.

https://www.researchgate.net/figure/Summaries-of-identification-effort-on-iNaturalist-Panels-a-and-b-summarize_fig1_372310054

Posted by dianastuder 28 days ago

Would it be possible in future experiments to stratify IDs for identifiers by regions they predominantly identify in, rather than by taxa?

@yerbasanta If regions were the only thing categorizing identifiers, that could be a problem because many identifiers (like me) identify certain taxa across the globe. There are many types of identifiers so I think a mix of both the location and taxa (which is which is already in place to some extent) would be better.

Posted by lj_lamera 28 days ago

@larry216 For what it's worth, you don't have to make identifications for others to get "improving" identifications. One of the observations I was assigned in this experiment was one I only qualified for because of my initial IDs on three of my own observations, which someone else had later agreed with, making my initial IDs "Improving" IDs. (see also the first few comments on this post)

Posted by dhasdf 27 days ago

"ungracious to the scientists who were invited" and "also ungracious to all of us amateur identifiers"

"Number of candidate validators: 4866" and "If an identifier had made at least 3 improving identifications on a taxon in a continent, we considered them qualified to validate that taxon within that continent."

Surely "3 improving identifications" are proving that person is qualified to validate ID at top level, I'm glad that you feel privileged to be included into this group.

Posted by igor117 27 days ago

"Definition of an expert is easy, it's someone who have multiple peer-reviewed publications on the taxonomy (or at least something related) of a group he or she IDing, most of all authors of the taxonomic revisions and identification guides."

Ah, if only it were that easy. Consider a quiet woman in southwest Oregon, a place with unusual plants. She grew up there, paid attention to plants all her life, was not educated beyond high school, did not publish. She knew more about the odd plants there than anyone before her, or since. The academic experts who dealt with plants in her area learned to respect her highly.

I remember a 14-year-old kid in Iowa who could identify birds better than 90% of the rest of us.
Some of us here have published on taxonomy. Some of us have degrees in the subject. Some of us don't. And you know what? The only real criterion for evaluating our expertise at identifying organisms of any kind on iNaturalist is our ability to identify those organisms accurately on iNaturalist.

Posted by sedgequeen 27 days ago

@sedgequeen experts are just someone who professionally working in the topic, for sure some amateurs have better ID rates than some experts, especially in IDs by photo where precise taxonomical characters are often not visible, so person more familiar with species can ID it better by the slight differences in the shape, colour or something else on the level of "intuition". Or someone can simply have better knowledge on a taxon without published works. The question is how should we know that IDs of such person are correct if we want to validate something and don't have our own knowledge on the group? With expert we can just check the list of publications and already be sure that person have skills in identifying some group. It's not like experts are always better identifiers than amateurs, it's just that if you need to validate identifications you need a person with a background proving that this person is capable of doing so.

Posted by igor117 27 days ago

I do many identifications in equatorial South East Asia, but hardly ever to species level (typically family rank, but with some kind of organisms like lichens my ids do not go into any detail at all). Since a selection criterion was the "continent", and equatorial asia is asia, I was pesented a lichen from siberia,. north of the polar circle - it is still asia; and it's true I was the first identifier of many lichens which had no identification at all before...
Anyway, most observations I was presented with were at rather coarse ranks, and sometimes I could improve the id a little.

Posted by bernhard_hiller 26 days ago

Btw, while the results page differentiates quite well when it comes to animals, plants are only shown as "Plantae". Could you differentiate a little more, like e.g. mosses-ferns-gymnosperms-monocots-dicots-other plants?

Posted by bernhard_hiller 26 days ago
Posted by lj_lamera 26 days ago

@bernhard_hiller for observations at coarse ranks, all the experiment is looking for is confirmation (or not) of the coarse rank. The results will be shown at different levels of precision (family, species, subspecies), so only go as far as you are comfortable with.

Posted by deboas 26 days ago

With the "Percent of sample validated: 94%" it wasn't entirely clear whether that last 6% is included in the Uncertain category? Also, an extra unnamed continent has crept into some of the results

Posted by deboas 25 days ago

Antarctica?

Posted by annkatrinrose 25 days ago

@wildskyflower Replying to your comment, it's not an either/or. I suggested it would be difficult for any one version of the experiment to confidently and accurately predict accuracy, so that multiple alternate versions of the experiment should be designed and conducted, and final conclusions/interpretations should consider all of them. That set of experiments could include multiple ideas commenters proposed, but also include version(s) where the overabundant easy to ID species are excluded. I've also suggested that the overall results reported so far in previous versions of the experiment seem to be at least somewhat overestimated or in need of caveating, at least if reported on without noting study limitations or biases.

Finally, I've suggested that in the event the admin plan to publish these results in external sources that they take note of, and note in their reporting on the findings any and all of the applicable study limitations, biases, or caveats that commenters seem to have been the first to have pointed out in the threads for the experiment posts. I'm not suggesting that the experiments aren't telling us anything, only that more information would be needed to be able to more accurately predict accuracy, and that there are multiple meaningful measures of accuracy that would be relevant to discuss and compare, not only one.

Posted by bdagley 25 days ago

Wow - 9,400 observations validated by 2,750 validators in 1 week - thank you so much to everyone who made this experiment possible! We added a Results section to this post.

Posted by loarie 24 days ago

That last chart is fascinating, particularly the spread of the confidence intervals. Really great to see the iterative improvements and discussion of feedback.

Posted by muir 24 days ago

Since I mostly coarse ID unidentified regardless of geography, I was pleased and scared to be selected. I did my set as accurately as I could. I was comforted in seeing identifiers and forum regulars whose expertise I respect being as clueless as I was and IDing only to "animals" or confirming my opinion based on layman resources.
Reading the comments, I am always amazed by the people who feel compelled to only identify to species and are flustered if they are presented with something beyond their niche. Looking beyond your comfort zone is like searching for gold, to find a nugget you need to sift through tons of dirt (and who knows what else is hiding in this dirt ;) ).

Posted by mbfoyard 24 days ago

@loarie Is it possible to make a project of all of the observations, and/or an export of the observations with their CID before and immediately after the experiment? It seems the links are all capped at 500 observations. That way we can have an archive to look at in case the CIDs change again, and it lets people skeptical of validators review the taxa themselves to see if there are any they think are seriously in dispute.

Posted by wildskyflower 24 days ago

@wildskyflower - yes, that was a casualty of scaling the experiment from 1k to 10k observations in that we had to truncate the bar-graph links at 500 obs to work with Explore. But the data is safe in the database and you're right that making that more accessible (probably as spreadsheet exports) is something else we need to add to the list of improvements - thanks for mentioning this.

Posted by loarie 24 days ago

not double blind and tasking 5 or more experts w a det. is not what normally occurs (outside of popular taxa like buprestids or tiger beetles)…

Posted by entomike 24 days ago

For at least one of these experiments I was sent a large number of plants and fungi (which I never identify), mostly blurry photos, and few examples of any of the taxa that I know and routinely ID on a large scale.

Was this a different experiment then the one summarize here? The images I was asked to check seemed to be mostly low quality and neglected ones of groups I know little above.

Posted by johnascher 24 days ago

Since I was asked to ID bad photos of unfamiliar non-animals, some of which seemed to have data quality problems (one was an immature human), I don't think my ID behavior was at all similar to my usual process of identifying selected animal taxa.

Posted by johnascher 24 days ago

Usually I ID my favorite taxa to species-group or at least genus-group (e.g., subgenus) rank, but for the experiment in question I was selecting "Fungi" and "flowering plants" for many of the submissions.

Posted by johnascher 24 days ago

If the main question is “how accurate are iNaturalist observations?” and the answer is ~95% on average for RG observations, can you also use the same data to ask about the accuracy for species on average? I think the question becomes, "how accurate are species identified on iNaturalist?" So, for example, if mallards are estimated to be identified accurately in 99% of iNat observations, wood frogs in 90% of iNat observations, and field horsetails in 70% of iNat observations, the average accuracy would be something like (99+90+70/3) = 86%. And so on. Is that sort of analysis possible?

Posted by muir 24 days ago

@johnascher These experiments aren't about "ID behavior" at all. They're checking how accurately the observations were identified before the experiment, by throwing some new identifiers at the sample of observations and then comparing the observation taxon before the experiment to the new identifications.

Maybe the observations you were sent were already at a high level like Fungi, Plants, or Flowering Plants?

Posted by dhasdf 24 days ago

@muir - thats a really interesting idea and probably a very relevant statistic for people to know what they're getting into with iNat data. We could prob estimate that now by taking the number of species in each group - e.g if there are X species in the N. America, Common, Insect group and the average accuracy for that group was Y we could weight the average accordingly.

@johnascher I think you got the batch of observations with kingdom level IDs, since most people would be qualified to ID those, I agree thats not a great use of your particular expertise. We'll look into why we sent your those and try to send a more appropriate batch next time.

Posted by loarie 24 days ago

It certainly would help out validators to restrict by region I think -- I have a certain passing familiarity with most of the macroscopic life forms in the area I've lived in all my life, and I know all the flowering plants (graminoids apart) to some degree even in genera I haven't studied in detail, so that I'd feel comfortable trying to figure out any of them from reference materials. But I'm at a loss in areas where I don't even know the range of plants that exists, and feel sure I might totally overlook some important factor needed for identification. That said, I don't want observations in areas where few validators live to be neglected, so maybe a mix of taxon-based and geography-based trials.

Posted by lmtaylor 24 days ago

@igor117 Experts (according to your formal definition) are quite often making mistakes. Possibly the most incredible case is an observation IDed by the observer as a species they described themselves. It was soon confirmed and stayed at RG for some time, but actually it appears to be a different species, even in a different subgenus. Unfortunately, experts require individual approach too. :) And indeed, there is no direct correlation between formal "level of expertise" and ability to successfully ID things at platforms like this one.

Posted by kharkovbut 24 days ago

@kharkovbut of course experts are also making mistakes, I've already explained in the next comment above that the point here is that if you need to validate identification you need a person (or better a few) with a background proving their skills. And it's not "3 improving IDs". We can't know if person is actually good in IDs or only claiming to be so if there is no background. There are thousands of RG observations with wrong or speculating IDs, including some repeating mistakes by the "customs" of some active identifiers who are not experts. I'm more than sure that there is a "direct correlation between formal "level of expertise" and ability to successfully ID things at platforms like this one", despite that some amateurs probably have better ID rateres than some experts. What they're claiming to be a validation here is just a slightly increased accuracy, so the results of this experiment will be misleading.

Posted by igor117 24 days ago

What does one do when they come upon a completely wacky suggestion from iNaturalist AI?

Posted by nan-cee 24 days ago

What does one do when they come upon a completely wacky suggestion from iNaturalist AI?

Don't use it as an ID, and use a disagreeing ID if someone else did use it as an ID.

you need a person (or better a few) with a background proving their skills. And it's not "3 improving IDs"

The concern isn't necessarily credentials alone, since some external credentialed experts who've never identified from photos misidentify a percent of photos identifiers email to them. Identification of physical specimens and from photos are only partly overlapping skills, and the second must be learned "again" even by credentialed experts. The actual problem for validators is that "3 improving IDs" is far too low of a standard, regardless of whether they have credentials.

Someone merely guessing IDs can easily meet that standard. But, this isn't meant to exclude anyone from being a validator either, since with effort many users can reach an even higher standard. Vs., using this too low of a standard will at least somewhat mislead results, and increase concerns like that some validators merely agree to the previous IDs.

Posted by bdagley 24 days ago

Showing validators obs. (more so) from their study region of expertise, as many asked, could also potentially bias some of the results, although could be useful as one iteration of the experiment, but not to use in every future version of the experiment.

The Results seem to suggest that across all obs., the accuracy is more around 80 than 95. What would be relevant to know, and probably somewhat easy to calculate, is what the current CV accuracy is when CV is used on obs. that have no other IDs. I predict under 70. It should also be kept in mind that some RG obs. are made RG by an observer using a CV ID and someone else merely agreeing with it without certainty.

And that if those obs. are selected in an experiment, some validators will merely agree with the ID the CV originally suggested without certainty. So, there must be some cases in the experiment where a CV ID is incorrect but becomes RG. It would be relevant to attempt to assess or determine how many of those obs. could affect the 95 RG accuracy results. Yet another useful variation of the experiment that could be done would be to only select obs. that didn't start with CV IDs.

Posted by bdagley 24 days ago

@bdagley It isn't entirely clear which IDs are CV-guided. In the past, when the CV pulled up the taxon that I wanted to identify the observation as, I clicked on it instead of typing to save keystrokes! Since I learned that that gets a special notation, I now type anyway -- but lots of other people must be doing the same thing.

Posted by lmtaylor 24 days ago

I am so glad that people point out the alternative use of CV - that is, using CV to save typing, help with spelling, and sometimes as suggestions to be checked out. Many of the CV suggestions are so unlikely that I go through spells of not using it at all, but it does help with the spelling and with taxonomic idiosyncrasies (to be checked, of course). Thank you @lmtaylor and others who mentioned this.

Posted by patswain 24 days ago

Add a disclaimer - no-reply - so iNatters know in future.

I would like to see your whole chosen batch of obs. Then let us (chosen identifiers) pick out the taxon and / or location for ourselves. And see where the IDs end up. We can still add a confirming ID as you requested. (Or you could count the Mark as Reviewed without us adding an ID - which is either silent agreement or I can't even)

Posted by dianastuder 24 days ago

@igor117 Right, but "a background proving their skills" is something that probably cannot be measured simply by the number of peer-reviewed publications and/or number of improving IDs here. :) I prefer to measure that elusive "background" by my own impression of one's skills. Thus we should not overestimate results of such experiments -- regardless of a team of validators selected by any formal criteria.

Posted by kharkovbut 24 days ago

In this experiment most IDers chosen as validators for the same set of observations as me seemed to take it seriously and be more cautious than they usually would be when IDing random observations.

My experience on the site in general is that on occasions when an amateur and a professional who both have a lot of experience IDing that taxa on inat disagree, it is basically a coin flip who will turn out to be right. Most often it is the second to post their ID, because sometimes people go fast and miss things (or mistype) on initial IDing but disagreement has more intentionality. However, in most cases, someone without a lot of experience IDing on inat/from photos is relatively more likely to be wrong regardless of their qualifications. IDing from photos, field IDing, and IDing from pressed specimens are basically three different skill sets and it is possible to be good at any one without automatically being great at the other two.

Posted by wildskyflower 24 days ago

I believe the comments I would have were very thoroughly summed up by others on this thread already, so I won't belabor, but #1) I appreciate the iNaturalist staff attempting to validate the accuracy of observations and #2) would support being invited to future experiments if the validator issues can be worked out. A number of my observations were involved in the experiment, which is fine, but some of them were quite tricky taxa that very few if any are true "experts" so the validators tended to make a bit of a mess of things with overly-general or incorrect IDs. When can I duplicate and delete the originals on these to clean things up?

Posted by jaykeller 23 days ago

If the incorrect IDs have moved the CID to wrong, you can either opt out of CID for those obs, or is there someone you can @mention for another ID?
(Overly general without hard disagreement is 'untidy' but has no effect)
And I do understand the untidy - I would like to return to the 'broad IDs added for these experiments' and delete mine!

Posted by dianastuder 23 days ago

I almost missed the experiment because I didn't check the mail link until April 1. But I got my IDs in before the deadline, anyway.

I was quite comfortable with many of the taxa I received, but on the other hand, there were also a number of species from far away places that I simply don't know. What really stumped me, though, were the larval forms of insects for which I only know the adult forms. Gosh, I think the larvae were insects, but I couldn't even place them to order.

I also found at least 2 plants that came to me as research grade that were clearly cultivated specimens, and no one else had marked them as such yet (I promptly did). Would there be a way to measure such missed cultivated/captive specimens in part of the accuracy testing?

Posted by erikamitchell 23 days ago

I second @erikamitchell's interest in measuring cultivated/captive organisms that aren't yet marked, as well as other DQA dimensions. I wonder about that for the region which I pay attention to. I find observations that have been RG sometimes for years with incorrect dates (easier if you live where there's distinct seasons with clues like snow on the ground), locations and/or wild/captive status. Presumably it's a small % overall, but it's part of the picture of iNat accuracy, and I feel like most identifiers on iNat pay less attention to DQA accuracy compared to their effort in contributing a taxonomic ID.

Posted by muir 22 days ago

"more cautious than they usually would be when IDing random observations."

This is a concern and also the problems caused by being forced to try to ID images we would never otherwise deal with in any capacity. I would never try to identify low quality photos of random plants or unknown "Life" unless instructed to do so for this experiment

Posted by johnascher 20 days ago

I was not involved with this apparent experiment, but I have two observations to add to this discussion. I check any submissions under Oecanthinae everyday, and have offered IDs on thousands of tree crickets, so I feel I have a bit more expertise than the majority of users for this taxa. 1) Why doesn't iNaturalist include a location algorithm? Occasionally I encounter photos where users have chosen a suggestion by AI that are far out of range. i.e. European species guessed for a submission from the US, or an Americas species guessed for a photo submitted in Africa. 2) Is there any way to include a pop-up suggestion that strongly encourages a user to stick to genus if they are not certain of a species? Seems like there could be another category besides Research Grade (which seems like it should only be moved to by true experts). Thank you.

Posted by nan-cee 20 days ago

@nan-cee What are "true experts" I wonder? Seems to be a bit presumptuous.

Posted by jaykeller 19 days ago

Yes, the word "expert" is being overused in this thread, even being abused by some. My mental "credibility alarm" goes off whenever I hear that word.

Posted by trscavo 19 days ago

I happen to know who are experts in Oecanthinae., which is the taxa I am most involved with When I am involved with other insects, I always check the background of the user. I can generally tell who has quite a bit of expertise. The '...... is a naturalist! " is a dead giveaway of a non-expert ;)

Posted by nan-cee 18 days ago

@nan-cee I know multiple very experienced people including researchers who have empty bios, or have left their bio empty for an extended period of time. Not everybody identifies/explains themself in their bio. Plainly, it's not a way to tell if someone is an "expert." Skill is subjective and based on shown experience, which can be in the form of consistently identifying things correctly.

Posted by arman_ 18 days ago

Okay, I understand others disagree. I will unsubscribe.

Posted by nan-cee 18 days ago

I appreciate you sharing your expertise @nan-cee!

Posted by loarie 18 days ago

@arman_ right, "shown experience, which can be in the form of consistently identifying things correctly", some most active identifiers here are doing so by agreeing with IDs in RG observations by many thousands times and they are the top identifiers in certain groups on iNat without actual knowledge on these groups. Even if less radical, in some regions some groups are represented by only one very easily recognizable common species, so it can be correctly identified without knowledge on this group, but as a result person who just know a single common species of this group and have no idea on other, could be among the top identifiers of this group on iNat. Like I'm constantly being tagged to confirm IDs of oaks from Southern Europe, where like 70 species present, but I live in the region with only 1 native species of oak and I have no idea how to distinguish it from others, while from your logic I should be considered an oak expert, as I'm consistently identifying it correctly. Number of correct or improving IDs means nothing in regard of expertise and ability to validate identifications.

Posted by igor117 18 days ago

@igor117 I am well aware and know some people like this which are quite problematic. I never said to simply count the amount of IDs somebody has of a common species and compare it to other people. As I said, it’s completely subjective. Do they have explanations for their IDs/are they able to explain them when questioned? Do they consistently make the same misIDs, i.e. are they spamming agree on common species with the result of repeated mistakes that are easy to catch? It’s about consistently demonstrating knowledge, not consistently pressing A on the Identify tab.

Posted by arman_ 18 days ago

Shown expertise - with helpful comments. Or a brief copypasta.
For example yesterday - harvestmen versus actually a spider.

Posted by dianastuder 18 days ago

Add a Comment

Sign In or Sign Up to add comments