Misidentification of Erodium observations in CA - Updated

This is an update since the last time metrics were calculated for this genus in December 12, 2022. Notice that the cumulative number of misidentifications have dropped from 10.6% to 9.3%. For the time period between 2022-12-12 to 2024-01-17, the mis-ID rate was 5.5%, which is a significant improvement over the past year.

The following chart shows the number of observations for each species within the genus Erodium in California and the number of times the species has been misidentified in iNaturalist.

The total Research Grade IDs are shown in green and the number of misidentifications are in red. Misidentification data was obtained from the 'Taxa Info' page in iNaturalist under the 'Similar Species' tab. The data are further filtered to include only observations in California.

Improvements were calculated by subtracting counts for 'Observations' and 'Incorrect IDs' between 2022-12-12 to 2024-01-17.

Erodium-MisID-2024-01-17

Posted on January 17, 2024 02:00 PM by truthseqr truthseqr

Comments

Some great work! Thanks for all that you do.

Posted by catchang 5 months ago

very cool - what do you think dropped the % Mis ID'd from 10 to 7%?

Posted by loarie 5 months ago

@loarie, I've been monitoring this genus since I first noticed all the misidentifications and contacted @silversea_starsong for help. He mentored me and was instrumental in starting this project (as well as the ones for Geranium and Robertium) to fix all the misidentified observations. I relied on him heavily until I got up to speed. He's the expert for this project along with @jrebman who has identified over 7,000 Erodium observations and corrected many of them (including some of my mistakes).

The improvement is probably much better than 3%. I think we'd have to only analyze the data from 2022-12-12 to present. I'm not sure the 'Taxa Info' page could report the misidentifications for such a subgroup of observations.

Posted by truthseqr 5 months ago

@loarie, I thought about this today and figured out how to calculate metrics for just the data between 2022-12-12 to the present. As I suspected, the improvement is much better than previously presented: the ID error rate dropped from 10.6% to 5.5%. I updated the chart above accordingly.

I think the improvement is due to a thorough review of all Erodium data for CA and correction of mistakes by the top reviewers. Also, I think education plays a role as well. When I encounter a misidentification, I leave a list of references for the identifier to read to improve their skills.

And a huge factor is... once the majority of misidentifications are fixed, the iNat CV algorithm gives more accurate suggestions.

Posted by truthseqr 5 months ago

Yikes!!! I made a mistake in the calculation for "Difference in %Mis-Id". Previously the chart showed that number to be 5.5%. Today I noticed the calculation error and I changed it to 1.4% (10.62 minus 9.27). Sorry about that. Still, it's an improvement over last year.

Posted by truthseqr 4 months ago

Please disregard that last message.
I should've written myself a note so I could understand what I was calculating...

Here's an explanation:

Under the "Improvement" columns there are counts taken at timepoints 2022-12-12 and 2024-01-17.
** The "Observations" column is the total number of observations in iNat at the given timepoint.
** The "Incorrect ID" column is the sum of misidentifications calculated from the Taxon --> Similar Species tab at the given timepoint.
** The %Mis-ID'd column is the percentage of Incorrect IDs divided by Observations.

The "Difference/Observations" cell contains the difference between total count taken at timepoint 2024 minus total count taken at timepoint 2022. In other words, the number of observations added since 2022-12-12.

The "Difference/Incorrect ID" cell contains the difference between the Incorrect IDs calculated at timepoint 2024 minus the Incorrect IDs calculated at timepoint 2022. In other words, the number of misidentifications between 2022 and 2024 (i.e., new mis-IDs since the last metrics were posted).

So, there are two ways to look at the "Difference / %Mis-IDs":

10.6% - 9.3% = 1.35 rounded to 1.4%
Alternatively, we can look at the percentage of misidentifications for the period of time between 2022 and 2024, which is 401 divided by 7,239 = 5.5%

The old brain ain't what it used to be...

Posted by truthseqr 4 months ago

Add a Comment

Sign In or Sign Up to add comments