Algorithm for finding the Pre-Maverick observations

Algorithm for finding the Pre-Maverick observations:

  • Reduce the set of candidates with this filter:
    quality_grade=needs_id&identifications=some_agree
  • and with this filter (I consider that pre-mavericks below the rank tribe are numerous and need not a special attention, as the "experts" reviewing observations can find them, and because in general a disagreement at the species rank may not be easy to resolve):
    lrank=tribe
  • Get 200 candidate observations with this URL + the filters: https://api.inaturalist.org/v1/observations?
  • Get detailed infos with this URL + 30 observations IDs : https://api.inaturalist.org/v1/observations/
  • For each observation, extract all the identifications from these detailed infos.
  • Ignore identifications that are not "current" (disabled identifications).
  • Skip the observation if one of the identifications is marked "maverick".
  • For every remaining identification, list the taxon ID and the IDs of all its ancestor taxa.
  • For every remaining identication A, check if it is pre-maverick:
    -1- Count how many identifications are identical to A. Twice this number gives how many identifications in disagreement we need to find for confirming that identification A is a pre-maverick.
    -2- Check all other identifications B and count how many identical identifications B are in disagreement with A.
    -3- There is a disagreement between taxa A and B if: A is not B, B is not A, A is not an ancestor of B, B is not an ancestor of A.
    -4- If the number of identical identifications B disagreeing with identifications A reaches the threshold (twice the number of identifications A), then we have a pre-maverick,
    -5- else try again with another B.

This might be still approximative. Should really the set of identifications B (point 4 above) be all identical (exactly the same taxon)? I didn't experiment in details all configurations that can turn an observation into a maverick, if one more ID is added.

Posted on February 12, 2023 09:05 AM by jeanphilippeb jeanphilippeb

Comments

I ignore from Family down, since I know my circle of identifiers filters for Family.

Posted by dianastuder about 1 year ago

Thanks for setting this up. Makes is much easier to find easy to ID but often confused plants among the hard to ID ones.

Posted by lappelbaum about 1 year ago

This is a good idea, however. There really really needs to be an automated way to remove observations from Pre Maverick.
Take this observer (chosen from the top 5 list)
https://www.inaturalist.org/observations?project_id=156949&place_id=any&verifiable=any&captive=any&user_id=carinalochner&hrank=species&view=species
all 7 of these observations are now RG with 4-5 IDs. and they've been so for about a week. So, for that time anyone coming along to help place IDs loads all those pages, only to see no help is needed, and then we've wasted a bunch of bandwidth, and put a huge strain on the inat servers.

So, please automate removal of IDs from Pre-Maverick once they attain RG (or even fall out of your initial selection criteria, at the tribe or whatever level).
If you do this you'll get more of what you want: People looking at two-to-one or four-to-two disagreements.

Posted by lincolndurey about 1 year ago

Thank you Lincoln for your feedback. In the project description there is this invitation to review and identify only the observations that are not RG :

Identify observations in this project that are not yet Research Grade.

If people use the link above (with any other location or taxon filter, as they wish) to filter out the RG observations (about 2400 so far!), then removal of RG observations from the project is not needed. This filter spares the bandwidth and can match the need you have expressed. Moreover, not removing the observations further spares bandwidth, as 1 request to the API is needed for removing 1 observation, and there is no way to do it better. (Anyway, more important is to spare the time spent by the reviewers/identifiers).

For instance, in this post published today, @lynnharper (thank you too!) uses the project with a quality_grade=needs_id filter.

There may be another reason not to remove observations from the project: some persons are interested in the statistics that can be done on observations already treated. (This preference has been expressed with regard to the phylogenetic projects). Statistics about who made identifications? Statistics about the species that were difficult to ID and that could be fixed using the project?

I agree that if people don't filter out RG observations and waste time and/or give up for this reason, then I have to plan to remove RG observations. I just would like not to disappoint people that would prefer not to remove observations.

I am a bit surprised, I would expect that identifiers know the 3 checkboxes in the Quality Grade panel of the Filters popup available from the Identify page.

Posted by jeanphilippeb about 1 year ago

Thanks - these are SO rewarding to work thru.

Posted by dianastuder about 1 year ago
Posted by lappelbaum about 1 year ago

Sorry, I had missed your ID link, that is much better, the usual interface we all use. I was wrongly going at it from the Project page.

Posted by lincolndurey about 1 year ago

Screening of Plantae observations identified at rank Epifamily or above completed.
Now adding observations that are not Plantae, identified at rank Epifamily or above.

Posted by jeanphilippeb about 1 year ago

I am down to the bottom of Cape Peninsula Pre-Mavericks, and can hope to keep up in future.

Will slowly tackle the Western Cape next.

Posted by dianastuder about 1 year ago

i've cleared out about half of the Georgia US Plantae Pre-Mavs, and will branch to the surrounding states soon, just too much fun to be had! @pucak @mjpapay @joedziewa @sedgequeen have you seen this:
https://www.inaturalist.org/observations/identify?quality_grade=casual,needs_id&project_id=156949
or know anyone else to tell about it ?

Posted by lincolndurey about 1 year ago

I have now.... I'll take a look.

Posted by pucak about 1 year ago

Thank you!

Posted by star3 about 1 year ago

Thanks!

Posted by jeanphilippeb about 1 year ago

Just wanted to say I'm really enjoying going through these for my areas. So many easy ones that can be tipped one way or the other. Found quite a few that were easy IDs that had been lingering in Needs ID for the past 5 years or more. Thanks for pulling them all together into a project!

Posted by annkatrinrose about 1 year ago

Yesterday - I finally got down to the bottom of the Western Cape plants. Going back about 10 years, and including the glitches when South Africa moved from iSpot.

Posted by dianastuder about 1 year ago

First screening completed, worldwide, down to the family rank.
Starting a second screening, down to the subfamily rank.

Posted by jeanphilippeb about 1 year ago

Screening down to the subfamily rank completed.
It took a bit more than a month.
Restarting an identical screening.

Posted by jeanphilippeb 12 months ago

Nicely trickling thru, just a few a day, so I can keep up.
Thanks

Posted by dianastuder 12 months ago

Screening down to the subfamily rank completed.
It took about 10 days.
I wait before starting a 3rd screening, for sparing ressources and collecting more observations the next time.

Posted by jeanphilippeb 11 months ago

Still working thru South Africa beyond the Western Cape.
And Africa starting with animals.

Posted by dianastuder 11 months ago

Flipping the Identify link to show everything - just over half a million obs!!

Posted by dianastuder 11 months ago

What are "Maverick" and "Pre-maverick" observations?

Posted by arbiess 11 months ago

Helpful algorithm, thank you for creating it. If possible, as IDers move through these please mark captive/cultivated. I’m noticing a higher than usual ratio of cultivated plants for my region captured in this project.

Posted by catchang 9 months ago

I've been going through a lot of old plant obs with two people disagreeing leaving it at Dicot, Flowering Plant, Vascular Plant, etc. I'm agreeing with one of those people so now it is two to one. When might those obs be added to this project so that we can get three to one?

Posted by lappelbaum 9 months ago

3 months since the last update.
I still have a lot of African ones to clear - down to the last thousand - busy with the dicot slice now.

@jeanphilippeb might you do it before the Great Southern Bioblitz? 24-27 November

Posted by dianastuder 9 months ago

Good idea! I just started a new run. Thank you for the reminder.

Posted by jeanphilippeb 9 months ago

Run completed.
505.000 → 517.000 observations in project.

+12.000 observations, in 3 months.
Growth of pre-mavericks: +0.8%/month (rate likely overestimated, because 517.000 is underestimated, because some pre-mavericks in the past have been solved before the project was created).
Growth of observations: +2.9%/month (assuming a doubling every 2 years).
→ Pre-mavericks grow slower than all observations.

Pre-mavericks are identification disagreements.
→ There are relatively less and less identification disagreements.

Do less disagreements mean improved identifications (else a convergence of the mistakes of iNatters)?
→ A [good] consequence of an always improving CV?

Posted by jeanphilippeb 8 months ago

'Pre-mavericks grow slower than all observations'

They need 3 identifiers to have added IDs. That is already a much smaller slice.

Posted by dianastuder 8 months ago

Yes, but this is not related to the growth.
A small slice may grow at the same rate (and remain a small slice) as the whole set.

Posted by jeanphilippeb 8 months ago

So not only light at the end of the tunnel, but reaching the end of the tunnel? I have another 430 for Africa.

Posted by dianastuder 8 months ago

Very good! Thank you very much for your efforts for identifications.

Posted by jeanphilippeb 8 months ago

Third run was 12K obs.

And the first and second runs were how many each?

Posted by dianastuder 8 months ago

I don't know.

Posted by jeanphilippeb 8 months ago

@tonyrebelo you may find JP's numbers (in an earlier comment) interesting

505.000 → 517.000 observations in project.
+12.000 observations, in 3 months.

We have cleared the backlog and I am down into the last hundred for Africa. Clearing the decks for GSB.

Posted by dianastuder 8 months ago

Just copy the url here, so that I can relate to it:
s. Afr: https://www.inaturalist.org/observations/identify?quality_grade=needs_id&project_id=156949&place_id=113055

So I get at present we have 8,337 pre-maverick observations that need ID (or 64% of 13,107) for southern Africa. (4% are casual).
Of course, that does not solve the issue of which are still pre-mavericks at a higher level, and which have been fixed (at least not without looking at them one by one).
So it would be nice to have a filter for tells one that the pre-mavericks are no longer pre-mavericks.
And I would love a filter that allows one to determine if the non-mavericks are at species level or at a higher level (i.e. not yet capable of potentially being pushed to RG).

Thanks. If you are down to the last 100 for Africa, then you must be really relieved. What will your next Gigantic Task be?

Posted by tonyrebelo 8 months ago

I have a queue.
Then GSB.

((Broad planty IDs for Cape Peninsula, one day)) - at very least to retrieve the blindingly obvious ones!

Posted by dianastuder 8 months ago

Ah: I see I need a project for you too.

Posted by tonyrebelo 8 months ago
Posted by tonyrebelo 8 months ago

I will consider moving observations that are not anymore pre-mavericks to another "Pre-Maverick Archive" project.

Posted by jeanphilippeb 8 months ago

Thanks
That would be useful, as there is no easy way of distinguishing between those no longer "pre-maverick".

Posted by tonyrebelo 8 months ago

That would also show the many which are pointed in the right direction, if not yet RG.

Posted by dianastuder 8 months ago

New project: Pre-Maverick Archive.
Populating this project might take 3 weeks.

Posted by jeanphilippeb 8 months ago

These taxon swaps (affecting species in the genus Prosopis) generated new pre-mavericks.
See this observation (as exemplified by @kevinfaccenda). It was Research Grade and after the swap it becomes a pre-maverick, at the rank Subfamily.

Posted by jeanphilippeb 8 months ago

Thanks for the heads up, I've been waiting for the swap to finish on P. glandulosa before I swap the genus Prosopis into the subfamily.

Posted by kevinfaccenda 8 months ago

The initial run for populating the Pre-Maverick Archive project has completed (after 11 days).

I fix and run the bot again because it has ignored observations currently marked as "the community taxon is as good as it can be", but it should not ignore them, it has to check all observations in the "Pre-Maverick" project. (I found this issue because there was too many "species" left in the "Pre-Maverick" project after the initial run).

Posted by jeanphilippeb 7 months ago

Just under 400K Pre-Mavericks remain
118K resolved and moved to the Archive

Posted by dianastuder 7 months ago

Might you run the Pre-Mavericks for us again before the upcoming CNC?

Posted by dianastuder 23 days ago

I am populating the Pre-Maverick project everyday for a few hours, together with the "unknowns" projets for about 10 hours a day.

Posted by jeanphilippeb 23 days ago

Thank you!

Posted by dianastuder 23 days ago

Add a Comment

Sign In or Sign Up to add comments