September 10, 2023

Improving the algorithm?

I wonder if the algorithm should be changed in order to add this observation to the "Pre-Maverick" project:

This observation has Community ID (CID) = Dicots.

If an ID Family Asteraceae is added, then the CID becomes Family Asteraceae and the 1st ID becomes Maverick.
If an ID Montanoa leucantha is added then the CID becomes Genus Montanoa and the 1st ID becomes Maverick.

So, it's truly a Pre-Maverick (one more ID is enough to make it a Maverick), but the CID cannot be fixed at rank species with only one more ID.

Shall similar observations be added to the Pre-Maverick project?

Posted on September 10, 2023 06:19 AM by jeanphilippeb jeanphilippeb | 16 comments | Leave a comment

February 12, 2023

Algorithm for finding the Pre-Maverick observations

Algorithm for finding the Pre-Maverick observations:

  • Reduce the set of candidates with this filter:
    quality_grade=needs_id&identifications=some_agree
  • and with this filter (I consider that pre-mavericks below the rank tribe are numerous and need not a special attention, as the "experts" reviewing observations can find them, and because in general a disagreement at the species rank may not be easy to resolve):
    lrank=tribe
  • Get 200 candidate observations with this URL + the filters: https://api.inaturalist.org/v1/observations?
  • Get detailed infos with this URL + 30 observations IDs : https://api.inaturalist.org/v1/observations/
  • For each observation, extract all the identifications from these detailed infos.
  • Ignore identifications that are not "current" (disabled identifications).
  • Skip the observation if one of the identifications is marked "maverick".
  • For every remaining identification, list the taxon ID and the IDs of all its ancestor taxa.
  • For every remaining identication A, check if it is pre-maverick:
    -1- Count how many identifications are identical to A. Twice this number gives how many identifications in disagreement we need to find for confirming that identification A is a pre-maverick.
    -2- Check all other identifications B and count how many identical identifications B are in disagreement with A.
    -3- There is a disagreement between taxa A and B if: A is not B, B is not A, A is not an ancestor of B, B is not an ancestor of A.
    -4- If the number of identical identifications B disagreeing with identifications A reaches the threshold (twice the number of identifications A), then we have a pre-maverick,
    -5- else try again with another B.

This might be still approximative. Should really the set of identifications B (point 4 above) be all identical (exactly the same taxon)? I didn't experiment in details all configurations that can turn an observation into a maverick, if one more ID is added.

Posted on February 12, 2023 09:05 AM by jeanphilippeb jeanphilippeb | 50 comments | Leave a comment

Archives