Predicting species observations on iNat

One of my 2017 goals is to learn a new way to contribute to iNat. I don't know if this venture will achieve that aim, but I would like to try.

I do iNaturalist entirely for fun, but I'm an ecologist by training and work in wildlife conservation, so my orientation is to think about when and where species occur, and how that information could be applied or converted into useful metrics. I'm weird that way! Here's some wonky questions that I think about:

-- How could you calculate the probability that an iNat observation will be recorded for a particular species during a visit to a particular place? During a particular month?
-- What predictive value might an observation probability have? How could you define predictive value?
-- How might the predictive value differ along a gradient of common to rarely observed species? Along a gradient of frequently to rarely visited places?
-- What is the minimum number of iNat observations necessary to calculate a probability with sufficient predictive value? The minimum number of iNat users?
-- How does species presence relate to species detection relate to species iNat observation?
-- How does the profile of an individual iNat user's activity change the observation probability? What are some of the reasons why an iNat user might detect a species but not make an iNat observation?
-- How could you calculate the probability of an iNat observation in a place where a species has not been observed yet?
-- How could existing range maps improve the calculation of species probabilities? Maps of habitat, ecosystem, and other bio-physical and human footprint features?
-- How could you generate a range map based on iNat observations and observation probabilities?
-- How could you use probability of species observations to incentivize iNat users to go outside? To incentivize visits to particular places? To look for particular species at particular times?
-- So what? Outside of iNat, how might this information actually be used by people interested in nature, science, philanthropy, and conservation?

If history is any guide, Scott @loarie or someone else will gently tell me that this feature is already under development, or some other group or entire discipline of scientists is already pursuing these questions. And at least in the latter case, that's absolutely correct! There is a LOT of literature and experts on species detection probabilities, species distribution models, and using Bayesian probability statistics to map where species occur. (Experts that include our very own Scott and probably several others on here!) So, acknowledging up front that I'm out of my depth and at high risk of looking foolish, I am putting my belief out there for your scrutiny that pursuing these questions in the context of iNaturalist would add substantial value to existing iNat activity and has the potential to make a novel contribution.

So, is anyone else curious about this stuff?


[images: Southern Leopard Frog (Lithobates sphenocephalus) and Rabid Wolf Spider (Rabidosa rabida) observed in McKinney Roughs Nature Park, TX; Locust Borer Beetle (Megacyllene robiniae) in Sky Meadows State Park, VA]

Posted on January 28, 2017 02:22 AM by muir muir

Comments

Matt I think this is a very interesting idea and I would love to participate in this.

Posted by vijaybarve about 7 years ago

Matt, I believe the distribution of the species is important and I would take into account all available infos on that.

But also the different observers and observer communities are important for such a prediction. Some people contribute a lot from all taxa, others a few records from the groups they know well. For some taxa, such as most vertebrates, the community will find an ID very fast, others may struggle to find someone being able to confirm an ID. iNat also seems to be geographically unevenly distributed, due to various reasons like language, internet access, other well established platforms, etc.

Finally, from our experience in the African Plants photo guide and this is much the same for plant records here, I have the impression that photo observations more or less ignore some groups like grasses and sedges, that are often impossible to identify from photos and even hard to get a decent photo of, but on the other hand photos rather overrepresent impressive trees, showy flowers etc. So it may be interesting also to take systematic groups and traits into account in that analysis. (For Burkina, we once compared records from collections and vegetation surveys, and the patterns showed quite different preferences).

I would be interested to contribute, if I can.

Posted by marcoschmidtffm about 7 years ago

Many thanks @vijaybarve @marcoschmidtffm for the encouraging comments. In future journal posts, I will start tagging you and others who express an interest (just message me if you would prefer that I do not).

I would love to participate in this Thanks Vijay. Let's talk. Anyone who is interested in working on this is free to email me at my gmail address: muirmatthewj

I believe the distribution of the species is important and I would take into account all available infos on that. Absolutely. But I would like to start with just information available/generated on iNat. I suspect there are more and better efforts out there creating distribution maps based on multiple sources of data. I am curious to see how far one can go with just iNat data.

But also the different observers and observer communities are important for such a prediction. Agreed. Understanding how individuals differ in their observations (i.e., what I would refer to when describing or defining a user's "profile") is important to this effort. I am curious, however, if there is a particular number of observers where individual tendencies start to average out and matter less. Related questions copied from above: "What is the minimum number of iNat observations necessary to calculate a probability with sufficient predictive value? The minimum number of iNat users? How does the profile of an individual iNat user's activity change the observation probability?"

Some people contribute a lot from all taxa, others a few records from the groups they know well. For some taxa, such as most vertebrates, the community will find an ID very fast, others may struggle to find someone being able to confirm an ID. iNat also seems to be geographically unevenly distributed, due to various reasons like language, internet access, other well established platforms, etc.....it may be interesting also to take systematic groups and traits into account in that analysis. Agreed absolutely again. For species that are challenging to ID from photos, or lack expertise on iNat, or for places with weak iNat coverage, this effort is unlikely to help. It will be better to start, I think, to try to predict common/conspicuous/charismatic species in regularly-visited places. Admittedly, that may not be of great interest to people who are more orientated to rare and neglected taxa, or who are making observations in an area with few other iNat users. I am interested, however, in how one might change the motivation or incentives for people to stop ignoring things like grasses and sedges, and to fill in data gaps in space and time. There is surely lots of ways to do that, that are totally unrelated to this proposal, but I have tried to capture it in a question at the end: "How could you use probability of species observations to incentivize iNat users to go outside? To incentivize visits to particular places? To look for particular species at particular times? " I'm interested to learn more about your experience with the African Plants photo guide.

Posted by muir about 7 years ago

A couple proposed next steps:

Download the data: http://www.inaturalist.org/observations/export
-- My instinct is to start with a United States dataset, 2013-2016. Restrict analysis to US counties with at least one iNat observation. Divide remaining counties into five groups based on iNat activity to capture a range of regularly-visited to rarely-visited places. Within each group, randomly select 10 counties for analysis. Follow similar process to randomly select species to capture a range of frequently- to rarely-observed species. Comments/suggestions/critiques welcome on the right dataset for this effort.

Start a journal post on terminology to refine/improve the framing and concepts behind some of the questions that I and others are interested in, and to link to existing definitions.
-- A term or unit of analysis that I haven't discussed so far, but I suspect is fairly essential to define, is effort and trips. My instinct is that if someone makes 1 observation, or 50, of the same species in a single day in a single place, it should count the same amount of effort --> one species / day trip to Place-X.

Posted by muir about 7 years ago

Add a Comment

Sign In or Sign Up to add comments