Дорогие друзья!
Вчера и сегодня я провёл некоторое время в изучении веб-версии базы данных Pl@ntNet, которая является основным конкурентом iNaturalist в категории "Приложения для автоматического распознавания растений", а с недавнего времени в категории "Публикация данных в GBIF".
Dear friends!
Yesterday and today I spent some time exploring the web version of the Pl@ntNet database, which is the main competitor to iNaturalist in the automated plant recognition apps and more recently as active GBIF publisher. English translation is given below.
Необходимость такого изучения была связана с тем, что Pl@ntNet обновил свои массивы в GBIF:
Two datasets contain 63,132 records from the mainland Russia (https://www.gbif.org/occurrence/search?country=RU&publishing_org=da86174a-a605-43a4-a5e8-53d484152cd3) and 2,640 records from the Crimea (https: //www.gbif.org/occurrence/search?publishing_org=da86174a-a605-43a4-a5e8-53d484152cd3&gadm_gid=UKR.20_1&gadm_gid=UKR.4_1)
Accordingly, I got acquainted with their database through this (single) entry point: https://identify.plantnet.org/. General impression: the Pl@ntNet is large, very promising, but it is absolutely impossible to work with it (yet?). It is a pity for people who upload their data to Pl@ntNet, because this is a completely one-sided process: you can put something, but you can't take something from there. However, most Pl@ntNet users use it exclusively as an automatic identifier, not even knowing that their data is going somewhere and something can be done with it.
Now, some particular issues (of which there are sooooo many). I will try to be objective, although this objective reality is definitely on the side of iNaturalist. I advise the curious ones to register on Pl@ntNet and try to simulate their activities on iNaturalist in a new environment. This is roughly what I was doing. Observations and short notes are given below.
1) Projects. You can create them, but only by contacting the Pl@ntNet head office. There are few projects - 6 general and 19 geographic. They have just started developing micro-projects, there are eight of them. The list is here: https://identify.plantnet.org/. General and geographic projects have partner organizations.
2) There is no single entry point to the whole database (something like https://www.inaturalist.org/observations). The "Explore" buttons are situated in the project panels. At the same time, it is unclear how the projects relate to each other: in the "World Flora" project, which should include everything, there are 4,258,944 images. But there are 10,429,008 entries in GBIF. My guesses are below.
3) In "observation" (example - https://identify.plantnet.org/the-plant-list/observations/1006029539, same in GBIF - https://www.gbif.org/occurrence/2974487841) within the Pl@ntNet database, locality and even administrative unit is hidden, but accessible through GBIF, if observation is there. It is unclear how coarse this location is. For example, on view maps (for example, https://identify.plantnet.org/the-plant-list/species/Asphodeline%20lutea%20(L.)%20Rchb./data) points are mapped with an accuracy of 1 km with a warning disclaimer.
4) In general, an observation on Pl@ntNet contains very little information: photo, species name, function of anonymous identification, rough quality assessment (for / against), author, date. There is a tab with a discussion of the record. That's all.
5) Pl@ntNet web version is available in three languages: English, Chinese, French. There is no database of national (e.g. Russian) vernacular plant names.
6) In order to define something, you need to choose a species, scroll to a mosaic of photos and flip through them, moving, if necessary, from a photo leaflet to an observation, where you can enter the correct name. At the same time, let me remind you, there are no geographic filters.
7) Cultivated and wild plants are mixed. You cannot separate one from the other manually. Seems anachronistic.
8) Datasets of Pl@ntNet in GBIF consist of two parts: https://www.gbif.org/dataset/7a3679ef-5582-4aaa-81f0-8c2545cafc81 (794K observations from the database, have an "o-" index in the identifier) and https://www.gbif.org/dataset/14d5676a-2c54-4f94-9023-1e8dcd822aa0 (9.6M observations, they are not in the database - these are ephemeral enquiries for automatic identification from users who do not save the image to the database later, but the machine identifies a plant with a confidence of 90% and higher, they have an index "q-" in the identifier). The darkest thing in this story is whether Pl@ntNet stores, in principle, photographs of such ephemeral queries (in the summer, their number was estimated at 150M throughout its existence) or only a short log. In any case, it is impossible to check the correctness of determining the observations of the second dataset - there are no pictures for them either in GBIF or in Pl@ntNet.
9) Data reliability check. I took 48 observations of the flora of Krasnodar Krai (Black Sea coast, Russia) from GBIF in a row from the first Pl@ntNet dataset (out of 173 observations, which are listed as "manually verified"). Incorrectly identified: 13 observations (27%). Surely cultivated: 12 observations (25%). You cannot go inside the base and mark cultural ones. There is simply no such function. Taxonomy: in an attempt to make a correct identification, I could not find Erythronium caucasicum and Rubus hirtus even among synonyms. There is generally one accepted Erythronium. But I pulled Galeobdolon caucasicum from their database.
10) There are taxonomic authors in taxonomy. This is absolutely correct. That is what we are missing on iNaturalist!
11) Another unambiguous advantage of Pl@ntNet: when the AI engine is running, the confidence is indicated as a percentage. I tried it on a well-filmed adoxa moschatellina (https://identify.plantnet.org/the-plant-list/observations/1003554108 ): 99.79% confidence, no other options suggested. iNaturalist, as you know, will show you: "We are almost sure that this belongs to the genus Adoxa" and Adoxa moschatellina as a top-species superseded by seven more stray species. On Pl@ntNet, in unclear cases, ten species are offered (with confidence percentages), then you can see the next ten, etc. This point is definitely better outworked on Pl@ntNet. Although the story with the suggestion of a genus, family or tribe on iNaturalist is a very strong feature for species unknown to AI.
12) Only 4 photos per observation are possible on Pl@ntNet - no more. This is rather a minus. It is imperative to indicate which part it is (organ or general view).
13) I dug up the shooting recommendations for Pl@ntNet: formally, the quality requirements are high (but scrolling through the base I realized that they are not always followed). Here is the official infographic instruction with puking emoticons: https://plantnet.org/wp-content/uploads/2017/08/bonne_photoen@0.5x.png. Extremely controversial. Especially the postulate: "This is not a plant, this is a hand holding a plant". How much time and effort I saved by photographing spikelets of grasses and sedges, pinching them between my fingers!
14) Let's go back to point 2. How is the number of photos in the project counted? Apparently, they add photographs of all species from the basic checklist of the project (for example, all photographs from all over the world of all species listed for the USA). It turns out that there are 426.5K photos of 698 species in the Comoros project. But in fact... just 22 photos in GBIF (https://www.gbif.org/occurrence/search?country=KM&dataset_key=7a3679ef-5582-4aaa-81f0-8c2545cafc81 ).
15) No project journals, no personal journals, no forum, no various statistics, no csv uploads, no shapefile uploads, no calendar, no profile, no notifications and dashboard, no bulk uploads, no bioblitzes.
In conclusion, I just want to say: comrades, if any of your friends and acquaintances use Pl@ntNet, they need to be urgently and by any means dissuaded from doing this. These people can bring much more contribution to science with the help of iNaturalist. A couple of years ago, Pl@ntNet was much better at AI identifying plants from Russia, since it "grew" from France. Now iNaturalist, on average, has caught up in the quality of automatic definitions, and for Russia, due to the noticeable growth of the base and the use of geographic hints (there are none in Pl@ntNet), it makes them much more confident.
Perhaps, in this post I will add something. The link to this post can be shared to convince doubters.
Comments
Хорошее сравнение.
Несколько иная таксономия принята в базе Pl@ntNet, чем здесь; например, отсутствует род Pulsatilla, он включен в Anemone.
Вопрос не в тему возможно, но есть ли русскоязычный форум или группа в ФБ для обсуждения работы iNat. К примеру, вчера перестала открываться фотогалерея к видам. Как быть?
специализированного форума или группы в ФБ не знаю;
однако есть группа GBIF : https://www.facebook.com/groups/477172382940229/
и там , на мой взгляд, можно обсуждать проблемы c Inat;
или на англоязычном форуме, здесь, на inat, пишите;
Спасибо за информацию.
Good comparison of the two platforms! You are a very thorough investigator to have found my old blog article reviewing the two platforms. I continue to use iNaturalist with my students for a reason not cited above. iNaturalist can handle any organism, not just plants. As a result my students can use iNaturalist in their other life science courses, such as marine biology and general biology. iNaturalist also connects my students to identifiers in the region under whom they could eventually study as they move on in college. Thanks for the note on my blog!
Add a Comment