User:Olafbot

Category:Commons bots#Olafbot Category:Commons wiki.java bots#Olafbot Category:Commons upload bots#Olafbot
  • Operator: This bot is operated by Olaf. The best way of contact is to write on my page in Polish Wiktionary or via e-mail.
  • Tasks: Maintaining lists for Lingua Libre. Bot is editing only pages prefixed with Commons:Lingua Libre/ or User:Olafbot/
  • Operation: Bot runs automatically
  • When: Usually once a day, but in the future it may run a few times a day
  • Maximum edit rate: theoretical limit rate is set as every 5 second, but usually preparing the data for each list takes longer
  • Language: Java, own code using modified version of MER-C library

Current work

The bot continuously updates various lists of missing audio recordings for Lingua Libre. The lists can be selected using "Local list" button in the Lingua Libre recorder:

The bot had been maintaining these lists on the Lingua Libre wiki since 2021 and has now been migrated to Commons alongside the rest of the project. The bot is much more active on the Polish Wiktionary.

Lists named "Entries-without-audio-sorted-by-number-of-wiktionaries" are created in the following way:

  • For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing since 2011, generating different lists for Polish Wiktionary.
  • Titles written in wrong alphabets are removed.
  • Titles containing uppercase letters are removed, except German, because of a bug in Lingua Libre, which makes recording uppercase entries problematic.
  • Entries with audio recording in Commons are also removed from this set. Not only files created with LiLi are removed, but also other recordings found in the "pronunciation" category for a given language or in its subcategories.
  • For a few languages, minor corrections are done, in order to extract the set of dictionary entries, if possible without inflected forms.
  • Items from a corresponding exclusion list (see below) are removed.
  • The resulting list is sorted descending by the number of dictionaries and limited to 380 entries.
  • The frequency of the refreshment is about one day.

Lists maintained: afr, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue.

Sometimes a list may contain an error. There is no point in removing it manually from the list, because the bot is going to add it again in the next pass. Instead, you can put the erroneous word on the exclusion list. Such a list is maintained separately for each language. Items from the exclusion lists are removed automatically from a corresponding "Entries" list.

The exclusion lists: afr, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue

Previous work

± This user is in fact a bot.
Category:Commons bots#Olafbot
Category:Commons bots Category:Commons upload bots Category:Commons wiki.java bots