User:DOPBot

Signalment
- Operator: User:Fl.schmitt
- Tasks:
- DOPBot will add precise coordinates (SDC coordinates of the point of view (P1259)) to ~ 74,000 Orthophotos kindly provided by the Bavarian Agency for Digitisation, High-Speed Internet and Surveying. The files are currently organized by districts in Orthophotos from Bayerische Vermessungsverwaltung.
- Currently, every file has the district's geographical center assigned als coordinates of the point of view (P1259) value. The same location is sometimes also hard-coded in the page text using {{Location}}. Using the district's geographic center for every file renders the geocoding almost useless (the real location might be off about 10 kilometers...).
- To determine the exact location, DOPBot will check the file names for UTM coordinates. E.g., for DOP40_-_Stadt_Nürnberg_32643_5488_(Bayerische_Vermessungsverwaltung).tif, the UTM values are 32 / 643 / 5488, which reads
32U 643000 5488000as UTM coordinate for the lower left corner of the orthophoto. The corresponding decimal geolocation is 49°31′40″N 10°58′34″E / 49.527730°N 10.976109°E. - For exact geocoding, DOPBot will use the center of each image to calculate the coordinates, which is very easy: just add 500 to the UTM northing/easting values. For geocoding DOP40_-_Stadt_Nürnberg_32643_5488_(Bayerische_Vermessungsverwaltung).tif, DOPBot will use the UTM coordinates
32U 643500 5488500, yielding the exact decimal geolocation of the orthophoto: 49°31′56″N 10°59′00″E / 49.532107°N 10.983196°E.
- Additionally, DOPBot will
deleteclear any existing {{Location}} / {{Object location}} template. If those templates are missing, DOPBot will add them without any parameter. Thus, the SDC values will be visible in the page text, too. - DOPBot won't touch an orthophoto at all if there are no UTM coordinates available.
- DOPBot is able to determine the name of the administrative area (country, subdivisions like district and so on) for each geolocation found ("Reverse geocoding"). Thus, DOPBot may add location-specific categories to the geocoded files (this still need manual configuration of the bot's source code). To do so, DOPBot does spatial queries using the GeoPandas library over NUTS data provided by the EU. This approach yields faster results compared to querying the Overpass API of OpenStreetMap.
- DOPBot will add precise coordinates (SDC coordinates of the point of view (P1259)) to ~ 74,000 Orthophotos kindly provided by the Bavarian Agency for Digitisation, High-Speed Internet and Surveying. The files are currently organized by districts in Orthophotos from Bayerische Vermessungsverwaltung.
- Operation: automatic
- When: one-time for a fixed set of files.
- Maximum edit rate: 4-5 file page edits per minute
- Language: python
- Source code: https://gitlab.wikimedia.org/toolforge-repos/dopbot
Details and Limitations
Positional encoding in file names / Geocoding
DOPBot is currently able to detect positional encoding in the following formats:
- UTM (e.g. "Dop20rgbi 32 425 6004 1 sh 2024")
- EPSG:25832 / EPSG:25833 (e.g. "Bayerische Vermessungsverwaltung - CIR20 - 639000 5511000")
- NAIP Entity IDs (e.g. "M 4207002 ne 19 060 20211024").
DOPBot tries to extract those coordinate values from file names using regular expressions. Python's utm and pyproj libs allow to calculate the lat/lon values (WGS84) required for geocoding on Commons. Lat/lon values are saved as SDC.
Reverse geocoding
To avoid overcrowded top-level categories, DOPBot is able to add specific location-based categories. Thus, it's possible to organize the DOPs e.g. in district-level categories (for example: Orthophotos of Landkreis Forchheim). This required a "reverse geocoding" ability. The first approach consisted of queries against OpenStreetMap's Overpass API, trying to determine the administrative area (level 6) for each processed file. Since this turned out to be quite slow, DOPBot now uses GeoPandas and shapely with locally-available NUTS Geopackages (DE/EU) provided by the Federal Agency for Cartography and Geodesy and the European Comission (GISCO). This approach avoids remote API calls but requires some additional work since the NUTS area names don't fit the naming scheme for categories on Commons in every case.
For areas outside the European Union, DOPBot uses Geopackages built by extracting boundaries from Openstreetmap dumps (pbf files) using ogr2ogr.
Current limitations
- DOPBot is currently restricted to square images (expanding for all rectangular media should be a low-hangig fruit)
- It can't reliably detect the offset required to calculate the image's center, since DOPs are provided in different resolution. So, doing some test runs for new datasets is inevitable.
- There's no single/unified datasource for world-wide admin boundaries yet.
ToDo
Features
- ✅
Handle bawü filenames (additional whitespace)done; - ✅
detect district for bawü DOPs - ✅
add {{Map}} template for BoundingBox at correct position of FilePage- implemented but currently not active; - add administrative_layer(5) (Regierungsbezirke) in by and bawü? ✅ for BY, todo for other states.
- Bounding Box??? Test file: DOP40 - Stadt Erlangen 32645 5496 (Bayerische Vermessungsverwaltung).tif as example using both {{Information}} and {{Map}} as file description, while using only the
latitude/longitudeandwarp_statusparameters of {{Map}}. At least, it seems to be possible to set the coords of the four corners exactly – but sadly, it seems that the Wikimap Warper can't handle TIFF files. Thus, settingwarp_statustoskipis required, and there's currently no way to show the DOP as map "layer", or to make use of the "bounding box" otherwise. - ✅
Reduce OSM queries? Save admin level for known UTM coordinates? DB?Use NUTS shapefiles with GeoPandas for spacial queries. - ✅
Check corners if center is outside current area?Use Geopandas nearest() query; - Category management is clumsy and requires adaption of bot code - needs improvement!
- ✅
drop OSM API; - add local boundary data for other countries;
- ✅
keepcountry / keepstate feature!
Content
Finished
- Bavaria:
- Bavaria DOP40 done ✅
- Bavaria DOP20 done ✅
- BY DOP20 CIR: done ✅
- BaWü: done ✅
- Saxony: done ✅
- Lower Saxony:
- Lower Saxony (RGB): done ✅
- Lower Saxony (RGBI): done ✅
- Brandenburg: done ✅
- North Rhine-Westphalia:
- North Rhine-Westphalia (RGB): done ✅
- North Rhine-Westphalia (RGBI): done (waiting for further uploads) ✅
- Berlin:
- Berlin INSPIRE RGB 2025: done ✅
- Berlin RGB 2007: done ✅
- Hamburg: done ✅
- Bremen:
- Bremen RGB 2025: done ✅
- Bremen CIR 2025: done ✅
- Schleswig-Holstein DOP20 RGBI: done ✅