User:DOPBot

Signalment
- Operator: User:Fl.schmitt
- Tasks:
- Initially, DOPBot was to written to add precise coordinates (SDC coordinates of the point of view (P1259) and coordinates of depicted place (P9149)) to ~ 74,000 Orthophotos kindly provided by the Bavarian Agency for Digitisation, High-Speed Internet and Surveying. The files were organized by districts in Orthophotos from Bayerische Vermessungsverwaltung.
- Every file has had the district's geographical center assigned als coordinates of the point of view (P1259) value. The same location was sometimes also hard-coded in the page text using {{Location}}. Using the district's geographic center for every file rendered the geocoding almost useless (the real location might be off about 10 kilometers...).
- To determine the exact location, DOPBot checked the file names for UTM coordinates. E.g., for DOP40_-_Stadt_Nürnberg_32643_5488_(Bayerische_Vermessungsverwaltung).tif, the UTM values are 32 / 643 / 5488, which reads
32U 643000 5488000as UTM coordinate for the lower left corner of the orthophoto. The corresponding decimal geolocation is 49°31′40″N 10°58′34″E / 49.527730°N 10.976109°E. - For exact geocoding, DOPBot uses the center of each image to calculate the coordinates, which is very easy: just add 500 to the UTM northing/easting values. For geocoding DOP40_-_Stadt_Nürnberg_32643_5488_(Bayerische_Vermessungsverwaltung).tif, DOPBot will use the UTM coordinates
32U 643500 5488500, yielding the exact decimal geolocation of the orthophoto: 49°31′56″N 10°59′00″E / 49.532107°N 10.983196°E.
- Additionally, DOPBot will
deleteclear any existing {{Location}} / {{Object location}} template. If those templates are missing, DOPBot will add them without any parameter. Thus, the SDC values will be visible in the page text, too. - DOPBot won't touch an orthophoto at all if there are no UTM coordinates available.
- DOPBot is able to determine the name of the administrative area (country, subdivisions like district and so on) for each geolocation found ("Reverse geocoding" – see below).
- After finishing its initial task, DOPBot continued and still continues to georeference DOPs from other states of Germany as well as NAIP imagery from the USA.
- Initially, DOPBot was to written to add precise coordinates (SDC coordinates of the point of view (P1259) and coordinates of depicted place (P9149)) to ~ 74,000 Orthophotos kindly provided by the Bavarian Agency for Digitisation, High-Speed Internet and Surveying. The files were organized by districts in Orthophotos from Bayerische Vermessungsverwaltung.
- Operation: automatic
- When: one-time for a fixed set of files.
- Maximum edit rate: 4-5 file page edits per minute
- Language: python
- Source code: https://gitlab.wikimedia.org/toolforge-repos/dopbot
Details and Limitations
Positional encoding in file names / Geocoding
DOPBot is currently able to detect positional encoding in the following formats:
- UTM (e.g. "Dop20rgbi 32 425 6004 1 sh 2024")
- EPSG:25832 / EPSG:25833 / EPSG:31468 (e.g. "Bayerische Vermessungsverwaltung - CIR20 - 639000 5511000")
- NAIP Entity IDs (e.g. "M 4207002 ne 19 060 20211024").
DOPBot tries to extract those coordinate values from file names using regular expressions. Python's utm and pyproj libs allow to calculate the lat/lon values (WGS84) required for geocoding on Commons. Lat/lon values are saved as SDC.
Reverse geocoding
To avoid overcrowded top-level categories, DOPBot is able to add specific location-based categories. Thus, it's possible to organize the DOPs e.g. in district-level categories (for example: Orthophotos of Landkreis Forchheim). This required a "reverse geocoding" ability. The first approach consisted of queries against OpenStreetMap's Overpass API, trying to determine the administrative area (level 6) for each processed file. Since this turned out to be quite slow, DOPBot now uses GeoPandas, Shapely and pyproj with locally-available OpenStreetMap GeoPackages (built by extracting boundaries from Openstreetmap dumps (pbf files) using ogr2ogr). This approach avoids remote API calls but requires some additional work since the OpenStreetMap area names don't fit the naming scheme for categories on Commons in every case.
Current limitations
- DOPBot can't reliably detect the offset required to calculate the image's center, since DOPs are provided in different resolution. So, doing some test runs for new datasets is inevitable.
- There's no single/unified datasource for world-wide admin boundaries yet.
ToDo
Features
- ✅
Handle bawü filenames (additional whitespace)done; - ✅
detect district for bawü DOPs - ✅
add {{Map}} template for BoundingBox at correct position of FilePage- implemented but currently not active; - add administrative_layer(5) (Regierungsbezirke) in by and bawü? ✅ for BY, todo for other states.
- Bounding Box??? Test file: DOP40 - Stadt Erlangen 32645 5496 (Bayerische Vermessungsverwaltung).tif as example using both {{Information}} and {{Map}} as file description, while using only the
latitude/longitudeandwarp_statusparameters of {{Map}}. At least, it seems to be possible to set the coords of the four corners exactly – but sadly, it seems that the Wikimap Warper can't handle TIFF files. Thus, settingwarp_statustoskipis required, and there's currently no way to show the DOP as map "layer", or to make use of the "bounding box" otherwise. - ✅
Reduce OSM queries? Save admin level for known UTM coordinates? DB?Use NUTS shapefiles with GeoPandas for spacial queries. - ✅
Check corners if center is outside current area?Use Geopandas nearest() query; - Category management is clumsy and requires adaption of bot code - needs improvement!
- ✅
drop OSM API; - add local boundary data for other countries;
- ✅
keepcountry / keepstate feature!
Content
Finished
- Bavaria:
- Bavaria DOP40 done ✅
- Bavaria DOP20 done ✅
- BY DOP20 CIR: done ✅
- BaWü:
- Saxony: done ✅
- Lower Saxony:
- Lower Saxony (RGB): done ✅
- Lower Saxony (RGBI): done ✅
- Brandenburg: done ✅
- North Rhine-Westphalia:
- North Rhine-Westphalia (RGB): done ✅
- North Rhine-Westphalia (RGBI): done (waiting for further uploads) ✅
- Berlin:
- Berlin INSPIRE RGB 2025: done ✅
- Berlin RGB 2007: done ✅
- Hamburg: done ✅
- Bremen:
- Bremen RGB 2025: done ✅
- Bremen CIR 2025: done ✅
- Schleswig-Holstein DOP20 RGBI: done ✅
Currently working on...
Next / later
- BY:
- HE:
- Hessen 2023 / Hessen 2024: waiting for further uploads;
- USA
- waiting for further uploads;
- Other States? - see also Liste der Orthophotos nach deutschen Bundesländern