User:DOPBot

Category:Commons Toolforge bots#DOPBot Category:Commons bots#DOPBot Category:Commons pywikibot bots#DOPBot Category:Commons maintenance bots#DOPBot Category:Commons bots with public source code Category:Commons user accounts with bot flag
Map view of geocoded Digital Orthophotos (DOP) of Landkreis Aichach-Friedberg, Germany

Signalment

Details and Limitations

Positional encoding in file names / Geocoding

DOPBot is currently able to detect positional encoding in the following formats:

DOPBot tries to extract those coordinate values from file names using regular expressions. Python's utm and pyproj libs allow to calculate the lat/lon values (WGS84) required for geocoding on Commons. Lat/lon values are saved as SDC.

Reverse geocoding

To avoid overcrowded top-level categories, DOPBot is able to add specific location-based categories. Thus, it's possible to organize the DOPs e.g. in district-level categories (for example: Orthophotos of Landkreis Forchheim). This required a "reverse geocoding" ability. The first approach consisted of queries against OpenStreetMap's Overpass API, trying to determine the administrative area (level 6) for each processed file. Since this turned out to be quite slow, DOPBot now uses GeoPandas and shapely with locally-available NUTS Geopackages (DE/EU) provided by the Federal Agency for Cartography and Geodesy and the European Comission (GISCO). This approach avoids remote API calls but requires some additional work since the NUTS area names don't fit the naming scheme for categories on Commons in every case.

For areas outside the European Union, DOPBot uses Geopackages built by extracting boundaries from Openstreetmap dumps (pbf files) using ogr2ogr.

Current limitations

  • DOPBot is currently restricted to square images (expanding for all rectangular media should be a low-hangig fruit)
  • It can't reliably detect the offset required to calculate the image's center, since DOPs are provided in different resolution. So, doing some test runs for new datasets is inevitable.
  • There's no single/unified datasource for world-wide admin boundaries yet.

ToDo

Features

  • Handle bawü filenames (additional whitespace) done;
  • detect district for bawü DOPs
  • add {{Map}} template for BoundingBox at correct position of FilePage - implemented but currently not active;
  • add administrative_layer(5) (Regierungsbezirke) in by and bawü? for BY, todo for other states.
  • Bounding Box??? Test file: DOP40 - Stadt Erlangen 32645 5496 (Bayerische Vermessungsverwaltung).tif as example using both {{Information}} and {{Map}} as file description, while using only the latitude / longitude and warp_status parameters of {{Map}}. At least, it seems to be possible to set the coords of the four corners exactly – but sadly, it seems that the Wikimap Warper can't handle TIFF files. Thus, setting warp_status to skip is required, and there's currently no way to show the DOP as map "layer", or to make use of the "bounding box" otherwise.
  • Reduce OSM queries? Save admin level for known UTM coordinates? DB? Use NUTS shapefiles with GeoPandas for spacial queries.
  • Check corners if center is outside current area?Use Geopandas nearest() query;
  • Category management is clumsy and requires adaption of bot code - needs improvement!
  • drop OSM API;
  • add local boundary data for other countries;
  • keepcountry / keepstate feature!

Content

Finished

Currently working on...

... Thüringen RGBI 2022

Next / later

Category:Commons Toolforge bots Category:Commons bots Category:Commons bots with public source code Category:Commons maintenance bots Category:Commons pywikibot bots Category:Commons user accounts with bot flag