User talk:Dominic
|
|
Finding DPLA erroneous files through duplicates
Hi!
On this link you can see the 5 files with the most duplicates on Commons. Three of them are groups of DPLA erroneous files, with respectively 317, 108 and 97 files.
Do you think you could make the bot reupload these files with correct contents?
Cheers, vip (talk) 02:56, 19 April 2026 (UTC)
- I have looked into this before but we think these are intentionally blank pages, nothing that a reupload would fix. They show up as duplicates because the provider is inserting the same image file by design, rather than re-scanning a blank page. If you look at the pages before and after these blanks, you will often be able to determine that they are legitimately where a blank page would be in the original books (the page numbers continue, and it might be the end of a chapter or something similar). Dominic (talk) 19:09, 19 April 2026 (UTC)
- Ah ok. Is it useful to have these blank pages on Commons as individual files? When browsing them we just see a blank page that feels like an error, and a very long list of duplicates. vip (talk) 19:24, 19 April 2026 (UTC)
- I hear you. It's less than ideal, but also not really harmful (in my opinion). It's not a choice I would have made; it was made by the institutions themselves to upload them that way. It maintains the proper page sequence and whatever informational value a blank page has as a faithful representation of the original work (in the context of all the other page scans). I wish there was a way to dismiss the concern from the duplicated files list, but it's not possible currently. The only other solution I can imagine is to edit each of the images to make them imperceptibly different (e.g., delete a pixel), but that's an awful lot of overhead. Dominic (talk) 23:52, 19 April 2026 (UTC)
- Ah ok. Is it useful to have these blank pages on Commons as individual files? When browsing them we just see a blank page that feels like an error, and a very long list of duplicates. vip (talk) 19:24, 19 April 2026 (UTC)