Commons talk:Structured data/Modeling/Works without Wikidata item
Trying to reach consensus about this data modeling proposal
Because of a recent discussion in the SDC Telegram group, I am taking the initiative to try to come to a consensus about this!
It would be absolutely great if we can agree on this, because this specific situation occurs so frequently. It would really help (batch) upload tools and would help many batch uploaders around the world.
Pinging a few people who will have opinions - @Multichill, @Jarekt, @Schlurcher who edited the page, and also @Seav who came up with a proposal in the Telegram group.
I have created a proposal in this file, where there is no Wikidata item yet: https://commons.wikimedia.org/w/index.php?title=File:Marlborough-duke-first.jpg&oldid=906350755
@Seav proposed a different approach, see for instance this file, and here's a WCQS query to see all examples of his proposed data modeling.
My own key recommendation would be to do the following: describe the properties of the artwork and the file together in the SDC and distinguish them with an agreed-upon uniform qualifier.
- Qualifier for statements that pertain to the file: applies to part (P518) Wikimedia Commons file (Q51954352)
- Qualifier for statements that pertain to the work or object: applies to part (P518) analog work (Q112134971)
My thoughts:
- It would be great if @Jarekt could update the
{{Artwork}}template to use SDC to support the data model we eventually agree upon. This template allows the creation of a Wikidata item, which is awesome, and I can imagine it would be great if it can already take some SDC to help create that Wikidata item. This is why I proposed this approach with qualifiers: the file on Commons is described with the same properties as it would have as an artwork on Wikidata, but with one single uniform qualifier to designate that that data belongs to the artwork, not the file. The template could then "grab" these and move these to Wikidata.- This means I (sympathetically) disagree with this edit by @Schlurcher. I argue to not use different properties on Wikimedia Commons between the work and the file, but to follow the Wikidata data modeling conventions for artworks. This is to make sure that artwork items on Wikidata created from Wikimedia Commons follow the same data modeling guidelines as on Wikidata, and do not do it differently.
- Which qualifier to use though? I have heard @Multichill say informally that he has doubts about applies to part (P518) and would prefer something like object of statement has role (P3831) (?). I have no very strong opinion on this, but I have observed that many people find the Subject/Object distinction very abstract and confusing (including myself!).
Thanks all! Spinster (talk) 07:55, 4 August 2024 (UTC)
- @Spinster, thanks for starting this topic. My main problem with your proposal of using standard distinguishers to separate statements about the file and about the object/artwork is that this breaks down if there are multiple depicted objects in the file. You no longer have any feasible way to distinguish which statement applies to which analog work (Q112134971). To give a concrete example, see the following file for which I have added a SDC statement for each of the three currently identified objects (a person, a plaque, and a statue) in the photo: File:Jose Rizal statue and historical marker (Campbelltown, NSW).jpg. As you can see I can add qualifiers to each depicts statement to describe each object separately without issue. (For now, ignore the fact that the statue is probably notable enough to have its own Wikidata item. Let's assume that we're talking about the situation where some depicted objects still don't have Wikidata yet but we want to describe them via SDC now.)
- Some other photos that I found where others have gone a similar route:
- File:Münster, LVM, Skulptur -Körper und Seele- -- 2016 -- 5817-23.jpg – This photo depicts 2 sculptures and a few buildings. The two sculptures already have Wikidata items so I think the qualifiers are redundant with the properties on the Wikidata item, but if we pretend that the Wikidata items don't exist yet, then we can have this same set up but have sculpture (Q860861) (or a similar class) as the two depicts (P180).
- File:Opus 29 Daedalus of Icarus van Gerhard Lentink (02).jpg – Here we have a standard photo where a piece of art is the main object and which doesn't yet have a Wikidata item (but is very likely notable enough to have one), but we can describe other stuff seen in the photo, like the building. We could argue that the building could be placed as the object for the location (P276) or location of creation (P1071) statement, but what if the photo shows another piece of art?
- —seav (talk) 09:41, 4 August 2024 (UTC)
- @Spinster: Yes, thank you for trying to nail down some standards. Years ago I created several subpages of Commons:Structured data/Modeling page with some standards I observed and which I was supporting in Module:Information and {{Information}}. Module:Artwork which supports {{Artwork}}, {{Book}} and {{Photograph}} currently relies much less on SDC, mostly due to inconsistent standards and whatever standard we come up with should make sense in context of all those infoboxes. To me the biggest issue with SDC is lack of standard for cases when a file is suppose to pull all the metadata from Wikidata. digital representation of (P6243) when it was proposed was supposed to cover 2D artworks, 3D artwork, movies, music, etc. however when it was approved the scope changed to 2D artwork only. I tried again to have a way to connect 3D artworks, movies, music etc. to Wikidata with d:Wikidata:Property proposal/Infobox based on, but it was rejected. User:Multichill changed Module:Artwork to connect some of non-2D files with wikidata using main subject (P921), and it works fine for most cases but it is a huge source of files connected to incorrect wikidata items as main subject (P921) often is used to indicate main subject of the artwork and not the photo. I like your "applies to part: analog work" to clarify some of this confusion.
- However, your question was about cases when there is no matching Wikidata item. In my opinion the best solution for files like File:Marlborough-duke-first.jpg is to create one and upload metadata there. I do not see a point placing metadata in SDC in case someone create Wikidata item so it can be moved to the right place. However, if for some reason it is easier to put it in SDC than I think "applies to part analog work" seems like a great approach to distinguish, for example inception (P571) date of the photo from the inception (P571) date of analog work (Q112134971). That seems quite good approach for photographs which are clearly associated with a single artwork. User:Seav's examples are all concentrating on SDC properties for images depicting multiple artworks and objects. Such images are often not well served by {{Artwork}} and should probably be better off using {{Information}}. The SDC metadata for those files seem perfectly reasonable.
- I am ok with modifying Module:Artwork to support the data model we agree on, but SDC's data model for artworks, books, etc. should be the same as Wikidata's as long as we specify that given property "applies to analog part". I am a little weary of complicating the code which is already at ~3000 lines of lua and divided among multiple modules. An easier solution could be to just write a bot to move some of those properties from SDC to Wikidata, although I still think that putting them on Wikidata in a first place is a better solution. SDC is lacking many tools I use a lot when editing: Duplicate Item tool (Q108311191), Duplicate References (Q97500668), moveClaim (Q110793966) and many others. Without those tools I find SDC very cumbersome to work with. --Jarekt (talk) 03:29, 5 August 2024 (UTC)
- Sitting att a pretty big round table at Wikimania discussing this now. This suggestions are:
- Statements applying to the Artwork (which does not yet have a Wikidata item) should be qualified applies to part (P518): analog work (Q112134971)
- For statements which apply to the digital file you can qualify them with applies to part (P518): Wikimedia Commons file (Q51954352) but the default assumption for any unqualified statements is that they apply to the Commons file.
- When statements get migrated to a Wikidata item, all the statements with the applies to part (P518): analog work (Q112134971) qualifier get removed from SDC.
- Left with all the statements with the qualifier applies to part (P518): Wikimedia Commons file (Q51954352), we could remove the qualifier but we aren’t sure about it yet.
- For copyright statements the PD-Art statement should be marked as preferred.
- applies to part (P518): analog work (Q112134971) can be used even in the edge case where that artwork may be digital born.
- Crops, collages etc. are out of scope for these discussions.
- / André Costa (WMSE) (talk) 09:43, 10 August 2024 (UTC)
- Sitting att a pretty big round table at Wikimania discussing this now. This suggestions are:
@Spinster, Seav, Jarekt, and Lokal Profil: picking up where we left. Currently emptying out Category:Taken on missing SDC inception by adding inception (P571). I'm applying the logic as discussed above as a pilot. The bot guesses that a file has multiple works because it contains {{Artwork}} or {{Art Photo}} or main subject (P921) or digital representation of (P6243) and the date is after 2000. Example edits: . I'm also adding the qualifier when the file is linked to Wikidata to stress that the inception (P571) is really just about the photo. What do you think? Multichill (talk) 13:29, 29 September 2025 (UTC)
- Multichill, those edits look good to me, although I think qualifies applies to part (P518) = photograph of the artwork (Q114187913) would be more clear. I would narrow down the selection logic to files which are linked to 2D or 3D artworks on Wikidata. we have a lot of files with {{Artwork}} or {{Art Photo}} and digital representation of (P6243) which is linked to some non-artwork items. I would separate them first and re evaluate the selection logic. --Jarekt (talk) 01:14, 30 September 2025 (UTC)
- @Multichill: Edits look god an in line with what we discussed back in the day.
- @Jarekt: photograph of the artwork (Q114187913) would limit this structure to only really be useful for PD-Art. With Wikimedia Commons file (Q51954352) the same structure can also be used for e.g. scanned books etc. /Lokal_Profil 21:15, 1 October 2025 (UTC)
- @Jarekt: For some understanding from our part (I'm at the Hackathon with @Spinster). For which situations would you need the selection logic to distinguish between artwork and non-artwork? For the metadata I would assume an image would be called with the {{Art Photo}} and for the license {{PD-Art}} would only be called in the determination method or standard (P459) faithful reproduction of two-dimensional public domain work of art (Q79719208) case? /Lokal_Profil 13:25, 13 March 2026 (UTC)
Use case(s) for not creating a Wikidata item (yet)
@Lokal Profil mentioned in his talk page that it would be good to explain the use case where/why one would upload files of artworks without a Wikidata item. I approach it very pragmatically: for many uploaders, creating Wikidata items is an extra complexity that may pose a too high barrier. Wikidata items can also be created afterwards. Spinster (talk) 09:05, 10 August 2024 (UTC)
Help?
Some kind of guidance would be nice. Any example at all? Jerimee (talk) 00:25, 6 February 2025 (UTC)
Additional feedback / alternatives / pros and cons
In the SDC Telegram group, we also had some discussion about this approach. @Nikki provided some additional arguments re: other approaches. Summarizing their input:
- Concerns that the qualifiers approach will make SDC much harder to use, because people would be forced to check for qualifiers to know how to interpret the values. It would mean you can no longer use wdt: in sparql queries, and it would make Lua modules even harder to write than they already are.
- Suggestion to instead use a preferred-rank statement (using the special somevalue/novalue values if necessary) that applies to the file, so that things like wdt:P31, wdt:P170 and wdt:P571 consistently return values that relate to the file (wdt: returns the best-ranked values with no access to the qualifiers, people who want to check the qualifiers will need to use the longer syntax with p: and ps: either way).
I think we have at least three possible approaches for the file vs work distinction, each with pros and cons. (I explicitly encourage everyone to edit the below section to list pros and cons, don't hesitate to edit my post.)
- Use qualifiers to distinguish especially the artwork-specific statements.
- Not use qualifiers, but use ranks instead.
- Not use qualifiers or ranks, but create and use a set of entirely new properties that are only to be used by creative works in SDC.
Cheers, Spinster (talk) 09:23, 14 March 2026 (UTC)
- I greatly prefer qualifiers, they are more visible and explicite, that said I don't think we need to use them in every single case but rather when it can not be drawn from the parent property or when it would not apply to both the file and work.
- However, I think the use of analog work (Q112134971) is and will be very confusing as our current use of it does not actually always mean an analog work but rather the original work, using it fore digital-born material and cases where the analog work no longer exist(but the digital copy is still part of a collection) makes it even more confusing. Maybe this should be solved by replacing it with a dedicated item.
- This being said I'm not at all opposed to the third option which I think would be the easiest for users to understand, my only worry is that it's a change from what we are currently doing. Abbe98 (talk) 10:35, 14 March 2026 (UTC)
- Today the proposal states that when applies to part (P518) is missing applies to part (P518) Wikimedia Commons file (Q51954352) should be assumed. I get that for tooling this means you need to check two separate patterns (the explicit Wikimedia Commons file (Q51954352) statement and the fallback. A semi-radical suggestion would be to use applies to part (P518) analog work (Q112134971) as proposed but to never use applies to part (P518) Wikimedia Commons file (Q51954352), and instead be very clear that this is what should always be assumed unless there is additional info. Lokal_Profil 12:11, 14 March 2026 (UTC)
- I'm not so much concerned for tooling and potential technical complexity, for me a much larger concern is making this understandable for users. New users or users who don't read documentation(no one) will never assume anything by default. Similarly my concern with applies to part (P518) analog work (Q112134971) isn't about the model itself but the usage of analog work (Q112134971) which generalizes badly, in testing it confuses every single user which has not been a part of these discussions.
- One step forward if there is consensus that applies to part (P518) Wikimedia Commons file (Q51954352) is the default could be that we ensure this is shown in the SDC UI as assumed. Abbe98 (talk) 19:02, 14 March 2026 (UTC)
- In my experience, the key underlying confusion comes from not understanding the underlying problem we want to address: that there is different data for the work, and different data for the file. That is the key thing I have spent most effort to explain in general. I think that, for instance, a new set of properties will not make this underlying issue clearer or more easy to understand, but I may be wrong?
- The current qualifier approach does IMO have the advantage of just further refining what people already naturally enter on Commons ("this thing was created by John Doe" / "this thing was created in 1875"), for which they use the generic properties.
- I think that, whatever solution we go for, UX of upload and editing tools, and the templates that display the data will be key in helping / nudging users to get to the actual understanding of that underlying issue. It is why I like the {{Art photo}} template so much - it displays both layers clearly - and I wouldn't mind it being used more widely.
- Does this need more user testing / A/B testing? I'm not adverse to that idea, but it'll take work that will probably be too much for a volunteer project. Spinster (talk) 10:44, 15 March 2026 (UTC)
- I think a problem for GLAM staff I have talked to is that they don't see Wikimedia Commons file (Q51954352) as the only instance of the digital representation. Wikimedia Commons file (Q51954352) is for example never considered a part of their collection but a digital born file or a file which depicts a destroyed object might be, analog work (Q112134971) similarly contributes to this.
- This said your note on how qualifiers just refining what people entering is very on point and well framed. If we make this a guiding principle, would that help us resolve how templates should work? Maybe for example templates should try to consider statements without qualifiers so that these are show to as many people as possible and refinement becomes more likely? Abbe98 (talk) 12:12, 15 March 2026 (UTC)
- @Abbe98 I'm not sure if I 100% understand what you are saying about showing the statements without qualifiers, but hopefully 90% - I also think it would be great if we would have something visible in file pages on Commons that allow people to see possible mismatches, and possibly even "move such statements around".
- With regards to your first argument (the GLAM staff) - again, not sure if I grasp this fully, but would this be resolved or at least addressed to some part by indeed not using the Wikimedia Commons file (Q51954352) qualifier at all + better naming of the analog work (Q112134971) one? Spinster (talk) 18:38, 25 March 2026 (UTC)
- Zooming back in to the "naming of the qualifier" - the analog work (Q112134971) item. If this is the main / largest hurdle, we can (and feel free to add arguments and thoughts):
- Relabel the existing item analog work (Q112134971) so that it simply does satisfy this use case and describe it correctly, and then just continue using that item. On Wikidata itself, it is still minimal in terms of number of statements, and it is almost unused. Furthermore, it contributes to a constraint violation for the digital representation of (P6243) property which is also linked to digital representation (Q42396623).
- Use digital representation (Q42396623) which as stated above is the counterpart of digital representation of (P6243) (but I think we have similar coverage, applicability and understandability issues there)
- Or indeed create a new dedicated item.
- What would be a label that is indeed as clear as possible? I think it will be challenging to find ones that fully cover what we want to do and are also understandable (hence also my remark above that UX of tools will help users understand), but here are some attempts. Taking into account that we will be dealing with a broad variety of creative works, but also probably e.g. specimens from natural history collections, stones, fossils... so not only works. Some will be in collections but some not, so I think we should avoid using "collection item", although that term will be fitting in many cases.
- work or object shown in the file
- work or object shown
- work or object depicted in the file
- Best, Spinster (talk) 11:06, 15 March 2026 (UTC)
- Actually if one embraces your point on how qualifiers are refinement, maybe this is less of an issue and something that could be resolved rather organically. That said I think the terms you list are all good improvements.
- I do think this also puts light on the need for and indicator in the UI that if there is no qualifier it should be assume to point on Wikimedia Commons file (Q51954352). Maybe a step here would be for me to prototype this as a userscript so that there could be some consensus building before someone(probably me) puts in the work upstream. Abbe98 (talk) 12:18, 15 March 2026 (UTC)
- Pinging @Pharos: to see if they have some thoughts on this topic, as they have been the person who originally created the analog work (Q112134971) item and probably also have a good understanding of what we try to achieve here. @Pharos - as you can see here, the analog work (Q112134971) Wikidata item you created has become a prominent part of unblocking (GLAM-related) SDC data modeling, templating, and tool building! I am curious if you have any strong objections against using the item for this purpose; to me it seems fitting, maybe with renaming or adding some aliases as I mentioned above. Thanks for taking a look :-) Spinster (talk) 18:24, 25 March 2026 (UTC)
- Yes, this all seems quite reasonable to me, and in keeping with original intentions. Of course, I didn't develop a full model or architecture at the time, so thank you for taking it much further. Quite glad if my work can help contribute to the understanding of 17th century paintings not having been "created in 2026"! Pharos (talk) 19:17, 5 April 2026 (UTC)
- Pinging @Pharos: to see if they have some thoughts on this topic, as they have been the person who originally created the analog work (Q112134971) item and probably also have a good understanding of what we try to achieve here. @Pharos - as you can see here, the analog work (Q112134971) Wikidata item you created has become a prominent part of unblocking (GLAM-related) SDC data modeling, templating, and tool building! I am curious if you have any strong objections against using the item for this purpose; to me it seems fitting, maybe with renaming or adding some aliases as I mentioned above. Thanks for taking a look :-) Spinster (talk) 18:24, 25 March 2026 (UTC)
- Today the proposal states that when applies to part (P518) is missing applies to part (P518) Wikimedia Commons file (Q51954352) should be assumed. I get that for tooling this means you need to check two separate patterns (the explicit Wikimedia Commons file (Q51954352) statement and the fallback. A semi-radical suggestion would be to use applies to part (P518) analog work (Q112134971) as proposed but to never use applies to part (P518) Wikimedia Commons file (Q51954352), and instead be very clear that this is what should always be assumed unless there is additional info. Lokal_Profil 12:11, 14 March 2026 (UTC)
Add a section about when to not distinguish between the analog work and the Commons File
We should add something to the recommendation to clarify that we are not expecting a privat individual who scanned their 1990s photographs to add metadata about the scan as the primary data and metadata about the photograph only behind a applies to part (P518) analog work (Q112134971) qualifier.
At the same time a photograph where the analog object is part of a museum collection should probably have the collection information behind a applies to part (P518) analog work (Q112134971) qualifier. / Lokal_Profil 12:37, 14 March 2026 (UTC)