- Not sure I see the point for having wikibase:lemma, since rdfs:label would do exactly the same thing.
- I would still make schema:inLanguage with language code, even if that is derived data and not primary. It may simplify querying a lot. And we already have config for this anyway. Of course if the language has no ISO code that one would be empty.
- dct:language has a range of dcterms:LinguisticSystem. We don't have this class on our language items, so it might be wrong.
- From the two above, I think we need to use schema:inLanguage and wikibase:language, unless we find a good language predicate with unresricted range.
- Maybe we change wikibase:grammaticalFeature to wikibase:partOfSpeech? To keep it similar to lexinfo.
- Not super-happy about having both ontolex:representation and rdfs:label but I see how it could be useful
- wikibase:grammaticalFeature sounds fine
- skos:definition seems to be closer to description than to label... But depends on usage.
Topic on Extension talk:WikibaseLexeme/RDF mapping
Thank you very much for your feedbacks. Some comments:
-
wikibase:lemma
has the advantage of being specific to lemmas and so allows to do queries like "get all lemmas with the label"foo"@en
without having to do a filter on entity types. I would keepwikibase:lemma
in the Query Service and filter outrdfs:label
. - Big +1 to it. I'm adding it to the document as derived data.
- I don't think it's a big problem. The triple
dct:language rdfs:range dcterms:LinguisticSystem
is meaning that the RDFS entailment on our data is deriving that all items used as a language for Lexemes are alsodcterms:LinguisticSystem
. It does not look wrong to me. We already have such behaviours with, e.g., the use ofcc:license
that have forrdfs:domain
cc:Work
. If we want to be safe and avoid to use this term we should probably also avoid to reuseontolex:
terms that come with a quite expressive OWL ontology: http://www.w3.org/ns/lemon/ontolex - See 3.
- If we use
wikibase:partOfSpeech
we move a bit out of the wording used by the abstract data model and the JSON specification. And it seems to me that "grammatical feature" is a bit broader than "part of speech" But I am not very familiar with computational linguistic so I may be wrong. - I believe we should just drop
rdfs:label
from the SPARQL endpoint. - Thanks.
- It is the property that is suggested by the ontolex: specification to encode glosses. It is presented in the SKOS spec primer as
.skos:definition
supplies a complete explanation of the intended meaning of a concept
+1 on 3. and 4. as described by Tpt.
If range is not an issue for dct:language
then maybe we should use it. Having extra prefix is not a huge deal - as soon as we add one, we can add more. I don't think it is an issue then.
Just a note, all the concerns I had were already expressed here by Smalyshev - I also prefer just using rdfs:label but from the discussion there seem to be other people who don't like it; not sure we can fully resolve it, but count me as another vote to at least keep rdfs:label as noted for the Lexemes and Forms (you could drop the Senses case though as it's not really a label). On the language labeling, Dublin Core has a very common problem of inconsistent usage but I suppose if Wikidata is using it consistently it's not a problem here. Count me as another vote for a wikibase:language predicate though if you're thinking about it. Otherwise, congratulations on this proposal, it seems pretty simple and well thought-out!