Jump to content

Topic on Extension talk:WikibaseLexeme/Data Model

LA2 (talkcontribs)

The current description seems to suggest that the forms of a word should be enumerated separately for each word. This is not how it has been done in Wiktionary until now. In Wiktionary, each word calls a template that creates the forms and many words use the same template. The German words Band and Land use the same template because their declension follow the same pattern (called a 'paradigm' among linguists), but Hand and Wand use another template since their declension follows another pattern. If all the forms should be enumerated for each word, it is likely that one form will be wrong by mistake. This risk is minimized if a limited set of standard patterns or paradigms (templates) are used as an intermediary.

Leszek Manicki (WMDE) (talkcontribs)

Data Model as proposed does not close the possibility of "generating" Forms based on the inflection/etc paradigm. Actually, this would be a way I'd expect them to being added in many cases.

If the concern here was that some particular form could be e.g. changed to be wrong (i.e. not match the form dictated by the paradigm), I strongly believe the Community and developers can come up with ways of ensuring such cases are identified and fixed (could be bot, gadget, probably multiple other ways I cannot think of just now). I don't think it is the concern of the data model, though.

LA2 (talkcontribs)

Saying that it is not a problem and that bots could detect and fix any inconsistencies is similar to saying that interwiki/interlanguage links between Wikipedia articles can be detected and fixed by bots. That is indeed how it used to work before Wikidata was created to fix the problem that the bots indeed did not succeed in fixing all inconsistencies. Wikidata exists because this is a problem with the data model.

Denny (talkcontribs)

It would be great to be able to represent and capture paradigms. But I think that this is a bit more complex and should be left for later. I indeed would think that there will be a future development stage, where a way to type a Lexeme with a certain paradigm will be possible, and then the system will execute some (Lua?) code and create the forms automatically.

Whereas I agree that this would be great to have from the beginning, I think it would make the initial system too complex to start with. Quite consciously, the first version of Wiktionary support is very dumb and simple, in order to figure out and fix the possible errors that happen at this stage already. Once this is settled, we will have reached a point where it makes sense to plan, design and implement paradigm support.

So, yes, I think you are right, that it is important and should be done as soon as possible, but I am afraid we are not smart enough to figure out how to do this right from the get-go, and that it has been traditionally a good practice for Wikimedia software projects to do thinks incrementally.

Note that in the Wiktionaries themselves, there is nothing to tell the Wiktionaries to stop using their existing solutions for paradigms. In fact, I do not expect those to become obsolete until Wikidata implements native support for paradigms.

I know that Daniel Kinzler has been thinking along these lines too for quite a while.

Nvitucci (talkcontribs)

I have been working for some time on the generation of verb forms for Lithuanian, which is a highly inflected language, and I am going to publish the data as LOD soon. You can find an example here (the RDF is not complete yet and some parts are still experimental). Is that what you are after when you talk about "generating from the paradigm"?

Duesentrieb (talkcontribs)

Even if we have automatic generation, we will need the ability to explicitly model forms, for odd cases, and for cases where we want to make statements about these forms.

I hope that "soon" after Lexemes go live, there will be a way to write Lua code that would simply take the entire Lexeme object as input, and generates Form objects as output, which are then shown on the Lexeme page. And when you edit such an "automatic" form, it becomes "real".

Psychoslave (talkcontribs)

What about having some item for each "paradigm", and a claim that lexem forms are related by this paradigm? Thus you can both have explicit forms produced by whatever mean, and if you want to (re)generate them using a paradigm on the lemma, you also have all the data required.

Now, surely what such a paradigm item should contains is an other point. Should it contains code implementations, for example. What about the nomenclature?

Denny (talkcontribs)

I like the idea of making an explicit connection between a word and its paradigm, independently of whether it is used to to create the Forms or not. Such explicit information will be useful for other reasons too. So, yes, fully agreed - in my opinion, there should be a property connecting Lexemes with Paradigm items.

Reply to "Paradigms"