The more I'm reading and thinking about it, the more I'm inclined to consider that the model is trying to give a too rigid framework.
What we are interested to document in Wiktionaries, is chunks of discourses, and what is claimed about that chunks in such and such theories.
A lexeme is an abstract structure which is already far too committed into a closed theory of language, that is it doesn't provide space for presenting language analyzes which doesn't fit a lexemic structuring.
The mission of Wiktionaries is documenting all languages. Side note: this doesn't state written language, spoken language, or in fact even human languages, so depending on local consensus, you might see bee language documented.
What is aimed here, as far as I understand, is to propose a structured database model to support this aim.
So, the model must allow to document lexemes, sure. But that could be done as a lexemic relationship. For example cat and cats, Baum and Bäumen are two couples in lexemic relationships, that could be recorded as 4 distinct entities.
To really support goal of Wiktionary, the model must also allow to document w:lexical item, morphs, w:morphemes, etymoms and whatever discourse chunk a contributor might want to document and relate to other discourse chunks. A lexeme class can't do that, or you must come with such a distant definition of lexeme that it won't match any of the already too many existing one among linguistic literature.
I'm not aware of any consensual term for the "discourse chunk" in the sense I'm suggesting here (token doesn't fit either). So, in the rest of this message I'll use logomer (see wikt:en:logo- and wikt:en:-mer).
A discourse is any sign flow[note 1].
A glyph is any non-segmentable[note 2] sign that can be stored/recorded.
A logomer is a data structure which pertains to parts of a sequence of glyphes representing a discourse.
A logomer must have one or more representation.
A representation must have one or more form.
A single form must be elected as label.
A representation should indicate which representational systems it pertains to.[note 3]
A logomer must be related to one or more meaning.[note 4]
A logomer form must be extractable from a glyph sequence that represents a discourse.[note 5]
The extraction process of a logomer form must keep every unfiltered glyph.
The extraction process must not add any glyph.[note 6].
The extraction process must not alter any glyph.
A logomer form must include one or more glyph sequences (thereafter named "segment").
A segment must provide a glyph string.
A form including more than one segment must provide an ordinal for each segment.
A segment ordinal must indicate the relative position of a segment with respect to other segments of the form, relatively to the begin of discourses where it appears.
A segment might be void.
A void segment might serve as boundary marker, indicating possible positions for other segments which are not part of the current logomer.
All logomer forms of a single representation must be congruent under permutation.[note 7]
An indistinguishable logomer form might appear in multiple discourses.[note 8]
Distinct occurences of the same logomer forms with distinct meanings must induce distinct logomers.
Distinct meanings attributed to the same discourse parts should appears in a single logomer.
A logomer form might be taken as a discourse of its own.
- ↑ More criteria regarding meaning is purposefully set aside
- ↑ That is, in regard of the sign system used. For example a code point of a character encoding system could be segmented in several bits, but a bit is not a sign of the encoding system itself, even if a discourse using this system can make references to such a sign.
- ↑ For example, through statements. Accuracy of this information might be left to community. It could be things as vague as "casual oral retranscription" and "direct matching of written document", or more precise like "phonemic system of the International Phonetic Alphabet" and "official orthography in the Dutch spelling reform of 1996"
- ↑ Or definition, or whatever indication of its sense
- ↑ Discourses that can't be represented as a glyph sequence are not considered
- ↑ So boundaries markers as hyphen in morphs, like logo-, aren't part of a logomere
- ↑ That is, all forms have the exact same set of segments, only ordinal of this segments can change.
- ↑ But happaxes are logomer forms too, though