User:TJones (WMF)/Notes/HebMorph Analyzer Analysis
May 2017 โ See TJones_(WMF)/Notes for other projects. See also T162741. For help with the technical jargon used in the Analysis Chain Analysis, check out the Language Analysis section of the Search Glossary.
HebMorph
[edit]HebMorph seems to be the only game in town when it comes to Hebrew analysis (see T162739), so it's the one I've been looking at.
HebMorph deals with the complexities of Hebrew (lack of vowels which leads to lots of ambiguity, occasional use of vowel diacritics for pronunciation which leads to lack of matching, lots of prefixes and suffixes which add to complexity and ambiguity, spelling inconsistencies, etc., etc.), but it also has some unexpected issues that are potential areas for concern or confusion.
To provide roughly comparable examples in English, consider bt, whichโfollowing simple rules of dropping [aeiou] (and y when it acts like a vowel)โcould possibly be any of bat, bet, bit, but, bot, boat, beat, bait, beet, boot, about, bite, abet, abut, obit, byte, bute, abate, or, my favorite, ubiety. (Now you know why Hebrew roots are mostly three consonants and not twoโso fewer words overlap!)
The situation with prefixes means that some prepositions, conjunctions, and determiners attach to the word that follows them, so andiron could be andiron or and iron. Similarly, conjunctions could be the rare surname Totherow, or to Therow (Therow is also a rare surname), or old boring to the row.
None of these examples are great because English doesn't really have this situation, but hopefully you get the gist.
Processing Speed
[edit]The first thing that I noticed is that HebMorph is kind of slow. I ran a 5K article sample with the default prod settings as a baseline, and it took 1m20s. With the default HebMorph config, 3m52sโso about 3x slower. This was on my laptop in vagrant, so the problem may not be relevant to production, where there is a lot more memory available and beefier processors.
It's also a somewhat unnatural task passing text to be analyzed from outside Elastic via the API using curl (the curl overhead used to dwarf all other timeโI recently got a general 30x speedup by minimizing curl calls and passing a lot more text per call). It's a potential concern to keep an eye on.
I ran some re-indexing tests on RelForge, which is certainly beefier than my laptop. Re-indexing ~205K articles with the production settings took 12m27s. Re-indexing with the default Hebrew analyzer took 23m22s. The unpacked analyzer (see below) took 21m17s.
So, roughly 2x to re-index on RelForge is significant, but Hebrew Wikipedia is only at ~205K articles. Again, something to keep an eye on, but not a disaster. Moore's law will probably keep us ahead of likely growth in Hebrew wikis.
"Previous Bugs"
[edit]The Analysis Config Builder code claims we didn't use HebMorph in the past because of bugs, but there are no specifics or pointers to more info, so I'm going to ignore that for now.
If anyone knows anything about previous bugs that might still be relevant, let me know!
Multiple Analyzers
[edit]The HebMorph plugin comes with several analyzers: hebrew, hebrew_query, hebrew_query_light, hebrew_exact, and an individual lemmatizers and a couple of char filters that allow you to unpack the analyzer and customize it. Oddly, there's no difference in output between hebrew and hebrew_query (on 5K articles), despite a minor config diff between them in the source code.
Multiple Analyzed Terms
[edit]An interesting feature of Hebrew is that without vowels, and with the various affixes, words without niqqud tend can be very ambiguous. Apparently there is also some spelling variation and/or common mistakes the plugin tries to account for. As a result, there are a lot of words that generate multiple analyzed tokensโmuch more so than other language analyzers.
For example, Chinese, English, French, Polish, Russian, and Swedish analyzers generated only one analyzed token per word out of 100K+ tokens. Ukrainian had up to four, but only for 13 out of almost 200K tokens, and 97.6% had only 1.
On a 5K sample of Hebrew with 200K+ tokens, words generated up to 14 tokens each! The mean was closer to 2 (1.38 to 2.26, depending on the analyzer, other than hebrew_exact, which always returns one).
$
[edit]An idiosyncratic feature of HebMorph is that exact forms are analayzed with a $ at the end. This applies to both Hebrew and non-Hebrew words. The hebrew_exact analyzer returns only this $-suffixed form.
The hebrew/hebrew_query analyzers return the $-suffixed form, and can give two terms that differ only by the $.
The hebrew_light analyzer drops the $ from "Hebrew words" (which includes actual Hebrew words, but also seems to include any token that starts with a Hebrew letter). However, hebrew_light doesn't drop the $ from non-Hebrew words.
Unpacking the analyzer gives us the option to drop the final $. For Hebrew words, the $-form is deduped, but for non-Hebrew words, two identical copies are returned. For words with appropriate multiple forms (Andrรฉ โ andre / andrรฉ), two of each are returned. When unpacking the Hebrew analyzer, I'm using the niqqud filter, hebrew_lemmatizer filter, and the icu_normalizer filter (which also lowercases Latin text).
Example Results by Analyzer
[edit]Below are some examples of different tokens (as defined by the Hebrew tokenizer) and their analyzed forms.
$ are supposed to be at the ends of words, but difficulties with your browser, difficulties with my editor, and the phase of the moon may affect the way it displays below.
There's a very interesting token, ื"Lonely, in the table. ื is a prefix meaning and, which can be attached to the beginning of nouns. In this case, it was attached to the Paul Anka song title, "Lonely Boy". Hebrew abbreviations use special punctuation marks that look sort of like single and double quotes, soโas with every other character that looks like a single or double quoteโpeople sometimes use single and double quote characters instead, which is why the tokenizer allowed the token with a double quote in it. The Hebrew analyzers all analyze it correctly, and the non-Hebrew analyzers re-tokenized it into two parts, splitting on the double quote.
term | prod (text) | prod (plain) | hebrew & hebrew_query | hebrew_query_light | hebrew_exact | unpacked + LC/ICU |
ืืืืื | ืืืืื | ืืืืื | ืืืืื$ / ืืื | ืืื | ืืืืื$ | ืืืืื / ืืื |
ืืืืืืืืฆืื | ืืืืืืืืฆืื | ืืืืืืืืฆืื | ืืืืืืืืฆืื$ / ืืืืืืฆืื | ืืืืืืฆืื | ืืืืืืืืฆืื$ | ืืืืืืืืฆืื / ืืืืืืฆืื |
ืืืืจืืื | ืืืืจืืื | ืืืืจืืื | ืืืืจืืื$ / ืืืืจ / ืืืจืื / ืืืจ / ืืืืจ | ืืืืจ / ืืืจืื / ืืืจ / ืืืืจ | ืืืืจืืื | ืืืืจืืื / ืืืืจ / ืืืจืื / ืืืจ / ืืืืจ |
ืืืชืฉืืฉืืช | ืืืชืฉืืฉืืช | ืืืชืฉืืฉืืช | ืืืชืฉืืฉืืช$ / ืชืฉืืฉืืช | ืชืฉืืฉืืช | ืืืชืฉืืฉืืช$ | ืืืชืฉืืฉืืช / ืชืฉืืฉืืช |
ืืชืจืขืืืช | ืืชืจืขืืืช | ืืชืจืขืืืช | ืืชืจืขืืืช$ / ืชืจืขืืืช | ืชืจืขืืืช | ืืชืจืขืืืช$ | ืืชืจืขืืืช / ืชืจืขืืืช |
ืฉืืืคืืื ื | ืฉืืืคืืื ื | ืฉืืืคืืื ื | ืืืคืืื / ืืืคื / ืืคืื / ืืคืื ื / ืืคืืื / ืืคื ื / ืืืคื ื / ืืคื ื / ื ืคื ื / ืคืื ื / ืคืื ื / ืคื ื / ืฉืืืคืืื ื$ / ืฉืืฃ | ืฉืืฃ / ืืืคืืื / ืืืคื / ืืืคื ื / ืืคืื ื / ืืคืื / ืคืื ื / ืืคืืื / ืืคื ื / ืืคื ื / ื ืคื ื / ืคืื ื / ืคื ื | ืฉืืืคืืื ื$ | ืฉืืืคืืื ื / ืฉืืฃ / ืืืคืืื / ืืืคื / ืืืคื ื / ืืคืื ื / ืืคืื / ืคืื ื / ืืคืืื / ืืคื ื / ืืคื ื / ื ืคื ื / ืคืื ื / ืคื ื |
ืืืืื | ืืืืื | ืืืืื | ืืืืื$ / ืืืืื | ืืืืื | ืืืืื$ | ืืืืื / ืืืืื |
ืึธืึฐืึดืึทื | ืึธืึฐืึดืึทื | ืืืืื / ืึธืึฐืึดืึทื | ืืืืื$ / ืืืืื | ืืืืื | ืืืืื$ | ืืืืื / ืืืืื |
ื"Lonely | ื + Lonely | ื + Lonely | lonely$ / lonely | lonely | lonely$ | lonely / lonely |
lonely | lonely | lonely | lonely$ / lonely | lonely$ / lonely | lonely$ | lonely / lonely |
Andrรฉ | andrรฉ | andre / andrรฉ | andre$ / andre | andre$ / andre | andre$ | andre / andrรฉ / andre / andrรฉ |
andre | andre | andre | andre$ / andre | andre$ / andre | andre$ | andre / andre |
straรe | strasse | strasse | strasse$ / strasse | strasse$ / strasse | strasse$ | strasse / strasse |
แแแฆแแแ | แแแฆแแแ | แแแฆแแแ | แแแฆแแแ$ / แแแฆแแแ | แแแฆแแแ$ / แแแฆแแแ | แแแฆแแแ$ | แแแฆแแแ / แแแฆแแแ |
ูุง | ูุง | ูุง | ูุง$ / ูุง | ูุง$ / ูุง | ูุง$ | ูุง / ูุง |
โตโตโต | โตโตโต | โตโตโต | โตโตโต$ / โตโตโต | โตโตโต$ / โตโตโต | โตโตโต$ | โตโตโต / โตโตโต |
Comparing Analyzers
[edit]Production vs HebMorph hebrew
[edit]- production tokens: 2,514,279
- HebMorph tokens: 6,121,267
That's a lot more tokensโbut it seems unavoidable with the ambiguity of Hebrew, plus all those $-final tokens.
- 1.2% of input tokens have 1 analyzed token
- 78.8% of input tokens have 2 analyzed tokens
- 14.6% of input tokens have 3 analyzed tokens
- 3.8% of input tokens have 4 analyzed tokens
- 1.1% of input tokens have 5 analyzed tokens
New Collision Stats
- types: 193714 (86.803%) [post-analysis types]
- tokens: 2355995 (93.705%)
So the vast majority of tokens, being Hebrew words that tend to be ambiguous, are indexed with some other word now.
A small number of splits occurred as well. All are due to bi-directional Unicode characters. The production config leaves the bidi characters as part of the token, but removes them for the analyzed form. HebMorph just ignores them from the beginning.
Tokenization is different, with words split on periods and commas (including numbers), underscores, colons, and other Unicode characters, particularly combining characters like stress marks in Cyrillic, IPA diacritics, Devanagari and Thai combining characters. There's also some Unicode normalization (e.g. ษพ โ r)
- 0.01M โ 0 / 01M
- 1.6ยบC โ 1 / 6ยบc
- 1.9891x10 โ 1 / 9891x10
- foo_bar โ foo / bar
- foo:bar โ foo / bar
- foo.bar โ foo / bar
- 1,200 โ 1 / 200
- Nยทs โ N / s
- ะัะธะณะพฬัะธะน โ ะณัะธะณะพ / ัะธะน
- เคเคฎเคพเคจ โ เคเคฎ / เคจ
- ื.ื โ ื / ื
- moฬหษพแบฝnษ โ mo / rena
- หtอกสipriหan โ t / สipriหan
- เนเธเธฒเธฐเธเนเธฒเธ โ เนเธเธฒเธฐเธ / เธฒเธ
In general, these are uncommon, not Hebrew, and should still work reasonably well with the plain field.
Of course, the Hebrew tokenizer includes a lot of tokens with an additional final $, as discussed above.
HebMorph is also smart about Hebrew affixes on non-Hebrew words, as discussed above.
I'm leaving the bulk of the changesโthe actual Hebrew analysisโfor later, after we compare the different Hebrew options.
HebMorph hebrew vs HebMorph hebrew_query_light
[edit]- hebrew tokens: 6,121,267
- hebrew_query_light tokens: 3,794,234
That's a lot fewer tokens, without most of the $-final tokens gone (though they are still there on Latin-character words).
- 69.8% of input tokens have 1 analyzed token
- 24.8% of input tokens have 2 analyzed tokens
- 3.8% of input tokens have 3 analyzed tokens
- 1.1% of input tokens have 4 analyzed tokens
- 0.3% of input tokens have 5 analyzed tokens
Since I don't think I agree with the rationale behind the $-final terms (making exact matches possibleโa task for which we have the plain field), this is great, since the number of tokens has gone down by 38.0%.
HebMorph hebrew_query_light vs Unpacked HebMorph w/ lowercase
[edit]I was hoping to get rid of all the $-final tokens, so I unpacked the analyzer, and skipped the filter that adds the $-final form. I still used the niqqud and hebrew_lemmatizer filters, and the lowercase/icu_normalizer filter. (The lowercase filter is replaced with the icu_normalizer filter if it's available. It does a bit more that's generally useful. Without it, Foo and foo would index separately, which seems silly.)
- hebrew_query_light tokens: 3,794,234
- unpacked/lowercase tokens: 5,063,634
Hmm. That's a lot more tokens, which I did not expect.
There was no different pre-analysis tokens, so the tokenization seems to be the same.
Analyzed token difference include:
- $-final tokens are gone.
- Greek final ฯ is converted to ฯ, as it darn well should be!
- Latin accented characters are preserved
- A few Unicode characters are converted to plain characters (dสฒ โ dj)
- Some IPA is preserved
- Raised o (ยบ) is used as a degree sign, and gets converted to o, so 45ยบ โ 45o.
But the bulk of the differences (129,723 types, 1,250,642 tokens) are Hebrew tokens. These seem to be the exact tokens for Hebrew words, without the final $ added.
There were very few new collisions or splits, mostly splits related to the lack of ascii folding, which is built into HebMorph.
Unpacked HebMorph: lowercase vs lowercase/folding
[edit]Both have the same number of tokens (5,063,634) and all changes are the expected collisions from ICU folding, with no effect on Hebrew words.
I think folding foreign languages is usually good, so we should keep the folding enabled.
Unpacked HebMorph: lowercase/folding vs lowercase/folding/preserve
[edit]Another option is to enable ICU folding, while indexing both the original form of the word and the folded form. In this case it doesn't have much impact on Hebrew, because HebMorph has already done the Hebrew-related folding, but it effects some Latin, Greek, Cyrillic, and CJK characters.
- Lowercase/folding tokens: 5,063,634
- Lowercase/folding/preserve tokens: 5,065,585
There were no new collisions or splits, just additional tokens indexed.
As expected, a few tokens popped back up, mostly with accented characters, a few with variants that are important in the source, but less so to a non-speaker, and some phonetic characters.
The plain field, which usually does exact matching, has ICU folding with "preserve original" enabled for Hebrew, which is helpful because it removes niqqud (the vowel diacritics). I don't think we need to preserve the originals for non-Hebrew words in the text field.
What to do?
[edit]After talking with Matanya and Stas about exact matching and stemming and with David about the final $ and search internals, I don't think we need it, and David's convinced me to worry about it messing up Did You Mean suggestions and regular expression matching.
I think it's best to go with the unpacked version, since it gives us the most flexibility, and avoids the final-$. I think we should enable folding, too, but that we don't need to preserve originals in the text field.
One odd side-effect of using HebMorph is that searching Hebrew words with a final $ will prevent stemming in the analyzer. There doesn't seem to be anyway to turn it off. I don't think it comes up much, and we shouldn't tout it as a feature, since it would only be for Hebrew, and it could go away in the future. So, it's an acceptable quirk of the analyzer.
Of course, the next step is to test all this with native speakers looking at HebMorph output and using it on real data in labs.
Hebmorph Output Examples
[edit]Below are some examples of output from HebMorph, for native speaker review. These don't have to all be perfect, but they shouldn't all be horrible, either.
Groupings show words that HebMorph has indexed to a common stem. In English, this would be group and groups being indexed together, so that searching for one will find all the others.
Analyzed Terms show the (often multiple) terms that are indexed for a given search term. These are generally the possible root forms of a term. In English, this would be does being analyzed as both a form of do and a form of doe (does is the plural of doe, which is a deer, a female deer).
Random Groupings
[edit]Here are 50 randomly selected groupings. The analyzed term they share is bolded. Each term is shown with its frequency, so "[1 foo][2 bar]" means that foo occurred once in the sample, and bar occurred twice. The relative frequency is important, since lower-frequency errors matter less.
- ืคืจืืฆืืื: [2 ืคืจืืฆืืืืืช][1 ืคืจืืฆืืืืื]
- ืืืจืชื: [1 ืึทืึฐึผืจึตืชึดื][10 ืืืจืชื][3 ืืืจืชืืืช][1 ืืืจืชืืื][1 ืืืจืชืืช][1 ืฉืืืจืชืื]
- ืฉืขื: [1 ืืฉืขื][1 ืืฉืขืื][1 ืืฉืขืื][1 ืืฉืขืืืื][11 ืืฉืขื][12 ืืฉืขืื][3 ืืฉืขืื][1 ืืฉืขืืื][1 ืืฉืขืืื][35 ืืฉืขื][2 ืืฉืขืื][2 ืืฉืขืื][1 ืฉึถืืขึธืืึผ][237 ืฉืขื][51 ืฉืขืื][39 ืฉืขืื][10 ืฉืขืื][41 ืฉืขืืื][30 ืฉืขืืืื][12 ืฉืขืืืื][72 ืฉืขืืื][1 ืฉืขืืื][4 ืฉืขืืื ื]
- ืงืื ื: [2 ืืืงื ื][1 ืืงืื ืื][1 ืืงื ื][2 ืืงื ื][2 ืืงื ื][1 ืืงื ืืืช][2 ืืงื ืืช][1 ืงึดื ึตึผื][3 ืงืื ื][2 ืงืื ืื][2 ืงื ื][10 ืงื ืื][13 ืงื ืื][1 ืงื ืื ื][1 ืฉืืงื ื]
- ืื'ื ืื: [6 ืื'ื ืื][3 ืื'ื ืืช][1 ืืื'ื ืื][5 ืืื'ื ืื]
- ืื ืืจืืงืก: [1 ืึถื ึถืึฐืจืืงึฐืก][1 ืื ืืจืืงืก]
- ืืื ืืคืื: [1 ืืืื ืืคืื][2 ืืืื ืืคืืืื][5 ืืืื ืืคืื][3 ืืืื ืืคืืืื][1 ืืืืื ืืคืืืื][2 ืืืืื ืืคืืืื][1 ืืืื ืืคืืืื][1 ืืืื ืืคืื][6 ืืื ืืคืื][6 ืืื ืืคืืืื]
- ืืืคืืฃ: [1 ืืืืคืืฃ][1 ืืืืคืืคื][1 ืืืืืคืืฃ]
- ืงืืกืืื: [1 ืงืึนืกึฐืืึนื][40 ืงืืกืืื]
- ืฉืืคืื: [2 ืืฉืืคืื][1 ืฉืืคืืื]
- ืืกืืื: [14 ืืืกืืื][1 ืืืกืืืืื][1 ืืืกืืืืช][323 ืืกืืื][142 ืืกืืืืืช][288 ืืกืืืืื][256 ืืกืืืืช][1 ืืกืืืืืื]
- ืืืื: [5 ืืืืื][37 ืืืื][3 ืืืืื][1 ืืืืืช][2 ืืืืืื][1 ืืืืื][1 ืืืืืื][40 ืืืืื]
- ืืืืืช: [1 ืืืืืืช][4 ืืืืืืช][1 ืืืืืืืช][1 ืืืืืืช][3 ืืืืืช]
- ืื ืฆืื: [4 ืืื ืฆืื][3 ืื ืฆืื][3 ืืื ืฆืื][1 ืืื ืฆืืช]
- ื ืืืื: [6 ืื ืืืืืช][1 ืืื ืืืืืช][1 ืื ืืืืืช][1 ืื ืืืืืช][1 ื ืืืื][9 ื ืืืืืช]
- ืืืืจืืื ืืช: [2 ืืืืืจืืื ืืืช][2 ืืืืืจืืื ืืช][15 ืืืืจืืื ืืช][8 ืืืืืจืืื ืืช]
- ืืชืจืื: [8 ืืชืจืื][1 ืืชืจืืื][2 ืืชืจืืืื][1 ืืืชืจืื][1 ืืืชืจืื]
- ืชืืงื: [3 ืืืชืืงื][1 ืืืชืืงื ืืช][4 ืืืชืืงื ืช][1 ืืฉืชืืงื ื][1 ืืชืืงื][1 ืืชืืงื][1 ืืชืืงื ื][5 ืืชืืงื][1 ืืชืืงื ืืช][12 ืืชืืงื ืช][2 ืฉืืชืืงื ื][4 ืฉืชืืงื][14 ืชืืงื][6 ืชืืงื ื][7 ืชืืงื ื][1 ืชืชืืงื]
- ืืฉืืืข: [1 ืืฉืืืข][1 ืืฉืืืขื][167 ืืฉืืืขื][16 ืืฉืืข][4 ืืฉืืขืช][1 ืืฉืืืขืื][2 ืืืฉืืืขื][1 ืืืฉืืขืช][1 ืืืฉืืืข][2 ืืืฉืืืข][1 ืืืืฉืืืข][4 ืืฉืืืข][1 ืืฉืืืขื][1 ืืฉืืืขื][1 ืืฉืืืขืช]
- ืืฉืจืืื: [618 ืืฉืจืืื][1 ื"ืืฉืจืืื][1 ื"ืืฉืจืืื]
- ืืฉืื: [2 ืืืฉืื][1 ืืืฉืืืื][1 ืึทืึนึผืฉึฐืืึดืื][62 ืืืฉืื][1 ืืืืฉืืื][1 ืืืืฉืืืื][1 ืืืืฉืื][1 ืืืฉืื][3 ืืืฉืืื][1 ืืืฉืืื][2 ืึธืฉืืึผื][1 ืึนืฉึฐืืึดืื][41 ืืฉืื][1 ืืฉืืื][9 ืืฉืืื][2 ืืฉืืืื][1 ืืืฉืื][1 ืืืฉืื][1 ืืืฉืืื][1 ืืืืฉืืืื][1 ืืืฉืืื][1 ืฉืืืฉืื]
- ืืืืื: [2 ืืืืืื ื][4 ืืืืืื ืื][1 ืืืืืื ืืช][2 ืืืืื ื][2 ืืืืื ืืืช][2 ืืืืื ืื][1 ืืืืื ืืช][1 ืืืืืื ืื]
- ื ืขืื: [5 ืื ืขืื][1 ืื ืขืืื][3 ืืื ืขืื][3 ืื ืขืื][1 ืึฐื ึทืขึฒืึถื][2 ืืื ืขืื][4 ืื ืขืื][1 ืื ืขืืื][4 ืื ืขืื][10 ื ืขืื][4 ื ืขืืืช][23 ื ืขืื][5 ื ืขืืื][1 ื ืขืืืช][1 ืฉื ืขืื]
- ืืืจืื ื: [53 ืืืจื][2 ืืืจืืืช][5 ืืืจืืื][10 ืืืจืื][22 ืืืจืืช][213 ืืืจืื ื][1 ืืจืื ื][1 ืืืืจืื][5 ืืืืจืืช][44 ืืืืจืื ื][4 ืืืจืื][1 ืืืจืืช][3 ืืืจืช][46 ืืืืจื][14 ืืืืจืืืช][4 ืืืืจืืื][11 ืืืืจืื][26 ืืืืจืืช][10 ืืืืจืื][26 ืืืืืจื][1 ืืืืืจืื][2 ืืืืืจืืช][1 ืืืฉืืืจืื][6 ืืืืจื][1 ืืืืจืื][1 ืืืืจืืช][23 ืืืืจืื ื][4 ืืืืืจืื ื][1 ืืืืืจื][2 ืืืฉืืืจืื][1 ืืืืจื][1 ืืืืจืื][12 ืืืืจืื ื][1 ืืืจืืช][5 ืืืืจื][1 ืืืืจืื][6 ืืืืจืื ื][8 ืืฉืืืจืื][1 ืฉืืืจืื ื][9 ืฉืืืืจืื ื][1 ืฉืืืืจื]
- ืืกืืจ: [1 ืืืกืืจืื][71 ืืกืืจ][30 ืืกืืจื][5 ืืกืืจืืช][1 ืืกืืจืื][32 ืืกืืจืื][13 ืืืกืืจ][8 ืืืกืืจื][10 ืืืกืืจืืช][12 ืืืกืืจืื][10 ืืืกืืจ][1 ืืืกืืจื][1 ืืืกืืจืื][1 ืืืืกืืจ][1 ืืฉืืกืืจ][1 ืืืกืืจืืช][28 ืืืกืืจ][1 ืืืกืืจืื][16 ืฉืืกืืจ][3 ืฉืืกืืจืืช][3 ืฉืืกืืจืื]
- ืฆืืคื: [2 ืืฆืืคื][30 ืฆืืคื]
- ืงืฉืืจื: [8 ืืงืฉืืจืช][5 ืืงืฉืืจื][1 ืืงืฉืืจื][1 ืืงืฉืืจืืช][6 ืืงืฉืืจืช][4 ืืงืฉืืจืช][6 ืงืฉืืจื][3 ืงืฉืืจืืช][7 ืงืฉืืจืช][1 ืงืฉืืจืชื]
- ืชืงื ืื: [6 ืืชืงื ืื][3 ืืชืงื ืื ืื][6 ืืชืงื ืื][4 ืืชืงื ืื ืื][1 ืืืชืงื ืื][5 ืืชืงื ืื][2 ืืชืงื ืื ืื][2 ืืืชืงื ืื][1 ืืืชืงื ืื ืื][19 ืชืงื ืื]
- ืขืืจ: [2 ื"ืขืืจ][1 ืึฐึผืขึดืืจ][1 ืึฐึผืขึธืจึตืืึถื][1339 ืืขืืจ][15 ืืขืืจื][28 ืืขืืจื][2 ืืขืืจื][2 ืืขืืจื ื][27 ืืขืจื][1 ืืขืจืื][1 ืืขืจืืืื][104 ืืขืจืื][3 ืืขืจืืื][1 ืืขืจื ืื][2 ืึธืขึดืืจ][16 ืืืขืืจื][2 ืืืขืืจื][5 ืืืขืจืื][12 ืืืฉืขืจืื][2614 ืืขืืจ][5 ืืขืืจื][3 ืืขืืจื][1 ืืขืืจืื][1 ืืขืจืื][48 ืืขืจืื][2 ืืขืจืืืื][201 ืืขืจืื][1 ื"ืขืจื][1 ื"ืฉืขืจื][1 ืึฐืขึธืจึตืืึถื][9 ืืืขืืจ][1 ืืืขืจื][8 ืืืขืจืื][58 ืืืขืืจ][2 ืืืขืจืื][7 ืืืขืจืื][1 ืืืขืืจ][2 ืืืขืืจ][1 ืืืขืืจืื][2 ืืืขืจืื][1 ืืืืขืืจ][1 ืืืขืืจ][1 ืืืขืจืื][1 ืืืขืจืื][2 ืืืฉืขืจืื][12 ืืขืืจ][3 ืืขืจื][12 ืืขืจืื][1 ืืฉืขืจื][1 ืืฉืขืจืืื][1 ืืฉืขืจืืื][3 ืืฉืขืจืื][3 ื"ืขืืจ][1 ื"ืฉืขืืจ][40 ืืขืืจ][2 ืืขืจื][2 ืืฉืืขืืจ][3 ื"ืขืืจ][424 ืืขืืจ][3 ืืขืืจื][7 ืืขืจื][36 ืืขืจืื][2 ื"ืขืืจ][1 ื"ืขืจื][1 ืืืขืืจ][1 ืืืขืืจื][1 ืืืขืจืื][118 ืืืขืืจ][15 ืืืขืจืื][33 ืืขืืจ][1 ืืขืืจื][12 ืืขืจื][1 ืืขืจืื][14 ืืขืจืื][5 ืืขืจืื][4 ืืฉืขืจื][1 ืืฉืขืจืื][1 ืืฉืขืจืื][24 ืืฉืขืจืื][2 ืขึดืืจ][1 ืขึดืืจึถืึธ][1 ืขึธืจึดืื][2 ืขึธืจึตื][1 ืขึธืจึธืื][431 ืขืืจ][2 ืขืืจื][14 ืขืืจื][1 ืขืืจื][3 ืขืืจื][1 ืขืืจื][2 ืขืืจื ื][93 ืขืจื][1 ืขืจืื][2 ืขืจืืื][1 ืขืจืื][2 ืขืจืื][2 ืขืจืืืื][131 ืขืจืื][2 ืขืจืืื][1 ืฉ"ืฉืขืจื][45 ืฉืืขืืจ][1 ืฉืืขืืจื][2 ืฉืืขืจื][2 ืฉืืขืจืื][29 ืฉืืขืืจ][4 ืฉืืขืจืื][1 ืฉืืฉืขืจืื][2 ืฉืขืืจ][2 ืฉืขืืจื][8 ืฉืขืืจื][133 ืฉืขืจื][5 ืฉืขืจืื][1 ืฉืขืจืืื][16 ืฉืขืจืื][7 ืฉืขืจืืื][422 ืฉืขืจืื]
- ืืจืื: [1 ืืืืจืืืื][1 ืืืจืืื][1 ืืืจืืืื][4 ืืืืจืืืื][11 ืืืจืื][3 ืืืจืืื][8 ืืืจืืื][1 ืืืจืืืื][1 ืืืืจืื][4 ืืืืจืื][2 ืืืืจืืื][1 ืืืจืืื][3 ืืืืจืื][4 ืืจืื][3 ืืจืืื][1 ืืจืืืื][2 ืืจืืืื][5 ืืืืจืื][3 ืืืืจืืืื][1 ืืืืจืืื][3 ืืืืจืื][10 ืืืจืื][4 ืืืจืืื][4 ืืืจืืืื][2 ืฉืืืจืื][1 ืฉืืืจืื][1 ืฉืืืจืืืื]
- ืืืกืงืืค: [6 ืืืืกืงืืค][6 ืืืืกืงืืค][1 ืืืืกืงืืคืื][1 ืืืืืกืงืืค][1 ืืืืืกืงืืคืื][1 ืืืืกืงืืค][17 ืืืกืงืืค][4 ืืืกืงืืคื][6 ืืืกืงืืคืื][1 ืืืืกืงืืค][1 ืืืืกืงืืคืื]
- ืฉืืื: [2 ืฉืืื][1 ืฉืืืึน]
- ืชืืืื: [151 ืืชืืืื][7 ืืชืืืืืช][2 ืืชืืืืืชืืื][1 ืืชืืืืืืช][4 ืืชืืืืช][1 ืืชืืืืชื][1 ืืชืืืืชื][28 ืืชืืืื][16 ืืชืืืืืช][10 ืืืชืืืื][1 ืืืชืืืืืช][3 ืืืชืืืื][3 ืืืชืืืืืช][1 ืืืชืืืื][2 ืืืชืืืื][7 ืืชืืืื][5 ืืชืืืืืช][1 ืืชืืืืืชืืื][1 ืืชืืืืืชืื][2 ืืชืืืืช][1 ืืชืืืืชื][38 ืืชืืืื][1 ืืชืืืืืช][2 ืืชืืืืช][9 ืืชืืืื][11 ืืชืืืืืช][1 ืืชืืืืืชืื][5 ืืชืืืืช][1 ืืชืืืืชื][1 ืืชืืืืชื][5 ืืชืืืื][1 ืืชืืืืืช][2 ืืชืืืืช][2 ืืชืืืืชื][2 ืฉืืชืืืื][1 ืฉืืชืืืื][1 ืฉืชืืืื][1 ืฉืชืืืืช][67 ืชืืืื][45 ืชืืืืืช][5 ืชืืืืืชืื][33 ืชืืืืช][3 ืชืืืืชื][7 ืชืืืืชื][2 ืชืืืืชื][1 ืชืืืืชื]
- ืงืืฉืื: [1 ืืงืืฉืื][6 ืงืืฉื][1 ืงืืฉืื][1 ืงืืฉืืช]
- ื\"ื: [1 ืื"ื][2 ื"ื]
- ืฉืืืื: [3 ืืฉืืืืืื][1 ืืฉืืืื][1 ืืฉืืืื][7 ืฉืืืื][7 ืฉืืืืืื][1 ืฉืฉืืืืื]
- ืงืคืฅ: [2 ืืงืืคืฅ][2 ืืงืืคืฆืื][9 ืืงืืคืฆืช][4 ืืงืคืฆื][1 ืืงืคืฆืช][1 ืืืงืคืืฅ][2 ืืงืืคืฅ][1 ืืงืืคืฆืื][1 ืืงืคืฆื][3 ืืงืคืืฅ][11 ืืงืคืืฅ][7 ืงืืคืฅ][5 ืงืืคืฆืื][1 ืงืืคืฆืช][12 ืงืคืฅ][2 ืงืคืฆื][1 ืงืคืฆื][1 ืงืคืฆื][1 ืฉืงืืคืฆืื][2 ืฉืงืคืฅ][1 ืฉืงืคืฆื][1 ืชืงืคืืฅ][1 ืชืงืคืฆื]
- ืืขืืก: [1 ืืืขืืก][1 ืืขืืกืช]
- ืืืื: [1 ืืืืื][4 ืืืื][1 ืืืืื][6 ืืืื][3 ืืืืื][1 ืืืืืืืช][1 ืืืืืืื][3 ืืืืื][1 ืืืืืื][1 ืืืืืืืช][1 ืืืืื][1 ืืืืืื][1 ืืืืืืช][1 ืื ืืื][29 ืืืื][1 ืืืืื][1 ืืืืื][2 ืืืื][1 ืืืืื][4 ืืืืืืช][1 ืืืืืื][1 ืืืืืช][3 ืฉืืืืื][2 ืฉืืืืื][1 ืฉืืื]
- ืืกืคืจ: [22 ืืกืคืจ][1 ืืกืคืจื][2 ืืกืคืจื][1 ืืกืคืจืื][2 ืืกืคืจืื][2 ืืกืคืจื]
- ืืืืื: [1 ืืืืื][2 ืืืืืืช]
- ืจืคืืจืืช: [9 ืจืคืืจืืช][1 ืฉ"ืจืคืืจืืช]
- ืืคืืืช: [1 ืืืคืืืช][1 ืืืคืืืืช][1 ืืืคืืืช][1 ืืืคืืืืช][1 ืืืคืืืช][1 ืืคืืืืช][2 ืืคืืืช]
- ืคืจืืืืืจื: [2 ื"ืคืจืืืืืจื][1 ืคืจืืืืืจื]
- ืคืกืื: [1 ื"ืคืกืืช][1 ืืคืืกืืช][2 ืืคืกืื][2 ืืคืกืืืช][7 ืืคืกืืช][2 ืืคืกืืชื][9 ืืคืกืื][5 ืืคืกืืืช][2 ืืืคืกืื][1 ืืคืกืืืช][1 ืืคืกืืช][2 ืืคืกืืช][6 ืืคืกืื][9 ืืคืกืืช][1 ืืคืกืืชื][1 ืืคืืกืืชื][3 ืืคืกืืืช][5 ืืคืกืืช][6 ืคืกืื][8 ืคืกืืืช][2 ืคืกืืืชืื][32 ืคืกืืช][1 ืคืกืืชื][1 ืฉืืคืกืืชื][1 ืฉืคืกืืชื]
- ืืืกื: [2 ืืกื][1 ืืกืืืช][1 ืืกืื][1 ืืกืื ื]
- ืืืืืจ: [1 ืืืืืืจ][2 ืืืืืืจ][19 ืืืืืจ][4 ืืืืืจ]
- ืืืืืืืช: [2 ืืืืืืืืช][1 ืืืืืืืืช]
- ืืขืืจืืช: [590 ืืขืืจืืช][1 ื"ืืขืืจืืช]
- ืืืื: [1 ืืึนืึดึผื][89 ืืืื]
Largest Groupings
[edit]These are the 5 largest groupingsโand they are pretty big! The analyzed form each group shares is listed under "stem". The number of distinct words is listed under "types" and the total number of words involved is listed under "tokens". The original words and their frequencies are listed under "forms".
stem | types | tokens | forms |
ืขืื | 232 | 7185 | [1 ืึถืขึฑืึถื][3 ืืขืื][3 ืืขืืื][1 ืืขืืื][4 ืืขืืืื][1 ืืขืืื][174 ืืขืื][412 ืืขืืืช][7 ืืขืืืชื][987 ืืขืื][25 ืืขืืื][12 ืืขืืืื][5 ืืขืืืื][15 ืืขืืื][1 ืืขืืื][19 ืืขืืื][1 ื"ืขืืืช][1 ืึถืขึธืืื][39 ืืืขืืืช][60 ืืืขืืื][6 ืืืขืืื][5 ืืืขืืืื][51 ืืืขืื][5 ืืืขืืืช][6 ืืืขืืื][3 ืื ืขืื][54 ืืขืืื][12 ืืขืืืืช][75 ืืขืืืื][107 ืืขืื][50 ืืขืื][1 ืืขืื][2 ืืขืืื][3 ืืขืืืื][35 ืืขืืื][6 ืืขืืืช][2 ืืขืื][36 ืืขืืชื][66 ืืชืขืื][2 ืืชืขืื][1 ืึฐื ึทืขึฒืึถื][1 ืึฐืขึธืึฐืชึธื][1 ืึทืึทึผืขึฒืืึผ][1 ืืืขืืืื][1 ืืืขืืืช][27 ืืืขืื][25 ืืืขืืืช][85 ืืืขืื][2 ืืืขืืื][1 ืืืขืืืื][1 ืืืขืืืื][3 ืืืขืืื][2 ืืืขืืื][1 ืืืืขืืืช][4 ืืืืขืืื][1 ืืืืขืืื][2 ืืื ืขืื][1 ืืืขืืื][2 ืืืขืืืื][16 ืืืขืื][4 ืืืขืื][2 ืืืขืืื][2 ืืืขืื][3 ืืืขืืชื][2 ืืืชืขืื][1 ืืืขืื][1 ืืืขืืืื][9 ืืืขืืืช][41 ืืืขืื][2 ืืืขืืืช][1 ืืืขืื][4 ืืืขืืื][3 ืืืขืืืื][2 ืืืขืืืื][7 ืืืขืืื][1 ืืืขืืื][11 ืืขืืื][2 ืืขืืื][4 ืืขืืืื][49 ืืขืื][9 ืืขืื][5 ืืขืื][25 ืืขืืื][13 ืืขืืืื][6 ืืขืืืื][34 ืืขืืื][1 ืืขืืื][2 ืืขืืื][2 ืืขืืื ื][1 ืืขืื][30 ืืขืืชื][1 ืืฉืขืื][1 ืืฉืขืื][1 ืืฉืขืืืื][2 ืืชืขืื][1 ืึทืขึฒืึถื][25 ืืขืื][7 ืืขืื][1 ืืืขืื][3 ืืืขืืืช][13 ืืืขืื][2 ืืืขืืื][1 ืืืขืืืื][1 ืืืขืื][1 ืืขืืื][1 ืืขืืืื][1 ืืขืื][2 ืืขืืืช][1 ืืฉืืขืืืช][1 ืืฉืืขืื][1 ืืฉืืขืืืื][2 ืืฉืขืืื][1 ืืฉืขืืืืช][12 ืืฉืขืื][3 ืืฉืขืื][1 ืืฉืขืืื][1 ืืฉืขืืื][1 ืืฉืขืืชื][1 ื"ืืขืื][5 ืืขืืื][1 ืืขืืืืช][5 ืืขืืื][13 ืืขืืืื][5 ืืขืื][134 ืืขืืืช][15 ืืขืื][10 ืืขืืื][2 ืึทืขึฒืึตื][15 ืืืขืื][11 ืืืขืืืช][29 ืืืขืื][2 ืืืขืืื][3 ืืืขืืืื][6 ืืืขืืื][2 ืืืขืืืื][2 ืืืขืืื][24 ืืขืืื][2 ืืขืืืืช][2 ืืขืืื][6 ืืขืืืื][158 ืืขืื][1 ืืขืื][161 ืืขืืืช][9 ืืขืื][25 ืืขืืื][9 ืืขืืืื][1 ืืขืืืื][23 ืืขืืื][21 ืืขืืื][2 ืืฉืขืื][2 ืืฉืขืื][10 ื ืขืื][2 ืขึฒืึตืืึถื][3 ืขึฒืึตืืึถื][1 ืขึฒืืึผ][1 ืขึดืึดึผืืช][1 ืขึถืึถื][1 ืขึธืึฐืชึธื][3 ืขึธืึตืื ืึผ][1 ืขึธืึถืืึธ][2 ืขึธืึถืืึธ][3 ืขึธืึธืื][2 ืขึปืึฐึผืึถื][1 ืขืึนืึตื][228 ืขืืื][21 ืขืืืืช][48 ืขืืื][121 ืขืืืื][1 ืขืืืช][6 ืขืืื][506 ืขืื][141 ืขืื][3 ืขืืืืืช][73 ืขืืืช][2 ืขืืืชื][12 ืขืืืชื][4 ืขืืืชื][1 ืขืืืชื][196 ืขืื][335 ืขืืื][267 ืขืืืื][72 ืขืืืื][765 ืขืืื][7 ืขืืื][2 ืขืืืื][1 ืขืืืื][12 ืขืืื][26 ืขืืืื][40 ืขืืื][38 ืขืืื ื][15 ืขืืืช][1 ืขืืืชื][1 ืขืืื][5 ืขืื][203 ืขืืชื][1 ืฉ"ืขืืื ื][1 ืฉึถืืขึธืืึผ][12 ืฉืืขืื][6 ืฉืืขืืืช][8 ืฉืืขืื][1 ืฉืืขืืื][1 ืฉืืขืืืื][1 ืฉืืขืืืื][2 ืฉืืขืืื][1 ืฉืืขืืื][2 ืฉืืขืืืื][28 ืฉืืขืื][3 ืฉืืขืื][2 ืฉืืขืื][3 ืฉืืขืื][6 ืฉืืขืืื][3 ืฉืืขืืืื][3 ืฉืืขืืืื][7 ืฉืืขืืื][1 ืฉืืขืืื][1 ืฉื ืขืื][14 ืฉืขืืื][3 ืฉืขืืืืช][6 ืฉืขืืืื][51 ืฉืขืื][39 ืฉืขืื][10 ืฉืขืื][41 ืฉืขืืื][30 ืฉืขืืืื][12 ืฉืขืืืื][72 ืฉืขืืื][1 ืฉืขืืื][4 ืฉืขืืื ื][40 ืฉืขืืชื][5 ืฉืชืขืื][29 ืชืขืื][1 ืชืขืืื ื] |
ืฉื ื | 210 | 29646 | [1 ื"ืืฉื ื][1 ื"ืฉื ืืช][2 ืึดึผืฉึฐืื ึทืช][1 ืึทึผืฉึธึผืื ึธื][55 ืืฉืื ื][1 ืืฉืื ืืช][3 ืืฉืื ื][394 ืืฉื ื][737 ืืฉื ืืช][18 ืืฉื ืืชืื][2 ืืฉื ืืชืืื][44 ืืฉื ืืชืื][2 ืืฉื ืืชืืื][1113 ืืฉื ืื][8277 ืืฉื ืช][6 ืืฉื ืชื][17 ืืฉื ืชื][25 ืืฉื ืชืืื][4 ืืฉื ืชื][3 ืึทืฉึธึผืื ึดืื][4 ืึทืฉึธึผืื ึธื][59 ืืืฉื ื][2 ืืืืฉื ืื][10 ืืืฉืื ื][4 ืืืฉื ื][1 ืืืฉื ืช][1 ืืืฉื ืชื][3 ืืืฉืื ื][1 ืืืฉืื ืืช][249 ืืืฉื ื][1 ืืืฉื ืืช][2 ืืืฉื ืื][28 ืืฉืื ื][5 ืืฉืื ืืืืช][205 ืืฉืื ืืช][9 ืืฉืื ื][313 ืืฉืื ืื][1 ืืฉืืืืืื ืืช][714 ืืฉื ื][1278 ืืฉื ืื][69 ืืฉื ืชื][26 ืืฉื ืชืืื][1 ื"ืฉืื ื][1 ื"ืฉื ื][2 ืืืฉืื ื][12 ืืืฉื ื][28 ืืืฉื ืืช][2 ืืืฉื ืืชืื][4 ืืืฉื ืืชืื][60 ืืืฉื ืื][379 ืืืฉื ืช][1 ืืืฉื ืชื][1 ืืืฉื ืชื][1 ืืืฉื ืชืืื][2 ืืืืฉืื ื][2 ืืืืฉื ื][1 ืืืฉืื ื][2 ืืืฉืื ื][1 ืืืฉืื ืื][1 ืืืฉื ื][1 ืืืฉื ืชืืื][1 ืืืฉืื ื][7 ืืืฉื ื][8 ืืืฉื ื][1 ืืืืฉืื ื][1 ืืืืฉื ื][4 ืืืฉื ื][1 ืืืฉื ืชืืื][3 ืืืฉืื ืืช][1 ืืืฉื ื][10 ืืืฉื ืืช][1 ืืืฉืื ืืช][3 ืืืฉืื ืื][10 ืืืฉื ื][1 ืืืฉื ืืช][1 ืืืฉื ืื][30 ืืืฉื ืช][1 ืืืฉื ืชื][4 ืืืฉื ืชื][1 ืื ึดืฉึฐืื ึถื][3 ืื ืฉื ื][1 ืืฉืืฉื ื][1 ืืฉืืฉื ื][5 ืืฉืื ื][1 ืืฉืื ืึผืช][4 ืืฉืื ืืช][1 ืืฉืื ื][11 ืืฉืื ืื][35 ืืฉื ื][15 ืืฉื ืืช][135 ืืฉื ื][6 ืืฉื ืื][1 ืืฉื ืื ื][1 ืืฉื ืื ื][4 ืืฉื ืืช][2 ืืฉื ืืชื][6 ืืฉื ืช][1 ืืฉื ืชื][14 ืืฉื ืชืืื][1 ืืชืฉื ื][178 ืืฉื ื][118 ืืฉื ื][3 ื"ืฉื ื][2 ื"ืฉื ื][1 ื"ืฉื ืช][2 ืึทึผืฉึธึผืื ึดืื][1 ืืืฉื ื][1 ืืืฉื ืื][9 ืืืฉื ื][1 ืืฉืืฉื ื][1 ืืฉืืฉื ืื][1 ืืฉืืฉื ืช][1 ืืฉืืฉื ืชืืื][1 ืืฉืื ื][1 ืืฉืืฉื ื][73 ืืฉื ื][2 ืืฉื ืืช][2 ืืฉื ืื][1 ืืฉื ืช][44 ืืฉื ืชืืื][1 ืืฉืฉื ื][1 ื"ืืฉื ื][2 ืืฉืื ื][12 ืืฉืื ืืช][23 ืืฉืื ื][82 ืืฉื ื][167 ืืฉื ืืช][1 ืืฉื ืืชืื][24 ืืฉื ืื][393 ืืฉื ืช][34 ืืฉื ืชืืื][1 ืืืฉื ืช][4 ืืืฉื ื][14 ืืืฉื ืื][6 ืืฉืื ื][5 ืืฉืื ืืช][1 ืืฉืื ื][9 ืืฉืื ืื][284 ืืฉื ื][108 ืืฉื ืืช][1 ืืฉื ืืชืื][24 ืืฉื ืื][913 ืืฉื ืช][22 ืืฉื ืชื][7 ืืฉื ืชืืื][1 ืืฉื ืชื ื][1 ื ืฉื ื][1 ืฉ"ืืฉื ืช][1 ืฉ"ืืฉื ื][1 ืฉึฐืื ึทืช][1 ืฉึฐืื ึธืชึธืึดื][2 ืฉึฐืื ืึนืชึตืื ืึผ][2 ืฉึธืื ึดืื][4 ืฉึธืื ึธื][1 ืฉืืฉืื ื][3 ืฉืืฉื ื][4 ืฉืืฉื ืืช][7 ืฉืืฉื ืื][32 ืฉืืฉื ืช][1 ืฉืืฉื ืชืืื][3 ืฉืืฉืื ืืช][1 ืฉืืฉื ื][1 ืฉืืืื ืืช][432 ืฉืื ื][880 ืฉืื ืืช][15 ืฉืื ื][1403 ืฉืื ืื][1 ืฉืืืืืื ื][2 ืฉืืืืืื ืืช][1 ืฉืื ืช][12 ืฉืืฉื ื][6 ืฉืืฉื ื][1 ืฉืืฉื ื][1 ืฉืืฉืื ื][3 ืฉืืฉืื ื][2 ืฉืืฉื ื][1 ืฉืืฉื ืืช][1 ืฉืืฉื ืช][2208 ืฉื ื][1 ืฉื ื][1186 ืฉื ืืช][1 ืฉื ืืชื][13 ืฉื ืืชืื][41 ืฉื ืืชืื][1 ืฉื ืืชืื][9 ืฉื ืืชื][2449 ืฉื ื][1937 ืฉื ืื][2 ืฉื ืื ื][48 ืฉื ืืช][250 ืฉื ืืชื][1108 ืฉื ืช][4 ืฉื ืชื][8 ืฉื ืชื][68 ืฉื ืชื][294 ืฉื ืชืืื][1 ืฉื ืชื][46 ืฉื ืชื][6 ืฉื ืชื ื][3 ืฉืฉืื ื][1 ืฉืฉืื ืืช][2 ืฉืฉืื ืื][1 ืฉืฉื ื][1 ืฉืฉื ืืช][2 ืฉืฉื ืื][1 ืฉืฉื ืืช][4 ืฉืฉื ืช][1 ืฉืฉื ืชืืื][1 ืฉืชืฉื ื][8 ืชืฉื ื] |
ืฉื | 185 | 10250 | [1 ืึทึผืฉื][1725 ืืฉื][43 ืืฉืื][110 ืืฉืื][39 ืืฉืืืช][4 ืืฉืืืชืืื][1 ืืฉืืืชืื][15 ืืฉืื][28 ืืฉืืื][17 ืืฉืื][24 ืืฉืื][2 ืืฉืืช][1 ื"ืืฉื][1 ื"ืฉื][1 ืึทืฉึธึผืืึทืึดื][4 ืึธืฉึทึผืืึธึผื][28 ืืืฉืื][6 ืืืฉืืื][1 ืืืฉืื][2 ืืืฉืื][5 ืืืืฉืื][112 ืืืฉืืื][51 ืืืฉืืฉืื][224 ืื ืฉืื][671 ืืฉื][2 ืืฉืื][45 ืืฉืืืช][43 ืืฉืืื][1 ืืฉืื][17 ืืฉืื][3 ืืฉืืช][1 ื"ืืฉืื][1 ื"ื ืฉืื][1 ืึฐืฉึตืื][1 ืึผืฉึฐืืึธืึผ][1 ืึผืฉึฐืืืึน][3 ืืืฉื][1 ืืืฉืื][4 ืืืฉืื][1 ืืืฉืืืช][1 ืืืฉืืื][1 ืืืฉืื][11 ืืืืฉืื][1 ืืืืฉืืื][3 ืืืืฉืืื][8 ืืื ืฉืื][7 ืืืฉื][1 ืืืฉืื][1 ืืืฉืืืช][2 ืืืฉืืื][3 ืืืืฉืื][2 ืืืฉืื][25 ืืืฉื][1 ืืืฉืืฉืื][1 ืืืฉืืื][88 ืืืฉื][13 ืืืฉืืฉืื][25 ืื ืฉืื][1 ืืฉ"ืืฉื][1 ืืฉืืฉืื][1 ืืฉืืฉื][1 ืืฉืืื][1 ืืฉืืฉืื][302 ืืฉื][32 ืืฉืื][45 ืืฉืื][6 ืืฉืืืช][1 ืืฉืืืชืื][1 ืืฉืืืชืืื][2 ืืฉืื][2 ืืฉืื][8 ืืฉืื][2 ืืฉืฉื][1 ืืฉืฉืื][1 ืืชืฉืืื][1 ืืชืฉืืื ื][5 ืืฉืื][1 ื"ืืฉืืื][19 ืืืฉืื][1 ืืืฉืืื][1 ืืฉืืฉื][8 ืืฉืืฉืื][46 ืืฉื][4 ืืฉืื][10 ืืฉืื][1 ืืฉืืืชืืื][1 ืืฉืื][1 ืืฉืื][1 ืืฉืื][9 ืืฉืืช][2 ืืฉืฉื][1 ืืฉืฉืืื][48 ืืฉืื][476 ืืฉื][19 ืืฉืื][31 ืืฉืื][13 ืืฉืืืช][1 ืืฉืืืชืื][1 ืืฉืืืชืืื][10 ืืฉืืื][1 ืืฉืื][1 ืืฉืื][6 ืืฉืื][1 ืืืฉืืื][7 ืืืฉื][1 ืืืฉืืืช][4 ืืืฉืืื][1 ืืฉืืฉืื][1 ืืฉืื][102 ืืฉืืื][1 ืืฉืืฉืื][172 ืืฉื][5 ืืฉืื][13 ืืฉืื][5 ืืฉืืืช][1 ืืฉืืืชืืื][1 ืืฉืืืชืืื][1 ืืฉืื][1 ืืฉืืื][4 ืืฉืื][103 ืืฉืืฉืื][2 ืืฉืืช][483 ื ืฉืื][1 ืฉ"ื ืฉืื][4 ืฉึฐืืืึน][1 ืฉึฐืืืึนืช][2 ืฉึดืืื][1 ืฉึตืื][1 ืฉึถืื][1 ืฉึทืืึธึผื][3 ืฉึธืื][1 ืฉึธืืึทืึดื][1 ืฉึธืืึธึผื][2 ืฉืืื][2 ืฉืืฉืื][2 ืฉืืฉื][1 ืฉืืฉืื][2 ืฉืืฉืื][3 ืฉืืฉืืื][1 ืฉืืฉืื][3 ืฉืืฉื][2 ืฉืืฉืืืช][1 ืฉืืฉืืื][1 ืฉืืืื][6 ืฉืื][2 ืฉืืื][10 ืฉืืื][12 ืฉืืื][1 ืฉืืื ื][10 ืฉืืฉืื][2 ืฉืืืฉืื][1 ืฉืืฉืื][1 ืฉืืฉืื][1 ืฉืืฉืื][2 ืฉืืฉื][5 ืฉืืฉืื][1 ืฉืืฉืื][2896 ืฉื][396 ืฉืื][2 ืฉืืึผ][731 ืฉืื][213 ืฉืืืช][1 ืฉืืืชืื][20 ืฉืืืชืืื][2 ืฉืืืชืืื][42 ืฉืื][33 ืฉืืื][1 ืฉืื][52 ืฉืื][115 ืฉืื][6 ืฉืื ื][1 ืฉืืฉืื][4 ืฉืืฉืืฉืื][25 ืฉืืช][1 ืฉืืชื][2 ืฉืืชื][7 ืฉื ืฉืื][32 ืฉืฉื][34 ืฉืฉืื][55 ืฉืฉืื][2 ืฉืฉืืืชืืื][10 ืฉืฉืื][1 ืฉืฉืื][1 ืฉืชืฉืื][4 ืชืฉืื] |
ืื | 184 | 6645 | [1 ื"ืืืืื][1 ื"ืืื][1 ืึทึผืึทึผืึดึผืื][1 ืึทึผืึธึผืึดืื][141 ืืื][5 ืืืื][341 ืืืื][8 ืืืืื][2 ืืืืืื][15 ืืืืื][81 ืืืืื][70 ืืืืื ื][2 ืืืื ื][24 ืืืืืื][1 ื"ืืื][1 ื"ืืฉืืื][1 ืึทืึธึผื][2 ืึทืึทึผืึดื][65 ืืืืื][4 ืืืื][1 ืืืืื][91 ืืืืื][730 ืืื][10 ืืืื][95 ืืืื][2 ืืืืื][320 ืืืืื][2 ืืืื][1 ืืืื ื][29 ืืืืื][24 ืืืืืื][46 ืืืืื][6 ืืืฉืืื][1 ืืืืื][1 ืื"ืืื][1 ืืืืื][467 ืืืื][17 ืืืืื][1 ืืืืื][112 ืืืฉืืื][4 ืืืฉืืื][1 ืืืฉืืืื][12 ืืืฉืืื][4 ืืืฉืืืื][3 ืืืฉืืืืื][1 ื"ืืืื][1 ืึฐืึธืึถืืึธ][1 ืึผืึทืึธึผืึดืื][5 ืืืื][11 ืืืืื][1 ืืืืืื][1 ืืืืืื][4 ืืืืืื][1 ืืืืืื ื][2 ืืืืืื][1 ืืืืืื][14 ืืืื][3 ืืืืืื][1 ืืืืืื][2 ืืืืืืื][7 ืืืืืื][11 ืืืืื][3 ืืืืฉืืื][1 ืืืืื][11 ืืื][8 ืืืื][1 ืืืืืื][5 ืืืืื][2 ืืืื][2 ืืืืืื][14 ืืืืื][2 ืืืฉืืื][1 ืืืืื][3 ืืืื][2 ืืืืื][30 ืืืืืื][1 ืืืืื][20 ืืืื][1 ืืืืืื][1 ืืืืืืื][4 ืืืืื][3 ืืืืื ื][1 ืืืฉืืื][4 ืืืฉืืื][2 ืืืฉืืืื][6 ืืฉืืื][1 ืืฉืืื][1 ืืฉืืืืื][5 ืึฐืึตื][1 ืึทืึดึผื][1 ืึธื][1 ืึธืึดืื][1 ืึธืึธึผื][4 ืืืื][446 ืื][2 ืืื][1 ืืื][404 ืืื][19 ืืืื][7 ืืืืื][63 ืืืื][747 ืืืื][179 ืืืื ื][1 ื"ืืฉืืื][1 ืืืืื][1 ืืื][1 ืืืื][36 ืืืื][16 ืืืืื][1 ืืืืื][1 ืืืืื ื][125 ืืืื][1 ืืืืื][1 ืืืืืื][1 ืืฉืืืื][10 ืืฉืืื][1 ื"ืื][1 ื"ืืืื][2 ื"ืืื][4 ืืืื][1 ืืืืืื][4 ืืืืื][118 ืืื][16 ืืืื][13 ืืืื][1 ืืืืื][3 ืืืืื][231 ืืืืื][7 ืืืืื ื][34 ืืืืืื][1 ืืืฉืืื][15 ื"ืืื][1 ืึดืึทึผื][2 ืึทืึดื][1 ืึธืึดื][22 ืืืื][3 ืืืืืื][1 ืืืืื][3 ืืืืื][584 ืืื][111 ืืืื][4 ืืืืื][2 ืืืืืื][13 ืืืืื][1 ืืืืื][2 ืืืืื][84 ืืืื][2 ืืืื ื][5 ืื"ืืื][3 ืืืื][8 ืืืื][1 ืืฉืื][102 ืืฉืืื][1 ืืฉืืืืื][31 ืืฉืืื][18 ืืฉืืืื][16 ืืฉืืืืื][1 ืืฉืืื][2 ืฉึดืืื][1 ืฉึธืืึทืึดื][2 ืฉืืื][12 ืฉืืื][5 ืฉืืืื][2 ืฉืืืืื][2 ืฉืืืืื][1 ืฉืืืืื ื][3 ืฉืืื][1 ืฉืืืื][6 ืฉืื][2 ืฉืืื][10 ืฉืืื][12 ืฉืืื][2 ืฉืืืื][4 ืฉืืืื][37 ืฉืืืืื][3 ืฉืืืืื][1 ืฉืืืืืื][33 ืฉืืื][1 ืฉืืืื][1 ืฉืืืืืื][2 ืฉืืืื][2 ืฉืืืื ื][1 ืฉืืืื][1 ืฉืืฉืืืื] |
ืื | 177 | 9575 | [1 ืึฐืึทื][1 ืึฑืึดื][4 ืึฑืึนืึดืื][1 ืึตื][1 ืึตืึดืื][1 ืึตืึถึผื][1 ืึตืึถืืึธ][1 ืึตืึทื][2 ืึตืึธืื][30 ืึถื][1 ืึถืึฐืึธื][2 ืึทื][11 ืืืืืื][10 ืืืืืื][1 ืืืืื][3213 ืื][1253 ืืื][5 ืืืืื][1 ืืืื][1355 ืืื][288 ืืื][324 ืืืื][175 ืืืืื][29 ืืืืื][512 ืืืื][6 ืืืื][2 ืืืืื][5 ืืืืื][4 ืืืื][1 ืืืืื][57 ืืืื][6 ืืืื ื][3 ืืื][3 ืืืืื][8 ืืื][152 ืืื][4 ื"ืื][2 ื"ืฉืืื][65 ืืื][19 ืืืื][1 ืืืืืื][12 ืืืื][7 ืืืื][1 ืืืืื][1 ืืืืืืื][1 ืืืืื][1 ืืืืื][1 ืืืืื][1 ื"ืฉืืืื][2 ืึธืึตืึถึผื][1 ืืึทืื][238 ืืื][138 ืืืื][2 ืืืืืื][43 ืืืื][4 ืืืื][68 ืืืืื][2 ืืืื][1 ืืืื][1 ืืืืืื][1 ืืืฉืื][1 ืืืฉืืื][5 ื"ืื][1 ื"ืฉืืื][2 ืึฐืึตืึถึผื][1 ืึฐืึถื][2 ืืืืืืื][86 ืืื][65 ืืืื][51 ืืืื][34 ืืืื][3 ืืืืื][4 ืืืืืื][3 ืืืืืื][7 ืืืืื][1 ืืืืื][1 ืืืืื][1 ืืืื][13 ืืืื][2 ืืืื][1 ืืืืื][1 ืืืืืื][3 ืืืื][2 ืืืืื][2 ืืืืืื][4 ืืืืื][2 ืืืืื][1 ืืืืืื][2 ืืืฉืืื][2 ืืืื][2 ืืืืื][2 ืืืืื][1 ืืืืืื][1 ืืืืื][1 ืืืืืื][1 ืืืื][2 ืืืืื][1 ืืืืื][2 ืืืฉืืื][5 ืืฉืื][1 ืืฉืืื][4 ืืฉืืื][7 ืืฉืืื][3 ืืฉืืืื][2 ืืฉืืืื][1 ื"ืื][1 ื"ืืื][65 ืืื][140 ืืืื][99 ืืืื][3 ืืืืื][7 ืืืืื][2 ืืืืื][7 ืืืื][2 ืืฉืืื][1 ืืฉืืื][1 ืืฉืืืื][1 ืืฉืืืืื][1 ื"ืื][51 ืืื][51 ืืืื][1 ืืืืืื][46 ืืืื][8 ืืืื][2 ืืืืืื][9 ืืืืื][1 ืืืืื][1 ืืืื][1 ืืืืื][3 ืืืืื][1 ื"ืืื][1 ืึดืฉึฐืืึธื][2 ืึตืึตืึธืื][15 ืืื][31 ืืืื][29 ืืืื][43 ืืืื][4 ืืืืื][3 ืืืืืื][1 ืืืืืื][11 ืืืืื][1 ืืืืื][5 ืืืืื][1 ืืืืืื][1 ืืืื][1 ืืืื][7 ืืืื][1 ืืืืืื][1 ืืืืื][40 ืืฉืื][6 ืืฉืืื][11 ืืฉืืื][32 ืืฉืืื][1 ืืฉืืืื][1 ืืฉืืืื][1 ืฉ"ืืื][1 ืฉ"ืฉืื][41 ืฉืื][70 ืฉืืื][29 ืฉืืื][4 ืฉืืื][29 ืฉืืืื][10 ืฉืืืืื][8 ืฉืืืืื][34 ืฉืืืื][1 ืฉืืืืื][10 ืฉืืื][2 ืฉืืืืื][14 ืฉืืืื][6 ืฉืืืื][101 ืฉืืื][10 ืฉืืืื][2 ืฉืืืื][18 ืฉืืืื][2 ืฉืืืืืื][1 ืฉืืฉืืื][1 ืฉืืฉืืืื] |
Random Analyzed Terms
[edit]Here is a sample of 50 random analyzed terms. The bold term is the original word, the second column has the frequency of the term in the corpus, and the last column lists out all the analyzed terms generated by HebMorph.
ืืืชืจืืื | 2 | ืืชืจืื | ืืืชืจืืื | ืจืืื | ืชืจืื |
ืืืืืื ืืช | 7 | ืืืืืื ืืช | ืืืื ื |
ืืขืืืชืื | 1 | ืืขืืืชื | ืืขืืืชืื |
ืืชื ืื | 1 | ืืชื ืื | ืชื |
ืืจืืืืช | 1 | ืืจืืืืช |
ืืกืืื | 11 | ืืกืืื | ืกืื | ืกืืื |
ืืฆื | 7 | ืืฆื | ืฆื |
ืืกืืจ | 6 | ืืกืืจ | ืกืืจ |
ืืืืืขืืช | 4 | ืืืืืขืืช | ืืืืขื |
ืืืจื | 2 | ืืจ | ืืืจื |
ืชื ืืืชื | 1 | ืชื ืืื | ืชื ืืืชื |
ืืืืืืช | 15 | ืืื | ืืืืืืช |
ืืืืื | 1 | ืืืืื |
ืคืจืืื | 9 | ืคืจืื | ืคืจืืื |
ืืกืืื ื | 1 | ืืกืืื ื |
ืืืกืคืงื | 1 | ืืกืคืงื | ืืืกืคืงื |
ืืืืคืจืืช | 9 | ืืืคืจ | ืืืืคืจืืช |
ืืืกื ืืจื | 1 | ืืืกื ืืจื |
ืฉืงืืืฅ | 2 | ืงืืืฅ | ืฉืงืืืฅ |
ืืืืจื | 3 | ืืืืจื | ืืืจ | ืืืจื |
ืกืืื"ื | 1 | ืกืืื\"ื |
ืคืืจืืืช | 2 | ืคืืจืืืช | ืคืจื |
ืฉืืืงืืื | 1 | ืืงืื | ืืืงืื | ืฉืืืงืืื |
ืืืืคืชืืืช | 1 | ืืืืคืชืืืช | ืืืคืชื |
ืืืืืืงืืช | 1 | ืืืืืงืืช | ืืืืืืงืืช |
ืืืชื | 5 | ืืืชืื | ืืืชื | ืืช | ืืชื | ืชื |
ืืืืื ืืช | 1 | ืืืืื ืืช | ืืืื | ืืืื ื |
ืืืืืืื ืืืกืง | 1 | ืืืืืืื ืืืกืง |
ืืื ืคืฅ | 1 | ืืื ืคืฅ | ื ืืคืฅ | ื ืคืฅ |
ืชืฉืก"ื | 35 | ืชืฉืก\"ื |
ืื ืืืก | 1 | ืื ืืืก |
ืืืืื ืืืื | 1 | ืืืืื ืืืื |
ืฉืืกืชืื | 1 | ืืกืชืื | ืฉืืกืชืื |
ืคืืืืืื | 7 | ืคืืืืืื |
ืฉืืืจืื | 2 | ืืจื | ื ืจืื | ืฉืืืจืื |
ืืฉื ืื ืืชื | 1 | ืืฉื ืื ืืชื | ืฉื ืื ืืช |
ืืืฆืช | 4 | ืืฅ | ืืฆื | ืืืืฅ | ืืืฆื | ืืืฆืช |
ืกืืืืืื ื | 9 | ืกืืืืืื ื |
ื"Vitamin | 1 | vitamin |
ืืื ืืกืืืืงืืื | 1 | ืืื ืืกืืืืงืืื |
ืฉืฉืื | 54 | ืฉืฉืื |
ืืืจืืฅ | 62 | ืืืจืืฅ | ืืจืืฅ | ืจืฅ |
ืืืืื | 6 | ืืืื | ืืืืื |
ืคืขืืืืช | 17 | ืคืขืืื | ืคืขืืืืช |
ืืืืื' | 1 | ืืืืื' |
ืืืื | 9 | ืื | ืืืื |
ืืืืจืืช | 7 | ืืืืจ | ืืืืจืืช |
ืืฉืืื | 1 | ืืฉืืื | ืฉืื | ืฉืื | ืฉืืื |
ืืืฆ'ืงืืง | 6 | ืืืฆ'ืงืืง |
ืงืืืืช | 1 | ืงืืืืช | ืงืืืช |
Largest Number of Analyzed Terms
[edit]Here are the forms with the most analyzed terms. Note that they are all fairly rareโall but one only occurring once or twice in the corpus.
ืืืื ืืืช | 1 | ืืืื ืืช | ืืืื ืืช | ืืืื ืช | ืืืืื ืืช | ืืืื | ืืืื ื | ืืืื | ืืืื ืืช | ืืืื ืืืช | ืืื ื | ืืื ืืช | ืืื ืืช |
ืืคืื ืื | 1 | ืืืคืืื | ืืืคื | ืืคื | ืืคืื | ืืคืื ื | ืืคืืื | ืืคืื ืื | ืืคื ื | ืืืคื ื | ืืคื ื | ื ืคื ื | ืคืื ื | ืคืื ื | ืคื ื |
ืืืืืื ืืช | 1 | ืืืื | ืืืื ืืช | ืืืื ืช | ืืืืื ืืช | ืืืื | ืืืื | ืืืื ื | ืืืื | ืืื ื | ืืื ืืช | ืืืืืื ืืช | ืืืืื | ืืืื ื |
ืืืืืจื | 2 | ืืืืจ | ืืืืจ | ืืืืจ | ืืืืจื | ืืืจ | ืืืจื | ืืืืืจื | ืืืืืจ | ืืืืจื | ืืืืืจ | ืืืจื | ื ืืืจ |
ืืืืื ืืืช | 1 | ืืืื ืืช | ืืืื ืืช | ืืืื ืช | ืืืืื ืืช | ืืืื | ืืืื ื | ืืืื | ืืืื ืืช | ืืื ื | ืืื ืืช | ืืื ืืช | ืืืืื ืืืช | ืืืื ื | ืืืืื ืืช |
ืืืืชืื | 1 | ืืืืื | ืืืืชื | ืืืื | ืืืืชืื | ืืืชืื | ืืืชื | ืื | ืืื | ืืืื | ืืืืชื | ืืชื | ื ืืชื |
ืืื ืืกืื | 1 | ืืืื ืก | ืืื ืืก | ืืื ืืกืื | ืืื ืกื | ืืื ืก | ืืื ืืก | ืืื ืก | ืื ืืก | ืื ืืกื | ืื ืก | ืื ืกืืื | ื ืื ืก | ื ืก |
ืืืืืื | 1 | ืืืืื | ืืืืื | ืืืื | ืืืืื | ืืืืืื | ืืืืื | ืืืื | ืืืื | ืืืืื | ืืื | ืืืื | ื ืืื |
ืืืืื ืื | 22 | ืืืื | ืืื ื | ืืื ื | ืืื ื | ืื | ืืื | ืืืื | ืืืืื | ืืืืื ืื | ืืืื | ืืื | ืืืืื |
ืืงืืืืืช | 2 | ืืืงืื | ืืงืืื | ืืงืืื | ืืงืืืืืช | ืงืืื | ืงืืืื | ืงืื | ืงืืื | ืงืืื | ืงืืื | ืงืืืืช | ืงืืืืืช | ืงืืื |
ืืืฉืืื | 1 | ืืืฉื | ืืืฉืื | ืืฉืื | ืืืฉืืื | ืืืื | ืืืฉื | ืืฉื | ืืฉืื | ืฉื | ืฉืื | ืฉืื | ืฉืืื |
ืืืื ืื | 1 | ืืื ื | ืืื | ืืื ื | ืื | ืื ื | ืื ืืื | ืืืื ืื | ืืื | ืืื ื | ืืื ืืื | ืืืื | ืืืืื | ืืืื |
ืืืืฉืจืช | 1 | ืืืฉืจ | ืืืฉืจื | ืืืฉืืจ | ืืฉืืจ | ืืฉืจื | ืืืฉืจ | ืืืืฉืจืช | ืืืฉืืจื | ืืฉืืจื | ืืฉืจื | ืืฉืจืช | ืฉืืจืช |
ืืกืืืื | 1 | ืืืกืื | ืืกืืื | ืืืกื | ืืืกืื | ืืืกื | ืืก | ืืกืื | ืืกืืืื | ืืกื | ืกืืื | ืกืืื | ืกื | ืกืื |
ืืขืจืืืืืช | 1 | ืืืขืจื | ืืขืจืื | ืืขืจืืืืืช | ืืขืจืื | ืืขืจืืช | ืขืืจืืช | ืขืจืื | ืขืจืืื | ืขืจื | ืขืจืื | ืขืจืื | ืขืจืืืืช |
ืฉืืืคืืื ื | 2 | ืืืคืืื | ืืืคื | ืืคืื | ืืคืื ื | ืืคืืื | ืืคื ื | ืืืคื ื | ืืคื ื | ื ืคื ื | ืคืื ื | ืคืื ื | ืคื ื | ืฉืืืคืืื ื | ืฉืืฃ |
Live Demo
[edit]There's a live demo of Hebrew Wikipedia with unpacked Hebmorph (with ICU folding) in labs.
Note that it contains the index of Hebrew Wikipedia, so it can show results and snippets, but all the links are red and none of the pages are available in labs.