TextCat uses a very lightweight model based on n-grams (rather than, say, dictionaries), so it doesn't care much about spaces since it doesn't care about words as words. It just looks at sequences of 1, 2, 3, 4, and 5 characters, and spaces are just another character. Spaces are useful at distinguishing languages because characters appearing at the beginning and ending of a word are more or less likely in different languages. For example, ng is much more likely at the end of a word in English than at the beginning, so "ng " is more characteristic of English than " ng", which is most common in a discussion of the sequence ng, or in names, like Nguyen. As expected, "ng " was the 124th most common n-gram in the training data for English, while " ng" comes in at #6361. In Vietnamese, "ng " is more common than " ng" at #20, but " ng" is only #79, which is still very common.
You can try out TextCat with the demo. With the default settings, whatlanguageisthis? is identified as English, despite the lack of spaces.
There are some languages that could be identified with high accuracy by their scripts—Thai, Burmese, Korean, Japanese hiragana and katakana, Hebrew, Greek, and others. Some of those do get used for other languages (Japanese for Okinawan or Ainu, Hebrew for Yiddish), but those are rare in queries on English Wikipedia, for example.
TextCat doesn't take the uniqueness of the writing system into account, and it really can't. A hybrid system could, but that's more complexity than we need most of the time. TextCat works well on unique alphabets and syllabaries, because all the characters are represented among the top thousand n-grams (and we use models of at least 3,000 n-grams). For Chinese, which is a logographic, TextCat can get in trouble with short strings (like many queries) or strings of Chinese with a tiny sprinkling of Latin characters. Because there are so many Chinese characters, not all of them are in the Chinese language model. For relatively large samples of text—say, a paragraph—it can tell apart Cantonese and Mandarin (which gets labelled "Chinese") most of the time—based on the different patterns of co-occurrence of the more common characters. (You can test that with snippets from the front page of the respective Wikipedias and the TextCat demo.) Japanese is relatively easy if it's a longer sample, because there will be likely be hiragana and katakana. For very short strings of hanzi and kanji, it's much harder.
Because of this confusion on short strings, we had to disable Chinese detection on Japanese Wikipedia (and similarly, French detection on English Wikipedia) with the original implementation. I've been working on improvements that should allow us to re-enable both of those relatively soon!