Chinese search support – 中文搜索支持¶
Insiders adds experimental Chinese language support for the built-in search plugin – a feature that has been requested for a long time given the large number of Chinese users.
After the United States and Germany, the third-largest country of origin of Material for MkDocs users is China. For a long time, the built-in search plugin didn't allow for proper segmentation of Chinese characters, mainly due to missing support in lunr-languages which is used for search tokenization and stemming. The latest Insiders release adds long-awaited Chinese language support for the built-in search plugin, something that has been requested by many users.
Material for MkDocs終於支持中文了!文本被正確分割並且更容易找到。
This article explains how to set up Chinese language support for the built-in search plugin in a few minutes.
Configuration¶
Chinese language support for Material for MkDocs is provided by jieba, an excellent Chinese text segmentation library. If jieba is installed, the built-in search plugin automatically detects Chinese characters and runs them through the segmenter. You can install jieba with:
The next step is only required if you specified the separator
configuration in mkdocs.yml
. Text is segmented with zero-width whitespace characters, so it renders exactly the same in the search modal. Adjust mkdocs.yml
so that the separator
includes the \u200b
character:
That's all that is necessary.
Usage¶
If you followed the instructions in the configuration guide, Chinese words will now be tokenized using jieba. Try searching for 支持 to see how it integrates with the built-in search plugin.
Note that this is an experimental feature, and I, @squidfunk, am not proficient in Chinese (yet?). If you find a bug or think something can be improved, please open an issue.