Hacking Book | Free Online Hacking Learning


jannson/yah: yah

Posted by agaran at 2020-04-04

"Dumb ha" Chinese participle, faster or more accurate, is defined by you. Through simple customization, make the segmentation module more suitable for your needs. "Yaha" You can custom your Chinese Word Segmentation efficiently by using Yaha

PS. here is a crfseg that encapsulates CRF + +. At present, ugliness is suitable for learning.

In the past, the word discovery function was implemented in extra / seqword.cpp, and now it has been upgraded and optimized to stand alone: project address

Using multithreading and MapReduce like ideas, you can process 50m + text and automatically get professional nouns, names, place nouns and other words in the text. After getting the words, they can be added to the dictionary of the word segmentation library.

pip install yaha

QQ communication group (also the communication group of VxWorks kernel like project): 2749-83126

Code deployed on gae: http://yahademo.appspot.com

Code deployed on SAE: http://yaha.sinaapp.com

The original address is no longer used: http://yaha.v-find.com/

Example code: https://github.com/jannson/yaha/blob/master/tests/test_cutter.py

Basic functions:

Available plug-ins:

Additional features:

I have been using it all the time. It seems that there is no problem. Finally, thanks to the author of Jieba, the current dictionary is directly copied from the Jieba project.