As the first version of novel-pinyin has been released, some feedback has been received.
The next version of novel-pinyin will try to finish the following todo tasks:
1. Model Modification. Modify the P(P|W) from k/n to C(P,W)/C(W).
(C(P,W) stands for counter of pinyin and word combination,
C(W) stands for word counter.)
2. Dynamic adjust phrase positions according to bi-gram possibilities.
As in HMM model training process, the frequency adjusted is very small(1 or 6).
To magnify the position changes, replace unigram with bi-gram when possible.
3. Versioned Data File Format.
As data file format will be changed in next release. So I will add a version file in
~/.scim/novel-pinyin, to indicate file format version.
When different version has been detected, the files of old version will be flushed.
Optional:
skim integration.
Wednesday, May 14, 2008
Subscribe to:
Post Comments (Atom)
2 comments:
Hi,
I am one of the authors of TouchPal(www.cootek.com / www.cootek.com/cnbbs) and I wrote the Chinese engine of TouchPal Chinese edition.
Now, we wanna do more intelligent, so I wish we could get acquainted, okay?
Teng Ren
Sorry, I just see your comment right now.
OK, nice to meet you.
My Email Address: alexepico at gmail dot com
Peng Wu
Post a Comment