Wednesday, May 14, 2008

novel-pinyin 0.2.x wishlist

As the first version of novel-pinyin has been released, some feedback has been received.
The next version of novel-pinyin will try to finish the following todo tasks:
1. Model Modification. Modify the P(P|W) from k/n to C(P,W)/C(W).
(C(P,W) stands for counter of pinyin and word combination,
C(W) stands for word counter.)

2. Dynamic adjust phrase positions according to bi-gram possibilities.
As in HMM model training process, the frequency adjusted is very small(1 or 6).
To magnify the position changes, replace unigram with bi-gram when possible.

3. Versioned Data File Format.
As data file format will be changed in next release. So I will add a version file in
~/.scim/novel-pinyin, to indicate file format version.
When different version has been detected, the files of old version will be flushed.

Optional:
skim integration.

2 comments:

任腾的blog said...

Hi,
I am one of the authors of TouchPal(www.cootek.com / www.cootek.com/cnbbs) and I wrote the Chinese engine of TouchPal Chinese edition.
Now, we wanna do more intelligent, so I wish we could get acquainted, okay?

Teng Ren

Alex Epico said...

Sorry, I just see your comment right now.
OK, nice to meet you.
My Email Address: alexepico at gmail dot com

Peng Wu