Wednesday, May 14, 2008

novel-pinyin 0.2.x wishlist

As the first version of novel-pinyin has been released, some feedback has been received.
The next version of novel-pinyin will try to finish the following todo tasks:
1. Model Modification. Modify the P(P|W) from k/n to C(P,W)/C(W).
(C(P,W) stands for counter of pinyin and word combination,
C(W) stands for word counter.)

2. Dynamic adjust phrase positions according to bi-gram possibilities.
As in HMM model training process, the frequency adjusted is very small(1 or 6).
To magnify the position changes, replace unigram with bi-gram when possible.

3. Versioned Data File Format.
As data file format will be changed in next release. So I will add a version file in
~/.scim/novel-pinyin, to indicate file format version.
When different version has been detected, the files of old version will be flushed.

Optional:
skim integration.