Monday, August 25, 2008

Sorry, for the bugs of novel-pinyin 0.2.3 release.

As the first release of 0.2.x series, I put the novel-pinyin on sourceforge. But later I withdraw the package, because some serious bug has been found.
During the Beijing Olympics, I finally released the novel-pinyin 0.2.3 package.
Thank lyman for feedbacking the bug in initializing code.
As novel-pinyin has been released, the fix code is relatively small, so I decide to release the fix as a seperate patch.
using the following command in novel-pinyin-0.2.3 directory:
patch -p2 < ../../urgent-patch-fix-novel-pinyin-first-load.patch

PS:
顺便提一句,输入法的中文名称变为了新智能拼音,英文名称为Novel Pinyin不变。

Friday, August 08, 2008

novel-pinyin 0.3.x wishlist

TODO Items:
1.Modify pinyin large table to merge scim-pinyin phrase lib into gb_char.table.
2. Write phrase to token conversion. (phrase_large_table)
3. Write n-gram segment to bootstrap phrase generation. (replace current mmseg.)
4. Larger corpus learning.
5. Entropy-based n-gram prune.
6. Add professional phrase libraries support.
7. Better fuzzy pinyin support.(like ms-pinyin)

novel-pinyin 0.2.3 released

Done Items:
1. Import the entire scim-pinyin phrases as corpus.
2. Better HMM parameter adjusts.
3. Better candidates adjusts.
4. Add version check.
5. Add data file corruption detection.
6. Protect against integer overflow.

Todo Items:
A input pad module for temporarily input Chinese characters by strokes lookup.
(Maybe this can be done in Hacker Week.)

Wednesday, May 14, 2008

novel-pinyin 0.2.x wishlist

As the first version of novel-pinyin has been released, some feedback has been received.
The next version of novel-pinyin will try to finish the following todo tasks:
1. Model Modification. Modify the P(P|W) from k/n to C(P,W)/C(W).
(C(P,W) stands for counter of pinyin and word combination,
C(W) stands for word counter.)

2. Dynamic adjust phrase positions according to bi-gram possibilities.
As in HMM model training process, the frequency adjusted is very small(1 or 6).
To magnify the position changes, replace unigram with bi-gram when possible.

3. Versioned Data File Format.
As data file format will be changed in next release. So I will add a version file in
~/.scim/novel-pinyin, to indicate file format version.
When different version has been detected, the files of old version will be flushed.

Optional:
skim integration.

Tuesday, February 19, 2008

novel-pinyin 0.1.0 internal test

You can get newest novel-pinyin 0.1.0 from the following url:
http://download.opensuse.org/repositories/home:/wupeng/

The source code in sourceforge.net misses the data file, so it will not run.
Please use the rpm on the above url.

Thursday, February 14, 2008

2008 New Year!

我自己的输入法Novel Pinyin终于跑起来了,还有一些bug,不过影响不大。现在我就在用我自己写的输入法,写自己的博客。
下周开始在同事中测试新的输入法。
首先,在这个周末,要把rpm在openSUSE Build Service上做出来。