July 19, 2009

Open Source Personal Chinese-English Dictionary

Filed under: Uncategorized — ktetaichinh @ 3:35 pm

I have created a dictionary software with the Chinese-to-English dictionary data from CC-CEDICT. The dictionary software is called, Open Source Personal Chinese-English Dictionary (OSPCED). It’s not the most elegant acronym, but, I do not think that name is taken. The major features of this software are enumerated below.

  1. Ability to view PinYin words in a tree structure.
  2. Ability to view all characters of a syllable (also organized in a tree structure).
  3. Ability to convert tone numbers to tone marks.
  4. Ability to search the dictionary data in English or PinYin.

OSPCED is open-source and licensed under the Apache 2.0 license. A binary distribution should be available by clicking here. The source code should be available by clicking here.

The binary distribution of OSPCED has installation instructions. However, I will quickly go over how to setup, install and run OSPCED. First, you need at least the Java Runtime Environment (JRE) version 1.4 to proceed. For more information on obtaining the JRE, click here. After you download the zip file of the binary distribution, you should see the following files.

  • init.bat
  • run.bat
  • setup.bat
  • README.txt
  • ospced-0.1.jar
  • lucene-core-2.4.1.jar
  • derby.jar
  • cedict_ts.u8

The main files of concern are setup.bat and run.bat. You first have to execute setup.bat to create the database (Derby) and index (Lucene). Go have coffee because this step could take a while. Then, to run OSPCED, execute run.bat.

When OSPCED starts up, there is a tabbed pane with four tabs (each corresponding to the major features outlined above). The following screenshot shows browsing through the entries of the dictionary by using a tree. When an entry is clicked, its corresponding PinYin with tone marks, traditional character, simplified character, and English equivalents are displayed to the right.

Browsing the dictionary entries.Browsing the dictionary entries.

The next screenshot shows browsing the dictionary by syllables by using a tree. When a syllable is clicked, all the characters that the syllable may represent are shown to the right in traditional and simplified form.

Browsing through the syllables.Browsing through the syllables.

The next screenshot shows the tone number to tone mark conversion feature of OSPCED. By no means is this dictionary a substitute for a legitimate input method editor (IME). The top text area is used by a user to type in PinYin with tone numbers. When the user hits the convert button at the bottom, the PinYin with tone numbers are parsed and converted to PinYin with tone marks and displayed in the bottom text area.

Tone number to tone mark conversionTone number to tone mark conversion

The last two screenshots show the search feature. Users can search by entering English words or Chinese words (using PinYin with tones).

Users can search the dictionary using English keywords.Users can search the dictionary using English keywords.
Users can search the dictionary by entering Chinese words (PinYin with tone numbers).Users can search the dictionary by entering Chinese words (PinYin with tone numbers).

Some improvements to OSPCED are as follows.

  • Use Java’s IME framework to make tone number to tone mark conversion like other IME products.
  • Make the search results more aesthetically pleasing.
  • Allow search using traditional and simplified characters.
  • Allow search using PinYin with tone marks.

There are a lot of Chinese-English dictionaries online. This Chinese-English dictionary software is unique in that it is open source and desktop based. I think users may find it handy to use, and I think programmers will find the API very friendly.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

%d bloggers like this: