Dict - Development Settings Edit

About
Dictionary download
Development
Other stuff & contact

This page in another languages

Česky

Powered by
Wiky on a stick


Jarome`s projects
Sorry, English version is not ready yet, it is coming soon...:)

This page is for anyone who has a basic knowledge of computer programming. You will not find a answers like "what is it compilation, what is it resource etc. here. It is for everyone who wants continue in this project in any way or to create a another language variants of mobile dictionary MDict.

Requirements:

  • Java virtual machine(JVM) compatible with Sun Java 1.4 a higher (5.0 recommended)
  • Sun wireless toolkit or NetBeans IDE with Mobility Pack
Note: You can use Eclipse IDE with appropriate plug-in. Personally, I think the development with eclipse is much quicker, but configuration is not so straight forward and I have had some troubles to run output MIDLet, so I used NetBeans in last stage.

Download:


Description of MDict archive:

  • src - dir contain a source code of MIDlet) MDict
  • src-index - dir source code of "MDictIndex" tool. It is designed for creating resource index data for targeted midlet
  • src-encz - dir source code of "EnCzData". Tool converts "RAW" English Czech data to format which is readable by MDictIndex program.
  • res - dir all resource localization, icons, help files - archive is not contain a any dictionary data, you have to use program MDictIndex to create them
  • doc - basic documentation - this document :)

Compiling MDict application:

Import a directories res and src to your selected environment(NetBeans or Eclipse). Configuration of midlet can be CLDC-1.1 and profile MIDP-1.0. Prepare dictionary data with MDictIndex program, copy them into res dir. Change some resource (locales for example) in res dir an compile a program. Enjoy...:)

Content of res directory:

  • language description of MIDLet user interface(locales) - files "language_codelocale" ("enlocale", "czlocale") standard text file in "UTF-8" code page (description is not in scope of this document)
  • language version of MIDLet help - files "language_codehelp.html" ("enhelp.html", "czhelp.html") standard text file in "UTF-8" code page. File can contain basic html tags. (description is not in scope of this document)
  • picture which describe translated-to language - standard PNG files "language_code.png". Dictionary always has to contain two pictures of targeted languages(Example: for English-Czech dictionary will have en.png and cz.png). Picture language code will describe below in creating dictionary data section. Recommended resolution of pictures is 20x20 pixels about.
  • dictionary data files - files i0 to iXXX. Count of files depends on dictionary size. Use program MDictIndex to create them.(Data files are not included)
Note: the files "locale" and "help" has nothing to do with dictionary languages. Language code is matching to codes which were returned by function "locale = System.getProperty("microedition.locale")". English version of files help and locale has to be always included.

creating dictionary resource data

program MDictIndex

Program is designed to create index files for MDict application. Source code is in src-index directory. To compile a program write those commands:
javac MDictIndex.java

you can run program with this parameters:
java MDictIndex [-fs -fe -fm -ia -cs] -i input_file -o target_directory


Parameters:
  • -fs filter a shorts (example: AA - American Association)
  • -fa force using of empty translations (those which are missing second side of translation) It is not tested well I think it is quite useless
  • -fm (count) filter those translations which contain more word on each side then "count switch"
  • -ia index only by alpha characters exclude a numbers (not implemented yet)
  • -cs (size) size of index pages in MIDLet. Standard size is 6000 bytes.
  • -i input_file file format is described below
  • -o target_directory directory where to store output index files.

Format of input file for MDictIndex

datafile id standard text file in "UTF-8" code page. Each line are terminated by return character. We will use <\n> from now on.

File structure:

Header - information about key mapping, code page etc. It is one or more lines of format:
parameter_name:value<\n>

Parameters:
  • ExcludeChars: - all chars which should be excluded from result dictionary, parameter is not required
  • ExcludeCharsTo: - above listed characters will be translated to these ones, parameter is not required
  • Languages: - language codes for dictionary data. It is same code used to describe pictures in resource directory. Make sure a picture are created before a compilation. Format of parameter look like ZZ,ZZ where "Z" is any alphanumeric character.
  • CharsetLowerCase: - all lower case alpha characters in sort order
  • CharsetUpperCase:- all upper case alpha characters in sort order. Characters correspond to line above.
  • Numbers: - all characters which define numeric characters. Usually "0-9"
  • OtherChars: - all other which are interpreted as spaces between a words. Those characters are not indexed. Note: character "space" has to be added to this set, but character "tab" must not. It ill be add automatically.
  • OtherCharsNoWhiteSpace: - All chars which are not alpha but can be indexed by them and they are not interpreted as white spaces.
  • KeyMap1: - characters assigned to keys on mobile device in order 0-9. Groups of characters are delimited by "tab" <\t>. Characters are not corresponding to definition of charset.
  • KeyMap2: - same as parameter above, not implemented yet
Note: params Numbers, OtherCharsNoWhiteSpace, CharsetUpperCase, CharsetLowerCase, OtherChars in this order specify "CHARSET". Version 1.00 of MDict is not support more then 128 characters include "tab" and "enter".These characters will be added automatically.

End of header:
DATA:<\n>

Dictionary data - one or more lines of format. <\t> means "tab" character.:
word or words in first language<\t>word or words in second language<\n>
Note: Last line is terminated by <\n> character also.

Example file:
ExcludeChars:öâôü<\n>
ExcludeCharsTo:oaou<\n>
Languages:en,cz<\n>
CharsetLowerCase:aábc deéfghiíjkl>mnHoópqrYsateuúovwxyýz~<\n>
CharsetUpperCase:AÁBC DEÉFGHIÍJKL=MNGOÓPQRXS`TdUÚnVWXYÝZ}<\n>
Numbers:0123456789<\n>
OtherChars:- =/,()\."?<;!&:+>[]$^*%#_~|<\n>
OtherCharsNoWhiteSpace:`'Ž<\n>
KeyMap1: 0<\t>.1<\t>abcá 2<\t>defé3<\t>ghií4<\t>jkl5<\t>mnoHó6<\t>pqrsYa7<\t>tuveúo8<\t>wxyzý~9<\n>
KeyMap2: 0<\t>.1<\t>abcá 2<\t>defé3<\t>ghií4<\t>jkl5<\t>mnoHó6<\t>pqrsYa7<\t>tuveúo8<\t>wxyzý~9<\n>
DATA:<\n>
ahoj<\t>hello<\n>
potatoe<\t>brambora<\n>

Using of EnCzConvert program

you can convert "RAW" dictionary data from "GNU-FDL Anglicko eský dictionary" project to format readable bu MDictIndex by this program.
To do this write a command:
java EnCzConvert GNU_FDL_file.txt your_working_file
Note: you have to download data in "UTF-8" code page.