Skip to content
This repository has been archived by the owner on Aug 18, 2018. It is now read-only.

Commit

Permalink
Initial import from Google Code. This is Penelope v2.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Alberto Pettarin committed Jun 30, 2014
1 parent 182e1d9 commit da36f47
Show file tree
Hide file tree
Showing 27 changed files with 5,907 additions and 5 deletions.
5 changes: 3 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright (c) 2014 Alberto Pettarin
Copyright (c) 2012-2014 Alberto Pettarin ([email protected])

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand All @@ -18,4 +18,5 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.

102 changes: 99 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,100 @@
penelope
========
# Penelope

**Penelope** is a multi-tool for creating, editing and converting dictionaries, especially for eReader devices.

* Version: 2.0.0
* Date: 2014-06-30
* Developer: [Alberto Pettarin](http://www.albertopettarin.it/) ([contact](http://www.albertopettarin.it/contact.html))

With the current version you can:

* convert a dictionary FROM/TO the following formats:
* Bookeen Cybook Odyssey (R/W)
* Kobo (R index only, W unencrypted/unobfuscated only)
* StarDict (R/W)
* XML (R/W)
* CSV (R/W)
* merge more dictionaries (of the same type) into a single dictionary
* define your own parser for each word/definition
* define your own collation function when outputting to Bookeen Cybook Odyssey format
* generate an EPUB file containing the index of a given dictionary (e.g., to cope with the lack of a search function on your eReader)

Please note that Penelope needs substantial code refactoring.
Unfortunately, I no longer have time to do that.
Please fork and improve.

Many people have asked for PRC/MOBI support.
Again, I no longer have time to do that.


### IMPORTANT UPDATE (2013-04-27)

Kobo issued a new firmware 2.5.1 (thanks!), which allows you to use unencrypted/unobfuscated dictionaries again, including those produced by Penelope. Some minor bugs in the UI/UX are still present, but at least the custom dictionaries are back!


### UPDATE (2013-04-23)

It seems that Kobo, with firmware 2.5.0, requires the dictionaries to be encrypted/obfuscated. Hence, the dictionaries output by Penelope do not longer work on Kobo devices. I contacted Kobo staff via Twitter, and they forwarded the notice to their development team. I hope they will fix the issue with a new firmware release soon. Meanwhile, if you need your custom-made dictionaries, you must stay with or revert to firmware 2.4.0.


## Usage

```
$ python penelope.py -h
$ python penelope.py -p foo -f en -t en
$ python penelope.py -p bar -f en -t it
$ python penelope.py -p "bar,foo,zam" -f en -t it
$ python penelope.py --xml -p foo -f en -t en
$ python penelope.py --xml -p foo -f en -t en --output-sd
$ python penelope.py -p bar -f en -t it --output-kobo
$ python penelope.py -p bar -f en -t it --output-xml -i
$ python penelope.py --kobo -p bar -f it -t it --output-epub
$ python penelope.py --odyssey -p bar -f en -t en --output-epub
$ python penelope.py -p bar -f en -t it --title "My EN->IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0"
$ python penelope.py -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary"
$ python penelope.py -p foo -f en -t en --collation custom_collation.py
$ python penelope.py --xml -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n"
```

Please have a look at this web page for details:
http://www.albertopettarin.it/penelope.html

## License

**Penelope** is released under the MIT License since version 2.0.0 (2014-06-30).

Previous versions, hosted in a [Google Code repo](http://code.google.com/p/penelope-dictionary-converter/),
were released under the GNU GPL 3 License.


## Technical Notes

The current version runs both under Python 2 or Python 3,
and it has been tested under Linux (Debian, Fedora) and Windows (XP, 7).
Unfortunately, since I do not have any financial support for the project,
I cannot offer support for all the possibile
values of the tuple (OS, Python version, console encoding).
Therefore, only problems running Penelope in a Linux environment
will receive full priority.


## Acknowledgments

Many thanks to:

* _uwelovesdonna_ for contributing ideas for improving the code and for setting up many pages of the project wiki;
* _Jens Sadowski_ for pointing out a bug with Unicode file names and for suggesting using multiset `dict()` instead of set `dict()`;
* _oldnat_ for pointing out a bug under Windows and Python 3;
* _Wolfgang Miller-Reichling_ for providing the code for reading CSV dictionaries;
* _branok_ for providing the idea and initial code for German collation function;
* _pal_ for suggesting passing `-l` switch to `MARISA_BUILD`;
* _Lukas Brückner_ for suggesting escaping `& < >` when outputting in XML format;
* _Stephan Lichtenhagen_ for suggesting forcing UTF-8 encoding on Python 3.


## Limitations and Missing Features

* No support for PRC/MOBI dictionaries
* Input files are assumed to be Unicode UTF-8 encoded
* CWDIR dependent

Penelope is a multi-tool for creating, editing and converting dictionaries, especially for eReader devices
Binary file added dictionary_index_epub/Chambers1908.epub
Binary file not shown.
Binary file added dictionary_index_epub/Websters1913.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-de.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-en.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-es.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-fr.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-it.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-ja.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-nl.epub
Binary file not shown.
Binary file added dictionary_index_epub/dicthtml-pt.epub
Binary file not shown.
49 changes: 49 additions & 0 deletions src/collation_de.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

__license__ = 'MIT'
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)'
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)'
__version__ = 'v2.0.0'
__date__ = '2014-06-30'
__description__ = 'Default collation function for penelope.py'

### BEGIN collate_function ###
# collate_function(string1, string2)
# compare string1 to string2
# return 0 if string1 == string2
# -1 if string1 < string2
# 1 if string1 > string2
def collate_function(string1, string2):
# conversion to unicode and lower case (only for Python 2)
#Python2#
b1 = string1.decode('utf-8')
#Python3# b1 = string1
#Python2#
b2 = string2.decode('utf-8')
#Python3# b2 = string2
b1 = b1.lower()
b2 = b2.lower()
# store strings with original accents for 2nd level collation
c1 = b1
c2 = b2

# replace german accent characters by base characters for 1st level collation
#Python2#
for f in [ [u'ä', u'a'], [u'ö', u'o'], [u'ü', u'u'], [u'ß', u'ss'] ]:
#Python3# for f in [ ['ä', 'a'], ['ö', 'o'], ['ü', 'u'], ['ß', 'ss'] ]:
b1 = b1.replace(f[0], f[1])
b2 = b2.replace(f[0], f[1])

# 1st level collation
if b1.encode('utf-16') == b2.encode('utf-16'):
# 2nd level collation
if c1.encode('utf-16') == c2.encode('utf-16'):
return 0
else:
return -1 if c1.encode('utf-16') < c2.encode('utf-16') else 1
# 1st level collation
else:
return -1 if b1.encode('utf-16') < b2.encode('utf-16') else 1
### END collate_function ###

50 changes: 50 additions & 0 deletions src/collation_de3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

__license__ = 'MIT'
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)'
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)'
__version__ = 'v2.0.0'
__date__ = '2014-06-30'
__description__ = 'Default collation function for penelope.py'

### BEGIN collate_function ###
# collate_function(string1, string2)
# compare string1 to string2
# return 0 if string1 == string2
# -1 if string1 < string2
# 1 if string1 > string2
def collate_function(string1, string2):
# conversion to unicode and lower case (only for Python 2)
#Python2# b1 = string1.decode('utf-8')
#Python3#
b1 = string1
#Python2# b2 = string2.decode('utf-8')
#Python3#
b2 = string2
b1 = b1.lower()
b2 = b2.lower()
# store strings with original accents for 2nd level collation
c1 = b1
c2 = b2

# replace german accent characters by base characters for 1st level collation
#Python2# for f in [ [u'ä', u'a'], [u'ö', u'o'], [u'ü', u'u'], [u'ß', u'ss'] ]:
#Python3#
for f in [ ['ä', 'a'], ['ö', 'o'], ['ü', 'u'], ['ß', 'ss'] ]:
b1 = b1.replace(f[0], f[1])
b2 = b2.replace(f[0], f[1])

# 1st level collation
if b1.encode('utf-16') == b2.encode('utf-16'):
# 2nd level collation
if c1.encode('utf-16') == c2.encode('utf-16'):
return 0
else:
return -1 if c1.encode('utf-16') < c2.encode('utf-16') else 1
# 1st level collation
else:
return -1 if b1.encode('utf-16') < b2.encode('utf-16') else 1
### END collate_function ###


25 changes: 25 additions & 0 deletions src/default_collation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

__license__ = 'MIT'
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)'
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)'
__version__ = 'v2.0.0'
__date__ = '2014-06-30'
__description__ = 'Default collation function for penelope.py'

### BEGIN collate_function ###
# collate_function(string1, string2)
# compare string1 to string2
# return 0 if string1 == string2
# -1 if string1 < string2
# 1 if string1 > string2
def collate_function(string1, string2):
b1 = bytearray(string1, 'utf-8').lower()
b2 = bytearray(string2, 'utf-8').lower()
if (b1 == b2):
return 0
else:
return -1 if (b1 < b2) else 1
### END collate_function ###

33 changes: 33 additions & 0 deletions src/default_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

__license__ = 'MIT'
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)'
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)'
__version__ = 'v2.0.0'
__date__ = '2014-06-30'
__description__ = 'Parse the given definition list for penelope.py'

### BEGIN parse ###
# parse(data, type_sequence, ignore_case)
# parse the given list of pairs
# data = [ [word, definition] ]
# with type_sequence and ignore_case options,
# and outputs the following list:
# parsed = [ word, include, synonyms, substitutions, definition ]
#
# where:
# word is the sorting key
# include is a boolean saying whether the word should be included
# synonyms is a list of alternative strings for word
# substitutions is a list of pairs [ word_to_replace, replacement ]
# definition is the definition of word

# default implementation, just copy the content of the stardict dictionary
def parse(data, type_sequence, ignore_case):
parsed_data = []
for d in data:
parsed_data += [ [ d[0], True, [], [], d[1] ] ]
return parsed_data
### END parse ###

Loading

0 comments on commit da36f47

Please sign in to comment.