This repository has been archived by the owner on Aug 18, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial import from Google Code. This is Penelope v2.0.0
- Loading branch information
Alberto Pettarin
committed
Jun 30, 2014
1 parent
182e1d9
commit da36f47
Showing
27 changed files
with
5,907 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
The MIT License (MIT) | ||
|
||
Copyright (c) 2014 Alberto Pettarin | ||
Copyright (c) 2012-2014 Alberto Pettarin ([email protected]) | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
|
@@ -18,4 +18,5 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
SOFTWARE. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,100 @@ | ||
penelope | ||
======== | ||
# Penelope | ||
|
||
**Penelope** is a multi-tool for creating, editing and converting dictionaries, especially for eReader devices. | ||
|
||
* Version: 2.0.0 | ||
* Date: 2014-06-30 | ||
* Developer: [Alberto Pettarin](http://www.albertopettarin.it/) ([contact](http://www.albertopettarin.it/contact.html)) | ||
|
||
With the current version you can: | ||
|
||
* convert a dictionary FROM/TO the following formats: | ||
* Bookeen Cybook Odyssey (R/W) | ||
* Kobo (R index only, W unencrypted/unobfuscated only) | ||
* StarDict (R/W) | ||
* XML (R/W) | ||
* CSV (R/W) | ||
* merge more dictionaries (of the same type) into a single dictionary | ||
* define your own parser for each word/definition | ||
* define your own collation function when outputting to Bookeen Cybook Odyssey format | ||
* generate an EPUB file containing the index of a given dictionary (e.g., to cope with the lack of a search function on your eReader) | ||
|
||
Please note that Penelope needs substantial code refactoring. | ||
Unfortunately, I no longer have time to do that. | ||
Please fork and improve. | ||
|
||
Many people have asked for PRC/MOBI support. | ||
Again, I no longer have time to do that. | ||
|
||
|
||
### IMPORTANT UPDATE (2013-04-27) | ||
|
||
Kobo issued a new firmware 2.5.1 (thanks!), which allows you to use unencrypted/unobfuscated dictionaries again, including those produced by Penelope. Some minor bugs in the UI/UX are still present, but at least the custom dictionaries are back! | ||
|
||
|
||
### UPDATE (2013-04-23) | ||
|
||
It seems that Kobo, with firmware 2.5.0, requires the dictionaries to be encrypted/obfuscated. Hence, the dictionaries output by Penelope do not longer work on Kobo devices. I contacted Kobo staff via Twitter, and they forwarded the notice to their development team. I hope they will fix the issue with a new firmware release soon. Meanwhile, if you need your custom-made dictionaries, you must stay with or revert to firmware 2.4.0. | ||
|
||
|
||
## Usage | ||
|
||
``` | ||
$ python penelope.py -h | ||
$ python penelope.py -p foo -f en -t en | ||
$ python penelope.py -p bar -f en -t it | ||
$ python penelope.py -p "bar,foo,zam" -f en -t it | ||
$ python penelope.py --xml -p foo -f en -t en | ||
$ python penelope.py --xml -p foo -f en -t en --output-sd | ||
$ python penelope.py -p bar -f en -t it --output-kobo | ||
$ python penelope.py -p bar -f en -t it --output-xml -i | ||
$ python penelope.py --kobo -p bar -f it -t it --output-epub | ||
$ python penelope.py --odyssey -p bar -f en -t en --output-epub | ||
$ python penelope.py -p bar -f en -t it --title "My EN->IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0" | ||
$ python penelope.py -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary" | ||
$ python penelope.py -p foo -f en -t en --collation custom_collation.py | ||
$ python penelope.py --xml -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n" | ||
``` | ||
|
||
Please have a look at this web page for details: | ||
http://www.albertopettarin.it/penelope.html | ||
|
||
## License | ||
|
||
**Penelope** is released under the MIT License since version 2.0.0 (2014-06-30). | ||
|
||
Previous versions, hosted in a [Google Code repo](http://code.google.com/p/penelope-dictionary-converter/), | ||
were released under the GNU GPL 3 License. | ||
|
||
|
||
## Technical Notes | ||
|
||
The current version runs both under Python 2 or Python 3, | ||
and it has been tested under Linux (Debian, Fedora) and Windows (XP, 7). | ||
Unfortunately, since I do not have any financial support for the project, | ||
I cannot offer support for all the possibile | ||
values of the tuple (OS, Python version, console encoding). | ||
Therefore, only problems running Penelope in a Linux environment | ||
will receive full priority. | ||
|
||
|
||
## Acknowledgments | ||
|
||
Many thanks to: | ||
|
||
* _uwelovesdonna_ for contributing ideas for improving the code and for setting up many pages of the project wiki; | ||
* _Jens Sadowski_ for pointing out a bug with Unicode file names and for suggesting using multiset `dict()` instead of set `dict()`; | ||
* _oldnat_ for pointing out a bug under Windows and Python 3; | ||
* _Wolfgang Miller-Reichling_ for providing the code for reading CSV dictionaries; | ||
* _branok_ for providing the idea and initial code for German collation function; | ||
* _pal_ for suggesting passing `-l` switch to `MARISA_BUILD`; | ||
* _Lukas Brückner_ for suggesting escaping `& < >` when outputting in XML format; | ||
* _Stephan Lichtenhagen_ for suggesting forcing UTF-8 encoding on Python 3. | ||
|
||
|
||
## Limitations and Missing Features | ||
|
||
* No support for PRC/MOBI dictionaries | ||
* Input files are assumed to be Unicode UTF-8 encoded | ||
* CWDIR dependent | ||
|
||
Penelope is a multi-tool for creating, editing and converting dictionaries, especially for eReader devices |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
__license__ = 'MIT' | ||
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)' | ||
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)' | ||
__version__ = 'v2.0.0' | ||
__date__ = '2014-06-30' | ||
__description__ = 'Default collation function for penelope.py' | ||
|
||
### BEGIN collate_function ### | ||
# collate_function(string1, string2) | ||
# compare string1 to string2 | ||
# return 0 if string1 == string2 | ||
# -1 if string1 < string2 | ||
# 1 if string1 > string2 | ||
def collate_function(string1, string2): | ||
# conversion to unicode and lower case (only for Python 2) | ||
#Python2# | ||
b1 = string1.decode('utf-8') | ||
#Python3# b1 = string1 | ||
#Python2# | ||
b2 = string2.decode('utf-8') | ||
#Python3# b2 = string2 | ||
b1 = b1.lower() | ||
b2 = b2.lower() | ||
# store strings with original accents for 2nd level collation | ||
c1 = b1 | ||
c2 = b2 | ||
|
||
# replace german accent characters by base characters for 1st level collation | ||
#Python2# | ||
for f in [ [u'ä', u'a'], [u'ö', u'o'], [u'ü', u'u'], [u'ß', u'ss'] ]: | ||
#Python3# for f in [ ['ä', 'a'], ['ö', 'o'], ['ü', 'u'], ['ß', 'ss'] ]: | ||
b1 = b1.replace(f[0], f[1]) | ||
b2 = b2.replace(f[0], f[1]) | ||
|
||
# 1st level collation | ||
if b1.encode('utf-16') == b2.encode('utf-16'): | ||
# 2nd level collation | ||
if c1.encode('utf-16') == c2.encode('utf-16'): | ||
return 0 | ||
else: | ||
return -1 if c1.encode('utf-16') < c2.encode('utf-16') else 1 | ||
# 1st level collation | ||
else: | ||
return -1 if b1.encode('utf-16') < b2.encode('utf-16') else 1 | ||
### END collate_function ### | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
__license__ = 'MIT' | ||
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)' | ||
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)' | ||
__version__ = 'v2.0.0' | ||
__date__ = '2014-06-30' | ||
__description__ = 'Default collation function for penelope.py' | ||
|
||
### BEGIN collate_function ### | ||
# collate_function(string1, string2) | ||
# compare string1 to string2 | ||
# return 0 if string1 == string2 | ||
# -1 if string1 < string2 | ||
# 1 if string1 > string2 | ||
def collate_function(string1, string2): | ||
# conversion to unicode and lower case (only for Python 2) | ||
#Python2# b1 = string1.decode('utf-8') | ||
#Python3# | ||
b1 = string1 | ||
#Python2# b2 = string2.decode('utf-8') | ||
#Python3# | ||
b2 = string2 | ||
b1 = b1.lower() | ||
b2 = b2.lower() | ||
# store strings with original accents for 2nd level collation | ||
c1 = b1 | ||
c2 = b2 | ||
|
||
# replace german accent characters by base characters for 1st level collation | ||
#Python2# for f in [ [u'ä', u'a'], [u'ö', u'o'], [u'ü', u'u'], [u'ß', u'ss'] ]: | ||
#Python3# | ||
for f in [ ['ä', 'a'], ['ö', 'o'], ['ü', 'u'], ['ß', 'ss'] ]: | ||
b1 = b1.replace(f[0], f[1]) | ||
b2 = b2.replace(f[0], f[1]) | ||
|
||
# 1st level collation | ||
if b1.encode('utf-16') == b2.encode('utf-16'): | ||
# 2nd level collation | ||
if c1.encode('utf-16') == c2.encode('utf-16'): | ||
return 0 | ||
else: | ||
return -1 if c1.encode('utf-16') < c2.encode('utf-16') else 1 | ||
# 1st level collation | ||
else: | ||
return -1 if b1.encode('utf-16') < b2.encode('utf-16') else 1 | ||
### END collate_function ### | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
__license__ = 'MIT' | ||
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)' | ||
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)' | ||
__version__ = 'v2.0.0' | ||
__date__ = '2014-06-30' | ||
__description__ = 'Default collation function for penelope.py' | ||
|
||
### BEGIN collate_function ### | ||
# collate_function(string1, string2) | ||
# compare string1 to string2 | ||
# return 0 if string1 == string2 | ||
# -1 if string1 < string2 | ||
# 1 if string1 > string2 | ||
def collate_function(string1, string2): | ||
b1 = bytearray(string1, 'utf-8').lower() | ||
b2 = bytearray(string2, 'utf-8').lower() | ||
if (b1 == b2): | ||
return 0 | ||
else: | ||
return -1 if (b1 < b2) else 1 | ||
### END collate_function ### | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
__license__ = 'MIT' | ||
__author__ = 'Alberto Pettarin (alberto albertopettarin.it)' | ||
__copyright__ = '2012-2014 Alberto Pettarin (alberto albertopettarin.it)' | ||
__version__ = 'v2.0.0' | ||
__date__ = '2014-06-30' | ||
__description__ = 'Parse the given definition list for penelope.py' | ||
|
||
### BEGIN parse ### | ||
# parse(data, type_sequence, ignore_case) | ||
# parse the given list of pairs | ||
# data = [ [word, definition] ] | ||
# with type_sequence and ignore_case options, | ||
# and outputs the following list: | ||
# parsed = [ word, include, synonyms, substitutions, definition ] | ||
# | ||
# where: | ||
# word is the sorting key | ||
# include is a boolean saying whether the word should be included | ||
# synonyms is a list of alternative strings for word | ||
# substitutions is a list of pairs [ word_to_replace, replacement ] | ||
# definition is the definition of word | ||
|
||
# default implementation, just copy the content of the stardict dictionary | ||
def parse(data, type_sequence, ignore_case): | ||
parsed_data = [] | ||
for d in data: | ||
parsed_data += [ [ d[0], True, [], [], d[1] ] ] | ||
return parsed_data | ||
### END parse ### | ||
|
Oops, something went wrong.