-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Convert the BMW encoding to JSON #16
base: main
Are you sure you want to change the base?
Conversation
docs/ConvertBMWToJSON.md
Outdated
that will serve as the foundation for implementing the BMW input method. | ||
|
||
BMW encoding documents are in PDF format. These PDFs are composed by digitalized images of orginal | ||
books. The coversio method is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo coversio -> conversion -
Also "digitized" is better rendering of "digitalized"
docs/ConvertBMWToJSON.md
Outdated
books. The coversio method is: | ||
|
||
1. Split every single page in a PDF into .jpg files | ||
2. Use OCR library to extract texts from .jpg files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a working OCR library for this? Could we provide more detailed instructions?
docs/ConvertBMWToJSON.md
Outdated
BMW encoding documents are in PDF format. These PDFs are composed by digitalized images of orginal | ||
books. The coversio method is: | ||
|
||
1. Split every single page in a PDF into .jpg files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split each page in the PDF into its own .jpg file
utils/README.md
Outdated
|
||
**File formats** | ||
|
||
1. The content of any .txt file in the `source_txt_path` directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sample content of a .txt file in the ...
Thanks for the review, @amb26. All addressed and ready for another round. |
fix: adding missing bci-av-ids in ../data/bmw.json
fix: making some fixes and adding missing bci-av-ids
Description
This pull request converts the BMW encoding to a JSON file to be used for the future development.
Steps to test
Refer to the document Convert BMW encoding to JSON about steps to convert.
Additional information
Due to the copyright concern, the original BMW encoding files are not included in this pull request.