Legco Hansard PDF Extractor converts legco hansard PDF into JSON.
- Download the latest jar from releases.
- Run the following command
java -jar hansard-parser.jar https://www.legco.gov.hk/yr17-18/chinese/counmtg/floor/cm20171025-confirm-ec.pdf
{
"membersPresent": [
"主席曾鈺成議員, G.B.S., J.P. THE PRESIDENT THE HONOURABLE JASPER TSANG YOK-SING, G.B.S., J.P."
],
"membersAbsent": [
"涂謹申議員 THE HONOURABLE JAMES TO KUN-SUN"
],
"publicOfficersAttending": [
"民政事務局局長曾德成先生, G.B.S., J.P. THE HONOURABLE TSANG TAK-SING, G.B.S., J.P. SECRETARY FOR HOME AFFAIRS"
],
"clerksInAttendance": ["助理秘書長梁慶儀女士 MISS ODELIA LEUNG HING-YEE, ASSISTANT SECRETARY GENERAL"],
"speeches": [{
"title": "全委會主席",
"content": "早晨,全體委員會繼續審議《2015年撥款條例草案》的附表,現在繼續進行第6項辯論。 \n 陳家洛議員,請發言。",
"sequence": 1,
"bookmark": "SP_LC_CM_00008"
}],
"date": "2015-5-15",
"url": "https://www.legco.gov.hk/yr14-15/chinese/counmtg/floor/cm20150515-confirm-ec.pdf"
}
- Checkout the latest source code
git clone [email protected]:g0vhk-io/legco-hansard-pdf-extractor.git
- Build with gradle
./gradlew build
- Debug / Running
./gradlew run -Dexec.args=https://www.legco.gov.hk/yr12-13/chinese/counmtg/floor/cm1212-confirm-ec.pdf
- Build one big jar
./gradlew shadowJar
Please free feel to open an issue or PR.
- Ho Wa Wong ([email protected])