Skip to content

Java program to extract reading articles and listening transcripts from the TPO app database.

License

Notifications You must be signed in to change notification settings

scottpedia/TpoExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TPO EXTRACTOR

This project is dedicated to the dumping of the internal database of the TPO application developed by zhan.com, and release all included TOEFL reading passages and listening transcripts in the format of Microsoft Word document.

GitHub release (latest by date and asset) GitHub release (latest by date and asset)

This project is currently inactive, and is not expecting any contributions. However, if you are interested in the source code, you are welcome to clone this repository and use it as you like, subject to certain terms and conditions.

The TPO application is a test-simulation program made by a Chinese company specializing in providing English tranining services to domestic students preparing to take the TOEFL, IELTS and other kinds of exams. It contains a wide variety of learning materials that are helpful to the preparing students. However, the proprietary application forbids any copying or exporting of such contents, in order to gain market competitiveness for the parent company. This project aims to extract all these materials from the application package programmatically, and export them into accessible formats.

The database as I mentioned contains almost all the data, or the links to the data, that are used by the TPO application. Those include listening materials in the form of audio recording, listening transcripts, translations of listening transcripts, etc. It's technically possible to dump all of them in the form of readable formats(e.g. html, MS word document). This project has only extracted the reading passages and listening transcripts from the database.

The software company that made the TPO application may in the future decide to make changes to the application package such that this kind of content extraction is no longer possible. In that case, I have archived the latest working version of the application with wayback machine. To download it, click on this link.(you need a VPN to access wayback machine's website)

The majority of the Chinese people in mainland China have poor technical skills and the flow of information there is highly restrictive. Therefore they are being exploited, and controlled by such for-profit companies. The education in China is very disfigured these days partly because of their existence.

For how to re-generate the word documents with the source code. Please see the latest release note. If you want to know more details about how this program works, you can get in touch with me by email.

Document Preview

For a preview of these documents, go to this page.

Author and License

Copyright (C) 2018-2022 Scott X. Liang <[email protected]>
GPL3
Except where otherwise noted, the program in this repository is licensed under GNU General Public License Version 3.
CC-BY-NC-SA
And the released documents in this project are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.