Skip to content

mzgubic/autothesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autothesis

Sit back, relax, and let machines write your thesis. Binder

Set up the code and the environment

git clone [email protected]:mzgubic/autothesis.git
python3 -m venv autothesis
cd autothesis
pip install -r requirements.txt
source setup.sh

Download training data

Download raw html of webpages with links to the pdfs from:
https://cds.cern.ch/collection/ATLAS%20Theses?ln=en
and put them in the raw_html folder.

Skim the lines with links to the pdfs:

cd ${SRC}/data
. skim.sh

Clean up and extract the links

python extract_https.py

Download the pdf files

python download.py

Clean and tokenise

And then convert to text, and do some basic cleaning:

  • remove short lines (mostly text from figures)
  • remove table of content lines
  • remove non english documents
cd ${SRC}/scripts
python pdf2txt.py

Science

autothesis contributed to the following masterpiece in my thesis:

The Inner Detector in the hadronic electrode to the tight distribution of the lead to the converted in the tracks and the summary of the electrons are described for the group the group can be simplified in the tracking to the total to the predictions in the tracks as a constants in the distributions are shown