Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multitask training #25

Merged
merged 33 commits into from
Apr 13, 2020

Conversation

ivyleavedtoadflax
Copy link
Contributor

@ivyleavedtoadflax ivyleavedtoadflax commented Mar 24, 2020

What this PR contains

  • Adds support for multitask training and prediction following Add support for multi-task models #13 with a new split_parse command that follows the same logic as the split and parse commands.
  • Makes load_tsv more robust to quotes in input data.
  • Combines model artefacts into a single pickle called indices.pickle. Note that this change may cause issues when attempting to load earlier (now outdated) model versions.
  • Adds handling for incorrect config path in split, parse and split_parse commands.

NOTE: This version includes changes to both the way that model artefacts are packaged and saved, and the way that data are laded and parsed from tsv files. This results in a significantly faster training time (c.14 hours -> c.0.5 hour), but older models will no longer be compatible. For compatibility you must use multitask models > 2020.3.19, splitting models > 2020.3.6, and parisng models > 2020.3.8. These models currently perform less well than previous versions (#27), but performance is expected to improve with experimentation predominantly around sequence length, and more annotated data.

How you can test it

# Create a test file:
cat > references.txt <<EOF
1 Sibbald, A, Eason, W, McAdam, J, and Hislop, A (2001). The establishment phase of a silvopastoral national network experiment in the UK. Agroforestry systems, 39, 39–53. 
2 Silva, J and Rego, F (2003). Root distribution of a Mediterranean shrubland in Portugal. Plant and Soil, 255 (2), 529–540. 
3 Sims, R, Schock, R, Adegbululgbe, A, Fenhann, J, Konstantinaviciute, I, Moomaw, W, Nimir, H, Schlamadinger, B, Torres-Martínez, J, Turner, C, Uchiyama, Y, Vuori, S, Wamukonya, N, and X. Zhang (2007). Energy Supply. In Metz, B, Davidson, O, Bosch, P, Dave, R, and Meyer, L (eds.), Climate Change 2007: Mitigation. Contribution of Working Group III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.
EOF

# Test the default model that ships with the package in a new venv

virtualenv temp && source temp/bin/activate
pip install git+git://github.com/wellcometrust/deep_reference_parser.git@feature/ivyleavedtoadflax/multitask_2

# Run the splitter

python -m deep_reference_parser split  "$(cat references.txt)"

# Run the parser

python -m deep_reference_parser parse "$(cat references.txt)"

# Run the parser

python -m deep_reference_parser split_parse "$(cat references.txt)"

@ivyleavedtoadflax ivyleavedtoadflax self-assigned this Mar 24, 2020
@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/multitask_2 branch from aa7178d to a3323e5 Compare March 24, 2020 20:26
@codecov-io
Copy link

codecov-io commented Mar 25, 2020

Codecov Report

Merging #25 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #25   +/-   ##
=======================================
  Coverage   80.94%   80.94%           
=======================================
  Files          17       17           
  Lines        1312     1312           
=======================================
  Hits         1062     1062           
  Misses        250      250           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e6658c...0e6658c. Read the comment docs.

* Don't require an argument, just output the predictions on the validation set by default
@ivyleavedtoadflax ivyleavedtoadflax changed the title Use lists for y case to allow multiple labels Implement multitask training Mar 25, 2020
@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/multitask_2 branch from e325e55 to 86817b2 Compare March 25, 2020 19:04
* Remove confusing max_words parameter
* Set max_len in init with default=250
@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/multitask_2 branch from 86817b2 to 57e7588 Compare March 25, 2020 19:05
Caused when laoding Rodrigues data with load_tsv
* Bumps parser model to 2020.3.8
* Bumps splitter model to 2020.3.6 - note that this model does not perform
    better than the model it replaces at present, but the previous model is
    not compatible with breaking API changes that have been implemented in the
    2020.3.2 of deep_reference_parser. Nonetheless it will be relatively easy
    to experiment with the splitter model to get a higher score, and in any case
    this individual splitter model is mostly superseded by the multitask model,
    and just provided her for comparison.
lizgzil and others added 3 commits April 2, 2020 12:00
keras_contrib.utils.load_save_utils is a soft wrapper around keras.saving
which is not required in this relatively simple case.
@ivyleavedtoadflax
Copy link
Contributor Author

ivyleavedtoadflax commented Apr 5, 2020

@lizgzil I've made some progress on this today. I'm not sure what caused the mismatch in dims, but it seems to be resolved if using a different config/model (2020.3.19 works ok). This may be because of changes in load_tsv made in this PR which are not reflected in 2020.3.18. In any case I think the way that the data were being handled in 2020.3.18 was not optimal - as you point out in #26 (comment).

I would be inclined to forget that model run, and to set the default to be 2020.3.19 for now. It doesn't perform as well as 2020.3.18, but I think with some experimentation you will be able to get the scores up in a future model run using this new logic, and for now it is important just to get it running.

Summary of what I have done today:

  • 3e1b20b : Whilst investigating the way that weights are loaded, I realised that the keras_contrib.utils.load_save_utils is just a soft wrapper around a keras function. We no longer need the wrapper, so I just use the keras function instead.

  • 20afa75 : Once I realised that the 2020.3.18 model was at fault, the next issue I ran into was that the DeepReferenceParser.predict() method cannot handle multiple outputs. This commit updates this method to handle a list of outputs, rather than a single output.

  • 77971e5 : This is WIP. This commit updates the split_parse to use the new outputs which are in a list, and to pretty print them. The ability to write to a file, and to extract a json as we discussed with author, title, year for each reference is still lacking. You can see the results of this commit with the 2020.3.19 config with:

    python -m deep_reference_parser split_parse -tc configs/multitask/2020.3.19_multitask.ini "1. Upson, M. A. (2019). Agroforetsry is the best. Journal of Agroforestry. Vol 1. Issue 23. p123-679. 2. Upson, M. A. (2018). Another paper on Agroforestry. Geoderma. 1(3)12-56"
    
  • The tests are now failing for the simple parse and split cases. This is probably due to changes in the predict method - they probably need a bit of tweaking to ensure they still work for the single task model. Still...some progress!

@lizgzil
Copy link
Contributor

lizgzil commented Apr 6, 2020

@ivyleavedtoadflax woop! great the mismatch issue isn't happening :)
Classic that you fix something and then that causes the older things to break eh!

So do you think once this PR is closed, then I should train a model using the config for 2020.3.18 (i.e. use adam)? But for now I will just use the 2020.3.19 model if I need to.

@ivyleavedtoadflax
Copy link
Contributor Author

ivyleavedtoadflax commented Apr 6, 2020

Yes, I would train a new model entirely. I've been doing some experimenting, and what is having more of an effect than anything is the sequence length. In this PR I added some logic to give us more fine grained control of sequence length both in data generation and in the model itself, and it had affected performance, so I think we might need to do a bit of experimentation to work out the best settings.

But for now, yes I would just use a model that works, to get the logic working.

ivyleavedtoadflax and others added 6 commits April 11, 2020 20:48
The `2020.3.19_multitask` model will replace `2020.3.18` as the default model.
Add multitask 3.18 tsvs to datasets in Makefile
…llcometrust/deep_reference_parser into feature/ivyleavedtoadflax/multitask_2
@ivyleavedtoadflax ivyleavedtoadflax linked an issue Apr 12, 2020 that may be closed by this pull request
This is a more reliable indicator that prediction size
NOTE: SplitParser functionality is not completely implemented, so these
tests will need to be edited once it is.
@ivyleavedtoadflax
Copy link
Contributor Author

ivyleavedtoadflax commented Apr 12, 2020

Hey @lizgzil spent a little more time on this today:

  • b33c2b2 : Added some logic to give more helpful error handling when you pass the model an incorrect config path.
  • 5b17587 and 15306b4 : Add logic to fix failing tests for the split and parse commands. Essentially we just needed to subset the output from deep_reference_parser.predict which was now a list of multiple outputs.
  • fdfe5d3 and 1be3864 : Switch to the 2020.3.19_multitask model as the default model.
  • 88f1a24 : Merged in your PR adjusting the Makefile reciples to download the 2020.3.19 data.
  • 8e3a155 and ad6d6bb : Adds tests for split_parse endpoint. Note that both the endpoint and the tests are currently not complete, functionality to return the parsed reference texts and attributes (authors, years, titles) is not yet implemented.

I think now that tests are passing, I'd be inclined to merge this in even though the split_parse command is not 100% complete, and then add the final functionality in a new smaller PR. There's a fair bit of refactoring that could be done too, and improving test coverage, which I think are best done in a future PR.

@ivyleavedtoadflax ivyleavedtoadflax marked this pull request as ready for review April 12, 2020 19:55
Copy link
Contributor

@lizgzil lizgzil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from that data/multitask comment. I'm happy for you to merge this and then open the smaller PR

@ivyleavedtoadflax
Copy link
Contributor Author

should be data/multitask I assume?

🤦 yep - will fix

@ivyleavedtoadflax ivyleavedtoadflax merged commit bbca141 into master Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mismatching dimensions when predicting with multitask models Add support for multi-task models
3 participants