Implement multitask training #25

ivyleavedtoadflax · 2020-03-24T20:26:00Z

What this PR contains

Adds support for multitask training and prediction following Add support for multi-task models #13 with a new split_parse command that follows the same logic as the split and parse commands.
Makes load_tsv more robust to quotes in input data.
Combines model artefacts into a single pickle called indices.pickle. Note that this change may cause issues when attempting to load earlier (now outdated) model versions.
Adds handling for incorrect config path in split, parse and split_parse commands.

NOTE: This version includes changes to both the way that model artefacts are packaged and saved, and the way that data are laded and parsed from tsv files. This results in a significantly faster training time (c.14 hours -> c.0.5 hour), but older models will no longer be compatible. For compatibility you must use multitask models > 2020.3.19, splitting models > 2020.3.6, and parisng models > 2020.3.8. These models currently perform less well than previous versions (#27), but performance is expected to improve with experimentation predominantly around sequence length, and more annotated data.

How you can test it

# Create a test file:
cat > references.txt <<EOF
1 Sibbald, A, Eason, W, McAdam, J, and Hislop, A (2001). The establishment phase of a silvopastoral national network experiment in the UK. Agroforestry systems, 39, 39–53. 
2 Silva, J and Rego, F (2003). Root distribution of a Mediterranean shrubland in Portugal. Plant and Soil, 255 (2), 529–540. 
3 Sims, R, Schock, R, Adegbululgbe, A, Fenhann, J, Konstantinaviciute, I, Moomaw, W, Nimir, H, Schlamadinger, B, Torres-Martínez, J, Turner, C, Uchiyama, Y, Vuori, S, Wamukonya, N, and X. Zhang (2007). Energy Supply. In Metz, B, Davidson, O, Bosch, P, Dave, R, and Meyer, L (eds.), Climate Change 2007: Mitigation. Contribution of Working Group III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.
EOF

# Test the default model that ships with the package in a new venv

virtualenv temp && source temp/bin/activate
pip install git+git://github.com/wellcometrust/deep_reference_parser.git@feature/ivyleavedtoadflax/multitask_2

# Run the splitter

python -m deep_reference_parser split  "$(cat references.txt)"

# Run the parser

python -m deep_reference_parser parse "$(cat references.txt)"

# Run the parser

python -m deep_reference_parser split_parse "$(cat references.txt)"

codecov-io · 2020-03-25T03:42:36Z

Codecov Report

Merging #25 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #25   +/-   ##
=======================================
  Coverage   80.94%   80.94%           
=======================================
  Files          17       17           
  Lines        1312     1312           
=======================================
  Hits         1062     1062           
  Misses        250      250

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e6658c...0e6658c. Read the comment docs.

* Don't require an argument, just output the predictions on the validation set by default

deep_reference_parser/deep_reference_parser.py

* Remove confusing max_words parameter * Set max_len in init with default=250

Caused when laoding Rodrigues data with load_tsv

* Bumps parser model to 2020.3.8 * Bumps splitter model to 2020.3.6 - note that this model does not perform better than the model it replaces at present, but the previous model is not compatible with breaking API changes that have been implemented in the 2020.3.2 of deep_reference_parser. Nonetheless it will be relatively easy to experiment with the splitter model to get a higher score, and in any case this individual splitter model is mostly superseded by the multitask model, and just provided her for comparison.

deep_reference_parser/io/io.py

keras_contrib.utils.load_save_utils is a soft wrapper around keras.saving which is not required in this relatively simple case.

ivyleavedtoadflax · 2020-04-05T21:37:38Z

@lizgzil I've made some progress on this today. I'm not sure what caused the mismatch in dims, but it seems to be resolved if using a different config/model (2020.3.19 works ok). This may be because of changes in load_tsv made in this PR which are not reflected in 2020.3.18. In any case I think the way that the data were being handled in 2020.3.18 was not optimal - as you point out in #26 (comment).

I would be inclined to forget that model run, and to set the default to be 2020.3.19 for now. It doesn't perform as well as 2020.3.18, but I think with some experimentation you will be able to get the scores up in a future model run using this new logic, and for now it is important just to get it running.

Summary of what I have done today:

3e1b20b : Whilst investigating the way that weights are loaded, I realised that the keras_contrib.utils.load_save_utils is just a soft wrapper around a keras function. We no longer need the wrapper, so I just use the keras function instead.
20afa75 : Once I realised that the 2020.3.18 model was at fault, the next issue I ran into was that the DeepReferenceParser.predict() method cannot handle multiple outputs. This commit updates this method to handle a list of outputs, rather than a single output.
77971e5 : This is WIP. This commit updates the split_parse to use the new outputs which are in a list, and to pretty print them. The ability to write to a file, and to extract a json as we discussed with author, title, year for each reference is still lacking. You can see the results of this commit with the 2020.3.19 config with:
```
python -m deep_reference_parser split_parse -tc configs/multitask/2020.3.19_multitask.ini "1. Upson, M. A. (2019). Agroforetsry is the best. Journal of Agroforestry. Vol 1. Issue 23. p123-679. 2. Upson, M. A. (2018). Another paper on Agroforestry. Geoderma. 1(3)12-56"
```
The tests are now failing for the simple parse and split cases. This is probably due to changes in the predict method - they probably need a bit of tweaking to ensure they still work for the single task model. Still...some progress!

lizgzil · 2020-04-06T11:17:19Z

@ivyleavedtoadflax woop! great the mismatch issue isn't happening :)
Classic that you fix something and then that causes the older things to break eh!

So do you think once this PR is closed, then I should train a model using the config for 2020.3.18 (i.e. use adam)? But for now I will just use the 2020.3.19 model if I need to.

ivyleavedtoadflax · 2020-04-06T12:05:47Z

Yes, I would train a new model entirely. I've been doing some experimenting, and what is having more of an effect than anything is the sequence length. In this PR I added some logic to give us more fine grained control of sequence length both in data generation and in the model itself, and it had affected performance, so I think we might need to do a bit of experimentation to work out the best settings.

But for now, yes I would just use a model that works, to get the logic working.

The `2020.3.19_multitask` model will replace `2020.3.18` as the default model.

Add multitask 3.18 tsvs to datasets in Makefile

…llcometrust/deep_reference_parser into feature/ivyleavedtoadflax/multitask_2

This is a more reliable indicator that prediction size

NOTE: SplitParser functionality is not completely implemented, so these tests will need to be edited once it is.

ivyleavedtoadflax · 2020-04-12T03:57:04Z

Hey @lizgzil spent a little more time on this today:

b33c2b2 : Added some logic to give more helpful error handling when you pass the model an incorrect config path.
5b17587 and 15306b4 : Add logic to fix failing tests for the split and parse commands. Essentially we just needed to subset the output from deep_reference_parser.predict which was now a list of multiple outputs.
fdfe5d3 and 1be3864 : Switch to the 2020.3.19_multitask model as the default model.
88f1a24 : Merged in your PR adjusting the Makefile reciples to download the 2020.3.19 data.
8e3a155 and ad6d6bb : Adds tests for split_parse endpoint. Note that both the endpoint and the tests are currently not complete, functionality to return the parsed reference texts and attributes (authors, years, titles) is not yet implemented.

I think now that tests are passing, I'd be inclined to merge this in even though the split_parse command is not 100% complete, and then add the final functionality in a new smaller PR. There's a fair bit of refactoring that could be done too, and improving test coverage, which I think are best done in a future PR.

lizgzil

Looks good apart from that data/multitask comment. I'm happy for you to merge this and then open the smaller PR

ivyleavedtoadflax · 2020-04-13T14:06:20Z

should be data/multitask I assume?

🤦 yep - will fix

ivyleavedtoadflax self-assigned this Mar 24, 2020

chg: Use lists for y case to allow multiple labels

a3323e5

ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/multitask_2 branch from aa7178d to a3323e5 Compare March 24, 2020 20:26

ivyleavedtoadflax added 3 commits March 24, 2020 18:50

chg: Fix tests and evaluate method

0de34d5

chg: revert OOV and NUM tokens

af73eb4

fix: output_layers count

a989b50

chg: Save predictions for multiple tasks

711354f

* Don't require an argument, just output the predictions on the validation set by default

ivyleavedtoadflax changed the title ~~Use lists for y case to allow multiple labels~~ Implement multitask training Mar 25, 2020

ivyleavedtoadflax commented Mar 25, 2020

View reviewed changes

deep_reference_parser/deep_reference_parser.py Outdated Show resolved Hide resolved

ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/multitask_2 branch from e325e55 to 86817b2 Compare March 25, 2020 19:04

chg: Use the max_len sent at init

57e7588

* Remove confusing max_words parameter * Set max_len in init with default=250

ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/multitask_2 branch from 86817b2 to 57e7588 Compare March 25, 2020 19:05

ivyleavedtoadflax added 10 commits March 25, 2020 19:11

chg: Combine artefacts into indices.pickle

322792e

fixup indices

1f8da40

chg: Update CHANGELOG

a65d260

new: Bump version to 2020.3.3

3d1a055

chg: fix: missing logging statements

966cba8

chg: Solve issue with quotes in tsv files

c33dd75

Caused when laoding Rodrigues data with load_tsv

chg: Fix logging messages in split and parse

b61de98

new: Add multitask config

182a9cb

new: Add multitask split_parse command

f392f9f

This was referenced Mar 31, 2020

Mismatching dimensions when predicting with multitask models #26

Closed

Poor performance when running splitter model #27

Open

ivyleavedtoadflax linked an issue Mar 31, 2020 that may be closed by this pull request

Add support for multi-task models #13

Closed

lizgzil reviewed Mar 31, 2020

View reviewed changes

deep_reference_parser/io/io.py Show resolved Hide resolved

lizgzil and others added 3 commits April 2, 2020 12:00

Add multitask 3.18 tsvs to datasets in Makefile

795679d

chg: Use lower level weight loading

3e1b20b

keras_contrib.utils.load_save_utils is a soft wrapper around keras.saving which is not required in this relatively simple case.

chg: Update predict function for multitask scenario

20afa75

chg: Update split_parse to deal with multiple predictions

77971e5

ivyleavedtoadflax and others added 6 commits April 11, 2020 20:48

new: Handle no config error

b33c2b2

new: Add logic to handle single task case

5b17587

chg: Update to 2020.3.19 multitask model

fdfe5d3

chg: Update datasets recipe

1be3864

The `2020.3.19_multitask` model will replace `2020.3.18` as the default model.

Merge pull request #30 from wellcometrust/add-multitask-makefile

88f1a24

Add multitask 3.18 tsvs to datasets in Makefile

Merge branch 'feature/ivyleavedtoadflax/multitask_2' of github.com:we…

0edf65f

…llcometrust/deep_reference_parser into feature/ivyleavedtoadflax/multitask_2

ivyleavedtoadflax linked an issue Apr 12, 2020 that may be closed by this pull request

Mismatching dimensions when predicting with multitask models #26

Closed

ivyleavedtoadflax added 3 commits April 11, 2020 21:35

chg: Use output labels to detect output size

15306b4

This is a more reliable indicator that prediction size

fix: failing test

8e3a155

new: Add tests for SplitParser

ad6d6bb

NOTE: SplitParser functionality is not completely implemented, so these tests will need to be edited once it is.

ivyleavedtoadflax added 2 commits April 12, 2020 15:35

chg: Update README.md

fad8d58

new: Update CHANGELOG

6db7d8e

ivyleavedtoadflax marked this pull request as ready for review April 12, 2020 19:55

new: Add split_parse model config to setup.py

0e6658c

lizgzil approved these changes Apr 13, 2020

View reviewed changes

fix: typo

fceed1b

ivyleavedtoadflax merged commit bbca141 into master Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement multitask training #25

Implement multitask training #25

Uh oh!

ivyleavedtoadflax commented Mar 24, 2020 •

edited

Loading

Uh oh!

codecov-io commented Mar 25, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ivyleavedtoadflax commented Apr 5, 2020 •

edited

Loading

Uh oh!

lizgzil commented Apr 6, 2020

Uh oh!

ivyleavedtoadflax commented Apr 6, 2020 •

edited

Loading

Uh oh!

ivyleavedtoadflax commented Apr 12, 2020 •

edited

Loading

Uh oh!

lizgzil left a comment

Uh oh!

ivyleavedtoadflax commented Apr 13, 2020

Uh oh!

Uh oh!

Implement multitask training #25

Implement multitask training #25

Uh oh!

Conversation

ivyleavedtoadflax commented Mar 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR contains

How you can test it

Uh oh!

codecov-io commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

ivyleavedtoadflax commented Apr 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lizgzil commented Apr 6, 2020

Uh oh!

ivyleavedtoadflax commented Apr 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivyleavedtoadflax commented Apr 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lizgzil left a comment

Choose a reason for hiding this comment

Uh oh!

ivyleavedtoadflax commented Apr 13, 2020

Uh oh!

Uh oh!

ivyleavedtoadflax commented Mar 24, 2020 •

edited

Loading

codecov-io commented Mar 25, 2020 •

edited

Loading

ivyleavedtoadflax commented Apr 5, 2020 •

edited

Loading

ivyleavedtoadflax commented Apr 6, 2020 •

edited

Loading

ivyleavedtoadflax commented Apr 12, 2020 •

edited

Loading