Skip to content

Releases: nedap/deidentify

v0.7.3

05 May 13:00
5693cab
Compare
Choose a tag to compare

Full Changelog

Closed issues:

  • TypeError: __init__() missing 1 required positional argument: 'text' while loading the model #64
  • Language of the Evaluator's Tokenizer is set to English when asked for Dutch or French. #62
  • Paths when running training #57

Merged pull requests:

v0.7.2

03 Jun 15:30
f75b747
Compare
Choose a tag to compare

Full Changelog

Merged pull requests:

v0.7.1

15 Feb 20:11
Compare
Choose a tag to compare

Full Changelog

Closed issues:

  • Support spaCy 3 #51

Merged pull requests:

v0.7.0

16 Dec 18:56
Compare
Choose a tag to compare

Full Changelog

Merged pull requests:

  • Run integration tests for Python 3.6/3.7/3.8 on CI #48 (jantrienes)
  • Move pytest-cov and pylint configuration to setup.cfg #47 (jantrienes)
  • Remove obsolete conditional in CRF tagger #46 (jantrienes)
  • Add integration tests for FlairTagger and CRFTagger #45 (jantrienes)
  • Handle invalid model names in model lookup #44 (jantrienes)
  • Automate model download if not found in cache #43 (jantrienes)
  • Upgrade model training dependencies #42 (jantrienes)
  • Return dict for ignored sents in crf.predict_marginals #41 (jantrienes)
  • Add fine-tuning and embedding language flags #40 (AIessa)

model_crf_ons_tuned-v0.2.0

16 Dec 13:45
ac8c8b3
Compare
Choose a tag to compare
entity level            tp: 2844  - fp: 202   - fn: 793   - tn: 0     - precision: 0.9337 - recall: 0.7820 - f1-score: 0.8511
Address                 tp: 115   - fp: 23    - fn: 41    - tn: 0     - precision: 0.8333 - recall: 0.7372 - f1-score: 0.7823
Age                     tp: 15    - fp: 8     - fn: 26    - tn: 0     - precision: 0.6522 - recall: 0.3659 - f1-score: 0.4688
Care_Institute          tp: 126   - fp: 30    - fn: 90    - tn: 0     - precision: 0.8077 - recall: 0.5833 - f1-score: 0.6774
Date                    tp: 711   - fp: 46    - fn: 92    - tn: 0     - precision: 0.9392 - recall: 0.8854 - f1-score: 0.9115
Email                   tp: 9     - fp: 0     - fn: 1     - tn: 0     - precision: 1.0000 - recall: 0.9000 - f1-score: 0.9474
Hospital                tp: 5     - fp: 1     - fn: 5     - tn: 0     - precision: 0.8333 - recall: 0.5000 - f1-score: 0.6250
ID                      tp: 7     - fp: 3     - fn: 18    - tn: 0     - precision: 0.7000 - recall: 0.2800 - f1-score: 0.4000
Initials                tp: 86    - fp: 11    - fn: 92    - tn: 0     - precision: 0.8866 - recall: 0.4831 - f1-score: 0.6254
Internal_Location       tp: 21    - fp: 4     - fn: 34    - tn: 0     - precision: 0.8400 - recall: 0.3818 - f1-score: 0.5250
Name                    tp: 1665  - fp: 65    - fn: 276   - tn: 0     - precision: 0.9624 - recall: 0.8578 - f1-score: 0.9071
Organization_Company    tp: 62    - fp: 9     - fn: 74    - tn: 0     - precision: 0.8732 - recall: 0.4559 - f1-score: 0.5990
Other                   tp: 0     - fp: 0     - fn: 4     - tn: 0     - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax               tp: 14    - fp: 1     - fn: 2     - tn: 0     - precision: 0.9333 - recall: 0.8750 - f1-score: 0.9032
Profession              tp: 5     - fp: 1     - fn: 37    - tn: 0     - precision: 0.8333 - recall: 0.1190 - f1-score: 0.2083
URL_IP                  tp: 3     - fp: 0     - fn: 1     - tn: 0     - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571

token level             tp: 3557  - fp: 225   - fn: 893   - tn: 1471820 - precision: 0.9405 - recall: 0.7993 - f1-score: 0.8642
Address                 tp: 176   - fp: 20    - fn: 60    - tn: 98177 - precision: 0.8980 - recall: 0.7458 - f1-score: 0.8149
Age                     tp: 29    - fp: 11    - fn: 30    - tn: 98363 - precision: 0.7250 - recall: 0.4915 - f1-score: 0.5858
Care_Institute          tp: 240   - fp: 51    - fn: 107   - tn: 98035 - precision: 0.8247 - recall: 0.6916 - f1-score: 0.7523
Date                    tp: 865   - fp: 44    - fn: 72    - tn: 97452 - precision: 0.9516 - recall: 0.9232 - f1-score: 0.9372
Email                   tp: 9     - fp: 0     - fn: 1     - tn: 98423 - precision: 1.0000 - recall: 0.9000 - f1-score: 0.9474
Hospital                tp: 9     - fp: 1     - fn: 5     - tn: 98418 - precision: 0.9000 - recall: 0.6429 - f1-score: 0.7500
ID                      tp: 7     - fp: 3     - fn: 18    - tn: 98405 - precision: 0.7000 - recall: 0.2800 - f1-score: 0.4000
Initials                tp: 89    - fp: 8     - fn: 96    - tn: 98240 - precision: 0.9175 - recall: 0.4811 - f1-score: 0.6312
Internal_Location       tp: 34    - fp: 5     - fn: 58    - tn: 98336 - precision: 0.8718 - recall: 0.3696 - f1-score: 0.5191
Name                    tp: 1950  - fp: 67    - fn: 282   - tn: 96134 - precision: 0.9668 - recall: 0.8737 - f1-score: 0.9179
Organization_Company    tp: 107   - fp: 13    - fn: 101   - tn: 98212 - precision: 0.8917 - recall: 0.5144 - f1-score: 0.6524
Other                   tp: 0     - fp: 0     - fn: 5     - tn: 98428 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax               tp: 20    - fp: 1     - fn: 2     - tn: 98410 - precision: 0.9524 - recall: 0.9091 - f1-score: 0.9302
Profession              tp: 19    - fp: 1     - fn: 55    - tn: 98358 - precision: 0.9500 - recall: 0.2568 - f1-score: 0.4043
URL_IP                  tp: 3     - fp: 0     - fn: 1     - tn: 98429 - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571

token (blind)           tp: 3652  - fp: 130   - fn: 798   - tn: 93853 - precision: 0.9656 - recall: 0.8207 - f1-score: 0.8873
ENT                     tp: 3652  - fp: 130   - fn: 798   - tn: 93853 - precision: 0.9656 - recall: 0.8207 - f1-score: 0.8873

model_bilstmcrf_ons_large-v0.2.0

16 Dec 13:59
ac8c8b3
Compare
Choose a tag to compare
entity level            tp: 3184  - fp: 262   - fn: 453   - tn: 0     - precision: 0.9240 - recall: 0.8754 - f1-score: 0.8990
Address                 tp: 135   - fp: 19    - fn: 21    - tn: 0     - precision: 0.8766 - recall: 0.8654 - f1-score: 0.8710
Age                     tp: 27    - fp: 8     - fn: 14    - tn: 0     - precision: 0.7714 - recall: 0.6585 - f1-score: 0.7105
Care_Institute          tp: 148   - fp: 55    - fn: 68    - tn: 0     - precision: 0.7291 - recall: 0.6852 - f1-score: 0.7065
Date                    tp: 735   - fp: 48    - fn: 68    - tn: 0     - precision: 0.9387 - recall: 0.9153 - f1-score: 0.9269
Email                   tp: 10    - fp: 2     - fn: 0     - tn: 0     - precision: 0.8333 - recall: 1.0000 - f1-score: 0.9091
Hospital                tp: 7     - fp: 3     - fn: 3     - tn: 0     - precision: 0.7000 - recall: 0.7000 - f1-score: 0.7000
ID                      tp: 16    - fp: 3     - fn: 9     - tn: 0     - precision: 0.8421 - recall: 0.6400 - f1-score: 0.7273
Initials                tp: 111   - fp: 22    - fn: 67    - tn: 0     - precision: 0.8346 - recall: 0.6236 - f1-score: 0.7138
Internal_Location       tp: 27    - fp: 10    - fn: 28    - tn: 0     - precision: 0.7297 - recall: 0.4909 - f1-score: 0.5869
Name                    tp: 1860  - fp: 68    - fn: 81    - tn: 0     - precision: 0.9647 - recall: 0.9583 - f1-score: 0.9615
Organization_Company    tp: 78    - fp: 23    - fn: 58    - tn: 0     - precision: 0.7723 - recall: 0.5735 - f1-score: 0.6582
Other                   tp: 0     - fp: 0     - fn: 4     - tn: 0     - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax               tp: 16    - fp: 0     - fn: 0     - tn: 0     - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession              tp: 11    - fp: 1     - fn: 31    - tn: 0     - precision: 0.9167 - recall: 0.2619 - f1-score: 0.4074
URL_IP                  tp: 3     - fp: 0     - fn: 1     - tn: 0     - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571

token level             tp: 3973  - fp: 262   - fn: 477   - tn: 1471783 - precision: 0.9381 - recall: 0.8928 - f1-score: 0.9149
Address                 tp: 210   - fp: 17    - fn: 26    - tn: 98180 - precision: 0.9251 - recall: 0.8898 - f1-score: 0.9071
Age                     tp: 44    - fp: 10    - fn: 15    - tn: 98364 - precision: 0.8148 - recall: 0.7458 - f1-score: 0.7788
Care_Institute          tp: 278   - fp: 69    - fn: 69    - tn: 98017 - precision: 0.8012 - recall: 0.8012 - f1-score: 0.8012
Date                    tp: 893   - fp: 47    - fn: 44    - tn: 97449 - precision: 0.9500 - recall: 0.9530 - f1-score: 0.9515
Email                   tp: 10    - fp: 2     - fn: 0     - tn: 98421 - precision: 0.8333 - recall: 1.0000 - f1-score: 0.9091
Hospital                tp: 11    - fp: 3     - fn: 3     - tn: 98416 - precision: 0.7857 - recall: 0.7857 - f1-score: 0.7857
ID                      tp: 16    - fp: 3     - fn: 9     - tn: 98405 - precision: 0.8421 - recall: 0.6400 - f1-score: 0.7273
Initials                tp: 120   - fp: 15    - fn: 65    - tn: 98233 - precision: 0.8889 - recall: 0.6486 - f1-score: 0.7500
Internal_Location       tp: 43    - fp: 9     - fn: 49    - tn: 98332 - precision: 0.8269 - recall: 0.4674 - f1-score: 0.5972
Name                    tp: 2161  - fp: 63    - fn: 71    - tn: 96138 - precision: 0.9717 - recall: 0.9682 - f1-score: 0.9699
Organization_Company    tp: 122   - fp: 22    - fn: 86    - tn: 98203 - precision: 0.8472 - recall: 0.5865 - f1-score: 0.6931
Other                   tp: 0     - fp: 0     - fn: 5     - tn: 98428 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax               tp: 22    - fp: 0     - fn: 0     - tn: 98411 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession              tp: 40    - fp: 2     - fn: 34    - tn: 98357 - precision: 0.9524 - recall: 0.5405 - f1-score: 0.6896
URL_IP                  tp: 3     - fp: 0     - fn: 1     - tn: 98429 - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571

token (blind)           tp: 4079  - fp: 156   - fn: 371   - tn: 93827 - precision: 0.9632 - recall: 0.9166 - f1-score: 0.9393
ENT                     tp: 4079  - fp: 156   - fn: 371   - tn: 93827 - precision: 0.9632 - recall: 0.9166 - f1-score: 0.9393

model_bilstmcrf_ons_fast-v0.2.0

16 Dec 13:50
ac8c8b3
Compare
Choose a tag to compare
entity level            tp: 3177  - fp: 314   - fn: 460   - tn: 0     - precision: 0.9101 - recall: 0.8735 - f1-score: 0.8914
Address                 tp: 130   - fp: 25    - fn: 26    - tn: 0     - precision: 0.8387 - recall: 0.8333 - f1-score: 0.8360
Age                     tp: 26    - fp: 11    - fn: 15    - tn: 0     - precision: 0.7027 - recall: 0.6341 - f1-score: 0.6666
Care_Institute          tp: 129   - fp: 58    - fn: 87    - tn: 0     - precision: 0.6898 - recall: 0.5972 - f1-score: 0.6402
Date                    tp: 742   - fp: 46    - fn: 61    - tn: 0     - precision: 0.9416 - recall: 0.9240 - f1-score: 0.9327
Email                   tp: 10    - fp: 0     - fn: 0     - tn: 0     - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Hospital                tp: 5     - fp: 3     - fn: 5     - tn: 0     - precision: 0.6250 - recall: 0.5000 - f1-score: 0.5556
ID                      tp: 12    - fp: 6     - fn: 13    - tn: 0     - precision: 0.6667 - recall: 0.4800 - f1-score: 0.5582
Initials                tp: 115   - fp: 31    - fn: 63    - tn: 0     - precision: 0.7877 - recall: 0.6461 - f1-score: 0.7099
Internal_Location       tp: 27    - fp: 10    - fn: 28    - tn: 0     - precision: 0.7297 - recall: 0.4909 - f1-score: 0.5869
Name                    tp: 1875  - fp: 91    - fn: 66    - tn: 0     - precision: 0.9537 - recall: 0.9660 - f1-score: 0.9598
Organization_Company    tp: 73    - fp: 27    - fn: 63    - tn: 0     - precision: 0.7300 - recall: 0.5368 - f1-score: 0.6187
Other                   tp: 0     - fp: 0     - fn: 4     - tn: 0     - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax               tp: 16    - fp: 0     - fn: 0     - tn: 0     - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession              tp: 13    - fp: 4     - fn: 29    - tn: 0     - precision: 0.7647 - recall: 0.3095 - f1-score: 0.4407
SSN                     tp: 0     - fp: 2     - fn: 0     - tn: 0     - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
URL_IP                  tp: 4     - fp: 0     - fn: 0     - tn: 0     - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000

token level             tp: 3975  - fp: 315   - fn: 473   - tn: 1471702 - precision: 0.9266 - recall: 0.8937 - f1-score: 0.9099
Address                 tp: 204   - fp: 20    - fn: 32    - tn: 98175 - precision: 0.9107 - recall: 0.8644 - f1-score: 0.8869
Age                     tp: 44    - fp: 16    - fn: 15    - tn: 98356 - precision: 0.7333 - recall: 0.7458 - f1-score: 0.7395
Care_Institute          tp: 254   - fp: 69    - fn: 93    - tn: 98015 - precision: 0.7864 - recall: 0.7320 - f1-score: 0.7582
Date                    tp: 901   - fp: 44    - fn: 36    - tn: 97450 - precision: 0.9534 - recall: 0.9616 - f1-score: 0.9575
Email                   tp: 10    - fp: 0     - fn: 0     - tn: 98421 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Hospital                tp: 10    - fp: 3     - fn: 4     - tn: 98414 - precision: 0.7692 - recall: 0.7143 - f1-score: 0.7407
ID                      tp: 12    - fp: 6     - fn: 11    - tn: 98402 - precision: 0.6667 - recall: 0.5217 - f1-score: 0.5854
Initials                tp: 128   - fp: 18    - fn: 57    - tn: 98228 - precision: 0.8767 - recall: 0.6919 - f1-score: 0.7734
Internal_Location       tp: 46    - fp: 11    - fn: 46    - tn: 98328 - precision: 0.8070 - recall: 0.5000 - f1-score: 0.6174
Name                    tp: 2179  - fp: 94    - fn: 53    - tn: 96105 - precision: 0.9586 - recall: 0.9763 - f1-score: 0.9674
Organization_Company    tp: 119   - fp: 30    - fn: 89    - tn: 98193 - precision: 0.7987 - recall: 0.5721 - f1-score: 0.6667
Other                   tp: 0     - fp: 0     - fn: 5     - tn: 98426 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax               tp: 22    - fp: 0     - fn: 0     - tn: 98409 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession              tp: 42    - fp: 4     - fn: 32    - tn: 98353 - precision: 0.9130 - recall: 0.5676 - f1-score: 0.7000
URL_IP                  tp: 4     - fp: 0     - fn: 0     - tn: 98427 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000

token (blind)           tp: 4100  - fp: 192   - fn: 350   - tn: 93791 - precision: 0.9553 - recall: 0.9213 - f1-score: 0.9380
ENT                     tp: 4100  - fp: 192   - fn: 350   - tn: 93791 - precision: 0.9553 - recall: 0.9213 - f1-score: 0.9380

v0.6.1

13 Oct 10:06
Compare
Choose a tag to compare

Full Changelog

Merged pull requests:

  • Correctly handle whitespace in BIO to standoff conversion #39 (jantrienes)
  • Add flag to save final BiLSTM-CRF model when training on a train-subset #38 (jantrienes)
  • Expand scope of error handling in date parsing #37 (jantrienes)
  • Escape regex parameters during name replacements #36 (jantrienes)
  • Handle platform-specific issue with strftime/strptime #35 (jantrienes)

v0.6.0

10 Sep 06:06
Compare
Choose a tag to compare

Full Changelog

Merged pull requests:

  • Add customizable error handling for PHI replacement #34 (jantrienes)

v0.5.2

07 Sep 09:10
Compare
Choose a tag to compare

Full Changelog

Merged pull requests:

  • Bundle generator resources with python package via package_data #32 (jantrienes)