Releases: nedap/deidentify
Releases · nedap/deidentify
v0.7.3
Closed issues:
- TypeError: __init__() missing 1 required positional argument: 'text' while loading the model #64
- Language of the Evaluator's Tokenizer is set to English when asked for Dutch or French. #62
- Paths when running training #57
Merged pull requests:
- update project dependencies in setup.py #65 (osmalpkoras)
- Use correct language for tokenizer during evaluation #63 (jantrienes)
- Pin pylint version #61 (jantrienes)
- Added German language support #60 (bbieniek)
v0.7.2
Merged pull requests:
- Add language parameter to evaluate_corpus #59 (jantrienes)
- Added French language #58 (bbieniek)
v0.7.1
Closed issues:
- Support spaCy 3 #51
Merged pull requests:
- Add Python 3.9 tests to CI pipeline #53 (jantrienes)
- Add spacy 3 compatibility #52 (jantrienes)
v0.7.0
Merged pull requests:
- Run integration tests for Python 3.6/3.7/3.8 on CI #48 (jantrienes)
- Move pytest-cov and pylint configuration to setup.cfg #47 (jantrienes)
- Remove obsolete conditional in CRF tagger #46 (jantrienes)
- Add integration tests for FlairTagger and CRFTagger #45 (jantrienes)
- Handle invalid model names in model lookup #44 (jantrienes)
- Automate model download if not found in cache #43 (jantrienes)
- Upgrade model training dependencies #42 (jantrienes)
- Return dict for ignored sents in crf.predict_marginals #41 (jantrienes)
- Add fine-tuning and embedding language flags #40 (AIessa)
model_crf_ons_tuned-v0.2.0
entity level tp: 2844 - fp: 202 - fn: 793 - tn: 0 - precision: 0.9337 - recall: 0.7820 - f1-score: 0.8511
Address tp: 115 - fp: 23 - fn: 41 - tn: 0 - precision: 0.8333 - recall: 0.7372 - f1-score: 0.7823
Age tp: 15 - fp: 8 - fn: 26 - tn: 0 - precision: 0.6522 - recall: 0.3659 - f1-score: 0.4688
Care_Institute tp: 126 - fp: 30 - fn: 90 - tn: 0 - precision: 0.8077 - recall: 0.5833 - f1-score: 0.6774
Date tp: 711 - fp: 46 - fn: 92 - tn: 0 - precision: 0.9392 - recall: 0.8854 - f1-score: 0.9115
Email tp: 9 - fp: 0 - fn: 1 - tn: 0 - precision: 1.0000 - recall: 0.9000 - f1-score: 0.9474
Hospital tp: 5 - fp: 1 - fn: 5 - tn: 0 - precision: 0.8333 - recall: 0.5000 - f1-score: 0.6250
ID tp: 7 - fp: 3 - fn: 18 - tn: 0 - precision: 0.7000 - recall: 0.2800 - f1-score: 0.4000
Initials tp: 86 - fp: 11 - fn: 92 - tn: 0 - precision: 0.8866 - recall: 0.4831 - f1-score: 0.6254
Internal_Location tp: 21 - fp: 4 - fn: 34 - tn: 0 - precision: 0.8400 - recall: 0.3818 - f1-score: 0.5250
Name tp: 1665 - fp: 65 - fn: 276 - tn: 0 - precision: 0.9624 - recall: 0.8578 - f1-score: 0.9071
Organization_Company tp: 62 - fp: 9 - fn: 74 - tn: 0 - precision: 0.8732 - recall: 0.4559 - f1-score: 0.5990
Other tp: 0 - fp: 0 - fn: 4 - tn: 0 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax tp: 14 - fp: 1 - fn: 2 - tn: 0 - precision: 0.9333 - recall: 0.8750 - f1-score: 0.9032
Profession tp: 5 - fp: 1 - fn: 37 - tn: 0 - precision: 0.8333 - recall: 0.1190 - f1-score: 0.2083
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 0 - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571
token level tp: 3557 - fp: 225 - fn: 893 - tn: 1471820 - precision: 0.9405 - recall: 0.7993 - f1-score: 0.8642
Address tp: 176 - fp: 20 - fn: 60 - tn: 98177 - precision: 0.8980 - recall: 0.7458 - f1-score: 0.8149
Age tp: 29 - fp: 11 - fn: 30 - tn: 98363 - precision: 0.7250 - recall: 0.4915 - f1-score: 0.5858
Care_Institute tp: 240 - fp: 51 - fn: 107 - tn: 98035 - precision: 0.8247 - recall: 0.6916 - f1-score: 0.7523
Date tp: 865 - fp: 44 - fn: 72 - tn: 97452 - precision: 0.9516 - recall: 0.9232 - f1-score: 0.9372
Email tp: 9 - fp: 0 - fn: 1 - tn: 98423 - precision: 1.0000 - recall: 0.9000 - f1-score: 0.9474
Hospital tp: 9 - fp: 1 - fn: 5 - tn: 98418 - precision: 0.9000 - recall: 0.6429 - f1-score: 0.7500
ID tp: 7 - fp: 3 - fn: 18 - tn: 98405 - precision: 0.7000 - recall: 0.2800 - f1-score: 0.4000
Initials tp: 89 - fp: 8 - fn: 96 - tn: 98240 - precision: 0.9175 - recall: 0.4811 - f1-score: 0.6312
Internal_Location tp: 34 - fp: 5 - fn: 58 - tn: 98336 - precision: 0.8718 - recall: 0.3696 - f1-score: 0.5191
Name tp: 1950 - fp: 67 - fn: 282 - tn: 96134 - precision: 0.9668 - recall: 0.8737 - f1-score: 0.9179
Organization_Company tp: 107 - fp: 13 - fn: 101 - tn: 98212 - precision: 0.8917 - recall: 0.5144 - f1-score: 0.6524
Other tp: 0 - fp: 0 - fn: 5 - tn: 98428 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax tp: 20 - fp: 1 - fn: 2 - tn: 98410 - precision: 0.9524 - recall: 0.9091 - f1-score: 0.9302
Profession tp: 19 - fp: 1 - fn: 55 - tn: 98358 - precision: 0.9500 - recall: 0.2568 - f1-score: 0.4043
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 98429 - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571
token (blind) tp: 3652 - fp: 130 - fn: 798 - tn: 93853 - precision: 0.9656 - recall: 0.8207 - f1-score: 0.8873
ENT tp: 3652 - fp: 130 - fn: 798 - tn: 93853 - precision: 0.9656 - recall: 0.8207 - f1-score: 0.8873
model_bilstmcrf_ons_large-v0.2.0
entity level tp: 3184 - fp: 262 - fn: 453 - tn: 0 - precision: 0.9240 - recall: 0.8754 - f1-score: 0.8990
Address tp: 135 - fp: 19 - fn: 21 - tn: 0 - precision: 0.8766 - recall: 0.8654 - f1-score: 0.8710
Age tp: 27 - fp: 8 - fn: 14 - tn: 0 - precision: 0.7714 - recall: 0.6585 - f1-score: 0.7105
Care_Institute tp: 148 - fp: 55 - fn: 68 - tn: 0 - precision: 0.7291 - recall: 0.6852 - f1-score: 0.7065
Date tp: 735 - fp: 48 - fn: 68 - tn: 0 - precision: 0.9387 - recall: 0.9153 - f1-score: 0.9269
Email tp: 10 - fp: 2 - fn: 0 - tn: 0 - precision: 0.8333 - recall: 1.0000 - f1-score: 0.9091
Hospital tp: 7 - fp: 3 - fn: 3 - tn: 0 - precision: 0.7000 - recall: 0.7000 - f1-score: 0.7000
ID tp: 16 - fp: 3 - fn: 9 - tn: 0 - precision: 0.8421 - recall: 0.6400 - f1-score: 0.7273
Initials tp: 111 - fp: 22 - fn: 67 - tn: 0 - precision: 0.8346 - recall: 0.6236 - f1-score: 0.7138
Internal_Location tp: 27 - fp: 10 - fn: 28 - tn: 0 - precision: 0.7297 - recall: 0.4909 - f1-score: 0.5869
Name tp: 1860 - fp: 68 - fn: 81 - tn: 0 - precision: 0.9647 - recall: 0.9583 - f1-score: 0.9615
Organization_Company tp: 78 - fp: 23 - fn: 58 - tn: 0 - precision: 0.7723 - recall: 0.5735 - f1-score: 0.6582
Other tp: 0 - fp: 0 - fn: 4 - tn: 0 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax tp: 16 - fp: 0 - fn: 0 - tn: 0 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession tp: 11 - fp: 1 - fn: 31 - tn: 0 - precision: 0.9167 - recall: 0.2619 - f1-score: 0.4074
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 0 - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571
token level tp: 3973 - fp: 262 - fn: 477 - tn: 1471783 - precision: 0.9381 - recall: 0.8928 - f1-score: 0.9149
Address tp: 210 - fp: 17 - fn: 26 - tn: 98180 - precision: 0.9251 - recall: 0.8898 - f1-score: 0.9071
Age tp: 44 - fp: 10 - fn: 15 - tn: 98364 - precision: 0.8148 - recall: 0.7458 - f1-score: 0.7788
Care_Institute tp: 278 - fp: 69 - fn: 69 - tn: 98017 - precision: 0.8012 - recall: 0.8012 - f1-score: 0.8012
Date tp: 893 - fp: 47 - fn: 44 - tn: 97449 - precision: 0.9500 - recall: 0.9530 - f1-score: 0.9515
Email tp: 10 - fp: 2 - fn: 0 - tn: 98421 - precision: 0.8333 - recall: 1.0000 - f1-score: 0.9091
Hospital tp: 11 - fp: 3 - fn: 3 - tn: 98416 - precision: 0.7857 - recall: 0.7857 - f1-score: 0.7857
ID tp: 16 - fp: 3 - fn: 9 - tn: 98405 - precision: 0.8421 - recall: 0.6400 - f1-score: 0.7273
Initials tp: 120 - fp: 15 - fn: 65 - tn: 98233 - precision: 0.8889 - recall: 0.6486 - f1-score: 0.7500
Internal_Location tp: 43 - fp: 9 - fn: 49 - tn: 98332 - precision: 0.8269 - recall: 0.4674 - f1-score: 0.5972
Name tp: 2161 - fp: 63 - fn: 71 - tn: 96138 - precision: 0.9717 - recall: 0.9682 - f1-score: 0.9699
Organization_Company tp: 122 - fp: 22 - fn: 86 - tn: 98203 - precision: 0.8472 - recall: 0.5865 - f1-score: 0.6931
Other tp: 0 - fp: 0 - fn: 5 - tn: 98428 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax tp: 22 - fp: 0 - fn: 0 - tn: 98411 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession tp: 40 - fp: 2 - fn: 34 - tn: 98357 - precision: 0.9524 - recall: 0.5405 - f1-score: 0.6896
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 98429 - precision: 1.0000 - recall: 0.7500 - f1-score: 0.8571
token (blind) tp: 4079 - fp: 156 - fn: 371 - tn: 93827 - precision: 0.9632 - recall: 0.9166 - f1-score: 0.9393
ENT tp: 4079 - fp: 156 - fn: 371 - tn: 93827 - precision: 0.9632 - recall: 0.9166 - f1-score: 0.9393
model_bilstmcrf_ons_fast-v0.2.0
entity level tp: 3177 - fp: 314 - fn: 460 - tn: 0 - precision: 0.9101 - recall: 0.8735 - f1-score: 0.8914
Address tp: 130 - fp: 25 - fn: 26 - tn: 0 - precision: 0.8387 - recall: 0.8333 - f1-score: 0.8360
Age tp: 26 - fp: 11 - fn: 15 - tn: 0 - precision: 0.7027 - recall: 0.6341 - f1-score: 0.6666
Care_Institute tp: 129 - fp: 58 - fn: 87 - tn: 0 - precision: 0.6898 - recall: 0.5972 - f1-score: 0.6402
Date tp: 742 - fp: 46 - fn: 61 - tn: 0 - precision: 0.9416 - recall: 0.9240 - f1-score: 0.9327
Email tp: 10 - fp: 0 - fn: 0 - tn: 0 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Hospital tp: 5 - fp: 3 - fn: 5 - tn: 0 - precision: 0.6250 - recall: 0.5000 - f1-score: 0.5556
ID tp: 12 - fp: 6 - fn: 13 - tn: 0 - precision: 0.6667 - recall: 0.4800 - f1-score: 0.5582
Initials tp: 115 - fp: 31 - fn: 63 - tn: 0 - precision: 0.7877 - recall: 0.6461 - f1-score: 0.7099
Internal_Location tp: 27 - fp: 10 - fn: 28 - tn: 0 - precision: 0.7297 - recall: 0.4909 - f1-score: 0.5869
Name tp: 1875 - fp: 91 - fn: 66 - tn: 0 - precision: 0.9537 - recall: 0.9660 - f1-score: 0.9598
Organization_Company tp: 73 - fp: 27 - fn: 63 - tn: 0 - precision: 0.7300 - recall: 0.5368 - f1-score: 0.6187
Other tp: 0 - fp: 0 - fn: 4 - tn: 0 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax tp: 16 - fp: 0 - fn: 0 - tn: 0 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession tp: 13 - fp: 4 - fn: 29 - tn: 0 - precision: 0.7647 - recall: 0.3095 - f1-score: 0.4407
SSN tp: 0 - fp: 2 - fn: 0 - tn: 0 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
URL_IP tp: 4 - fp: 0 - fn: 0 - tn: 0 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
token level tp: 3975 - fp: 315 - fn: 473 - tn: 1471702 - precision: 0.9266 - recall: 0.8937 - f1-score: 0.9099
Address tp: 204 - fp: 20 - fn: 32 - tn: 98175 - precision: 0.9107 - recall: 0.8644 - f1-score: 0.8869
Age tp: 44 - fp: 16 - fn: 15 - tn: 98356 - precision: 0.7333 - recall: 0.7458 - f1-score: 0.7395
Care_Institute tp: 254 - fp: 69 - fn: 93 - tn: 98015 - precision: 0.7864 - recall: 0.7320 - f1-score: 0.7582
Date tp: 901 - fp: 44 - fn: 36 - tn: 97450 - precision: 0.9534 - recall: 0.9616 - f1-score: 0.9575
Email tp: 10 - fp: 0 - fn: 0 - tn: 98421 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Hospital tp: 10 - fp: 3 - fn: 4 - tn: 98414 - precision: 0.7692 - recall: 0.7143 - f1-score: 0.7407
ID tp: 12 - fp: 6 - fn: 11 - tn: 98402 - precision: 0.6667 - recall: 0.5217 - f1-score: 0.5854
Initials tp: 128 - fp: 18 - fn: 57 - tn: 98228 - precision: 0.8767 - recall: 0.6919 - f1-score: 0.7734
Internal_Location tp: 46 - fp: 11 - fn: 46 - tn: 98328 - precision: 0.8070 - recall: 0.5000 - f1-score: 0.6174
Name tp: 2179 - fp: 94 - fn: 53 - tn: 96105 - precision: 0.9586 - recall: 0.9763 - f1-score: 0.9674
Organization_Company tp: 119 - fp: 30 - fn: 89 - tn: 98193 - precision: 0.7987 - recall: 0.5721 - f1-score: 0.6667
Other tp: 0 - fp: 0 - fn: 5 - tn: 98426 - precision: 0.0000 - recall: 0.0000 - f1-score: 0.0000
Phone_fax tp: 22 - fp: 0 - fn: 0 - tn: 98409 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
Profession tp: 42 - fp: 4 - fn: 32 - tn: 98353 - precision: 0.9130 - recall: 0.5676 - f1-score: 0.7000
URL_IP tp: 4 - fp: 0 - fn: 0 - tn: 98427 - precision: 1.0000 - recall: 1.0000 - f1-score: 1.0000
token (blind) tp: 4100 - fp: 192 - fn: 350 - tn: 93791 - precision: 0.9553 - recall: 0.9213 - f1-score: 0.9380
ENT tp: 4100 - fp: 192 - fn: 350 - tn: 93791 - precision: 0.9553 - recall: 0.9213 - f1-score: 0.9380
v0.6.1
Merged pull requests:
- Correctly handle whitespace in BIO to standoff conversion #39 (jantrienes)
- Add flag to save final BiLSTM-CRF model when training on a train-subset #38 (jantrienes)
- Expand scope of error handling in date parsing #37 (jantrienes)
- Escape regex parameters during name replacements #36 (jantrienes)
- Handle platform-specific issue with strftime/strptime #35 (jantrienes)
v0.6.0
Merged pull requests:
- Add customizable error handling for PHI replacement #34 (jantrienes)
v0.5.2
Merged pull requests:
- Bundle generator resources with python package via package_data #32 (jantrienes)