You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to parse the NCBI Disease Corpus train set, but get an error for mentions that include multiple MeSH terms (i.e. "colon and some other cancers" -> D003110|D009369). Suggestions on how to handle this aside from removing lines that include "CompositeMention".
Dataset
10192393|t|A common human skin tumour is caused by activating mutations in beta-catenin.
10192393|a|WNT signalling orchestrates... but a small percentage of colon and some other cancers harbour...
10192393 15 26 skin tumour DiseaseClass D012878
10192393 443 449 cancer DiseaseClass D009369
10192393 483 496 colon cancers DiseaseClass D003110
10192393 539 565 adenomatous polyposis coli SpecificDisease D011125
10192393 567 570 APC SpecificDisease D011125
10192393 670 698 colon and some other cancers CompositeMention D003110|D009369
10192393 855 867 skin tumours DiseaseClass D012878
10192393 879 893 pilomatricomas SpecificDisease D018296
10192393 1021 1035 pilomatricomas SpecificDisease D018296
10192393 1210 1221 skin tumour DiseaseClass D012878
10192393 1262 1268 tumour Modifier D009369
10192393 1312 1326 pilomatricomas SpecificDisease D018296
10192393 1385 1392 tumours DiseaseClass D009369
10192393 1615 1622 tumours DiseaseClass D009369
Error
77 prev_line_type = curr_line_type
78 except Exception as e:
---> 79 raise Exception('ERROR occured when parsing line'
80 f' #{line_number}. Exception {e}')
82 if self.__document_being_read is not None:
83 self.corpus.append(self.__document_being_read)
Exception: ERROR occured when parsing line #8. Exception Unexpected content received on line #8, the line/data may have been corrupted. Content: '10192393 670 698 colon and some other cancers CompositeMention D003110|D009369
The text was updated successfully, but these errors were encountered:
Trying to parse the NCBI Disease Corpus train set, but get an error for mentions that include multiple MeSH terms (i.e. "colon and some other cancers" -> D003110|D009369). Suggestions on how to handle this aside from removing lines that include "CompositeMention".
Dataset
Error
The text was updated successfully, but these errors were encountered: