MER failling to identify entities in text #3

Cobollero · 2020-03-13T17:42:46Z

Hi!

Came across a bug where MER fails to identify entities if the entity is right next to a punctuation mark.

For example, in the following picture, Calcimycin is identified in the sentences "I like Calcimycin" and "I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", but not in "I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase" which has a comma right after the word Calcimycin.

The code that I used to create the lexicon (mesh_lex) is the following:

import merpy

with open('MeSH_name_id_mapping.txt', encoding='utf-8') as finput_terms:
    l_terms = finput_terms.readlines()

dict_terms = {}    
for i in l_terms: 
    aux = i.split('=')
    dict_terms[aux[0].strip()] = aux[1].replace('\n','')

with open('mesh_terms_synonyms.txt', encoding='utf-8') as finput_terms:
    l_terms_syn = finput_terms.readlines()

dict_terms_synonyms = {}    
for i in l_terms_syn: 
    aux = i.split('\t')
    dict_terms_synonyms[aux[0]] = aux[1].replace('\n','')

conv_dict = {}
for key, values in dict_terms_synonyms.items():
    l_synonyms = values.split(',')
    if key not in l_synonyms:
        l_synonyms.append(key)

    for i in l_synonyms:
        conv_dict[i.strip()] = dict_terms.get(key)

merpy.create_lexicon(conv_dict.keys(), "mesh_lex")
merpy.create_mappings(conv_dict, "mesh_lex")
merpy.show_lexicons()
merpy.process_lexicon("mesh_lex")

#Examples
print(merpy.get_entities("I like abdominal injuries", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))

Here are the files with the entities:
MeSH_name_id_mapping.txt
mesh_terms_synonyms.txt

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MER failling to identify entities in text #3

MER failling to identify entities in text #3

Cobollero commented Mar 13, 2020

MER failling to identify entities in text #3

MER failling to identify entities in text #3

Comments

Cobollero commented Mar 13, 2020