Medical Knowledge Graph the main structure and part of data are from https://github.com/liuhuanyong/QASystemOnMedicalKG
python == 3.7.0
neo4j == 4.1.0
py2neo == 4.3.0
pymysql == 0.10.0
The project mainly depends on the structure from
https://github.com/liuhuanyong/QASystemOnMedicalKG.
Apart from that, adding the drug recommend system on it and the drug data is from
https://www.jiankangle.com/healthMall/category;jsessionid=FDC570E85A0BE68990B51FAB6307B454.
KG_data.py: match the disease and medicine (but the result is not so good)
KG_parameters.py: store the various paths of data
KG_functions.py: store some used functions in this project
build_graph.py: upload the structured data to neo4j database
classifier: classify the intention of the patient's query
paser.py: transfer to required sql query sentence
searcher.py: return the information queried
drug_recommend.py: return the recommended drug
It contains 9 kinds of nodes (category of drug, drug, disease,symptom, food, department, check, parts) and 12 kinds of relationships Totally 28000 entity nodes and 360000 relationships
the main method to classify question and name entity is based on string matching which is highly dependent on the size of vocabulary and not flexible.
Further Suggestion:
- Using a NER model like BiLSTM+CRF, HMM and some other sequential models extracts the entities which could be applied to much more cases and avoid the limition of the vocabulary size.
- Applying a method that compare the similarity of words and transfer the entity extracted from NER model to standard key words in SQL. Carry out the sql query and return the required answer.