Remove them
- Assign each drugName/condition a unique ID and construct a two-way mapping between drugName/condition and ID. In the final output file, please store the ID number instead of the string of drugName/condition.
- Previous data analysis shows that there are some data frames with a null condition entry, remove these data frames.
- If it is possible, merge synonyms together. I am not sure how these two features are collected, if it is entered by the users, then it will be useful to merge synonyms.
- Follow the preprocessing process of this one. One slight difference is that, we do not need to create the ‘sentiment_rate’ feature and implement the sentiment(review) function as we are not doing a sentiment analysis task.
- Split the review sentence into a list of words
- Assign each word a unique ID and construct the two-way mapping. In the final output please store the ID of word. So the final output of each review should be a list of integers.