Sentiment Analysis for drugs/medicines
Use Case:
- Data contain samples of text. This text can contain one or more drug mentions. Each row contains a unique combination of the text and the drug mention. The Objective is to predict the sentiment for texts contained in the test dataset for given the text and drug n
Solution:
- 1) Input Dataset is “text". The unstructured data is processed with raw data preprocessing followed by text preprocessing .
- 2) TFIDF featurization is used to convert preprocessed text into vectors.
- 3) Sentiment class data is imbalanced .
- 4) Thus, Sentiment Class data is performed over-sampling using SMOTE.
- 5) The misclassification error for each alpha value is plotted and best alpha value is used in Naive Bayes classifier .
- 6) The plotted confusion matrix is evaluated based on naive bayes classifier.