Sentiment Analsysis is frequently used for Natural Language Processing. The goal is to analyze a text and predict whether the underlying sentiment is positive, negative or neutral. It can done on either supervised data or on un-supervised data.
Sentiment Analysis for drugs/medicines
Problem Statement:
Nowadays the narrative of a brand is not only built and controlled by the company that owns the brand. For this reason, companies are constantly looking out across Blogs, Forums, and other social media platforms, etc for checking the sentiment for their various products and also competitor products to learn how their brand resonates in the market. This kind of analysis helps them as part of their post-launch market research. This is relevant for a lot of industries including pharma and their drugs.
Sentiment can be clubbed into 3 major buckets – Positive, Negative and Neutral Sentiments.
Data contains samples of text retrieved from various social media platforms. This text can contain one or more drug names. Each row contains a unique combination of the text and the drug name. Note that the same text can also have different sentiment for a different drug.
Challenge:
The challenge is that the language used in this type of content is not strictly grammatically correct. Some use sarcasm. Others cover several topics with different sentiments in one post. Some post comments and replies thereby indicating their sentiment about the medicine
Solution:
iVentura Machine Learning Platform was used for building the solution. iVentura provides the complete ecosystem for data scientists to build models without worrying about the underlying Infra & Security. Either for a team or an individual data scientist, iVentura is ideally suited as a platform of choice.
To deal with the above problem statement ,datasets needs to be analysed and evaluated with metrics to acquire best outcome. Here we go:
1) Input Dataset is the form of “text" thus the unstructured data is processed with raw data preprocessing followed by text preprocessing .
2) TFIDF featurization is used to convert preprocessed text into vectors.
3) Sentiment class data is imbalanced . Thus, Sentiment Class data is performed over-sampling using SMOTE.
4) The misclassification error for each alpha value is plotted and best alpha value is used in Naive Bayes classifier.
5) Here,MultinominalNB classifier is used to predict sentiment of text datasets.
6) The plotted confusion matrix and macro F1 score is evaluated and sentiment is predicted on test dataset.