Doing Sentiment Analysis

Sentiment Analysis is contextual mining of text which identifies and extracts subjective information in source material, and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations. However, analysis of social media streams is usually restricted to just basic sentiment analysis and count based metrics. This is akin to just scratching the surface and missing out on those high value insights that are waiting to be discovered. So what should a brand do to capture that low hanging fruit?

With the recent advances in deep learning, the ability of algorithms to analyse text has improved considerably. Creative use of advanced artificial intelligence techniques can be an effective tool for doing in-depth research.

Problem Statement

The main objective in this Internship Project is to predict the sentiment for a number of movie reviews obtained from the Internet Movie Database (IMDb). This dataset contains 50,000 movie reviews that have been pre-labeled with “positive” and “negative” sentiment class labels based on the review content. Besides this, there are additional movie reviews that are unlabeled.

The dataset can be obtained from http://ai.stanford.edu/~amaas/data/sentiment/ , courtesy of Stanford University and Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. They have datasets in the form of raw text as well as already processed bag of words formats. We will only be using the raw labeled movie reviews for our analyses.

Hence our task will be to predict the sentiment of 15,000 labeled movie reviews and use the remaining 35,000 reviews for training our supervised models.

Supervised Learning

Setting up Dependencies

2. Text Normalisation(using Text_normalizer.py) & Feature Engineering

A text corpus consists of multiple text documents and each document can be as simple as a single sentence to a complete document with multiple paragraphs. Textual data, in spite of being highly unstructured, can be classified into two major types of documents. Factual documents that typically depict some form of statements or facts with no specific feelings or emotion attached to them. These are also known as objective documents. Subjective documents on the other hand have text that expresses feelings, moods, emotions, and opinions.

3. Model Training , Prediction and evaluation using Model_evaluation_util.py

Here is the code of Supervised Learning

4. Summary :-

The F1-score of the model using traditional Supervised Learning is 90% and an accuracy of 90% approximately .

Unsupervised Lexicon Model

Unsupervised Lexicon Model :-

There are several popular lexicon models used for sentiment analysis. We would be using 3 lexicon Models mentioned below :-

· AFINN Lexicon

· SentiWordNet Lexicon

· VADER Lexicon

Setting up Dependencies
Sentiment Analysis using AFINN

Model training,Prediction and Evaluation

Sentiment polarity is typically a numeric score that’s assigned to both the positive and negative aspects of a text document based on subjective parameters like specific words and phrases expressing feelings and emotion. Neutral sentiment typically has 0 polarity since it does not express and specific sentiment, positive sentiment will have polarity > 0, and negative < 0. Of course, you can always change these thresholds based on the type of text you are dealing with; there are no hard constraints on this.

3. Sentiment Analysis using SentiWordNet

4. Sentiment Analysis using VADER

Here is the code of Unsupervised Lexicon Model

CONCLUSION

On comparing the overall F1-Score and model accuracy of Supervised ML Model with the best Unsupervised Lexicon Model ,we conclude that Supervised Leaning gives us an more accurate and good model than Unsupervised Lexicon Model .

Search This Blog

Doing Sentiment Analysis