Sentiment Analysis, also known as Opinion Mining, is a field of study that deals with the computational treatment of opinions, sentiments, evaluations, and emotions expressed in text. It is used to determine the polarity of a text, whether it is positive, negative, or neutral, and to quantify the degree of sentiment expressed. Sentiment analysis has become a critical tool for businesses, researchers, and governments to gain insights into public opinion and customer feedback, and to monitor social media for brand reputation management.
The Natural Language Toolkit (NLTK) is a popular open-source library in Python that provides a comprehensive suite of tools for working with human language data. One of the most useful features of NLTK is its SentimentIntensityAnalyzer, a pre-trained model that can be used to perform sentiment analysis on text data.
The SentimentIntensityAnalyzer uses a lexicon-based approach, where each word in a sentence is looked up in a pre-defined sentiment lexicon and given a sentiment score. In the case of the SentimentIntensityAnalyzer in NLTK, the sentiment lexicon used is the VADER (Valence Aware Dictionary and sentiment Reasoner) lexicon, which contains a large list of words and their associated sentiment scores. The scores in the VADER lexicon range from -1 (very negative) to +1 (very positive).
Here is an example of sentiment analysis in Python using the Natural Language Toolkit (nltk) library:
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
def sentiment_analysis(sentence):
# Use the SentimentIntensityAnalyzer to compute the sentiment scores
sentiment = SentimentIntensityAnalyzer().polarity_scores(sentence)
# Categorize the sentiment as positive, negative, or neutral based on the compound score
if sentiment['compound'] >= 0.05:
sentiment_category = "Positive"
elif sentiment['compound'] <= -0.05:
sentiment_category = "Negative"
else:
sentiment_category = "Neutral"
return sentiment, sentiment_category
# Test the sentiment analysis on some example sentences
sentence = "I love this youtube video! You Rock."
sentiment, sentiment_category = sentiment_analysis(sentence)
print("Sentence:", sentence)
print("Compound score:", sentiment['compound'])
print("Sentiment:", sentiment_category)
sentence = "I hate this youtube video! You're Terrible."
sentiment, sentiment_category = sentiment_analysis(sentence)
print("Sentence:", sentence)
print("Compound score:", sentiment['compound'])
print("Sentiment:", sentiment_category)
sentence = "I feel so-so about your youtube videos."
sentiment, sentiment_category = sentiment_analysis(sentence)
print("Sentence:", sentence)
print("Compound score:", sentiment['compound'])
print("Sentiment:", sentiment_category)
sentence = "I feel so-so about your boring youtube videos."
sentiment, sentiment_category = sentiment_analysis(sentence)
print("Sentence:", sentence)
print("Compound score:", sentiment['compound'])
print("Sentiment:", sentiment_category)
This code uses the SentimentIntensityAnalyzer
from nltk to compute the sentiment scores for each sentence. The polarity_scores
method returns a dictionary containing four values, which represent the sentiment of the sentence: pos
, neg
, neu
, and compound
. The compound
score is a composite score that summarizes the overall sentiment of the sentence, where scores close to 1 indicate a positive sentiment, scores close to -1 indicate a negative sentiment, and scores close to 0 indicate a neutral sentiment. In this example, we use the compound
score to categorize the sentiment of each sentence as positive, negative, or neutral.
What other tools are within SentimentIntensityAnalyzer
?
The polarity_scores
method of the SentimentIntensityAnalyzer
in the nltk library returns a dictionary containing the following four values:
pos
: The positive sentiment score, ranging from 0 to 1, where higher values indicate a more positive sentiment.neg
: The negative sentiment score, ranging from 0 to 1, where higher values indicate a more negative sentiment.neu
: The neutral sentiment score, ranging from 0 to 1, where higher values indicate a more neutral sentiment.compound
: The composite sentiment score, which summarizes the overall sentiment of the sentence. The score is a float between -1 and 1, where positive values indicate a positive sentiment, negative values indicate a negative sentiment, and values close to 0 indicate a neutral sentiment.
So in addition to the compound
score, you can also use the pos
, neg
, and neu
scores to further analyze the sentiment of a sentence. For example, you could use the pos
and neg
scores to see which sentiment is more strongly expressed in the sentence, or you could use the neu
score to determine how much of the sentiment is neutral.
In sentiment analysis, how is compound aggregated?
The compound score in the nltk library’s SentimentIntensityAnalyzer
is created using a weighted average of the scores for positive, negative, and neutral sentiment. The scores for each sentiment are generated using a lexicon-based approach, where each word in a sentence is looked up in a pre-defined sentiment lexicon and given a sentiment score.
In the case of the SentimentIntensityAnalyzer
in the nltk library, the sentiment lexicon used is the VADER lexicon, which contains a large list of words and their associated sentiment scores. The scores in the VADER lexicon range from -1 (very negative) to +1 (very positive).
To compute the compound score for a sentence, the scores for each word in the sentence are first combined, taking into account the position of the word in the sentence and any intensifying or dampening words that may be present. This results in an intermediate score for the sentence. The final compound score is then computed as a weighted average of this intermediate score, the positive score, the negative score, and the neutral score.
The exact formula for the compound score is not publicly available, but it has been shown to be effective in accurately capturing the overall sentiment of a sentence.