Data Analytics

Sentiment Analysis. How is it being solved today?

From a completely redesigned 3D Touch Home Button to the device’s blazing-fast A10 Fusion chip, the iPhone 7 is leagues away from its predecessor. Other features of the device that are particularly impressive are the devices’ upgrade cameras, waterproof capabilities, and the device’s new colour options. While Apple might see the removal of the headphone jack as an innovation, many iPhone users have stated that the absence of the headphone jack is a pretty big turn off for the device.

You are so quick to study the colored items as Positive, Negative or Mixed sentiment. Right?

Monitoring data is one of the biggest challenges faced by the enterprises as big data is growing bigger and bigger and social media disrupting every industry. Nowadays, users share their experience to the social media, blogs, customer complaint forums to express their dissatisfaction and unhappiness towards any improper services/ products. Knowing and examining sentiments of the customers become imperative. Therefore, you need to analyze sentiments, handle the diverse sources and different formats of the structured and unstructured data.

 

‘Sentiment Analysis’ also known as opinion mining, is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.

 

Complexity –

-Now the thing is sentiment analysis is not as easy as it seems to be. Visually it’s just reading a statement and saying whether it is positive, negative or neutral.  But when this task has to be automated, it has to be done using complex machine learning algorithm out there at an advanced level. Things like sarcasm, poor spelling, lack of context (e.g.: the food was not that awful, the food was barely good, the food was too good), make it difficult for machine learning algorithm to understand the real sentiment for a text.
-Human likelihood on sentiment analysis – Don’t worry it’s not just a difficult task for an algorithm to find out the correct sentiment. When we have a set of people tagging sentiment for same reviews collection, there is only a probability of 60 – 90 % of chances that all of them agree with the same opinion.
-Next comes the context of topic or domain that we are concerned with for sentiment analysis. While “cheap price” can be a good review for a restaurant but “cheap wood” can be a bad review for a furniture shop.

How do we solve it?     

Now these heavy words like machine learning, learning algorithm would have left questions in your mind, that you need to have an expertise over these algorithms based approaches which is something related to pure science field especially if you have never dealt with it as a developer. Sadly, yes you should know about it… but let’s not make it complicated, for starters let’s discuss an introductory approach to conduct sentiment analysis. These techniques 100% come from experience in real-time projects.

Python for sentiment analysis –

There are many other languages like R, Java etc. for data analysis, but python has the power of Natural Language Toolkit (NLTK). It is an amazing library to play with the natural language. It provides great interfaces like corpora and lexical resources like WordNet, along with different text processing libraries like tokenizing, POS tagging, stemming, etc. These are very useful features to manipulate the data as per our requirements in NLP.

Approach –

Based on the challenges in sentiment analysis, we have categorized a statement occurring with the following possibilities:

Generally, a sentiment is positive, negative or neutral for a statement. But wait a second what about the sarcasm, like the words that totally give a negated meaning or the words that emphasize the sentiment like adverbs.

We classify them as follows:

Positive –

good, great, amazing, fantastic.

– the food was great.

– martinis were good.

Negative –

bad, worst, horrible.

– it was the worst chicken i had ever had.

– the movie was bad.

Negation –

– not, no longer, no way, couldn’t, shouldn’t, ain’t.

– the movie was not bad at all.

– we ain’t go there again.

Emphasize-rs:

Increment –

– too, lot, vastly, more, many.

– the food was too good.

– it could have been a lot better.

Decrement –

little, acute, barely, hardly.

– we could hardly chew the food.

This classification does not lose any kind of match of words that express the sentiment. But what do we do with these words?

In our approach, we have created a generalized dictionary like files for all the categories of these words. We can refresh these files over a period of time from large categorized words available over the internet.

Tokenizing –

This is like creating a base structure for our sentiment score calculation. We have to extract the words of sentiment for calculating the score.

– Each text is a list of sentences.

– Each sentence is a list of tokens(words).

Using NLTK we can get the above format; we tokenize the statements into words and remove the unnecessary stop words.

Calculation –

This is the final and most important part of the process.

– All the tokenized words are checked for their presence in the available dictionary file’s (positive, negative, increment, decrement, negation).

Part 1 –

– We loop over set of tokenized list and look for positive and negative words from the         dictionary in the tokenized words.

– If either of it is present we increment 1 for positive and -1 for negative.

Part 2 –

– In this same course we also check for the Negators and Emphasize-rs.

for the current word, every previous word is checked in the dictionary.

(this is in reference to cases like ex: it was not as great as we expected,

the fish was barely cooked to eat, they had many varieties in the menu)

– If such words are identified then,

  1. Incremental words:

-the current score is incremented to *2

-in case of no current score it is incremented to +1

  1. Decremental words:

-the current score is decremented to /2

-in case of no current score it is decremented to -1

3.Negating words:

-the current score is negated by *-1

-in case no current score it is decremented to -1

So, this is how we generalize the calculation for sentiment score. A basic approach of calculation

based on the possibility of the words.

Isn’t it cool for starters interested in sentiment analysis! –  but this is only the very beginning.

There are a lot more machine learning techniques to do sentiment analysis, but at an introductory level this seems to work quite perfect guys…. Over a period of time keep improvising your dictionary and have fun sentimentally analyzing.

 

Regardless of what SaaS vendors might claim, diving into sentiment analysis is not a task for amateurs and data analytics projects should be handled by experienced data experts. Big data tools can provide unbiased insight into generated data from any source for proper and accurate decision making and implementation.

And if you want our help putting sentiment analysis into practice for your kind, just give a shout!

Continue Reading
Say Hello!