In the past one decade, there has been an exponential surge in the online activity of people across the globe. The volume of posts that are made on the web every second runs into millions. To add to this, the rise of social media platforms has led to flooding to content on the internet.
Social media is not just a platform where people talk to each other, but it has become very vast and serves many more purposes. It has become a medium where people
- Express their interests.
- Share their views.
- Share their displeasures.
- Compliment companies for good and poor services.
So in this article, we are going to learn how we can analyze what people are posting on social networks (Twitter) to come up a great application which helps companies to understand about their customers.
Before we drive further, let’s look at the table of contents of this article.
Table of contents:
- People emotions to how customers felt about the product
- How to create the twitter app
- Sentiment analysis using twitter tweets
- Why sentiment analysis?
- Challenges in performing sentiment analysis on twitter tweets
- Implementing sentiment analysis application in R
- Extracting tweets using Twitter application
- Cleaning the tweets for further analysis
- Getting sentiment score for each tweet
- Segregating positive and negative tweets
- Conclusion
People emotions to how customers felt about the product
Social networks has grown from a mere chatting platform to a storehouse of data which could help companies solve many problems.
Which could help companies understand their customers better. What competitors are doing. Which could help companies understand what customers are talking about it.
Though at prima facie, it looks like a storehouse of insights it may not be as easy to extract the relevant information out of the unstructured text. Analyzing textual data is always difficult because of the inherent ways in which people write their posts.
Nevertheless, posts made by people on social media can be very expressive and help us understand their sentiments and emotions. Twitter, being one of the most popular social media platforms, is a platform where people often resort to express their emotions and sentiments about a brand, a product or a service.
How to create the Twitter app?
Twitter has made the task of analyzing tweets posted by users easier by developing an API which people can use to extract tweets and underlying metadata.
This API helps us extract twitter data in a very structured format which can then be cleaned and processed further for analysis.
To create a Twitter app, you first need to have a Twitter account. Once you have created a Twitter account, visit Twitter’s app page (Click here) and create an application.
Write the basic details such as application name, description along with a website name. You may enter any test website name as well. Once you have entered these details, you will get keys and access tokens. You will get 4 keys and tokens:
- Consumer Key (API Key)
- Consumer Secret (API Secret)
- Access Token
- Access Token Secret
These keys and tokens will be used to extract data from Twitter in R.
Sentiment Analysis Using Twitter tweets
Before going a step further into the technical aspect of sentiment analysis, let’s first understand why do we even need sentiment analysis.
Why sentiment analysis?
Let’s look from a company’s perspective and understand why would a company want to invest time and effort in analyzing sentiments of the posts. Analyzing each post and understanding the sentiment associated with that post helps us find out which are the key topics or themes which resonate well with the audience.
If the sentiment around the post is very positive, then people want to talk about the topic in that post. The topic could be a product or a service or a social message or any other thing. Understanding this can help us decide the kind of posts the company needs to put on social media platforms to increase the user engagement.
Also, analyzing the sentiment of a company over a period could help us relate its sales data with the overall sentiment. Was there a negative campaign at some time which resulted in the negative sentiment of the company.
Addressing questions
- Thereby, resulting in the decline in sales during that period?
- Was there a huge spike in positive sentiment because a celebrity talked about company’s product?
- Did that positive spike result in positive sales?
- Understanding the posts with negative sentiment could help us find the common themes in these posts?
- Is customer service a common topic among posts which have high negative emotion?
All these questions could help us understand how customers are perceiving the company. What they are talking about the company product. What are they liking and what are they disliking.
I am sure, you will agree with me if I say, “Sentiment analysis of tweets or social media posts can help companies better analyze customer feedback and opinion, and better position their strategy.”
Challenges in performing sentiment analysis on twitter tweets
Given all the use cases of sentiment analysis, there are a few challenges in analyzing tweets for sentiment analysis. The first one is data quality. The Twitter application helps us in overcoming this problem to an extent.
After basic cleaning of data extracted from the Twitter app, we can use it to generate sentiment score for tweets. The second problem comes in understanding and analyzing slangs used on Twitter.
People have a different way of writing and while posting on Twitter, people are least bothered about the correct spelling of words or they may use a lot of slangs which are not proper English words but are used in informal conversations.
There is a lot of research going on in this area and a lot of people have been able to develop slang dictionaries to understand their meaning. We won’t be focusing on this part in this article; we will use the standard dictionaries and packages available in R for sentiment analysis.
The third and the biggest problem in sentiment analysis is decoding sarcasm. Since sentiment analysis works on the semantics of words, it becomes difficult to decode if the post has a sarcasm.
Implementing sentiment analysis application in R
Now, we will try to analyze the sentiments of tweets made by a Twitter handle. We will develop the code in R step by step and see the practical implementation of sentiment analysis in R.
The code is divided into following parts:
- Extracting tweets using Twitter application
- Cleaning the tweets for further analysis
- Getting sentiment score for each tweet
- Segregating positive and negative tweets
Extracting tweets using Twitter application
We will first install the relevant packages that we need. To extract tweets from Twitter, we will need package ‘twitteR’.
‘Syuzhet’ package will be used for sentiment analysis; while ‘tm’ and ‘SnowballC’ packages are used for text mining and analysis.
1 2 3 4 5 6 7 8 9 10 11 | # Install Requried Packages installed.packages(“SnowballC”) installed.packages(“tm”) installed.packages(“twitteR”) installed.packages(“syuzhet”) # Load Requried Packages library(“SnowballC”) library(“tm”) library(“twitteR”) library(“syuzhet”) |
Next, we will invoke Twitter API using the app we have created and using the keys and access tokens we got through the app.
1 2 3 4 5 6 7 8 9 10 | # Authonitical keys consumer_key <– ‘ABCDEFGHI1234567890’ consumer_secret <– ‘ABCDEFGHI1234567890’ access_token <– ‘ABCDEFGHI1234567890’ access_secret <– ‘ABCDEFGHI1234567890’ setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret) tweets <– userTimeline(“realDonaldTrump”, n=200) n.tweet <– length(tweets) |
We have invoked the Twitter app and extracted data from the twitter handle ‘@realDonaldTrump’. We will now see what format we have got the extract and what all steps do we need to take to clean the data.
Cleaning the tweets for further analysis
1 2 3 | tweets.df <– twListToDF(tweets) head(tweets.df) |
We get a total of 16 variables using ‘userTimeline’ function, snapshot of the sample data is shown below.
The field ‘text’ contains the tweet part, hashtags, and URLs. We need to remove hashtags and URLs from the text field so that we are left only with the main tweet part to run our sentiment analysis.
Our current text field looks like below:
1 2 3 4 5 6 7 | > head(tweets.df$text) [1] “We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… https://t.co/4GQmdSmiRk” [2] “This is your land, this is your home, and it’s your voice that matters the most. So speak up, be heard, and fight,… https://t.co/u09Brwnow3” [3] “Just arrived at the Pensacola Bay Center. Join me LIVE on @FoxNews in 10 minutes! #MAGA https://t.co/RQFqOkcpNV” [4] “On my way to Pensacola, Florida. See everyone soon! #MAGA https://t.co/ijwxVSYQ52” [5] ““The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… https://t.co/ujuFLRG8lc” [6] “MAKE AMERICA GREAT AGAIN! https://t.co/64a93S07s7” |
This contains a lot of URLs, hashtags and other twitter handles. We will remove all these using the gsub function.
1 2 3 4 5 6 7 | tweets.df2 <– gsub(“http.*”,“”,tweets.df$text) tweets.df2 <– gsub(“https.*”,“”,tweets.df2) tweets.df2 <– gsub(“#.*”,“”,tweets.df2) tweets.df2 <– gsub(“@.*”,“”,tweets.df2) |
Our output now looks like below:
1 2 3 4 5 6 7 8 9 10 | > head(tweets.df2) [1] “We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… “ [2] “This is your land, this is your home, and it’s your voice that matters the most. So speak up, be heard, and fight,… “ [3] “Just arrived at the Pensacola Bay Center. Join me LIVE on “ [4] “On my way to Pensacola, Florida. See everyone soon! “ [5] ““The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… “ [6] “MAKE AMERICA GREAT AGAIN! “ |
Now, we have only the relevant part of the tweets and we can run our sentiment analysis part on the data.
Getting sentiment score for each tweet
We will first try to get the emotion score for each of the tweets. ‘Syuzhet’ breaks the emotion into 10 different emotions – anger, anticipation, disgust, fear, joy, sadness, surprise, trust, negative and positive.
1 2 3 4 5 6 7 | word.df <– as.vector(tweets.df2) emotion.df <– get_nrc_sentiment(word.df) emotion.df2 <– cbind(tweets.df2, emotion.df) head(emotion.df2) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | > head(emotion.df2) tweets.df2 anger anticipation disgust 1 We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… 0 0 0 2 This is your land, this is your home, and it‘s your voice that matters the most. So speak up, be heard, and fight,… 1 0 0 3 Just arrived at the Pensacola Bay Center. Join me LIVE on 0 0 0 4 On my way to Pensacola, Florida. See everyone soon! 0 0 0 5 “The unemployment rate remains at a 17–year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… 0 0 1 6 MAKE AMERICA GREAT AGAIN! 0 0 0 fear joy sadness surprise trust negative positive 1 0 1 0 0 3 0 2 2 1 0 0 0 0 1 1 3 0 0 0 0 1 0 2 4 0 0 0 0 0 0 0 5 1 0 0 0 1 1 1 6 0 0 0 0 0 0 0 |
The above output shows us the different emotions present in each of the tweets.
Now, we will use the get_sentiment function to extract sentiment score for each of the tweets.
1 2 3 4 5 | sent.value <– get_sentiment(word.df) most.positive <– word.df[sent.value == max(sent.value)] most.positive |
1 2 3 4 5 | most.negative <– word.df[sent.value <= min(sent.value)] most.negative > most.positive [1] “Stock Market hits new Record High. Confidence and enthusiasm abound. More great numbers coming out!” |
1 2 | > most.negative [1] “Horrible and cowardly terrorist attack on innocent and defenseless worshipers in Egypt. The world cannot tolerate t… “ |
Let us see how the score of each of the tweets has been calculated. In all, there are 154 tweets that we are evaluating, so there should be 154 positive/negative scores, one for each of the tweets.
1 2 3 4 5 6 7 8 | > sent.value [1] 1.55 –0.50 0.50 0.00 –0.60 0.50 –0.75 0.50 1.00 1.55 0.00 –1.00 1.85 0.00 0.50 –0.50 1.55 0.50 0.25 0.75 0.50 0.50 2.75 [24] 0.85 0.75 –0.25 –0.50 0.40 –1.75 –1.75 –1.60 0.50 –1.65 0.75 1.00 –1.35 0.50 0.25 –2.60 0.00 1.15 0.25 –1.25 –0.50 –2.75 –1.10 [47] –2.25 1.85 0.60 0.00 2.10 0.50 –0.25 3.05 –0.25 –0.75 –0.75 0.05 –0.85 0.00 –0.75 0.00 2.80 1.50 0.75 0.00 –0.05 0.65 –0.75 [70] –0.50 2.25 –1.75 0.00 0.75 0.75 1.55 0.15 0.65 0.15 0.80 0.00 –0.10 –2.00 –3.25 –3.45 –0.10 0.00 –1.50 0.50 0.50 0.00 2.25 [93] 1.55 0.80 0.50 0.00 2.35 0.30 –0.25 0.60 0.00 0.65 0.80 0.55 0.40 1.15 –0.10 –1.35 0.00 1.35 –1.00 0.00 –1.10 –1.10 0.00 [116] –1.15 1.95 1.50 1.55 0.00 0.50 –0.50 –0.75 0.50 0.75 0.70 0.25 0.75 1.25 –0.25 –1.95 –2.75 1.25 –0.75 –0.40 0.50 0.50 –0.50 [139] 0.00 2.85 1.25 0.50 1.50 0.50 0.40 0.00 0.50 0.50 1.00 1.00 2.05 0.25 0.50 0.50 |
Segregating positive and negative tweets
Now, we will segregate positive and negative tweets based on the score assigned to each of the tweets.
1 2 3 4 5 6 7 8 9 | > positive.tweets <– word.df[sent.value > 0] > > head(positive.tweets) [1] “We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… “ [2] “Just arrived at the Pensacola Bay Center. Join me LIVE on “ [3] “MAKE AMERICA GREAT AGAIN! “ [4] “LAST thing the Make America Great Again Agenda needs is a Liberal Democrat in Senate where we have so little margin… “ [5] “Big crowd expected today in Pensacola, Florida, for a Make America Great Again speech. We have done so much in so s… “ [6] “I fulfilled my campaign promise – others didn’t! “ |
1 2 3 4 5 6 7 8 | > negative.tweets <– word.df[sent.value < 0] > > head(negative.tweets) [1] “This is your land, this is your home, and it’s your voice that matters the most. So speak up, be heard, and fight,… “ [2] ““The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… “ [3] “Fines and penalties against Wells Fargo Bank for their bad acts against their customers and others will not be drop… “ [4] “Across the battlefields, oceans, and harrowing skies of Europe and the Pacific throughout the war, one great battle… “ [5] “National Pearl Harbor Remembrance Day – “A day that will live in infamy!” December 7, 1941” [6] “Putting Pelosi/Schumer Liberal Puppet Jones into office in Alabama would hurt our great Republican Agenda of low on… “ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | > neutral.tweets <– word.df[sent.value == 0] > > head(neutral.tweets) [1] “On my way to Pensacola, Florida. See everyone soon! “ [2] “Tonight, “ [3] “Today, the U.S. flag flies at half-staff at the “ [4] “Biggest Tax Bill and Tax Cuts in history just passed in the Senate. Now these great Republicans will be going for f… “ [5] “Our FIFTH 1K milestone of 2017!\n” [6] “The only people who don’t like the Tax Cut Bill are the people that don’t understand it or the Obstructionist Democ… “ # Alternate way to classify as Positive, Negative or Neutral tweets category_senti <– ifelse(sent.value < 0, “Negative”, ifelse(sent.value > 0, “Positive”, “Neutral”)) head(category_senti) |
1 2 3 4 5 6 7 8 9 10 11 | > head(category_senti) [1] “Positive” “Negative” “Positive” “Neutral” “Negative” “Positive” > category_senti2 <– cbind(tweets,category_senti,senti) > head(category_senti2) tweets category_senti senti [1,] “We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… “ “Positive” “1.55” [2,] “This is your land, this is your home, and it’s your voice that matters the most. So speak up, be heard, and fight,… “ “Negative” “-0.5” [3,] “Just arrived at the Pensacola Bay Center. Join me LIVE on “ “Positive” “0.5” [4,] “On my way to Pensacola, Florida. See everyone soon! “ “Neutral” “0” [5,] ““The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… “ “Negative” “-0.6” [6,] “MAKE AMERICA GREAT AGAIN! “ “Positive” “0.5” |
So, now we have analyzed the twitter handle of Donald Trump and got the sentiment around tweets. The break of total number of tweets by sentiment is
1 2 3 4 | > table(category_senti) category_senti Negative Neutral Positive 49 20 85 |
Conclusion
I’m sure you can now easily relate to the significance of sentiment analysis that I have discussed at the beginning of the article.
Sentiment analysis could be extended to a far greater extent, even to images as well. Though there are a lot of tools available in the market already but having practical knowledge of how does the entire process works is beneficial.
Moreover, the available tools are very expensive and do not offer the level of flexibility and customization that you can develop using R.