There have been many articles written on linguistic and sentiment analysis for newspapers text, financial data service text application and microblogging web text message in the literature. Because, being easy to collect the comprehensive datasets and increasing the population of the microblogging provide an efficient study field for researcher.
Bagehot (1971) presented a new idea to literature. He emphasized information asymmetry term and the effect of information on trading, without confirming any mathematical model. The basic idea created inspire for information based on markets for many researcher in the following years. (Storkenmaier, 2011) .
Glosten and Milgrom (1985) demonstrated an academic model to the literature in order to develop Bagehot’s idea (1971). They observed how market participants with superior information play a role to notify to the market. (Glosten & Milgrom, 1985)
Antweiler & Frank (2004) published to The Journal of Finance in 2004. They investigated 1,5 million message related to 45 firms in DJIA on Yahoo Finance. They resulted that stock related messages can be used to forecast volatility of the stock market significantly.
In early studying in this topic, Trahan and Bolster (1997) analyzed new a popular financial news weekly, Barron’s, to see relation between stock prices and magazine’s section of Invesment News and Views, besides the articles which contains purchase recommendations. They found second hand information in media had a positive impression on stock price significantly.
Read (2005) noted that sentiment classification pursues to determine a piece of text in order to understand that it is positive or negative according to its author’s general feeling. The author demonstrated that match with respect to domain and time is also important, and presents preliminary experiments with training data labelled with emoticons such as “:-)”1 and “:-(“2 to form a training set for the sentiment classification. The emoticons can be independent of field, subject and date (Read, 2005).
Tetlock (2007) handled the Wall Street Journal to investigate effect of media on the stock exchange. He designed a Pessimism Media Factor model to forecast price, volume and performance of DJIA stocks. The author concluded that negative news of the media has an influence on return but this effect continues temporarily. Moreover, stock volume is predictable when this negative effect is outstandingly large or short.
Tetlock et al., (2008)
Choudhury et al. (2008) improved a baseline model to perform communication dynamics in the blogosphere. Using these dynamics, they defined stimulating correlations with stock market movement. It is remarkably observed that the communication significantly correlated with stock market. They used two baseline methods and Support Vector Machine3 (SVM) and they showed that average 78% accuracy to predict the scale of stock market movement and 87% accuracy to the direction of the same movement.
Giller (2009) inspected a little dataset for an experiment in the use of Twitter to publicise a record of directional intraday index futures trades. The author used the maximum likelihood ratio test and he discovered a positive correlation between success and an increase in the number of followers.
Go et al. (2009) design an algorithm that can properly classify Twitter messages as positive or negative, with respect to a query term. The authors provided high accuracy on classifying sentiment in Twitter messages utilizing machine learning methods.
Gloor et al. (2009) presented an unconventional studying of social network analysis. The studying based algorithms for mining the Web, blogs, and online forums to determine trends and find the investors who start these new trends. The authors presented a correlation between Web buzz and real-world events.
Word of Mouth (WOM) identified as the transferring of information from an individual to another individual by oral communication. Moreover, WOM is more trustable information about any brands and products. Next step, with developing technology, Electronic WOM is online information transferring from an internet user to another inter user at the present time. Jansen et al. (2009) examined more than 150,000 microblog texting of social media sites containing branding discussions, sentiments, and perceptions. They planned to clarify impact of microblog texting via electronic WOM on the brand information and brand relationship. Within this framework, they analysed the timing, frequency, range of tweets. They conducted that microblogging is an online tool for individuals and companies. Using microblogging is a part of word of mouth communications for individuals and it is also a tool for companies to see effect of their marketing strategy.
Bollen et al. (2010) collected Tweet texting related (DJIA) over time. They designed to measure collective mood states. For this reason, the authors determined 6 dimensions4 in order to the mood in terms. Their results proved that using with public mood dimensions can improve to predict DJIA significantly. Moreover, they state that they realized a precision of 87, 6% in predicting the daily fluctuations for Dow Jones Index close values. In addition, the authors succeed to decrease the Mean Average Percentage Error more than %6.
Sprenger et al. (2010) examined approximately 250.000 twitter messages related S&P 100 companies for on a daily basis, using methods computational linguistics and Naïve Bayesian Classification. The authors expressed that message volume with abnormal stock return includes respected information to forecast following day trading volume.
Moreover, they pointed out that users writing investment advice more than average are retweeted more often and have more followers.
Pak et al. (2010) presented a method for an automatic collection of a corpus that can be used to train a sentiment classifier. They collected 300.000 text posts from Twitter. They implemented linguistic analysis for sentiment analysis and opinion mining purposes. They explained discovered phenomena. Using the linguistic analysis, they design a sentiment classifier in order to define positive, negative and neutral attitudes for a text. Analyses proved that their proposed method are effective and achieves better than earlier proposed methods (Pak & Patrick, 2010).
Castillo et al. (2011) studied the information reliability of news in a given set of tweets.
They prove that it is able to separate messages related newsworthy subject from other kind of text messages. They evaluated social media reliability about newsworthy topics.
They also show that they can assess automatically the level of social media credibility of newsworthy topics. A few authors write credible news and these news is broadcasted by other online users with re-posting. There are measurable differences in the way messages propagate to classify them as trustworthy or not trustworthy, with precision and recall in the range of 70% to 80 (Castillo, et al., 2011). Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
Bollen et al., (2011) implied a sentiment mining for Twitter post messages. They used a psychometric test to ensure six mood states like their previous studied in 2010. However, they changed kind of six moods (tension, depression, anger, vigor, fatigue, confusion) this time. Twitter text message was associated daily and it was computed a six dimensional mood vector from postings (Bollen, et al., 2011). They analysed extremely specific effect between texts related economic, political, cultural, and social, other major events and 6 dimensional Profile of Mood States (PMOS). They find that events in analyses of public mood can provide to detect the emotive trend of society. Furthermore, this trend can help to ensure indicators to predict economics events.
Zhang et al. (2011) aimed to analyse Twitter posts in order to predict stock market indicators in U.S. Financial stock market index. They gathered Tweets for six months.
They collected the twitter feeds for six months. In addition they calculated collective hope and fear daily and they observed to relationship with stock market indicators. They expressed negative correlation between tweet subjectivity analysis and Dow Jones, S&P 500 and NASDAQ indexes. On the other hand, the authors demonstrated positive correlation Chicago Board Options Exchange Volatility Index significantly. Moreover they displayed that if emotions on twitter increase, people feel hope, fear and worry. Then, Dow jones Index decrease next day. In contrast, if observed twitter sentiment lose hope, fear and worry, then Dow Jones Index increase next day. Therefore, tracking on twitter opinion extraction is a useful predictor to see the stock market next day. (Zhang, et al., 2011)
Rao and Srivastava (2012) investigated relationship Tweeter messages and stock price, volume and volatility of DJI, NASDAQ-100 Index, 13 technology companies. They inserted %88 correlation twitter sentiment mining and stock prices. They also displayed an equation to predict stock returns with a high value of R-square (%95, 2).
Mao et al., (2012) implied a regression model with exogenous input to forecast stock market indicator by Twitter data exogenous input. Result displayed that tweets are correlated stock market indicators (Mao, et al., 2012). Moreover, they find that Twitter is a benefit tool to forecast stock market.
Sprenger et al., (2014) demonstrated a methodology to determine news events based on social media.They implied a computational linguistics to more than 400,000 stock-related Twitter messages about S&P 500. They separate good and bad news. They resulted that the returns before good news events are clearer than returns before bad news events. They displayed that the stock market effect of news events is different in diverse categories.
Türkmeno?lu and Tantu? (2014) comprised Lexicon based and Machine Learning methods. They analysed Turkish twitter data set and movie dataset by using these two approaches in order to display their weak and strength ways.
Çoban et al., (2015) collected Turkish Twitter messages to investigate the classification methods for sentiment analyses. They assessed the methods such as Naive Bayes, Multinomial Naive Bayes, Naive K Nearest Neighbors and Support Vector Machines learning algorithms. N-gram model is better performing than Bag of Word model. Also, Naïve Bayes exhibited the best performance in algorithmic methods (Çoban, et al., 2015).
Ranco et al., (2015) collected 15 months Tweeter message to demonstrate relations twitter sentiment analyse, twitter volume and abnormal returns of 30 companies of DJIA index. They claimed significant dependence between abnormal returns and Twitter subjectivity analysis when Twitter volume reached the peak level. Furthermore, the authors demonstrated that Twitter volume at the peak level can forecast the direction of stock returns.
Eliaç?k and Erdo?an (2015) suggested a new user metrics method. For this purpose, they calculated financial community’s sentiment polarity in social media to test the method. They analysed to correlation between BIST100 index and financial community’s sentiment weekly. They achieved a significant linear relationship between the market and financial community’s sentiment by using this recent method.
Dickinson and Hu (2015) realized a opinion extraction for stock related tweets in order to display a correlation between sentiment and stock price movement. They used n-gram and “word2vec” techniques to classify the tweets. They found a significant correlation between stock market and sentiment mining. Microsoft and Walmart have a positive correlation, on the other hand, Goldman Sachs and Cisco System has a negative correlation strongly. They claim that consumer facing companies has a different interaction according to other companies.
Heston et al., (2016) used a dataset of more than 900,000 news stories. They aim to test whether news can predict stock returns using textual analysis of news stories based on a neural network. They confirm that daily news is able to forecast stock returns for a few days and this case also confirmed their previous research. On the other hand, weekly news forecast stock returns for one quarter (Heston & Sinha, 2016). After positive new stories, stock return increase rapidly, however effect of negative stories have longer deferred response (Heston & Sinha, 2016).
Akgul et al., (2016) tested sentiment twitter software. The program classify each tweet as positive, negative or neutral. They used the program to assess which method one has better performance among n-gram and lexicon methods. In conclusion, they demonstrated lexicon method has better performance.
In literature, Kaynar et al., (2016) analysed the performing of classification algorithms models such as Multi-Layer Perception, Support Vector Machines, Central Based Classified and Naive Bayes. They used content of review of Internet Movie Database (IMDb) for analysing. They found that neural network and SVM5 demonstrated better performance.
Kordonis et al., (2016) collected Twitter data and they implied Naive Bayes Bernoulli and Support Vector Machine to analyze sentiment of Twitter. As a result, they found a correlation between subjectivity analysis of Twitter and stock price.
Joshi et al., (2016) reseached the relation between financial news article and stock trends. They aimed to capture the relation between stock trends and sentiment analysis. They provided a good test performance via Support Vector Machines6 and Random Forest Model7. On the other hand, Naïve Bayes8 had a relatively good performance but not better than other two (Joshi, et al., 2016).Otherwise, the authors stated that the success of the prediction model is more than %80.
Baykara and Gürtürk (2017) investigated a sentiment analyse for Twitter by using Bayes algorithm methodology. They determined the Twitter message posts are positive, negative or neutral. In addition, they classified users’ tweeter message as news, politics, and culture, according their content.
Kebabc? and Diri (2017) developed a system to classify Turkish Tweets. They implied Naive Bayes and Support Vector Machines together for classification. They studied Hybrid TFIDF method in order to summarise classified tweets. In conclusion, the article achieved to define opinion of Tweeter user publicly.
Kürkçü (2017) aimed to define the interaction of degree of online news media. Furthermore, she collected to official twitter account of Turkish Agencies and Turkish Newspaper daily. She found ratio of “retweets” and “likes” in tweets in 48 hours after news occurs in order to measure users’ interaction levels.
Zhang et al., (2017) aimed to provide full tweet content in different language, as well. Therefore, they developed a multilingual tweets classification method to support over 40 languages. They encrypted the order of characters in a tweet according to UTF-8 codes and they implied a character-based CNN classification method. They showed that UniCNN model is better performing than traditional methods and it is fully language independent. Also, the code based system doesn’t require any tokenization or translation.
Social media my affect the financial markets (Jadhav & Wakode, 2017). Therefore, the authors calculated sentiment score of Twitter in 2017. They tested different technique in order to forecast the stock market. Moreover, Jadhav and Wakode (2017) argued to improve and accelerate the performance of computation.
3 Support Vector Machine is a learning algorithms model to examine data for classification and regression analysis.
4 Calm, Alert, Sure, Vital, Kind, and Happy
5 Support Vector Machines
6 % 90 correctly classified
7 % 80 correctly classified
8 % 75 correctly classified