Analysing news sentiment using Clojure, Elasticsearch and Kibana.
I've been working on a project recently to do with analysing the underlying sentiment found in online news stories. I wanted to have a play with Kibana to do some "real time" visualisations and I wanted to do it without using log files.
So, I came up with a small (less than 100 lines) Clojure program that would:
- Query various popular news sites and analyse words using an open source lexicon dataset.
- Filter any words that have little impact, like "the".
- Sanitize anything that may have contained HTML markup.
- Store in Elasticsearch.
I've left it running for ~24 hours and the results are interesting, if not very scientifically accurate.
It has aggregated data from the following source.
- BBC News
- Sky News
- Fox News
- Huffington Post
- Daily Mail
- CNN
- Telegraph
- Reuters
- Yahoo News
- Google News
All of the results below assume 0 is a neutral sentiment, <0 is a negative sentiment and >0 is a positive one.
Daily Mail (MailOnline)
Everyone knows the Daily Mail is typical tabloid news, I expected that it would probably generate quite a negative sentiment. It was consistently the most negative.Huffington Post
I didn't realise it before now but the Huffington Post has quite a lot of positive news as well as some reasonably humorous articles. It was consistently the most postive. It also appears to publish a LOT and very frequently, which makes the graph a bit jumpy.BBC News
I expected that the BBC might be quite consistent and might represent a good view of overall news sentiment. It was more negative than I expected, I guess that is just the world we live in.All of the other news sources (Sky, Fox, CNN, Telegraph, Reuters, Yahoo, Google) trended much closer to a neutral sentiment.
Overall
I think the overall sentiment may be a little off because of including the Huffington Post data, I think in the long run, the more sources I add the better the data will be.I'm planning to leave this running over a longer period and see how it gets on, it would be interesting for it to pick up a major news event, we will see.
For anyone interested, here is a screenshot of the Kibana dashboard.
I'll do a more technical post soon and make available the code once I'm finished.