Ethan Chiu My personal blog

Creating Realtime, Client-side Twitter Analysis Script for the United Timeline

I recently started working on a project that shows and analyzes real-time Conservative, Moderate, and Liberal viewpoints across social media called the United Timeline.

Here’s the mission statement: “The media constantly bombards us with polarizing rhetoric that support our own viewpoints. Without viewing other perspectives, we create an echo chamber that limits the extent to which we can have an open dialogue that critically engages with important topics and situations. Ignoring arguments from the other political side creates divisiveness, furthering the divide between Americans and bringing no real progress when it comes to reforms and policies in Congress and elsewhere.

The United Timeline provides real-time social media posts and analysis of liberal, moderate, and conservative viewpoints. Our goal is to bridge the divide between liberals, conservatives, and moderates by sharing the current conversation from each side on social media.”

Currently, we show what Twitter news feeds would look like from those different viewpoints.To achieve this, I started out creating the Twitter timelines for Conservative, Moderate, and Liberal viewpoints. This was pretty straightforward. I created a Twitter account and created three different lists containing the most prominent and intelligent pundits from each side. Then, I simply embedded them side by side.

After that, I wanted to generate word clouds of the most recent Tweets. One way that instantly popped up in my mind was to simply use Twitter’s API. Unfortunately, the API has limits that would easily be passed for real-time analysis. Another method was using Selenium to constantly scrape the most recent Tweets. This would require me to create a server with a backend and also might not work since Twitter blocks scraping after a certain limit.

So, I created a way to analyze the 20 most recent Tweets from each list all in the client-side (no backend). As a result, I don’t have to rely on backend processing or worry about server costs. To analyze these Tweets in real time, I first analyzed how Twitter’s embedded lists load onto a site. I saw that the embedded link transformed into a asynchronous script, meaning it loads in the background of the website as the user interacts with the list. As a result, I programmed a script which checks when the async Twitter script is loaded for each list. Afterwards, it grabs the top 20 Tweets’ text content using jQuery to identify the class names. Then, I use regex to eliminate any urls, @ tags, or punctuation so that only words are generated in the word clouds. After that, I calculate the word frequency and format that into a list that the wordcloud2 Javascript library can understand.

So, my script allows the user to see realtime graphs of the 20 most recent Tweets from each list every time he or she refreshes the page!

Here’s the code:

$( document ).ready(function() {
	var $canvas = $('#word_cloud');

	//Temp solution since Twitter uses async for embeded links.
	generateWordCloud();
});

//Edited https://stackoverflow.com/questions/30906807/word-frequency-in-javascript
function wordFreq(string) {
		// get rid of urls, @ mentions, punctionation + spaces
		var no_url = string.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '').replace(/\S*@\S*\s?/g, "").replace(/(?:(the|a|an|and|of|what|to|for|about) +)/g, "").replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"");
    var words = no_url.replace(/[.]/g, '').split(/\s/);
    var freqMap = {};
    words.forEach(function(w) {
        if (!freqMap[w]) {
            freqMap[w] = 0;
        }
        freqMap[w] += 1;
    });

    return freqMap;
}

function generateWordFreqs(raw_data, name) {
	var tweets = [];
	for(var i = 0; i<raw_data.length; i++){
		tweets.push(raw_data[i].innerText);
	}
	//Regex to elimate tags, etc. + join words into one coherent group
	tweets = tweets.join(' ');

	listA = wordFreq(tweets);
	var list = [];
	for (var key in listA)
	{
		list.push([key, listA[key]*5]);
	}
	WordCloud.minFontSize = "15px";
	WordCloud($('#' + name + '_word_cloud')[0], { list: list });
}

function generateWordCloud(){
	setTimeout(
	    function() {
	    	//Set up variables, get document values from iframes loaded asyncronously
				//Work around for not being able to explicitly call $("#twitter-widget-.....")
	    	var liberal = $(document.getElementById('twitter-widget-0').contentWindow.document).find(".timeline-Tweet-text");
	    	var moderate = $(document.getElementById('twitter-widget-1').contentWindow.document).find(".timeline-Tweet-text");
	    	var conservative = $(document.getElementById('twitter-widget-2').contentWindow.document).find(".timeline-Tweet-text");

	    	//Generate Word Clouds
				generateWordFreqs(liberal, "liberal");
				generateWordFreqs(moderate, "moderate");
				generateWordFreqs(conservative, "conservative");
	    }, 250);
}

Here’s the link where you can see this all in action. Screenshot United Timeline Analysis

Obviously, this isn’t the prettiest solution out there. I could of used recursion and for loops to eliminate some repetition, but I feel like this is the clearest explanation. For a full list of my other attempts for analyzing the Tweets in realtime completely client-side, please check out the Javascript file on Github.

Since this script only works for 20 of the most recent Tweets, I’m most likely going to create a Python backend to process more Tweets (like every Tweet in a day) and present different word clouds daily. This would also allow me to use libraries such as NTLK to filter out connecting words.

We are currently working on adding more analytic tools and integrations. This website is completely open source and open to contributions. We are currently working on adding more analytic tools and integrations. If you have any suggestions/concerns, feel free to open up an issue over here.