Twitter News Project
Members : Me and Rohan Anand
Current Status :
Have collected around 3 million tweets.
Have managed to get 1 lac tweets with resolved URLs.
Running script to get more tweets with URL resolved.
Have done previously decided analysis and doing further analysis (on 1 lac tweets).
- The temporal analysis of the tweeting pattern of the users.We defined notion
of a 'switch factor' per user which is basically the number of times a user switches
back to the previous topic/category of tweet.
- This analysis shows that people are likely to tweet about one particular category
in one go and it's unlikely that they will return to old topics later.
- Here is the plot suggesting the above idea.plot
- The preliminary results of Different news sources,Categories and subcategories.
- The ranking of different Categories based on Retweet count and retweet ratio
- User based analysis.Calculating user's interest based on his/her tweeting pattern.
User clustering based on their interests in different topics.
Sixth and Seventh Week Updates
Interesting results found :
- Did URL based analysis on the 1 lac tweets.
- Categorised the tweets into different categories based on the categories in the URL(currently only for nytimes).
- Ranked categories based on the number of tweets
- The most tweeted category is 'world' which reflects the global audience of nytimes.
- Movies and technology isn't among the top ten.
Fifth Week Updates
- URL resolving script was quite expensive and taking 2 sec on avg for resolving one URL.
- Tried various methods to optimize this process but no luck
- So decided to start analysis on smaller sample set of 15,000 tweets with resolved URLs and then will see the results for
bigger sets later.
Fourth Week Updates
- Got a free trial basis Cloud servive for one month which ran for one week to get new tweets around 1 millon.
- Started running Script on these data to get full length URLs.This is taking lot of time and proxy both.
We need more internet proxy and computing resource for this
- Thinking of key-words to do classifying these URLs
Third Week Updates
- Tried but couldn't start getting data on cloud because of technical infeasibility.
- Will use live machine to get tweets as soon as we get hold of a machine.
- Written script to unshorten the URLs in the old twitter data and So now we have the full length URLs.
- Will start classifying these URLs
Second Week Updates
more papers Here
In bib formatBib file
- CSC admin denied permission to get proxy removed from cloud.
- will install browser and login from it to start polling data.
Some interesting Articles
First Week Updates(Up till the meeting on 17th Jan,Tue)