Uncovering key characteristics of a Twitter community network

About the project

We collaborated with researchers from Pomona College to analyze network connections in Twitter social graphs.

Location:

United States

Industry:

Research

Services:

Data analytics

Business type:

University

Challenges

The research team reached out to us because their data analysis process was too slow for their needs.

The dataset consists of 6.5 Tb-s of tweets. In their estimates, a single analysis would require several months to complete, which would need to be repeated every time they want to adjust the parameters. They needed to cut down on that time requirement radically.

Solution

The logical way to do that would be to make the dataset smaller somehow. We can do that by focusing only on what is really valuable and ignoring the rest, so we started with filtering. The research did not use tweets with pictures or ones written in another language than English, for example.

We also identified and deployed a file format that radically reduces the time required for analyzing the filtered data.

However, the value we provided went way beyond finding a suitable compression method, because the new dataset enabled the research team to uncover new findings.

Instead of simply analyzing the content of the tweets, it was now possible to figure out how certain tweets spread, highlighting the ‘power-users’ who have the audience that engages with this subject material the most.

Another group of important users are those who do not generate much engagement on their individual posts, but they tend to retweet a lot, so they create more visibility for the topic.

We discovered that smaller hubs of users who follow different influencers might be the best targets to ‘hit’ with information we want to spread because they are connected to several bigger hubs, thus making them potentially more valuable than simply contacting the influencer who has the highest reach, but whose followers do not interact with other sources much, so the information is unlikely to spread outside that hub. The smaller hub could be more valuable because of the connections to two different big hubs.

In short, the researchers can identify who has the biggest influence in a certain online community. These network hubs can be visualized on Cosmograph, a free and open-source web based application. We’re developing the graph-building algorithm, looking for ways to make it even faster than it is now.

Results

Our solution was proven to be effective in speeding up the analysis, and thus, the whole research process. Currently, it is in use when analyzing subgroups of the complete database.

According to our measurements, the solution should be able to make the analysis of the full database complete in a few hours, or a single day, instead of weeks!

Case studies

Let's build something!

We’d like to hear more about you and what you have in mind.

Tell us about you

© 2024 Lexunit Zrt.

A lexunit telefonszáma: