Harnessing the Power of Public Social Data

Harnessing the Power of Public Social Data

Introducing the New Social Signals Package

Analytics extends far beyond just business data. In our digitally interconnected societies, we are constantly creating and interacting with an immense volume of social data. At RepublicOfData.io, we are deeply committed to responsibly tapping into this vast resource. Our approach prioritizes ethical harvesting, encompassing transparency, privacy, and data minimization, to leverage this data for societal benefits.

We’re excited to announce the release of our social-signals package! It abstracts away the complexity of sourcing public social signals. Get a sneak peek with this quick 5-minute tour and tutorial.

Introducing the social-signals package

In the ever-evolving landscape of data product development, the integration of public data sources presents a unique set of challenges. These sources, ranging from the structured realms of government databases to the dynamic streams of social media platforms, hold untapped potential for those who can effectively harness them.

Examples of public data sources:

The ‘social_signals’ package is engineered to serve as the backbone for such endeavours, offering a robust framework to navigate the complexities of public data integration. This toolkit empowers data product builders to enrich their ecosystems with a diverse array of signals, from the media buzz of Twitter to the statistical depths of Data.gov.

Social signals Github repository

Our goal with social_signals is to provide a streamlined experience for capturing the pulse of public discourse and sentiment. This first release comes with an interactive tutorial to get you started, breaking down each component and demonstrating how to interlace these signals into your data fabric.

Social signals tutorial introduction — https://app.hex.tech/bca77dcf-0dcc-4d33-8a23-c4c73f6b11c3/app/c276cbc5-93c9-4eb3-9aa8-419533326730/latest

Our Social Movements data ecosystem

Let’s put all of this in context. What’s the purpose of social data and how can you harvest it to build your data products?

At RepublicOfData.io, we are building a data ecosystem in public to share our approach to building and managing a portfolio of data products. And hopefully, inspire you to expand your data horizons.

The initial design of the Social Movements data ecosystem

Above is a diagram of our portfolio of data products to monitor social movements in North America. Our objective is to generate data assets and interfaces to decode those movements: who their actors are, how they interact with each other, what are they advocating for, what are their actions, etc.

The Social Signals package is our initial building block in that ecosystem. Its purpose is to abstract away the complexity of harvesting those social data sources. The next building block is the Social Movement Signals data product which will be a consumer of the package. Its role will be to harvest relevant data for our ecosystem’s objective and hydrate the other data products with high-quality and timely data assets. But more on that in later posts.

The Social Signals package in action

So what is the Social Signals package? It’s our open-source toolkit to abstract away the complexity of harvesting publicly available social data.

For this first iteration, we are providing abstractions to harvest GDELT data only. We are actively working on adding Wikipedia, X and Data.gov data as well. More on those in a future blog post.

We’ve put together an interactive tutorial for you to see the package in action. The first tab takes our GDELT module out for a spin.

Harvesting GDELT data using the Social Signals package — https://app.hex.tech/bca77dcf-0dcc-4d33-8a23-c4c73f6b11c3/app/c276cbc5-93c9-4eb3-9aa8-419533326730/latest

I’ve covered the GDELT data project many times already, but that’s just because it’s such a rich social data source that can be used for many purposes.

As our purpose is to abstract away the complexity of sourcing GDELT, we have made some choices on how to fetch its data (using BigQuery vs their CSV files) and the endpoints to use (the raw gkg feed instead of the transformed events tables). This is not the final format for this module, but we think you can already get some highly valuable and rich data just from this abstraction.

The tutorial takes you through a simple example where we are pulling articles that cover protest events in the United States on a specific date. We then used that dataset to see which actors were most covered in those articles and how their relationships with each other.

Read more