GDELT’s Analysis Service

I first noticed the Global Database of Events, Language, and Tone on New Years day, which led me to write Foreign Policy’s Global Conversation Infograhic. Three weeks later the service was mysteriously suspended and five days later we got an explanation as to what happened.

Since the successful resolution of concerns regarding data and processes used by GDELT they have made dramatic progress. GDELT Analysis Service offers fourteen different visualizations and export of the associated data. I entered the keyword ‘Ukraine’, selected all data for 2014, and about a minute later I received an email with this link to a live heatmap and another link to a CSV file.

Ukraine Mentions During 2014

Ukraine Mentions During 2014

Next I used the timeline tool. I was hoping for a live page like the location heat map, but it just produces a static image and a CSV file.

Each GDELT event contains a date, geographic coordinates, the players involved, and the type of interaction. The content is regular enough that it could be mapped to an import process for Sentinel Visualizer, an intelligence sector link analysis tool. I had set out to do something similar with the Global Terrorism Database last summer, but there were license issues, and the dataset lacks the live feed feature that GDELT provides.

GDELT is clearly going to grow services based on the live stream of content they have available. These services are likely to play to the strengths they have in defining and operating the stream. There should be plenty of room for follow on qualitative analysis and integration of data sources external to their feed. Data sources such as spot commodity prices in locations trending towards trouble would be particularly helpful in spotting hazards before they get out of hand.

Global Database of Events, Language & Tone (GDELT) Is SAFE!

That was a long, uncomfortable silence, after I posted GDELT’s Mysterious Demise, but we now have the particulars on what happened:

The bottom line is that GDELT is one of the very few event datasets in existence today that actually has all of the necessary permissions. The concerns that have recently been discussed were raised by two faculty members at the University of Illinois and were examined by a panel of faculty experts convened by the University of Illinois’ Vice Chancellor for Research. That panel formally cleared GDELT on behalf of that office stating “the Panel finds that it was not able to conclude that GDELT is founded on misappropriated … data or software.” With respect to concerns raised regarding the open source TABARI software that GDELT makes use of to create its CAMEO event records, the same panel explored concerns raised regarding its ownership and similarly found that “TABARI … has well known antecedents at another institution dating back to at least 2000 and therefore is not attributable to the [University of Illinois]“. While this whole situation would have been easily avoided with just a little communication and avoided a lot of unnecessary angst, the silver lining is that it has demonstrated just how widely-used and important GDELT has really become over the past year and we are tremendously excited to work with all of you in 2014 to really explore the future of “big data” study of human society.

I thought there might be a problem with either the underlying data or the software used, turns out that both issues were raised by the University of Illinois professors who parted ways with the project.

This feels a bit like the USL vs. BSDi lawsuit, which freed unix from AT&T’s clutches twenty years ago. A big, important datasource is now out in the open in such a way that it can not be put back. I have some financial records digging to do in the coming week, the Montgomery County Council will remain a priority until the primary is over, but I am itching to wrestle the GDELT feed into some format I can personally use.

Geospatial Tools For Activists

When working with grassroots analysts free software and services are key. I have long wished for a project that would fund a copy of Sentinel Visualizer I could keep, but the $5,000 cost for being able to handle temporal and geospatial data is very steep.

I have been inspecting campaign finance data for the Montgomery County Council and the files are clean enough that all of it could be geocoded, the only question was what tool to use. Earlier today noticed Sourcemap in a logistics discussion group on LinkedIn and this just might be the solution we’ve been seeking. Here is an example map from the free service, showing the global sources of Nutella ingredients.

Nutella Sourcemap

Nutella Sourcemap

If you sign up for the service it provides you a way to do free, public mappings using a simple spreadsheet format for input.

Lemonade Spreadsheet

Lemonade Spreadsheet

We can treat financial contributions as raw materials, county council members as ‘legislation factories’, and someone would have to know enough to describe which developer goes with which specific project, then we’d have a finished map of influence to development.

I can do the data handling component, what I need are some local hands and eyes who know more of the companies, personalities, and history behind the urbanization of Montgomery County.

GDELT’s Mysterious Demise

I wrote Foreign Policy’s Global Conversation Infographic on New Year’s Day. The content used to create the visualization was based on the Global Data of Events, Language and Tone, commonly referred to as GDELT. The effort was suspended during the week ending January 17th via this terse announcement.

GDELT Suspension

GDELT Suspension

There was an addendum to this which I didn’t include int he screen capture, but it mentioned Robin Kaler at the University of Illinois. I wrote her seeking additional information on the suspension and I received a response just moments ago.

GDELT Kaler

GDELT Kaler

“serious questions about the origins of the source texts used to code GDELT”

I believe this means that whomever created the CAMEO coding was either not credited appropriately, or there may be an issue with using it in a derivative product. I ran into this late last year – I was going to republish the Global Terrorism Database packaged for use with Sentinel Visualizer, but this was not allowed. I was free to publish a set of scripts to accomplish this task, the issue was that the entities that fund that effort wanted a count of total users, so any derivative work had to be post processing run by the user, rather than repackaging.

I hope what we are seeing here is some sort of pause to clean house and/or make things right with regards to whatever coding material was incorrectly used. The volume and quality of content was extremely promising and I hope the suspension is just some misunderstanding that can be quickly corrected. I kept the archive of the 1979 – 2012 data so I can continue working on something that will handle the live feed when it returns.

Muckety: Names, Nodes & Relationships

As a response to Visualizing Graph Databases With Linkurious I received a tip that I should look at Muckety. This system provides social/network visualization with data from a variety of important sources.

I have looked at The Militarist Galaxy as documented by RightWeb in the past so I choose one of the more noxious individuals, Frank Gaffney, as a starting point. I was pleased and surprised to see the Mother Jones articles on Groundswell were integrated.

Muckety Graph: Frank Gaffney

Muckety Graph: Frank Gaffney

Gaffney shares the Groundswell link with Allen West, a former Congressman and a war criminal who is equally willing to have journalists abused in this country. I dug just a little further than this and my eyes opened wide at the connections which were revealed.

Muckety: Frank Gaffney & Allen West

Muckety: Frank Gaffney & Alen West

A tool like Muckety doesn’t replace a desktop link analysis setup like Maltego, Gephi, or Sentinel Visualizer. What it does do in an easily accessible fashion is permit people who might otherwise never handle such technology to enter the name of an individual or organization that interests them, and immediately see important connections which would otherwise take hours of Googling, reading, and note taking.

Network analysis is to 2014 what social media was to 2009 – something that has specialists using it, but which will rapidly spread due to the powerful sense-making capabilities it offers when trying to understand complex interactions.

Visualizing Graph Databases With Linkurious

Yesterday a @kdnuggets tip led to SciCast: Gamified Technocracy and he hit another home run today with a pointer to Linkurious, a link analysis tool that works with the neo4j graph database. The beta signup leads to a live demo that contains much of the Internet Movie Database.

They suggested Clint Eastwood as an initial search term for the five minute demo, which walks you through the basics. Anyone acquainted with link analysis/data visualization tools like Maltego or Gephi will find the environment quite familiar. I continued playing after completing the demo, starting with Laura Dern, an actress David Lynch favors – a good choice as there are a lot of overlaps in casting between his various movies.

Linkurious Laura Dern

Linkurious Laura Dern


Linkurious Blue Velvet

Linkurious Blue Velvet


Linkurious David Lynch

Linkurious David Lynch

Just a few mouse clicks were required to hunt up actors, actresses, and movies. The system produces the links automatically. This is a web demo of what will be a desktop product that connects to neo4j. Pricing for a single user is similar to Maltego, enterprise is in line with the cost of a single seat for Sentinel Visualizer.

Linkurious Pricing

Linkurious Pricing

This is an exciting development for me. Maltego and Gephi are good for what they’re meant to do, but they have limits. Maltego entity types can be extended but the system starts to choke around the thousand node mark. Gephi can scale up to tens of thousands of nodes, but it is more general purpose, lacking the concept of node type. I had just picked up a free copy of the Graph Databases O’Reilly book earlier this week, which focuses on neo4j, and I am taking finding a visualization tool meant to work with it as a sign I am on the right track.

SciCast: Gamified Technocracy

A little over a year ago I surveyed the reelection of Barack Obama, the mix of incredulity and rage from the fringe right, and I knew that my work in this area was complete. The GOP faced the stark choice of trimming its sails for the demographic headwinds or being reduced to a regional party, but it took the shutdown of 2013 before they realized they were going to have to actively clean house rather than passively waiting for time to do the job.

I set out looking for something new to do at the start of 2013 and I knew it wasn’t going to be domestic. I spent a lot of time studying the social and professional network of Wikistrat, e-International Relations, and other foreign policy related collectives. The response when I approached was a pretty universal: “So, you are a college dropout and a hacker? Really?” I would not have predicted it at the beginning of the year, but the Terrorism Research & Analysis Consortium proved to be just the right size for my odd mix of skills and interests.

I was pleased with one anchor outlet like TRAC, but I see forces at work in the world, manifesting in social media, that indicate something is about to happen, but it’s hard to say exactly what. I happened to scan my Twitter timeline a bit ago, and one of those imminent somethings popped right out at me:

@kdnuggets hints about SciCast

@kdnuggets hints about SciCast

I took a look at the Twitter account for the effort and I was amazed that something so new had scored a mention from data mining guru @kdnuggets. I got in on the ground floor – follower #70.

@SciCasters Twitter

@SciCasters Twitter

What’s inside is as promising as a fledgling effort getting big name attention. Fourteen prediction areas, a clear sign of gamification in the form of a leaderboard, and it’s crowdsourced from the question creation through to the results.

SciCast Dashboard

SciCast Dashboard

When social media exploded onto the scene about five years ago the competing systems were clawing for market share. Twitter, Facebook, MySpace in decline except for music, Google+ trying to be ubiquitous and ending up intrusive and annoying, and a horde of also-rans that were mowed down by Facebook’s momentum and Twitter’s open format.

Fast forward five years, what social media platform am I using? Twitter, but it’s measured in minutes per day, as I passively observe leaders in various areas that interest me. I can go a month without logging into Facebook, but I have yet to make it a full quarter. LinkedIn is tolerable given the ability to lock up one’s contacts and mute chatterboxes.

Here are some trends I have noted:

One by one, email providers are requiring SMS verification before they’ll provide an account. The scammy, spammy nature of new social media accounts has much to do with this, as providers are seeing hordes of accounts registered and tended just long enough to make social media registrations work.

Someone gifted me a ScienceX profile and here I’m seeing one aspect of the future – people are willing to pay $15/year for high quality content and trolls aren’t willing to pay a $15 per ban tax.

The price of entry for SciCast is intangible but the bar is set very high – if you’re not serious about scientific method and quality control you’ll sink like a stone. They don’t explicitly state it but I’m sure they have sterner measures for anyone who is intentionally disruptive.

Stepping back even further, the very concept of the nation state seems to be in decline, but corporate power is far from assured. None of them run without humans on the inside, and Edward Snowden has made it painfully clear what happens when those people begin to question the nature of their work. And those corporations all have to participate in some fashion in this melange of social networking.

We evolved as a species on predator filled savannah, grouped together and task differentiated to maximize survival. Deprived of a secure place in the ranks of a union or on a stable corporate ladder, Randian theories glorifying hyper-individualism quickly fall away. The hunter-gatherer band has come back into fashion and networked humans are an acid bath for monolithic entities that draw the wrong sort of attention to themselves. SciCast is just the latest of a rainbow of threats to slow moving, internally politicized companies that have to meet shareholder expectations.