Category Archives: Gephi

Swarmcast: Jihadist Netwar Tactics By Ali Fisher

Swarmcast: How Jihadist Networks Maintain a Persistent Online Presence has been lingering on my desktop for several weeks. Social network analysis papers often draw my eye, in particular when they focus on counter-insurgency methods, and this thirteen page work summed up things I’ve known experientially for the last couple years.

The report makes several references to Arquilla & Ronfeldt’s The Advent of Netwar(1996) and Networks and Netwars(2001), two seminal works on future conflict that are must-reads for anyone interested in this field.

The data science angle of the paper is quite familiar to me. This is clearly output from Gephi.

Jihadi Social Network

Jihadi Social Network

This paper contains significant knowledge of the particulars of the network, much like in Robust Action and the Rise of the Medici 1400 – 1434 by Padgett & Ansell, the famous paper on early Renaissance Florentine politics that every budding social network analyst will recognize, but which very few of us have read. There is implicit information on the structure and development of the network, but it’s cast in human terms, rather than in advanced social network modeling methods.

I’ve been delving deeper into those methods, as I limp through my third attempt at Matt Jackson’s Social and Economic Networks: Models and Analysis. The modeling component is mathematical; one does not come away from this class with a clutch of tools that lend themselves to screen shots for blog posts or papers, unlike the earlier undergraduate class taught by Lada Adamic, which has since vanished from Coursera.

This paper and my earlier reading today, The Turner Legacy by J.M. Berger, are conveying a message. I either need to go broader, which I am doing in the Tableau Specialization I just started, or go deeper. I’m at a point in this field where I need to buckle down and submit properly cited academic work to peer reviewed journals, or otherwise elevate my game.

Four more weeks till the election is over and the Jackson class finishes concurrent with that. The Tableau Specialization runs until April but that’s fairly low impact. Looks like Q4 of 2016 will follow my usual pattern; a time of introspection and deciding what to do next. Perhaps 2017 or 2018 will bring papers with “Neal Rauhauser” first among the authors. We shall see …

Thirty Three Key Hashtags

Foreign Policy Collectives: @LobeLog

Earlier this year I examined the social networks of a number of foreign policy oriented groups including Wikistrat and e-International Relations. This included probing their Twitter and LinkedIn usage. I also laid hands on RightWeb’s content and produced Militarist Influence On Foreign Policy, an exploration of the static profiles for over 300 militarists maintained by a watchdog organization.

Near the end of that process I subscribed to LobeLog, which I’ve found to be very good. Today I noticed that eight of their authors have Twitter profiles so I turned my system loose on them.

3,508 DIscussion Peers For Eight Authors

3,508 DIscussion Peers For Eight Authors

400 Frequent Discussion Partners

400 Frequent Discussion Partners

Forty Four Accounts To Watch

Forty Four Accounts To Watch

So these final forty four are people who are important to the discussion – I recognize some of them from foreign policy reading and I assume the rest are academics and policy people. The criteria here were those mentioned fifty or more times in the last 3,200 tweets.

@LobeLog Authors Influencee Network

@LobeLog Authors Influencee Network

I pasted the eight seed names into Maltego and then let my @Klout transforms work. I am a little surprised by the result – the only loop in here is the one I created in order to keep the original accounts near the center of the graph. This is an indication that the foreign policy discussion space is large. When we examine astroturf efforts we find self-referential loops by the second generation.

There were over 1,400 hashtags referenced.

1,400 Hashtags

1,400 Hashtags

My parser has improved quite a bit since the last time I did this and I quickly narrowed down to just thirty three key hashtags that were being used.

Thirty Three Key Hashtags

Thirty Three Key Hashtags

What have we learned here?

I typed eight names into a text file, issued a single command, and fifteen minutes later I had the data used to produce these graphs. We can tell which other accounts they talk to, weighted by frequency, and we can determine who they influence according to Klout. We can also tell which topics concern them based on hashtag use weighted by frequency.

What can we do next?

I recognized some of the names as I was filtering the large list and in the final I see one person I know in real life and another that I know from a mailing list. These people are richly interconnected in a fairly transparent fashion.

I think the next step will be doing this for the much larger group listed on RightWeb, but that’s taking a while as I am having to dig for their Twitter accounts. Once I have that I will do some sort of composite graph, putting in all the foreign policy people and organizations I have identified, and I’m going to try to sort them into cliques.

What I really need here are a few foreign policy watchers who already pay close attention and who would be willing to either provide me API access to their account, or run a secondary account specifically to create who’s who lists. I have considered using a passive approach, just milking public lists, but for this to work I think there is an additional level required when classifying accounts. Lists made by users at this level tend to be inclusive – all experts on a given topic, rather than breaking them down to their viewpoints.

Visualizing The Global Terrorism Database

I received a cryptic note from a colleague earlier tonight:

“This one has time AND location data.”

The email contained a link to the Global Terrorism Database, which is maintained at the University of Maryland at College Park, which is an easy walk from a green line stop on the D.C. Metro. I poked around the site a bit and discovered that everything from 1970 through 2011 is available for download if you just fill out a form.

The total content is large so I pulled out the 5,066 events from 2011. There are an amazing 127 attributes for each event, but it’s a sparse row setup, very easy to process. I unrolled just a few key items – city, country, and region. This resulted in over 15,000 lines indexed with their twelve digit event IDs. The first rough visualization I did was immediately exciting in terms of what was visible.

Global Terrorism Database Overall

Global Terrorism Database Overall

Global Terrorism Database Russian Events

Global Terrorism Database Russian Events

Global Terrorism Database AfPak Events

Global Terrorism Database AfPak Events

Global Terrorism Database Mideast & North Africa

Global Terrorism Database Mideast & North Africa

So Russia’s North Caucasus insurgency popped right out, AfPak events sort into a group, and the Mideast and North Africa are close together. Gaza and the West Bank are a node between AfPak and the Mideast, the green thing off by itself at the lower right is Europe.

The files arrive as XLSX, so any spreadsheet can open them. I converted to CSV so I can handle the content with awk, python, or import to any of the tools I use. The following steps are immediately very clear.

  • The format is neat, but inappropriate for what I want to do. Those files are going to have to be ‘unrolled’, basically breaking out the columns we want into individual files.
  • There is going to be some post processing to ensure that Gephi will behave when it imports content.
  • There will be a LOT of work to get this content into Sentinel Visualizer, but given that it has both time AND location data it’s going to be amazing once the process is complete.

When I asked for a dataset with geographic information I was hoping someone might come across with a few hundred events spread out over a few months. Getting almost 60,000 events spread over twenty years is a treasure trove suitable for creating and testing methods similar to the Organization, Relationship & Contact Analyzer, an anti-gang law enforcement specific algorithm suite.

It isn’t clear from the Citing GTD link if I can massage the data into a workable import for Sentinel Visualizer and then distribute the database. I am going to write them and find out if this is a permissible use – I am sure this would be an excellent set of data for both the nuts and bolts training needed to handle SV, as well as counter-insurgency specific work.

Sentinel Visualizer: Serious Network Analysis

The first time I ever touched a link analysis tool was late in 2010. I downloaded the Maltego Community Edition, clicked my way through entering a pair of Twitter accounts, and despite the rate limited demo API the very first transform I ever ran immediately showed me something important. I was hooked.

I have had a paid license for the last two years and if you check the Maltego category here you can find a variety of posts showing just some of it’s uses. I recently took the Coursera SNA class, which is an excellent introduction to social network analysis, and I got to spend a bit of time with Gephi which is a more general purpose tool. Anything I have done recently, like this visualization of terror group names and their locations of operation, would have been done with Gephi.

Centers Of Terror Activity

Centers Of Terror Activity

My writing here has started to pay off and for the next three months I have a chance to work with Sentinel Visualizer from FMS. This is a law enforcement/counter insurgency grade visualization tool that can do many things which these other two tools can not.

Sentinel Visualizer Social Network

Sentinel Visualizer Social Network

This is a social network for some bad guys. Each of them has a ring of associates, and then in between there are people who are fixers, facilitators, or financiers for their activities. This is something you could discover with pen, paper, and patience, or you can apply a visualization tool. This particular graph could just as well have come from Maltego.

What Sentinel Visualizer offers that lesser tools do not are capabilities in two new realms – geospatial and temporal information handling.

Sentinel Visualizer understands where things happen in explicit detail. Maltego does have a ‘place’ entity, but it’s a simple little thing, just a city and country name required, maybe you put a street address. Sentinel Visualizer understands latitude and longitude, it understands how to find things that are geographically near each other, and it can provide detailed information for use in Google Maps.

Sentinel Visualizer also understands when events happen in a flexible, powerful fashion. Time can be as specific as down to the second, or as broad as “some time last summer”. This network of bad guys didn’t just appear out of thin air, they met, one after the other. This can be visualized forward and backward in time, letting you examine what happened before, during, and after a specific event.

The system offers some other capabilities that aren’t a big deal for my specific project, since I’m the only one handling the data, but they do matter for larger scale use. Individual installs use a local SQL server, but the top level product uses a shared database server so multiple analysts can all use the same data. That system also supports free read only clients, so consumers can view finished content live in the system rather than waiting for static reports.

I’m very excited to have this opportunity to add a full featured LE/counter-insurgency grade tool to both my skill set and my resume. I can’t say anything about the details of the project itself, beyond the fact that it is something to do with food security, but the system is flexible enough that I will be able to round up other datasets and show off what it can do.

Terror From Above

I have published various data visualizations here using either Maltego or Gephi, these posts are promoted on LinkedIn, and after the first half dozen I began to get requests. One of these got me a very curious dataset – the names of 3,500+ cells, brigades, armies, movements, and revolutionary committees. The content is a bit dated, but fascinating – I have the group names, active locations, and leader names for every terror group that existed from about 2003 back to the 1950s.

This data turned up after I read and reviewed National Defense University’s Convergence. The source was a sociology professor who had noticed the review and asked if I could do anything with the information he had. There were 3,098 groups and 4,455 events at 271 distinct locations. The location could be anything from a specific city to a general region.

Global Terror Groups 1950 - 2003

Global Terror Groups 1950 – 2003

I began with the assumption that 3,098 actors causing 4,455 events would be somewhat similar to the large groups of Twitter accounts and hashtags I have been handling, but this was not the case. It’s fairly easy to get a sorting by community by topic and association for tweets, but I came at this terror dataset from a variety of directions and I have not been happy with any of the visualization outcomes. I finally went through and tossed all of the nodes that made a single appearance and I began to see the patterns I expected.

Centers Of Terror Activity

Centers Of Terror Activity

I had expected that the content would neatly sort into Europe, the Mideast, southeast Asia, and South America. There was collaboration between U.S. and European universities on the data collection, so France, Greece, Italy and the United Kingdom feature prominently in the set. The geographic sorting I expected nearly really came through until I tossed all of the low degree nodes, and even then what I see seems muddled. The modularity algorithm knew India and Pakistan go together, and it figured out that Lebanon, Israel, Gaza, and the West Bank are related, but at higher resolutions there were many pairings that I felt were very odd – why would Japan and obscure groups from North Africa be sorted into the same bucket?

Convergence spells out the circumstances in which illicit networks will flock to a given location. The state has to be weak enough that bribes and the threat of violence can facilitate needed activities, but a state that has outright failed will lose illicit ventures just as it would legitimate business, albeit at a slightly slower rate.

Humans will abuse drugs, counterfeit currency, deal weapons, and traffic in their fellow man when state institutions are weak enough to permit this. There are divisions of labor – you don’t see a local warlord funding efforts via an extractive industry who is also an international money launderer. There are hubs of activity – financiers, facilitators, and fixers. If they can be located and neutralized their sudden absence often reveals they were more deeply connected than anyone suspected.

As for me, I am not sure what this means beyond a chance to explore a new class of dataset. The world is swimming in counter-insurgency and counter-terrorism analysts, and those folks are going to be trying to retool their skills to the growing data science sector as we wind down operations in Afghanistan and otherwise draw down our level of global engagement. I want to know how to tease meaning out of thousands of rows of data, but I don’t imagine I’m every going to face something like this in my day to day work.

Wikistrat’s Current Conversations

I started looking at Wikistrat‘s collective consulting effort several months ago. I have added a few similar groups but they are still the largest and busiest of the mix. A first cut examination of a social network involves the links your can discern from a static snapshot, which I did some time ago.

Three days ago I put the accounts of the thirty three members I found on Twitter into my capture system and this afternoon I checked to see what it had found. There were 58,889 tweets and 15,712 unique accounts mentioned during the timeframe. Some of the accounts are low volume and the oldest tweet dated back to August of 2011. 39,000 of the tweets are from 2013.

Friend relationships between accounts might tell you something, but who a social media account adresses directly is information on who and what is important in the moment. Size of the nodes here indicates relative volumes of tweets.

Wikistrat Analysts

Wikistrat Analysts

And these are the accounts which they most often mention. As is common for influential professionals, much of what they say is directed to media outlets and the rest are others who also work in related fields. Keep in mind this is 0.58% of the total nodes and 3.68% of the total links. This isn’t about the content of the conversation, it’s about who the common conversational partners are. Some of these might be accounts we’d want to watch, since this tips us off that they are in some way important.

Those Mentioned By Wikistrat Analysts

Those Mentioned By Wikistrat Analysts

My system picks up hashtags at the same time it’s getting mentions and it leaves an edge file for each. There were 6,943 unique user to hashtag mentions in those 58,889 tweets.

Wikistrat Hashtag Usage

Wikistrat Hashtag Usage

Filtering out the hashtags with a low in-degree gives us an overview of what’s important. Egypt, Syria, and Iran most of all, followed closely by Israel, Iraq, Turkey, and Afghanistan.

High Volume Wikistrat Hashtags

High Volume Wikistrat Hashtags

Our conclusion? Foreign policy experts like to talk about foreign policy and the issues that they focus on are the same ones that appear in the mainstream media? No revelations in that. What is interesting is that I could easily take this data and time slice it, and it’s going to accumulate going forward.

There are several things I should do next if I want to improve predictive capacity.

  • Catalog news sources mentioned and monitor them, too.
  • Examine conversation partners, find out which ones are also good sources
  • Select external data with a temporal component to correlate

That last one is where the fun starts. I could tell North Africa was going to blow up during the fall of 2010 because of what happened in Russia and Pakistan that summer – epic fires, epic floods, and grain crop losses. Arab Spring started right on schedule.

If I can find a feed that provides commodity prices for wheat, rice, and a few other things, those numbers will precede disorder. If I can find a numeric feed of Modis Data and correlate that with crop production areas, this would be a fine barometer for planetary mood. This would actually be easier than finding a global pool of rain gauge data – turns out that this is a national secret in places where water is scarce.

There is injustice of all sorts all over the world, but if people are getting enough calories, protein, and vitamins, they will struggle along as best they can. If they get hungry, or even worst hungry and then really thirsty … things can change in a tremendous hurry. Scenes like this, from Tunisia, are going to be all too common.

Wikistrat’s Ongoing Conversation

Two months ago I posted the Wikistrat Investigation Summary. This group of 150+ professionals engages in distributed problem solving in the realm of foreign policy and they were the first one I profiled when I started exploring this area. I was using Maltego‘s OpenCalais NER(named entity recognition) to parse LinkedIn profiles and Paterva’s transform servers to access their Twitter feeds.

Maltego, even with Paterva’s commercial transform servers, has performance limits, and I have done a variety of simple homebrew transforms to circumvent this. Once you clear those hurdles there are other problems that arise when you have very large volumes of content. I find Maltego graphs get cramped at two or three hundred entities, but I have Gephi files with 23,000+ entities and they’re quite manageable.

I have built a couple of tools for Twitter forensics over the last few weeks – an image capture system that can screenshot about 4,800 tweets per hour if I light up my whole cloud, and I have a tweet recorder that can do around 180,000 tweets per hour. I turned it loose on the Twitter accounts of thirty three Wikistrat analysts earlier today, but got impatient and pulled their files when it was only halfway done. These are the over 6,000 twitter accounts I found mentioned in 27,000 captured tweets.

Wikistrat Analysts' Mentions

Wikistrat Analysts’ Mentions

This is good for identifying who is most active, but once we know that we start excluding the accounts mentioned just one, then twice, until we narrow down to a few hundred nodes that we can visualize including their names.

Wikistrat Analysts Multiple Mentions

Wikistrat Analysts Multiple Mentions

Once we narrow the field to those with five or more mentions all we see are the Wikistrat people and a roughly even mix of print media outlets and others who have something to say about foreign policy. Keep in mind only 20% of them have Twitter and those that do are professional in their demeanor – not much chatter, no fighting, their public faces are their product.

Wikistrat Analysts & Those Frequently Mentioned

Wikistrat Analysts & Those Frequently Mentioned

I get hashtags at the same time I extract mentions. I was surprised to find a very large number of single use tags. Once I focused on those mentioned five or more times it became clear which regions mattered to the analysts I had captured.

Wikistrat Analysts Tag Usage

Wikistrat Analysts Tag Usage

So what did we learn here? I think that Maltego’s inclusion of named entity recognition is very handy, but for large volume long term observation of the small, easily parsed messages from Twitter, a hand built solution is vastly more effective. Gephi’s automatic community detection, detailed methods for weighting nodes, and an extensible set of layout methods permit me to handle a hundred times the entities I can with Maltego.

What’s next? We can start with a group of Twitter accounts, learn who their associates are, and what they are discussing. Every tweet has a timestamp, but we’ve not really done anything about the temporal aspect of the conversations we isolate.

I recently gave SplunkStorm a try and I was immediately taken with the value of this machine data indexing tool. Unlike its big brother, Splunk, the software as a service free account version permits up to six people to share the content, and it has room for up to six million tweets. I rigged my tweet recorder to create files that are just under the hundred megabyte upload limit – so a user with just a small amount of training can pick up a cache of nearly 600,000 tweets, load them into SplunkStorm, and explore at their leisure.

The other thing I am going to try is this – the Gephi Graph Streaming plugin. If we have long term captures of friend relationships, mentions, and hashtags, we should be able to see how groups form and disperse, as well as the ebb and flow of conversations within them.