Category Archives: Wikistrat

Wikistrat Klout Topics Using Maltego

Yesterday in Wikistrat Klout Influencers Using Maltego we started with the roughly thirty Wikistrat analysts who had Twitter accounts and we obtained a list of their influencers. Today we’re going to dig a little deeper, making it easy to access their Klout pages and sorting them by the specific topics listed in their profiles.

Here’s our network from yesterday, Wikistrat people, their influencers, and the handful of accounts who influenced their influencers.

Three Generations Influencers Core

Three Generations Influencers Core

I ran the transform to gather their topics and then used circular layout. The green dots on the edge are topics with just a single Twitter account claiming them, the ones in the central circle are associated with two or more accounts.

Three Generations Influencers Core With Topics

Three Generations Influencers Core With Topics

Here’s the network minus the singleton topics and arranged using force directed layout.

Three Generations Influencers Core & Shared Topics

Three Generations Influencers Core & Shared Topics

Three Generations Influencers Core & Shared Topics Closeup

Three Generations Influencers Core & Shared Topics Closeup

The transforms I cooked up this morning stumble periodically and I am not sure why, so I haven’t related them yet. They are extremely simple, just an API call and a few lines of code to move that content into Maltego entities.

What have we accomplished here?

We can start with a small group of accounts, find their influencers, we’ll probably want to find the second generation of influencers to get a decent sized set, and then we can round up the topics for which they have influence. The path from start to finish takes a few minutes and spares us having to read their timelines in great detail in order to characterize them.

And there’s a related utility transform which takes a Twitter account and provides a URL to their Klout profile that includes their Klout score. The transform that does this probably needs a minimum Klout score filter, since we’d be using it to hunt for high value contributors in a given group.

Individual Klout Score URLs

Individual Klout Score URLs

Yesterday we made a bigger problem than we started with by pulling a bunch of new accounts into the graph. Today we can sort them by topic and we can get a simple visual clue about how influential they are which also provides access to their Klout scores with a single right click of the URL entity. This approach might be useful for an illicit network that overlapped into those with professional legitimacy, but when faced with questions about a group of people like the Wikistrat analysts we can quickly learn a good deal about them from the Klout API.

Wikistrat Klout Influencers Using Maltego

Wikistrat bills itself as ” is the world’s first Massively Multiplayer Online Consultancy (MMOC)”. This is the largest of a handful of hive minds I have studied that focus on foreign policy. The blowout from the Egyptian army takeover is something I have known was coming for about two weeks thanks to my subscription to NightWatch. I am curious to see who among their analysts was first discussing it on Twitter, and who else they involved.

Wikistrat Analysts

Wikistrat Analysts

Things like this go through a progression, first the locals and knowledgeable observers in country detect something in the works, then the the well connected analysts begin to talk, then the specialty reporting outlets, and finally the topic broaches mainstream news. Somewhere in the mix, around the time the specialty outlets begin running the story the blogosphere will also pick it up and start playing with it. A while ago I wrote a Maltego transform for the premier reputation tracking system, Klout. I dusted it off this evening and applied it to Wikistrat. Only four out of the thirty accounts I have identified were not registered for Klout.

Wikistrat Analyst Influencers

Wikistrat Analyst Influencers

I have been looking at the Wikistrat social network for a while now and the Twitter contingent is not large, so I recognize most of them by sight, and many of their conversation partners from the various examinations I have performed. The first surprise that there were so few influencers that reached more than one member. I had assumed I would find at least some luminaries from the field, but this is not the case. It was also a bit surprising to see that there were no instances where one analyst influenced another. My take on that is that these guys don’t talk shop on Twitter.

Wikistrat High Degree Influencers

Wikistrat High Degree Influencers

Thinking that this set was a high value source for foreign policy information I went through and manually separated them into organization role account and humans. The role accounts aren’t interesting in this context, since they are mostly broadcasters rather than conversation partners who would have influencers.

Wikistrat Analysts Influential People & Orgzanizations

Wikistrat Analysts Influential People & Orgzanizations

Once I had just the people I collected second generation influencers.

Wikistrat Second Generation Influencers

Wikistrat Second Generation Influencers

And when I checked those influenced by the core group I found no feedback loop at all. I think there are two explanations for this. The first is that this is a large universe, many players, and we’d start to see feedback loops if we took another step back. The problem with this is … two steps is a lot in a professional environment. The six degrees of separation meme is based on a sociology study in the 1960s and it more or less holds up no matter how large the system goes. There is a study of the Microsoft Messenger network where they had a set of two hundred million people and the average path length was about six. A professional network ought to have something similar to the Erdős Number for mathematicians – with an average distance between two people being four to five hops.

The other explanation is a little easier to swallow – Klout is seeing interactions but foreign policy is dense and there is an industry specific jargon. If the system can’t interpret the content of tweets it is less likely to automatically select influencers, and our test set are people who aren’t manually adding influencers to their profiles.

Three Generations No Feedback

Three Generations No Feedback

Circular layouts are a way to see who is in the middle of the action and who is on the edge. Here we see a core of actors and some small clusters on the edge.

Three Generations Circular Layout

Three Generations Circular Layout

I selected the inner circle, moved it to its own graph, and used a force directed layout. This particular phase didn’t provide a lot of new information but in general this is a place you’d linger if you had a new, complex network you were trying to understand, so I include it for the sake of completeness.

Three Generations Influencers Core

Three Generations Influencers Core

And finally we get a bit of a payoff. I recognize @texasinfrica, this one turns up in all sorts of discussions. The other eight are a couple of role accounts and a handful of new people.

Nine New Players

Three Generations Nine New Players

What did we accomplish here? I can see a few things.

I’ve had this ability for a long time but tonight was the first time I’ve applied it to more than a few accounts for testing. This is a selection of all influencers, not just foreign policy sources, so we’re stuck with a manual slog if we want to constrain the results to just that sector. The Maltego transform servers are limping tonight, which I credit to the work created by the Egyptian coup, so I can’t get into what these guys are saying without putting them into my Twitter recorder. There are maybe two hundred accounts listed so it would be an overnight job to get them all recorded for the first time.

What comes next?

The Maltego Klout transform code can be adjusted to produce output suitable for Gephi with a very small amount of work. We could use the Klout number itself to weight Twitter accounts, we can graph the influencers or influencees, and we can pull topics and create an accounts to areas of expertise map.

The Klout API rules are much tougher than Twitter or LinkedIN, where you can just create an application at will. I had to explain what I was doing and they were really helpful – my account has ten times the API credits of the individual/desktop accounts, and they agreed that if I could show some unique uses involving Maltego they would work with users that needed the higher capacity. They also limit caching to five to seven days, but this not good for forensics work or for long term studies of how clusters of accounts change over time.

Once we work through the access and data retention issues there are some really cool things that can be done – like launching a brand new Twitter account and taking daily snapshots of friends and followers as well as the derived Klout attributes. This data could be used to feed Sentinel Visualizer or the Gephi streaming plugin, producing an animation of an accounts growth and expansion into new topics.

I suppose the first order of the day here is encapsulating anything that can be done with the API in both a Maltego transform and something to output CSV for use with Gephi and Sentinel Visualizer. I will do this, publish the code to my Github, and then pick out something to study and write it up here.

Wikistrat’s Current Conversations

I started looking at Wikistrat‘s collective consulting effort several months ago. I have added a few similar groups but they are still the largest and busiest of the mix. A first cut examination of a social network involves the links your can discern from a static snapshot, which I did some time ago.

Three days ago I put the accounts of the thirty three members I found on Twitter into my capture system and this afternoon I checked to see what it had found. There were 58,889 tweets and 15,712 unique accounts mentioned during the timeframe. Some of the accounts are low volume and the oldest tweet dated back to August of 2011. 39,000 of the tweets are from 2013.

Friend relationships between accounts might tell you something, but who a social media account adresses directly is information on who and what is important in the moment. Size of the nodes here indicates relative volumes of tweets.

Wikistrat Analysts

Wikistrat Analysts

And these are the accounts which they most often mention. As is common for influential professionals, much of what they say is directed to media outlets and the rest are others who also work in related fields. Keep in mind this is 0.58% of the total nodes and 3.68% of the total links. This isn’t about the content of the conversation, it’s about who the common conversational partners are. Some of these might be accounts we’d want to watch, since this tips us off that they are in some way important.

Those Mentioned By Wikistrat Analysts

Those Mentioned By Wikistrat Analysts

My system picks up hashtags at the same time it’s getting mentions and it leaves an edge file for each. There were 6,943 unique user to hashtag mentions in those 58,889 tweets.

Wikistrat Hashtag Usage

Wikistrat Hashtag Usage

Filtering out the hashtags with a low in-degree gives us an overview of what’s important. Egypt, Syria, and Iran most of all, followed closely by Israel, Iraq, Turkey, and Afghanistan.

High Volume Wikistrat Hashtags

High Volume Wikistrat Hashtags

Our conclusion? Foreign policy experts like to talk about foreign policy and the issues that they focus on are the same ones that appear in the mainstream media? No revelations in that. What is interesting is that I could easily take this data and time slice it, and it’s going to accumulate going forward.

There are several things I should do next if I want to improve predictive capacity.

  • Catalog news sources mentioned and monitor them, too.
  • Examine conversation partners, find out which ones are also good sources
  • Select external data with a temporal component to correlate

That last one is where the fun starts. I could tell North Africa was going to blow up during the fall of 2010 because of what happened in Russia and Pakistan that summer – epic fires, epic floods, and grain crop losses. Arab Spring started right on schedule.

If I can find a feed that provides commodity prices for wheat, rice, and a few other things, those numbers will precede disorder. If I can find a numeric feed of Modis Data and correlate that with crop production areas, this would be a fine barometer for planetary mood. This would actually be easier than finding a global pool of rain gauge data – turns out that this is a national secret in places where water is scarce.

There is injustice of all sorts all over the world, but if people are getting enough calories, protein, and vitamins, they will struggle along as best they can. If they get hungry, or even worst hungry and then really thirsty … things can change in a tremendous hurry. Scenes like this, from Tunisia, are going to be all too common.

Wikistrat’s Ongoing Conversation

Two months ago I posted the Wikistrat Investigation Summary. This group of 150+ professionals engages in distributed problem solving in the realm of foreign policy and they were the first one I profiled when I started exploring this area. I was using Maltego‘s OpenCalais NER(named entity recognition) to parse LinkedIn profiles and Paterva’s transform servers to access their Twitter feeds.

Maltego, even with Paterva’s commercial transform servers, has performance limits, and I have done a variety of simple homebrew transforms to circumvent this. Once you clear those hurdles there are other problems that arise when you have very large volumes of content. I find Maltego graphs get cramped at two or three hundred entities, but I have Gephi files with 23,000+ entities and they’re quite manageable.

I have built a couple of tools for Twitter forensics over the last few weeks – an image capture system that can screenshot about 4,800 tweets per hour if I light up my whole cloud, and I have a tweet recorder that can do around 180,000 tweets per hour. I turned it loose on the Twitter accounts of thirty three Wikistrat analysts earlier today, but got impatient and pulled their files when it was only halfway done. These are the over 6,000 twitter accounts I found mentioned in 27,000 captured tweets.

Wikistrat Analysts' Mentions

Wikistrat Analysts’ Mentions

This is good for identifying who is most active, but once we know that we start excluding the accounts mentioned just one, then twice, until we narrow down to a few hundred nodes that we can visualize including their names.

Wikistrat Analysts Multiple Mentions

Wikistrat Analysts Multiple Mentions

Once we narrow the field to those with five or more mentions all we see are the Wikistrat people and a roughly even mix of print media outlets and others who have something to say about foreign policy. Keep in mind only 20% of them have Twitter and those that do are professional in their demeanor – not much chatter, no fighting, their public faces are their product.

Wikistrat Analysts & Those Frequently Mentioned

Wikistrat Analysts & Those Frequently Mentioned

I get hashtags at the same time I extract mentions. I was surprised to find a very large number of single use tags. Once I focused on those mentioned five or more times it became clear which regions mattered to the analysts I had captured.

Wikistrat Analysts Tag Usage

Wikistrat Analysts Tag Usage

So what did we learn here? I think that Maltego’s inclusion of named entity recognition is very handy, but for large volume long term observation of the small, easily parsed messages from Twitter, a hand built solution is vastly more effective. Gephi’s automatic community detection, detailed methods for weighting nodes, and an extensible set of layout methods permit me to handle a hundred times the entities I can with Maltego.

What’s next? We can start with a group of Twitter accounts, learn who their associates are, and what they are discussing. Every tweet has a timestamp, but we’ve not really done anything about the temporal aspect of the conversations we isolate.

I recently gave SplunkStorm a try and I was immediately taken with the value of this machine data indexing tool. Unlike its big brother, Splunk, the software as a service free account version permits up to six people to share the content, and it has room for up to six million tweets. I rigged my tweet recorder to create files that are just under the hundred megabyte upload limit – so a user with just a small amount of training can pick up a cache of nearly 600,000 tweets, load them into SplunkStorm, and explore at their leisure.

The other thing I am going to try is this – the Gephi Graph Streaming plugin. If we have long term captures of friend relationships, mentions, and hashtags, we should be able to see how groups form and disperse, as well as the ebb and flow of conversations within them.

Hashtag & Humans

Completing the process of entering the Wikistrat people into my base graph, which I entitled Wikistrat-Full-Organization, brought me five more Twitter accounts – for a total of thirty two.

Wikistrat Twitter Accounts

Wikistrat Twitter Accounts

I used the transform to pull all the tweets for each account and I set it to return up to fifty.

32 Wikistrat Twitter accounts, up to 50 tweets from each.

32 Wikistrat Twitter accounts, up to 50 tweets from each.

Once I had the tweets I used a transform to extract the hashtags. What I was hoping for here was to discover topic specific hashtags, such as the names of countries or regions, where Wikistrat experts congregate for discussions.

Twitter accounts, tweets, and extracted hashtags.

Twitter accounts, tweets, and extracted hashtags.

I was waiting for long periods of time for results so I backed off to the smallest possible return – just twelve tweets per account. I pulled the hashtags from them, eliminated the tags that were associated with a single tweet, and came up with this.

A small number of Twitter accounts, tweets, and hashtags.

A small number of Twitter accounts, tweets, and hashtags.

There was a small hashtag cluster – four tags that all had to do with the Mideast, and each had two or more tweets associated.

05-Mideast

There were a couple of food enthusiasts in the mix and a hashtag cluster associated with their discussion.

06-food

And then there was August Cole. Doesn’t say all that much, but used the same couple of hashtags for several things.

07-august-cole

This is a process demonstration and it worked well enough for our purposes. If I were going to do this on a regular basis or handle large volumes of data I would probably develop methods within Maltego and then code something to do the work, once I had a solid grasp of the problem.

Wikistrat: Full Network As Of 3/30/2013

I spent quite a bit of yesterday hand entering the 156 Wikistrat experts, capturing their names, Wikistrat profiles, LinkedIn profiles, and Twitter accounts.

The green dots are “Twitter affiliation entities”, in the language of the Maltego manual. The blue dots are URL entities. It isn’t terribly clear in this image, but Maltego has a method to ‘color’ entities, and this shows up as a tiny star. All of the Wikistrat profiles got a yellow star, while LinkedIn profiles are blue. This permits selection and operations on a subset of entities of the same type. The brown dots are Person entities, which own their respective URLs, and which have a link to the Wikistrat domain. We will see other Person entities turn up as a result of transforms, and having a central hub like that permits picking just the members via the ‘Select neighbors’ function.

Wikistrat Network: Experts, Wikistrat Profiles, LinkedIn Profiles & Twitter Accounts

Wikistrat Network: Experts, Wikistrat Profiles, LinkedIn Profiles & Twitter Accounts

This is an example of that coloring and neighbor selection in action. I selected all Twitter entities by type, then pulled them away from the core of the network. I used the yellow and blue tags to differentiate between Wikistrat profiles, which everyone has, and LinkedIn profiles

Exploded view of four classes of entities in the Wikistrat Maltego graph.

Exploded view of four classes of entities in the Wikistrat Maltego graph.

I selected the Twitter entities and used the transform that finds their friends, then I dropped any that did not have at least two friends. This graph also begins to visually show something we already know from the dataset. The mix of brown Person entities and blue Wikistrat entites are the ones who only have a Wikistrat profile. The Twitter accounts are a separate group and the Person entities nearest to them are the ones that have Twitter accounts. The group of URL entities off to the side are LinkedIn profiles, which about a third of them have, and they belong to the nearest Person entities. The border between Twitter, LinkedIn, and the Person entities nearby are those that have accounts on both systems.

Seeing community structure like this after expanding on one unrelated subsection is counter-intuitive, but it’s an artifact of the ‘organic’ layout algorithm. Adding Twitter accounts triggered additional ‘repulsion’ between all nodes, spreading them out and providing a little insight.

Adding the Twitter inner circle to the Wikistrat graph.

Adding the Twitter inner circle to the Wikistrat graph.

This is an intermediate step in the process. I pulled the Person entites away from the rest of the network, then selected the Wikistrat profiles, the larger group of blue URL entities, and moved them away, too. The large multicolored muddle are the keywords from all Wikistrat profiles, extract using OpenCalais NER (named entity recognition).

Wikistrat network after applying NER to profiles from the official site.

Wikistrat network after applying NER to profiles from the official site.

The NER results were all over the board, so I thought I would try to sharpen things up by doing the same with LinkedIn profiles. You can still discern a bit of separation between the Wikistrat profiles NER and the ones from LinkedIn.

Wikistrat network after applying NER to LinkedIn profiles.

Wikistrat network after applying NER to LinkedIn profiles.

I thought it would be nice to know where the Wikistrat people are located as well as which geographic areas they name as part of their expertise. I selected all pink Location entities and pulled them aside.

Locations detected with NER from Wikistrat and LinkedIn profiles.

Locations detected with NER from Wikistrat and LinkedIn profiles.

I let the organic layout arrange the entities and I was surprised to see there weren’t any geographic hubs. I would have expected a large number of them to point to D.C. Instead almost all locations are connected to just a single entity.

Wikistrat organic layout with locations.

Wikistrat organic layout with locations.

I manually separated the locations, Twitter accounts, and I put the Wikistrat URLs on top with the LinkedIn URLs below. This is a good visualization of poor information, at least in terms of location. I am going to hunt for a way to automate finding that information, but I suspect that if I truly want it this will be another hand entry job.

Wikistrat network with majro classes of entities separated.

Wikistrat network with major classes of entities separated.

I will admit that I had high hopes when I fired up the Named Entity Recognition stuff. The Wikistrat profiles show the areas of expertise for each person and I was hoping that this would be a neat way of sorting them by their regions of interest.

I have a choice to make at this time. The Wikistrat profiles are resistant to NER but they are fairly regular in layout. I can write a simple parser that knows to look for the numbered areas of expertise in each profile, or I can hand enter them. Automating that processing would be pretty easy for me, but there would be some utility in examining all of the experts again by hand, as I am trying to describe how this group fits into the overall flow of foreign policy making.

Wikistrat’s Analysts & Friends

I got fed up with transcribing people’s information into Maltego and decided I’d spend a little time digging.

This is a classic Twitter donut. Starting with the people directly connected to the Wikistrat Twitter account, pull all their friends. Any friend with two or more links will be in the inner circle, while those with a single attachment are in the outer ring.

Wikistrat Analysts & Friends

Wikistrat Analysts & Friends

Just eyeballing the same graph in ‘organic’ mode, trying to see if there are obvious communities. They do appear to exist.

Wikistrat Analysts & Friends Organic View

Wikistrat Analysts & Friends Organic View

Remove all the single link friends except for the ones that began as outliers.

Wikistrat Analyst Core Group & Several Isolated Individuals

Wikistrat Analyst Core Group & Several Isolated Individuals

I looked closely at the loners and decided to remove their friends. Vikram here is an odd case – he’s the only non-Wikistrat associate that got a second link by being directly connected to the Wikistrat account itself.

Wikistrat 'Loners' & Vikram

Wikistrat ‘Loners’ & Vikram

I moved the inner ring to its own graph and switched to Maltego’s organic view. The red starred entities are the Wikistrat people, the rest are about 75% human and 25% role accounts for think tanks and news services.

Wikistrat Analysts, Friends & Organization Role Accounts

Wikistrat Analysts, Friends & Organization Role Accounts

I removed the role accounts, hoping to see community in the layout. I thought I might find clusters of people who work for the same think tank as well as Wikistrat, or maybe communities of practice based on specific areas of expertise. Neither were in evidence, despite what the big picture seems to show. I am not saying that such divisions don’t exist, I’m saying that this sample doesn’t demonstrate them, probably because the sample is too small.

The Wikistrat account being present as a friend to all is not a fact, it’s a convenient fiction for me that helps to keep the full sized graph organized in a visually useful fashion. Removing it did not provide any great revelations either.

Seeking Communities By Removing The Wikistrat Hub

Seeking Communities By Removing The Wikistrat Hub

I slogged through entering all 156 Wikistrat experts after I took these screen shots. I discovered half a dozen new Twitter accounts and next I’m warming up to apply some Named Entity Recognition to the experts’ profiles on both the Wikistrat web site and on LinkedIn.