I spent quite a bit of yesterday hand entering the 156 Wikistrat experts, capturing their names, Wikistrat profiles, LinkedIn profiles, and Twitter accounts.
The green dots are “Twitter affiliation entities”, in the language of the Maltego manual. The blue dots are URL entities. It isn’t terribly clear in this image, but Maltego has a method to ‘color’ entities, and this shows up as a tiny star. All of the Wikistrat profiles got a yellow star, while LinkedIn profiles are blue. This permits selection and operations on a subset of entities of the same type. The brown dots are Person entities, which own their respective URLs, and which have a link to the Wikistrat domain. We will see other Person entities turn up as a result of transforms, and having a central hub like that permits picking just the members via the ‘Select neighbors’ function.
This is an example of that coloring and neighbor selection in action. I selected all Twitter entities by type, then pulled them away from the core of the network. I used the yellow and blue tags to differentiate between Wikistrat profiles, which everyone has, and LinkedIn profiles
I selected the Twitter entities and used the transform that finds their friends, then I dropped any that did not have at least two friends. This graph also begins to visually show something we already know from the dataset. The mix of brown Person entities and blue Wikistrat entites are the ones who only have a Wikistrat profile. The Twitter accounts are a separate group and the Person entities nearest to them are the ones that have Twitter accounts. The group of URL entities off to the side are LinkedIn profiles, which about a third of them have, and they belong to the nearest Person entities. The border between Twitter, LinkedIn, and the Person entities nearby are those that have accounts on both systems.
Seeing community structure like this after expanding on one unrelated subsection is counter-intuitive, but it’s an artifact of the ‘organic’ layout algorithm. Adding Twitter accounts triggered additional ‘repulsion’ between all nodes, spreading them out and providing a little insight.
This is an intermediate step in the process. I pulled the Person entites away from the rest of the network, then selected the Wikistrat profiles, the larger group of blue URL entities, and moved them away, too. The large multicolored muddle are the keywords from all Wikistrat profiles, extract using OpenCalais NER (named entity recognition).
The NER results were all over the board, so I thought I would try to sharpen things up by doing the same with LinkedIn profiles. You can still discern a bit of separation between the Wikistrat profiles NER and the ones from LinkedIn.
I thought it would be nice to know where the Wikistrat people are located as well as which geographic areas they name as part of their expertise. I selected all pink Location entities and pulled them aside.
I let the organic layout arrange the entities and I was surprised to see there weren’t any geographic hubs. I would have expected a large number of them to point to D.C. Instead almost all locations are connected to just a single entity.
I manually separated the locations, Twitter accounts, and I put the Wikistrat URLs on top with the LinkedIn URLs below. This is a good visualization of poor information, at least in terms of location. I am going to hunt for a way to automate finding that information, but I suspect that if I truly want it this will be another hand entry job.
I will admit that I had high hopes when I fired up the Named Entity Recognition stuff. The Wikistrat profiles show the areas of expertise for each person and I was hoping that this would be a neat way of sorting them by their regions of interest.
I have a choice to make at this time. The Wikistrat profiles are resistant to NER but they are fairly regular in layout. I can write a simple parser that knows to look for the numbered areas of expertise in each profile, or I can hand enter them. Automating that processing would be pretty easy for me, but there would be some utility in examining all of the experts again by hand, as I am trying to describe how this group fits into the overall flow of foreign policy making.