In this article, I want to persuade you of the real possibility and high probability that, in the very near future, remote entities will be able target people’s on-line presence to capture and leverage their emotional states and feelings. There are some very extreme implications of this from a security and privacy perspective, and this is the scope I will adhere to in this article. On the flip side, the ideas presented in this article can be leveraged to construct powerful business decisioning and measurement capabilities, a topic that deserves it’s own space - I will cover this subject in a separate article in the next few days.
Before I go any further, I want to stress that the purpose of this article is not to spread undue alarm, nor is the purpose to portray social online media as an evil. I personally utilize the many avenues of online communication and collaboration facilitated by the Generation Y culture. The purpose of this article, instead, is to share some of my initial thoughts on the possibilities of abuse, specific to the mapping of individual feelings online and possible implications.
We Feel Fine.
To begin with, I insist that you watch Jonathan Harris’ TED talk titled The Art of Collecting Stories:
In this talk, Jonathan describes his passion for making sense of the emotional world and his deep compassion for the human condition. Regardless of this particular article, Jonathan’s talk stands on it’s own. I think Jonathan’s ideas, projects, and aspirations are true works of art. His ideas are powerful enough to inspire a security professional such as me to look outside the oft-incestual world of information security, and to reach out and connect with other venues of Science and understanding. In a small way, the material presented in this article are my attempts to try and do just that.
I invite you to visit one of Jonathan’s projects that he co-founded with Sep Kamvar - We Feel Fine :
Since August 2005, We Feel Fine has been harvesting human feelings from a large number of weblogs. Every few minutes, the system searches the world's newly posted blog entries for occurrences of the phrases "I feel" and "I am feeling". When it finds such a phrase, it records the full sentence, up to the period, and identifies the "feeling" expressed in that sentence (e.g. sad, happy, depressed, etc.). Because blogs are structured in largely standard ways, the age, gender, and geographical location of the author can often be extracted and saved along with the sentence, as can the local weather conditions at the time the sentence was written. All of this information is saved.
The result is a database of several million human feelings, increasing by 15,000 - 20,000 new feelings per day. Using a series of playful interfaces, the feelings can be searched and sorted across a number of demographic slices, offering responses to specific questions like: do Europeans feel sad more often than Americans? Do women feel fat more often than men? Does rainy weather affect how we feel? What are the most representative feelings of female New Yorkers in their 20s? What do people feel right now in Baghdad? What were people feeling on Valentine's Day? Which are the happiest cities in the world? The saddest? And so on.
...
At its core, We Feel Fine is an artwork authored by everyone. It will grow and change as we grow and change, reflecting what's on our blogs, what's in our hearts, what's in our minds. We hope it makes the world seem a little smaller, and we hope it helps people see beauty in the everyday ups and downs of life.
Here is a video I uploaded to Youtube, demonstrating We Feel Fine’s interface, including the ability filter for specific targets (for example: feelings expressed by individuals in their 20s in Iraq):
Emotion Dashboard: Targeting Individuals.
The We Feel Fine project does not target specific individuals. The creators of the project imply that doing so would violate an individual's privacy:
Privacy: We Feel Fine only collects and displays data that was already posted publicly on the World Wide Web? We Feel Fine never associates individual human names with the feelings it displays, though it always provides a link to the blog from which any displayed sentence or picture was collected....
We Feel Fine is a work of art designed by well meaning intellectuals. It doesn’t have the capability nor the intention of intruding on any one particular person’s privacy, yet the project raised my personal consciousness towards the security and privacy implications of capturing the feelings (past and present) of individuals.
To pursue discussion around the possibility and implications of capturing feelings projected by individuals online, I decided to develop a proof of concept visualization tool that I will call Emotion Dashboard. This is not a production-ready tool of any sort because I do not currently have the resources to develop such a thing. The goal of this tool (if you should even call it a tool) is to demonstrate my ideas and my vision on this particular topic to facilitate and encourage further discussion in the community. Here are the components of Emotion Dashboard:
- RSS. It consumes an RSS feed as its source of input. This RSS feed can include more than one resource stitched together using a service such as Yahoo Pipes:
In other words, the targeted individual’s online presence may include his or her Facebook profile updates, Blogs, and Twitter messages. In this way, updates on all of the sources of a particular individual’s online presence can be coupled together in one RSS feed and then supplied to Emotion Dashboard which will scan the feed from the past to the present (older entries first).
- Pulse. In order to visualize the emotional state of an individual from the past (older RSS entry) to the current, the tool includes a line graph at the top of the interface that tends upwards when a word that expresses a happy (positive) emotion is found, and downwards when a word that expresses a sad or angry (negative) emotion is located. To accomplish this feature, I was able to leverage the CSV file provided by the We Feel Fine project located here: http://www.wefeelfine.org/data/files/feelings.txt. This file includes a list of words that are commonly used to express feelings. I marked each word in this file against my judgment of it being a positive or negative sounding word. Occurrences of these words are plotted on the line graph, and can also be clicked on to spawn a new browser session targeting the relevant location of the word.
Immediately below the line graph is a solid bar that expresses the culmination of the individual’s overall mood. The color of this bar is either Yellow (happy), Blue (sad), or Red (angry). The hex code for these colors are also derived from the We Feel Fine CSV file listed above.I concede that this technique of merely grepping for words lacks context and that is prone to an extremely high error rate. However, given the limited amount of resources I have at this point, my goal is not to provide something that readily usable for all cases, but to present a starting point of a possible approach and the probable implications should this be extended to apply intelligent grammar based contextual analysis. Do note that, even though I concede this is an approach vulnerable to a high error rate, the technique does, statistically speaking, get slightly more accurate the more words it consumes.
- Word Cloud. Below the line graph is a simple word cloud containing words from the CSV list discussed above. As the RSS feed is analyzed from past to present, words in the word cloud grow in size as they re-occur.
The word cloud allows the user to analyze the words being used to express feelings as the Emotion Dashboard reads the RSS feed from past to present. The words in the cloud are colored based on the associated hex color codes present in the CSV file.
The following is a screen-shot demonstrates a sample output of an individual’s (who we will call “Jack Smith” for the purposes of this discussion) online presence:
Here are some observations and implications:
- Jack’s initial online presence portrays his emotional state as positive (word-cloud: happy).
- Jack’s blogs about his friend being laid-off from his job (word-cloud: layoff). This is a negative event.
- Feelings expressed by Jack on venues (other than this blog) where he has online presence (example: Twitter), on the same day as his blog entry about his friend’s layoff, are extremely negative (word-cloud: handicapped, upset) even though Jack is discussing other topics. This can lead to the hypothesis that Jack’s overall mood is negative because he is influenced by his friend’s situation. This hypothesis, if true, may allow a malicious third party into manipulating Jack’s negative state to influence his actions. However, in order for such a tactic to succeed, the third party will need to understand Jack’s personality to understand how Jack behaves in moments of stress. It is possible for a third party to construct a personality profile on Jack by studying his authored content based on his on-line presence (blog, Twitter, Facebook, etc) and correlating it with known personality analysis methodologies, for example, the Big Five personality traits based tests:
Once enough information about Jack is collected to reasonably satisfy the personality test requirements, Jack’s personality patterns can be determined that may aid a malicious third party in exploiting Jack’s current emotional state. It is also plausible that this an be extended to automated and trigger based abilities. This is an extremely powerful idea - Jack may not be consciously aware of his negative mood, yet a third party may be able to analyze this remotely with some degree of probability. The following is a screen-shot of the results of a Big 5-like personality test (courtesy of Signal Patterns) :
- Jack’s mood recovers to a positive state as time progresses, only to be briefly pulled down momentarily by his discussion of his friend’s layoff situation. This illustrates that the after-shocks of his friends situation are still negatively affecting him.
- Eventually, Jack recovers to his average positive state (word-cloud: nice).
Case Study: Criminal Investigation and Analysis.
There are numerous security and privacy implications of the discussion at hand. I am unlikely to succeed in attempting to iterate them all. Instead, I want to present one particular case study that can further illustrate the impact of this topic.
In this case study, I want to take upon the following real incident: http://blog.mlive.com/chronicle/2008/07/excon_vents_pain_online_then_k.html
Ex-con vents pain online, then kills
OCEANA COUNTY -- Danlee Mead was apparently using his MySpace site to tell the world how unhappy and desperate he felt in the hours before he abducted and killed his wife, then turned a shotgun on himself.... Hours later, the depth of the ex-convict's anguish turned to violence.....
A cached copy of Danlee’s MySpace page suggests that he changed his profile (moments before he committed the violent act) to use more positive-sounding words, even though his overall thoughts remained negative. His prior profile, also consisted of negative feelings, yet the words used in the original profile were more negative-sounding. Here is a demonstration of what his profile looks like when run through an analysis over time:
A few observations:
- Initially, Danlee’s Myspace profile frequents negative-feeling words (blue bar).
- His profile remains consistently negative over time (blue bar).
- The words used in his updated profile tip the mood bar to positive (yellow). This is when Danlee changed his profile right before committing the crime.
Following from the above observations, it is clear to see how this type of analysis can be used by investigators, admittedly after-the-fact, to get a glimpse into a suspect's state of mind over time.
It may not be possible to use data from online social media to proactively detect the future behavior of all individuals, yet in this situation, the criminal did indeed have prior history of crimes. Perhaps a proactive approach targeted towards known suspects’ online social presence can be used to detect certain deviance form tuned thresholds - possibly in an automatic fashion based on a set of defined triggers. Such an approach seems more tolerable for a set of individuals with known backgrounds because the elements in their history can aid in influencing the signal-to-noise ratio in favor of the signal.
Some Additional Thoughts.
The prior case study was just one illustration of the many impacts of using social media to capture the psyche of individuals. Here are some additional thoughts:
- There are positive and negative implications of targeting individuals (or groups). In the first situation, it is easy to see how Jack’s online activity was used to get a better understanding of his psychological state, in addition to the hypothesis on how something like this can be further extended to aid in malicious manipulation and influence by a malicious entity. In the second situation, it is clear to see how the visualization of expressed feelings online may aid investigators into obtaining further insight into a given case.
- The victim is the volunteer. Individuals with social presence online willingly contribute and volunteer data that can facilitate the mapping of their psyche. This is in contrast to the Orwellian sense, where information is extracted from the victims in an intrusive way.
- The data set is genuine. Most people do not over-edit their blog entries or Twitter messages to conceal emotions.
- The study of an individual’s online presence and it’s correlation to emotion and personality analysis is most likely to remain probabilistic. This introduces the risk of unfair analysis. For example: What does it mean for an individual to be identified, and in turn judged, as someone with a 15% chance of being a psychopath?
- (online) Social privacy is an oxymoron. Social applications are, by definition, mutually beneficial to users within the system. If you sign up on a social networking application as Mickey Mouse to protect your identity, your friends will not be able to find you, thereby decreasing the value of the system to you. The popular social networking sites often promise privacy by implementing controls on certain tuples, yet as a user, it is important to understand that there is implied and indirect information within the system (such as connections between networks and the cases presented in this article) that cannot be concealed without destroying the core use-cases of the social application.
To conclude, I sincerely hope this article facilitates further discussion around the topics presented. You may feel that the probability of fruition of some of my thoughts and ideas is low. Perhaps you may find them extremely fantastical, or perhaps you agree that the scenarios presented indeed have a high probability of being relevant in the near future. I am obviously intrigued by the topic and I’d be delighted to hear your thoughts.