Skip links

The dangers of social media monitoring


As colleges and universities get more heavily involved in social media, the next logical step (I hope it is, at least) is to monitor who mentions your school and what they are saying about it. And there are hundreds of software programs that will constantly scan the Internet and database all mentions of your school. They will even tag the comments as positive, negative or neutral.
Sounds great! But, it’s only a first step. While there are some great software programs out there, they can’t replace a human. Here’s why.

1. Representation: so the software says 70% of comments about RPI are positive. And 50% of comments about MIT are positive. Don’t assume that RPI is better liked. Who are those 70%? Who are those 50%? The internet users responsible for user-generated content are not necessarily representative of the general public, a college’s student base or that of its competition. In 2009, Forrester Research found that while 80% of people in the US have access to the Internet, only 24% would be considered content-generators…those who blog, have a website, upload audio/video/photos, post articles, etc.

It is important to know who is out there creating comments, not just what those comments are.

2. Contextual confusion: It is extremely difficult to automate context. For example, distinguishing someone who just bought a rotten apple from someone who hates their new Apple computer. Text-scraping tools are getting better, but they are no replacement for human oversight of context.

For example:
This would come back as a negative comment: “Just went on a campus visit and, man, Ithaca College has one bad-ass music room.” The software wouldn’t recognize the slang use of ‘bad’.

This would come back as a positive comment: “I love being a Rutgers student. My favorite part is standing in the freezing rain waiting for the stupid bus to get here”. The software can’t contextualize sarcasm. Only a human can.

3. Online Echo: of the search results, how many are duplicate thoughts from the same user? Content generators rarely post to only one single place, but instead post the same thought to Twitter, Facebook, a blog, a forum, a friend’s Facebook account (I know I do). Unless the wording is identical, the software can’t identify that as the same user to remove the duplicates.

There is software that can scrape data and ‘learn’ user attributes to identify them, the pool of identified content-generators vs. anonymous content-generators? Well, it’s like comparing a puddle to the Pacific. And multiple posting of the same data will skew and disrupt quantitative analysis. But, if a human reviews the data, they will recognize when basically the same comment appears multiple times.

In summary: Relying on quantitative analysis alone is risky. A more robust method for analyzing user generated content is to use web-scraping and text analytics tools to gather and provide the initial sort-through of data but to then analyze the data qualitatively. UGC can be incredibly useful but the real value lies in understanding what people are saying…not just how many people are saying it.