Facebook has emerged as one of the largest data companies in the world. The social media giant has more than 1.23 billion active daily users, many of whom post lots of personal information that’s analyzed not only by the company, but independent scientists.
Those researchers include Jennifer Golbeck, a University of Maryland computer scientist who who writes algorithms that produce a deeper understanding of individual users. She discussed her work with the San Diego Union-Tribune.
Q: Facebook users realize that the company knows certain things about them because we voluntarily post a lot of personal information. But you write algorithms that predict many things about these users — like a person’s age, their religion, their sexual orientation, their personality traits, even their intelligence.
How do these algorithms work?
A: They start with a bunch of data and what we call “ground truth”. For example, if you want to know if someone’s introverted based on what they tweet, you need lots of tweets from people who are known to be introverted and people who aren’t.
Then, you measure a bunch of things about the data. If you’re using tweets, it may be what hashtags they use, who they follow, what kinds of words they use, how often they use images, and so on. Then, you teach a machine learning algorithm how to tell the groups apart. Basically, you say “here’s a person we know is introverted. Here’s all the measurements for that person.”
You do this over and over with thousands of examples of introverts and non-introverts. The algorithm learns what features or combinations of features show a difference between each group. Then, you can theoretically feed it measurements for an unknown person and it will guess based on what it learned.
Q: Most users click the “Like” button on Facebook. How important are those likes to coming up with a profile of an individual user?
A: The likes are interesting because they are always public – you can’t make them private no matter what your settings are. Research has shown that likes are useful to predict all kinds of demographics and behaviors. People can try it on themselves; the project is online at http://applymagicsauce.com/
Question: Do these algorithms do simple things like say, “Well, this user likes BBC America and NPR. Therefore, he or she must be well-educated”?
A: They can, but usually it is much more complex than that. In fact, most of the connections don’t make a lot of logical sense. For example, when one paper was published, liking the Facebook page for curly fries was a strong indicator of intelligence. It’s not because curly fries have anything to do with intelligence. There are lot of complex social dynamics going on behind the scenes and liking a page reflects those, which relate to personal attributes much more than the subject of a page does.
Q: Do the algorithms factor in the photos and videos that we post on Facebook? Many photos are rich in meaning.
A: They can! There are algorithms that work just on photos. There’s one recent study that can accurately identify signs that a person is depressed based on the composition of photos they post.
The interesting and scary thing is that we find signals of people’s personal attributes and mental states pretty much everywhere we look for them. This means we can discover things all over, which is exciting from a scientific perspective, but it also means people can’t hide from the algorithms. I personally think consent is critical as these move from research labs into the public sphere, but now, basically anyone can discover your secrets from your public social media data, and there’s nothing you can do about it. I find this very troubling.