At 6:45 A.M., the alarm on my mobile phone wakes me up. Eager to start the day, I carry my phone to the kitchen while I scan through my email and Facebook notifications. My phone’s GPS receiver and WiFi register the changes in location, logging my shift a few meters north and east. As I pour myself a cup of coffee and really start to get going, the phone’s accelerometer tracks how quickly I walk and the barometer registers when I’m going up the stairs. Because I have Google apps installed on my phone, Google has a record of all these data.
After breakfast, I’m ready to make my way to Stanford University.
The electricity company has put in a “smart” meter, which registers the decrease in electricity use as I turn off my lights and unplug my mobile devices. When I open the garage door, the meter detects the usage signature specific to it. Thus, as I pull my car out onto the street, my electricity provider has enough data to know I’m no longer at home.
When my phone’s signal gets picked up by different cellular signal towers, so does my mobile phone carrier. On the road, a camera installed on a street corner takes a photo of my license plate in case I speed through a red light. Thankfully, I’m on my best behavior today so I won’t be greeted with a ticket in the mail.
But as I go on my way, my license plate is photographed again and again. Some of those cameras belong to the local government, while some belong to private companies that are analyzing the data to identify patterns of mobility—which they sell to police departments, land developers, and other interested parties.
When I get to Stanford, I use the EasyPark app on my phone to pay the parking fee. The money is automatically debited from my bank account, and the university parking team is notified that I’m paid up, so both the school and my bank can see that I’m on campus starting at 9:03 a.m.
When my phone stops moving at a car’s pace, Google infers this is where I have parked and logs the location, so that I can look it up in case I forget later. It’s also time to check my Metromile insurance app, which has been recording data about my drive from the car’s on-board diagnostic system. I can see in an instant that my fuel efficiency was lower today—nineteen miles per gallon—and that I spent $2.05 on gas for my commute.
After my day at Stanford, I’m planning to meet up with a new friend back in San Francisco. We “virtually” met each other when we both commented on a post by a mutual friend on Facebook, and liked each other’s take on the topic. It turned out we had more than thirty Facebook friends in common, more than enough reason to meet up.
Google Maps predicts that I’ll get to my new friend’s place at 7:12 p.m., and as usual the prediction is correct within a few minutes. As it happens, my friend lives above a store that sells tobacco products as well as various paraphernalia used for smoking marijuana. The GPS receiver on my smartphone doesn’t differentiate between the apartment and the store, however.
As far as my carrier and Google are concerned, I’ve ended my day with a visit to the head shop—a fact revealed to me by the ads Google shows when I check the weather forecast before going to bed.
GIVE TO GET
Welcome to the social data revolution. Every day, more than a billion people create and share social data like these. Social data is information about you, such as your movements, behavior, and interests, as well as information about your relationships with other people, places, products, even ideologies.
Some of these data are shared knowingly and willingly, as when you are signed in to Google Maps and type in your destination; others less so, often without much thought, part and parcel of the convenience of using the internet and mobile devices.
In some cases, it is clear that sharing data is a necessary condition for receiving services: Google can’t show you the best route to take if you don’t tell it where you are and where you want to go. In other cases, you might happily contribute information, as when you “like” a friend’s Facebook post or endorse a colleague’s work on LinkedIn simply because you want to reach out and support her in some way. Social data can be highly accurate, pinpointing your location to within less than a meter, but social data are often sketchy, in the sense of being incomplete.
For example, unless I sign in to an app that displays my smart meter’s readings (for instance, to be sure that I really did turn off all the lights in my house as I make my way to the airport), the electricity company knows when I am not at home, but nothing more than that. It’s a rough data point that may or may not be of much use to me. Similarly, as I was visiting my new friend in San Francisco, while my latitude and longitude were conveyed with precision, the inferences made about my activities that evening were utterly wrong. That’s even sketchier, in the sense that the data appeared quite exact but were very much an interpretation.
Sketchy data have a tendency to be incomplete, error-prone, and—occasionally—polluted by fraud.
Altogether—passive and active, necessary and voluntary, precise and sketchy—the amount of social data is growing exponentially. Today, the time it takes for social data to double in quantity is eighteen months. In five years, the amount of social data will have increased by about a factor of 10, or an order of magnitude, and after ten years, it will increase by about a factor of 100. In other words, the amount of data we created over the course of the entire year 2000 is now created over the course of a day.
At our current growth rate, in 2020 we’ll create that amount of data in less than an hour. It’s essential to understand that “social data” isn’t merely some trendy buzzword for social media. Many social media platforms have been designed for broadcasting. In the case of Twitter, communication is almost always moving in one direction, from a celebrity, authority, or marketer to the masses. Social data is far more democratic. You may share information about yourself, your company, your accomplishments, and your opinions through Twitter or Facebook, but your digital traces are much deeper and broader than that.
Your searches on Google, your purchases on Amazon, your calls on Skype, the minute-by-minute location of your mobile phone—all these and many more sources come together to produce a unique portrait of you as an individual.
Further, social data doesn’t end with you. You create and share data about the strength of your relationships with family, friends, and colleagues through your communication patterns; you create data along side friends and strangers alike—for instance, when reviewing a product or tagging a photo on Instagram. You verify your identity when you set up an account on Airbnb, the platform for renting a room or house, using your Facebook profile in addition to a government-issued ID.
Social data are becoming embedded in homes with smart thermostats, in cars with navigational systems, and in workplaces with team-based software. Such data are beginning to feature in our classrooms and doctors’ offices. As mobile phones get loaded up with more sensors and apps, and new devices start tracking your behavior at home, in the mall, and on the job, you’ll have less and less ability to control the data that describe your daily routine—as well as your deepest wishes.
Data scientists become detectives and artists, painting iteratively clearer sketches of human behavior from our digital traces. These digital traces are examined and distilled to uncover our preferences, reveal trends, and make predictions, including about what you might buy.
During my tenure as chief scientist of Amazon, I worked with Jeff Bezos to develop the company’s data strategy and customer-centric-culture. We ran a series of experiments to see if customers were happier with their purchases when they were shown editor-written versus consumer-written product reviews, and whether recommendations based on traditional demographic profiling or individual clicks were more successful.
We saw the power of genuine communication over manufacturer-sponsored promotions. The personalization tools we created for Amazon fundamentally changed how people decide what to purchase and became the standard in e-commerce.
Since leaving Amazon, I have taught courses on “The Social Data Revolution” to thousands of students, from undergraduates and graduate students at Stanford and the University of California–Berkeley to Chinese business students at Fudan University and China Europe International Business School in Shanghai, and Tsinghua University in Beijing.
I also continue to run the Social Data Lab, a group of data scientists and thought leaders that I founded in 2011. Over the past decade, in my work with corporations ranging from Alibaba and AT&T to Walmart and UnitedHealthcare, and at major airlines, financial services firms, and dating sites, I have been an advocate for sharing the decision-making power of data with customers and users—regular people like you and me.
No single person can wade through all of the data available today in an effort to make what we used to call an “informed” decision about some aspect of life. But who will have access to the tools that are necessary for leveraging data in service to our problems and needs?
Will the preferences, trends, and predictions extracted from data be available to only a few powerful organizations, or will they be available for anyone to use? What price will we have to pay to secure the dividends of our social data?
As we discover the value of social data, I believe we must focus not just on access but also on actions. We face some decisions many times each day, others just once in a lifetime. Indeed, the social data we create today have a long shelf life. The way we behave today may influence the choices we face in the decades to come. Few people have the ability to observe everything they do, or to analyze how their behavior might affect them, in the short or long term.
Social data analysis will allow us to better identify the possibilities and probabilities, but the final choice must be deliberate. One thing these technologies cannot do is decide what sort of future we want—as individuals or a society. The laws in place that protect individuals in many countries from discrimination in the workplace or health care may not exist tomorrow—and in some countries, they do not exist even today. Imagine that you opt to share that you’re worried about having high cholesterol with a health app or site in order to get advice about diet and exercise regimens. Could your worries be used against you in some way?
What if the law made it permissible to charge you a higher rate for medical care if you refused to stop eating deep-fried food and slouching on the couch after you’ve been presented with a menu of your health risks and recommendations for healthier choices?
What if a manager used a service to crawl the web for information about you, and then, based on what he learned, decided that your lifestyle isn’t a good match for a job at his company and he won’t consider your application? These are real risks.
If the sole person creating and sharing data about you was yourself, you might be able to withhold information that you thought might be risky. It would cost you a lot of convenience, but it could be possible. However, we do not live in such a world. You have no control over much of the data about you. This fact will become more palpable as social data are utilized by businesses and governments to improve effectiveness and efficiency. Because social data are so democratic, the questions about how best to handle it touch each and every one of us.
Technology is moving fast, and the companies that collect and analyze our data are primarily in the business of creating and coding information, not creating and codifying principles. Many of those questions are being considered on an ad hoc basis, if they’re being considered at all. We should not leave decisions about principles that will deeply influence our future in the hands of the data companies.
We can agree to have all of these data collected, combined, aggregated, and analyzed so that we are in a better position to understand the tradeoffs in decision-making. Human judgment is crucial to evaluating the trade-offs intrinsic to any important decision. Our lives should not be driven by data. They should be empowered by data.
PRINCIPLES FOR THE POST-PRIVACY AGE
As we’ve come to appreciate the increasing role of data in life, there have been several efforts to safeguard citizens’ interests. In the 1970s, the United States and Europe adopted broadly similar principles for the fair use of information. Individuals were told they had a right to know who collected what data about them, and how these data were being used.They could also correct data about themselves that were inaccurate. These protections are perversely both too strong and too weak for the world of new data sources and analytics that is being built today. They’re too strong because they assume it’s possible to keep tabs on all the data collected about you. Amazon might be able to explain in accessible terms exactly how the data the company collects about you are used. It might even be able to do so in a way that helps you make better decisions. But reviewing all this information would require investing a lot of time. How many of us would take the time to trawl through all the relevant data? Would it be useful to you to see how Amazon weighs each data point, or would you prefer to get a summary?
At the same time, these protections are too weak, because even if you could check every bit of data you have created and shared about yourself, you will not get a full picture of the data about you, which includes data created and shared by others, such as your family, friends, colleagues, and employers.
The businesses you visit online, as well as most of those you visit in the physical world, also create (and sometimes share) data. That goes for strangers on the street and a number of other organizations, public and private, with which you interact.
Who decides whether these data are accurate or inaccurate? Because data today come from so many perspectives, having the right to correct data about yourself doesn’t reach nearly far enough.
Finally, even accurate data can be used against you.With the massive quantitative and qualitative shifts in data creation, communication, and processing, the right to know and the right to correct are clearly insufficient. Thus far, the attempts to update these guidelines have focused almost entirely on maintaining individual control and privacy.
Unfortunately, this approach is borne out of ideals and experiences that are technologically a century out of date. Standards of control and privacy also force individuals to enter an unfair contract with data companies. If you want your decision-making to be improved by data, you usually have to agree to having your data collected on the data collector’s terms.
Once you’ve done that, the data company has satisfied the legal requirement to give you individual “control,” regardless of how much choice you really had or the effects on your privacy. I
f you want to maintain personal privacy, you can instead withhold your consent to data collection and forfeit your access to relevant data products and services, reducing the value you get from your data. Enjoy your individual control then.
Today, what we need are standards that allow us to assess the risks and rewards of sharing and combining data, and provide a means for holding companies accountable. After two decades working with data companies, I believe the principles of transparency and agency hold the most promise for protecting us from the misuse of social data while increasing the value that we are able to reap from them.
Transparency encompasses the right of individuals to know about their data: what it is, where it goes, and how it contributes to the result the user gets. Is the company observing you from the “dark” side of a one way mirror, or does it also give you a window with a view to what it does with your data, so that you can judge whether (and when) the company’s interests are aligned with your own?
How much data about yourself do you have to share to receive a data product or service that you want? Historically, there has been a strong information asymmetry between institutions and individuals, with institutions having the advantage. Not only do institutions have more capacity to collect data about you, they can interpret your data in comparison to others’ data.
The balance between what you give and what you get needs to be clear to you. Consider how transparency is designed into the shopping experience at Amazon compared to the traditional relationship between customers and retailers. When you are about to buy an item, should a retailer remind you that you already bought it, potentially losing a sale in the process? At Amazon, if you try to buy a book you’ve already bought from the site, you’re greeted with the query “Are you sure? You bought this item already, on December 17, 2013.”
If you buy a track from an album of music and then decide to buy the rest of it, Amazon will “complete the purchase,” automatically deducting the amount you have already spent on the track from the current price for the album. Amazon surfaces and uses data about your purchasing history in these ways because the company wants to minimize customer regret.
Likewise, many airline frequent flyer programs now send you a reminder that your miles are about to expire rather than letting them quietly disappear from the company’s books. Unfortunately, transparency is far from the norm. Consider the far-too-typical experience of calling your favorite customer service center. At the start of the call, you’ll inevitably receive the warning: “This call maybe recorded for quality assurance purposes.” You’ve got no choice: you must accept the company’s conditions if you want to talk to a representative.
Okay, but why is that recording accessible only to the business? What, really, does “quality assurance purposes” mean when only one side of the conversation is assured access to the record of what was agreed?
The principle of data symmetry would also give you, the paying customer, access to the recording. Whenever I hear that my call might be recorded, I announce to the customer service rep that I might also record the call for quality assurance purposes. Most of the time, the rep plays along. Occasionally, however, the rep hangs up. Of course, I could record the call without asking for the rep’s permission—which, I should note, is against the law in some places. Then, if I don’t get the service I was promised, I could appeal to a manager with my evidence in hand. If that still didn’t work, I could upload the audio file in the hopes that it might go viral and the company feels pressured to fix things quickly—as Comcast did when a customer tried to cancel services but was rebuffed again and again, finally succeeding after his recording started trending on Twitter.
You shouldn’t have to break the law to level the playing field in this way. To make transparency the new default, you need more information to be public, not less. But transparency isn’t enough; you also need agency.
Agency encompasses the right of individuals to act upon their data. How easy is it for you to identify the data company’s “default” settings, and are you allowed to alter them for whatever reason you like? Are you able to act upon the data company’s outputs in any way you choose, or are you gently nudged (or forcefully pushed!) toward only some options—mostly the options that are best for the company?
Can you play with the parameters and explore different scenarios to show a smaller or bigger range of possibilities? Agency is an individual’s power to make free choices based on the preferences and patterns detected by data companies. This includes the ability to ask data companies to provide information to you on your own terms.
On a fundamental level, agency involves giving people the ability to create data that are useful to them. Amazon whole heartedly embraced uncensored customer reviews. It didn’t matter to the company if the reviews were good or bad, five stars or one, written out of a desire to gain approval from others or to achieve a lifelong dream of becoming a book critic.
What mattered was their relevance to other customers who were trying to figure out what to purchase. Reviews revealed whether a customer regretted a purchase even though she did not return the item for a refund. These data helped customers decide if a recommended product was the best choice for them. Amazon gave customers more agency. Many marketers talk about targeting, segmentation, and conversion. I don’t know about you, but I don’t want to be targeted, segmented, or sliced and diced. These aren’t expressions of agency. We can’t assume that the leaders of every company will, on their own, embrace the principles of transparency and agency. We must also go beyond these principles: we need delineated rights that help to spell out how to translate transparency and agency into tangible, hands-on tools.
If we can get data companies to agree to a set of meaningful rights and tools, it will lead to what I call “sign flips”—reversals in the traditional relationships between individuals and institutions. Amazon’s decision to let customers write most of the content about products is a sign flip, and the social data revolution will provide many more similar opportunities. As individuals gain more tools to help them make better decisions for themselves, old-fashioned marketing and manipulation are becoming less effective. Gone is the day when a company could tell a powerless customer what to buy. Soon, you will get to tell the company what to make for you. In some places, you already can.
Sign flips are an important element in how physicists see the world. They are often associated with phase transitions, where a change in an external condition results in an abrupt alteration in the properties of matter—water changing from a liquid into a gas when it is heated to the boiling point. The effect on society of the increasing amount of data can be compared to the increasing amount of heat on a physical system.
Under certain conditions—when data companies provide transparency and agency for users—a sign flip will take place that favors the individual over the institution; that is, it will benefit you, not the company, or the company’s chief marketing officer.
We the people all have a stake in the social data revolution. And if you want to benefit from social data, you must share information about yourself. Period. The value you reap from socializing data often comes in the form of better decision-making ability, when negotiating deals, buying products and services, getting a loan, finding a job, obtaining education and health care, and improving your community.
The price you pay and the risks you take in sharing data must at least be offset by the benefits you receive. Transparency about what data companies are learning and doing is essential. So, too, is your ability to have some control over data products and services. Otherwise, how could you possibly judge what you give against what you get?
BALANCING THE POWER
Information is at the center of power. Those who have more information than others almost always stand to benefit, like the proverbial used-car salesman who pushes a lemon on an unwitting customer. As communication and processing have become cheap and ubiquitous, there’s a lot more data—and a lot more risk of substantial information imbalances, since no individual can get a handle on all the data out there.
Much of the data being created and shared is about our personal lives: where we live, where we work, where we go; who we love, who we don’t, and who we spend our time with; what we ate for lunch, how much we exercise, and which medicines we take; what appliances we use in our homes and which issues animate our emotions.
Our lives are transparent to the data companies, which collect and analyze our data, sometimes engaging in data trafficking and too often holding data hostage for use solely on their terms. We need to have some say in how our data are changed, bartered, and sold, and set more of the terms on the use of our data.
Both sides—data creator and data company—must have transparency and agency. This will require a fundamental shift in how we think about our data and ourselves.
Excerpted from “Data for the People: How to Make Our Post-Privacy Economy Work for You” by Andreas Weigend. Copyright 2017. Available from Basic Books, an imprint of Perseus Books, a division of PBG Publishing LLC, a subsidiary of Hachette Book Group.