The moral problems of Big Data

Cory Doctorow linked to a great article about the civil rights implications of data collection. By the way, data collection is totally a civil rights issue.  Alistair Croll explains,

“Personalization” is another word for discrimination. We’re not discriminating if we tailor things to you based on what we know about you — right? That’s just better service.

There's a lot of information you can get out of the amount of data that corporations gather about their customers -- and a lot of ways that information can be used in damaging ways.  There was a case in which Target accidentally outed pregnant teens to their families by mailing them personalized catalogs close to entirely about things like baby carriages and diapers.

Croll raises the issue of that sort of information being figured into issues like bank loans or housing.  That's a big problem -- it means existing trends of social dysfunction will implicitly get reinforced.

If I collect information on the music you listen to, you might assume I will use that data in order to suggest new songs, or share it with your friends. But instead, I could use it to guess at your racial background. And then I could use that data to deny you a loan.

It doesn't even matter if they actually try to guess your race.  If the trends among fans of a particular band is that they're less likely to make their loan payments, then being part of a particular musical subculture can unfairly affect your ability to get loans.  And musical taste often does break down along the traditional lines of discrimination -- race, gender, sexuality.

Eli Pariser discussed this issue in his book, The Filter Bubble, which explores a huge variety of the ways in which the massive amount of data companies gather about us is potentially (and often practically) a very bad thing.

Ideally, citizens on the internet need to be empowered to decide how their data is used.  But Croll points out that it's a lot easier to say that's a good thing than to actually make it happen:

The only way to deal with this properly is to somehow link what the data is with how it can be used. I might, for example, say that my musical tastes should be used for song recommendation, but not for banking decisions.

Tying data to permissions can be done through encryption, which is slow, riddled with DRM, burdensome, hard to implement, and bad for innovation. Or it can be done through legislation, which has about as much chance of success as regulating spam: it feels great, but it’s damned hard to enforce.

Croll calls it the civil rights issue of our generation. I think LGBTQ rights still tops it for urgency, and none of the old civil rights problem are really gone, entirely, but he's right that this is a massive issue, and it needs more attention.  Organizations with a lot of power have a bad record for looking out for the rights of the people they have that power over.