Tim Harford's talk on the perils of big data at the Royal Statistical Society (RSS):
Here is a short summary of one of the main arguments:
Hidden biases in data are a problem. Even the largest of datasets have bits of information missing. Quoting Microsoft researcher Kate Crawford, Harford said one might think they have all the data, but there will always be people missing from any dataset.
To illustrate this, Harford pointed to the City of Boston's Street Bump smartphone app - a clever idea to tackle the problem of potholes. Bostonians were encouraged to download the app and set it running when out in their cars so that when their vehicles hit a pothole, the bump would be recorded by the phone's accelerometer and location data sent to the city's public works department. What happened, of course, was that most of the potholes that were identified and fixed were those in young, affluent areas - areas where people owned smartphones and could download the app.
City officials might have thought they had found a way to record every pothole, but that wasn't the case. As Harford concluded: "Some might think we are now able to measure everything; that we can turn everything into numbers. But we need to be wise enough to know that is always an illusion."