Data for good

Dino Pedreschi, Professor of Computer Science at the University of Pisa and co-lead of KDD Lab - Knowledge Discovery and Data Mining Laboratory, talks about forecasting, crisis response, customization and personal data.

Share Discuss


Whenever we have data that are good enough to understand the phenomena, we can build a model of this phenomenon and eventually predict it. This can be true for soccer matches, this could be true for even what we'd be dreaming tonight. If we have enough observation of our mind, of the neuroscience, and cognition phenomena that occur neurological and cognition phenomena that occur in our brain, and we have a full account of what we saw and imagined in the last few days, maybe it would even become possible to predict what will we dream tonight.


It's a matter of observation. If we have good enough information we can become predictive. Maybe we can predict the next economic crisis before it happens, we have not being able to do so for the moment, meaning that we don't have yet good enough data and good enough theories for that. Does this mean that we don't need expert anymore, that we don't need humans anymore, in the cycle? No, this is absolutely false. It’s a hype that is totally misleading.



Advertising has become really the major industry, the first big industry for Big Data. Essentially with the same idea, right? Create, learn, again artificial intelligence, learn from the personal data of each user, what are the preferences, in order to make individualized marketing offers to people. It doesn't matter whether we are talking about clothes, we are talking about travels, we are talking about news article to read, or interesting posts to follow.


The logic is always the same. We build your profile, we know your interest, we try to organize a platform, a commercial or advertising platform so that you are offered the kind of services and items that you would most probably like, because you're most interested into. This is the queen application of big data so far. But, I believe that if in 20 years from now the people in the future will look at us in this moment, will actually think, well these people were really crazy, they had discovered such a huge energy source, such a big power and they were using it only for a minor aspect. I mean an important aspect, but not the only aspect of society.


This is, in my opinion, entirely true. Already in the last 10 years, have emerged many other possible uses of this information, secondary usage of this information. Secondary means usage for which the data were not or regionally planned or collected. For instance, you can take the GPS tracks left by cars, equipped with black box navigation devices, that have been installed on your car for insurance purposes, because you subscribed a “pay as you drive” insurance product, but as a side effect you are delivering to the company your very, very detailed trips that you do every day with your car. And by putting together all these trips over the city scale, or a regional scale, you can create an amazing picture of traffic and mobility.


How our trips impact on cities, what kind of time table we use to arrive or leave the places where we go to work and study. What is the geography of the places, that we describe simply by moving around, doing our everyday effect. You can create statistical information like demographic information and not with sensors anymore, but with mobile phone data that can be used to have a very precise idea of how many people are in any place, and also how many commuters will go from a city to another, how many visitors will visit a certain area, and how could we organize a rescue and emergency response in case of natural disasters.



It is certainly true that, up to now, the most widespread business model for big data is being developed by a big corporation, big web corporations that were able to serve amazingly large user bases, up to a billion of users, and centralize data about users that allowed such companies to deliver very intelligent and very pervasive products. This is not the only possible model, and actually is a kind of primitive model for data. Which is, in a sense, following a game, the path that is being common in history for many new resources.


Imagine land in the Middle Ages, or industrial capacity in the 18th century. There is an initial phase, where a few monopolists or a few landowners, are essentially pushing the new the new economy, the new industrial revolution towards a new path, but eventually, the very true part of this revolution is also to distribute much more in a broader sense in society, so that the new revolution can really flourish, be it an agricultural evolution, or an industrial revolution, or, like we are living today, a digital revolution based on digital information. So while it is certainly true that, right now, we have big centralized monopolies for big data, it is also true that the picture is rapidly changing, towards a much more distributed ownership and distribution of information.