Petabytes [worth of data] allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.Or no, because correlation is NOT enough and we shouldn't be satisfied with that sort of relationship? Isn't that the human folly we must overcome in the pursuit of scientific discovery: the need to fit everything to a pattern, to force one where none exists? Doesn't chaos theory explain the problem with pattern-based science? A break in the pattern at the smallest levels mean there can't be a pattern on a higher order.
I wonder if this piece is satire when I read the following:
The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.Is this a modest proposal on the abandonment of scientific method in favor of just writing stuff down? Does writing stuff down teach us anything?
If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.
Update: This comment is the closest to a rational support I can sort-of understand -
To me this "Google" approach is a very exciting extension to the cycle of knowledge.I still find it, however, potentially very dangerous to abandon method to data madness.
Today's research is largely hypothesis-driven, which totally relies on pre-existing knowledge. The problem here: In this way nothing new can be discovered, because the answer is always already formulated in the question.
Then there is the non-hypothesis driven approach. Most great discoveries were actually INITIATED in this way. Darwin had no hypothesis in mind when he started his investigations. Mendel had no hypothesis to test. Nuesslein-Volhard had no hypothesis to test. Newton also had no hypothesis- how could he??
Non-hypothesis driven science starts with an idea or theory, and, after doing the right experiments/data analysis, allows one to then formulate a hypothesis which can then be tested and support the theory. The problem with the non-hypothesis driven approach is that one still only can find things out about what one can formulate and think of (after Wittgensteins: "The limits of my words are the limits of my world").
Then there is a third approach, which (not surprisingly) is not at all considered part of the cycle of knowledge: it is the accident. The list of accidental discoveries is long, famous examples being Radioactivity, Penicillin, and (closer to my own research in developmental biology) the Spemann organizer. The researchers had something completely different in mind, when either the outcome of the experiment surprised them or something went wrong.
In these cases there was obviously no theory at hand. The researchers were just sufficiently awake to formulate one and start the cycle of knowledge.
And this is where the “Google” approach (or whatever Anderson calls it) kicks in. The only thing this Google approach does is to force the luck by sifting though these gargantums of data and providing previously unthought-of correlations and- based on those- come up with theories. This is perfectly valid and will surely lead to many many new discoveries which will allow us to formulate new Theories.
Very excitingly to me, this is approach allows to integrate “luck” into the cycle of knowledge and provides a rationale means of entering it without any preconception. It surely does not put the “old-fashioned” theory and hypothesis aside.
And back to my original point with this post on confounding variables. I think that Anderson's original piece, like intelligent design, sounds so cozy and easy that it's a real danger to scientific progress. It's the second biggest danger. The first is America's lack of emphasis on math and science education.