I have recently read an interesting article from the Economist entitled “Data, data everywhere”. The author has regrouped some interesting (and impressive) figures regarding amount of data. I learned that astronomy is certainly the domain where most data are generated:
“WHEN the Sloan Digital Sky Survey started work in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy.“
An important conclusion is that it will soon be needed to preprocess data before storing them. This is a necessity since the increase in data is bigger than the increase in storage capacity. One of the issue, according to me, is how to select or aggregate useful data before using them. Since you don’t always know in advance how you will use the data, it may be very difficult to perform this preprocessing to save disk space. Any thought on this topic?
Read the full article from the Economist.