Data, data everywhere

May 5, 2010 by Sandro Saitta
Filed under: Uncategorized 

I have recently read an interesting article from the Economist entitled “Data, data everywhere”. The author has regrouped some interesting (and impressive) figures regarding amount of data. I learned that astronomy is certainly the domain where most data are generated:

WHEN the Sloan Digital Sky Survey started work in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy.

An important conclusion is that it will soon be needed to preprocess data before storing them. This is a necessity since the increase in data is bigger than the increase in storage capacity. One of the issue, according to me, is how to select or aggregate useful data before using them. Since you don’t always know in advance how you will use the data, it may be very difficult to perform this preprocessing to save disk space. Any thought on this topic?

Read the full article from the Economist.

No TweetBacks yet. (Be the first to Tweet this post)
  • Share/Bookmark

Comments

2 Comments on Data, data everywhere

  1. Sumit on Thu, 6th May 2010 1:03 pm
  2. Hey,

    I agree, that the problem of information overload has rise significantly.
    And the major problem comes in data preprocessing as the data is available in different sources and different forms (structured, semi-structured and unstructured)
    Recently i got to know about an Informatica ETL tool. I think this tool could be useful in preprocessing of structured data from may different sources.
    It lets you know the different properties of data automatically.
    For example: for each column it tells us the percentage of unique values, null values, its inferred data type, simple patterns present for each column, statistics (max length, min length, top 5 values, bottom 5 values) and many such information.
    It also lets you automatically store data in databases in structured format.

    I think it could be a very helpful tool for data-preprocessing.

  3. Sandro Saitta on Mon, 10th May 2010 12:07 pm
  4. @Sumit: Thanks for your input. Feel free to share if you have feedback about other ETL tools.

Tell me what you're thinking...





  • Data Mining Search Engine

    Supported by AnalyticBridge

  • T-shirts, Mugs & Mousepads

  • Archives

  • Reading Recommandations