Note from the Editor:
The above graph compares the results from 2007 poll with the results
from a similar
2006 KDnuggets Poll on largest database data-mined.
We see growth in the top end. In 2007, 37 or 22% of respondents reported mining databases of 1 terabyte or more, about double of 11.5% who dealt with terabyte-size databases in 2006.
Also, the median of largest database size mined in 2007 was in the 30-60 GB range, while for 2006 it was in the 2-4 GB range, a growth of 1 order or magnitude !
Will Dwinell, Measuring Data Size
Measuring the size of the data is tricky, since:
- A given set of data may be stored more or less efficiently. This can
make several orders of magnitude of difference.
- Measured size will vary, depending on whether data in multiple tables
is joined or not.
- The original data set may be very much larger than one which is
actually digested downstream, giving activities like sampling and
TimManns, largest database analysed
The largest single database table I access is 900GB (62 days of data, 70
million rows per day), but I commonly access data from several different
tables. The sum of these tables is probably several terabytes.
The conclusions of my analysis (churn, cross-sell, fraud detection etc)
is usually a single flat table of approx 1GB, which is stored for a