Articles about data and storage were big last week. It started with this
IDC paper forecasting that the amount of digital information we generate will exceed our ability to store it this year. Om Malik then observed that this is
despite amazing improvements -- greater than 100% per year currently -- in disk density and capacity over the last 50 years. That's growth that's even faster and more disruptive than Moore's Law underlying chip improvements.
Mike Gunderloy at WebWorkerDaily sees opportunity in this new explosion of data, with knowledge workers becoming
the new magpies who pick through the huge pile of shiny data we are all creating [Ed. note: Mike gets the Blackfriars Oscar for best metaphor this year]. But while I love the metaphor, the report itself has none of the same gravitas of
prior sizings done at Berkeley, largely due to absurd statements like this one in the executive summary:
"In 2007 the amount of information created will surpass, for the first time, the storage capacity available."
Now consider the meaning of that statement for a moment. As a society, we're generating more information. Granted. But we can't store it? How did it get here then? Where did it come from? IDC has a lovely graph illustrating what they mean as shown below:
Note that the actual crisis The problem, though, is that it ignores what I see is the real reason that storage is growing so fast:
Most of our storage growth is due to information we don't want to keep or copies of existing information, not new information.Now to give IDC its due, they do mention this little detail in the full paper as follows:
About one quarter of the digital universe is original (pictures recorded, keystrokes in an email, phone calls), while three quarters is replicated (emails forwarded, backed up transaction records, Hollywood movies on DVD).
A majority of these bits represent images, both moving and still. This is because one digital camera image can generate a megabyte or more of digital information, and video or digital TV can generate a dozen megabytes per second.
....
Not all of the bits in the digital universe will necessarily need to be stored - such as digital TV signals we watch but don't record, Web pages that disappear when we turn off our browser, or voice calls that are made digital in the network backbone for the duration of a call. On the other hand, we may want to store them. Personal video recorders and set-top boxes may store them temporarily, anyway; whether we program them to do so or not.
Much of the information crisis IDC highlights is data we may (and probably should) throw away, or copies of information that we already have. I, for one, don't ever want to see the Head-On commercial ("Apply directly to the forehead!") ever again, so that's a terabyte or two of video saved right there.
So what should we take away from the IDC report? Well, how about the fact that we are making more and more copies of data. And we're doing that because while processing and storage are growing like gangbusters, Internet bandwidth to the edge of the network is growing at about half that rate. I graphed this trend about five years ago when I was at Forrester Research; I've reproduced the chart below:
Looking at the chart, we can see that processors get about 50% faster each year, disk storage gets about 62% bigger each year, yet the bandwidth to our desktop improves only about 27% a year. That means our devices are growing more storage because they can't get to the information stored on the Internet quickly enough and therefore have to make copies of that information to make it useful to consumers. Said another way, we don't have enough wireless broadband to stream music to us everywhere, so we buy ipods that store it. We don't have access to every movie we want on an airplane, so we bring movie copies on DVDs. We don't have enough bandwidth to our homes to provide us with on-demand versions of every TV show we want to watch, so we store TV shows on our TiVOs. Storage and bandwidth are a tradeoff -- and at the moment, storage growth is making up for our paucity of consumer network bandwidth.
So is the explosion of information a crisis? Yes, but three quarters of that crisis is caused by not having ways to get information to where it is needed. Jonathan Schwartz, CEO of Sun Microsystems, recently said in a speech that it was faster to send a petabyte of data from San Francisco to Hong Kong by sailboat than through the Internet. We can assemble the petabyte of data easily. We can't deliver it where it is needed easily. And that's the true information crisis.
Technorati Tags: Bandwidth, Broadband, IDC, Information, iPod, Opinion, Storage, Technology