bill bennett

journalism + new media

Data storage: Most of it is junk

with one comment

Computerworld reports global data grew by 281 exabytes in 2007.

An exabyte is one billion gigabytes. So this adds up to 800MB of data for each of the world’s 6 billion people. That's equal to a stack of books piled some 30 metres high.

A lot of information.

Or maybe not. Storage experts believe anywhere from 80 to 90 percent of stored data is junk.

In 2002 I spoke to Rob Nieboer, who at the time was StorageTek’s Australian and New Zealand storage strategist. He said the vast bulk of data stored on company systems is worthless.

He said, “I haven’t met one person in the last three years who routinely deletes data. However, as much of 90 percent of their stored data hasn’t been accessed in months or years. According to the findings of a company called Strategic Research, when data isn’t accessed in the 30 days after it is first stored there’s only a two percent chance it will get used later.”

At the same time many data files are stored many times over in the same file system. Nieboer says it is not unusual for there to be as many as 15 separate copies of the same file in a single system.

Storage Parkinson’s Law

Rosemary Stark ( interviewed in 2002 when she was Dimension Data’s national business manager for data centre solutions), says storage obeys Parkinson’s Law.

She said, “It’s a case of if you build it, they will come. Put together a system with 2GB of storage and pretty quickly it will fill up with data. Buy a system with 200GB of storage and that will also fill up before too long.”

Like Nieboer, Stark said there’s a huge problem with multiple copies of the same information but she estimates the volume of unused archive material is closer to 80 percent.

But she said 80 percent isn’t all junk. “It’s like the paper you keep on your desk. You don’t want it all, there may be a lot you can safely throw away but there are things you need to keep just in case you need them again later.”

Needles and haystacks

Although many companies focus on the  cost of storing vast amounts of junk information, there’s a tendency to overlook the performance overhead imposed by unnecessary data. In simple terms, computer systems burn huge resources ploughing through haystacks of trash to find valuable needles of real information.

There are other inefficiencies. Stark said she has seen applications, for example databases, with say, 300 Terabytes of storage even though the real data might only be 50 Terabytes.

This happens when systems managers set aside capacity for anticipated needs. This is like a mother buying outsize clothes on the grounds her child will eventually grow into them.

Nieboer said there are inherent inefficiencies in various systems.

Mainframe disks are typically only 50 percent full. With Unix systems disks might only be 40 percent full, with Windows this falls to 30 percent.

Written by Bill Bennett

April 11th, 2009 at 3:18 pm

One Response to 'Data storage: Most of it is junk'

  1. [...] as I've previously reported, almost all the data stored around the world is worthless junk. Experts say as much as 90% of [...]

Leave a Reply