We store zettabytes of rubbish data

Last year the world created or replicated 64.2 zettabytes of data. The number comes from IDC, a market research firm (but the original document is no longer online).

The figure is remarkable considering three years earlier IDC was forecasting the 2020 number would be 44 zettabytes.

A zettabyte is a trillion gigabytes.

In part IDC puts the faster growth down to the Covid-19 pandemic: a “…dramatic increase in the number of people working, learning, and entertaining themselves from home.”

Ephemeral data

IDC says: “…less than 2 per cent of this new data was saved and retained into 2021 – the rest was either ephemeral (created or replicated primarily for the purpose of consumption) or temporarily cached and subsequently overwritten with newer data.”

Between now and 2025 the amount of data is set to grow at a compound annual rate of 23 percent.

The fastest growing source of data is the Internet of Things, not including surveillance video cameras. Social media is the second fastest growing source.

Growing faster than we can cope with

IDC says the amount of data generated is growing faster than our capacity to store data. The world had around 6.7 ZB of storage and that is growing at 19.2 per cent year on year.

Which means we save less and less of the generated data.

This is less of a problem than it might appear because a large fraction of data is useless. A decade ago experts found as much as 90 per cent of stored data was rubbish. It can include empty files, duplicates… or many multiple copies of identical files and temporary files that were never deleted.