The basis of modern computing technologies is precise electronic and software systems, where errors are unacceptable. But it doesn’t seem like this is always the case. For example, why is there a difference between the promised and actual hard drive capacity? Is this just a marketing trick of “it’s close enough so they won’t notice”? Actually, there’s solid science behind this problem, with marketing being only one part of the issue.
Byte Prefixes and Magnitudes
One of the two biggest reasons for the difference between the advertised and actual amount of hard drive storage is how large amounts of data are being addressed. Pretty much everyone knows roughly what a gigabyte and terabyte are since most of our information is measured in gigabytes, and most hard drives come in sizes labeled in terabytes. Most people think that similarly to other units of measurement gigabytes would contain a thousand megabytes and a terabyte contains 1000 gigabytes. Some of you may already know that most operating systems see 1 GB, for example, as 1024 MB because computers use base 2 numbers (binary). This system is a part of the JEDEC memory standards.
This is what creates the difference between advertised and actual values of data – advertisers use the decimal 1GB = 1000MB numbering system, while most computers label bytes in the binary 1GB = 1024MB JEDEC numbering system. It’s not simply a difference of 24 smaller units either, as the difference between the two numbering systems builds up, the higher up you go in multitudes.
|Byte prefix||Value in decimal system||Value in JEDEC system||System difference in bytes & %|
|Kilobyte, kB||1000 (103 bytes)||1024 (210 bytes)||24 (2.3%)|
|Megabyte, MB||1000 kB (106 bytes )||1024 kB (220 bytes)||48'576 (4.6%)|
|Gigabyte, GB||1000 MB (109 bytes )||1024 MB (230 bytes)||73'741'824 (6.9%)|
|Terabyte, TB||1000 GB (1012 bytes )||1024 GB (240 bytes)||99'511'627'776 (9.2%)|
|Petabyte, PB||1000 TB (1015 bytes )||1024 TB (250 bytes)||125'899'906'842'624 (11.5%)|
|Exabyte, EB||1000 PB (1018 bytes )||1024 PB (260 bytes)||152'921'504'606'846'976 (13.8%)|
The International Electrotechnical Commission (IEC) tried to resolve the difference in 2002 by creating special prefixes for binary systems. Unfortunately, this system didn’t catch on (probably because of the silly prefixes) and both computers and tech manufacturers kept using the same decimal system in their own specific way. This means that computers are technically wrong when they use the JEDEC binary-but-not-really counting system, where gigabytes actually represent gibibytes etc.
IEC’s special binary counting prefixes
|Kibibyte, kiB||1024 bytes||1'024 (210) bytes|
|Mebibyte, MiB||1024 kibibytes||1'048'576 (220) bytes|
|Gibibyte, GiB||1024 mebibytes||1'073'741'824 (230) bytes|
|Tebibyte, TiB||1024 gibibytes||1'099'511'627'776 (240) bytes|
|Pebibyte, PiB||1024 tebibytes||1'125'899'906'842'624 (250) bytes|
|Exbibyte, EiB||1024 pebibytes||1'152'921'504'606'846'976 (260) bytes|
Regardless of who’s right or wrong here, the different adoption of the same system causes a roughly 10% error in the terabyte-petabyte range, causing the following difference in capacity:
- 1 advertised (decimal) terabyte = 0.909 actual (JEDEC) terabytes
- 1 advertised (decimal) gigabyte = 0.931 actual (JEDEC) gigabytes
All of this discrepancy is neither the drive manufacturers’ fault nor is it the computer engineers’ fault. It’s simply the confusion between two numbering systems that got mixed up, which manufacturers managed to use to their advantage by making their drive look bigger. While it can be used as a marketing trick, there’s no capacity actually missing because of these different numbering standards.
Background Features That Reduce Drive Capacity
While the error between numbering systems makes up around 7% of your “missing” capacity in 1-8 TB range, there are a couple background computing features and basic data that actually use up drive space. They don’t take up too much space, but in certain situations may remove a few gigabytes from your hard drives max capacity.
Shadow Storage and Other Hidden Files
Don’t think that you’re the only one managing data on your hard drive, because Windows, for example, uses quite a bit of your storage for its own extra needs. Windows calls this “shadow storage” and uses it to store temporary files, as well as backups of both recently changed files and the system image. Shadow storage, on average, will take up 5-10 GB of the drive that’s used for Windows system files, which won’t show up in file explorer. However, you can dig around file and drive properties to find the difference between shown and actual used data, or check the precise size of your shadow storage by running Command Prompt as administrator, and typing in vssadmin list shadowstorage.
While this is a specific feature of the Windows OS, there’s no doubt that other operating systems, like Mac and Linux, also use some of your drive’s capacity to do background work, thus reducing the actual capacity of your drive.
Formatting and Data Allocation
Formatting a drive is necessary for the operating system to recognize and address the files on the drive. This requires a decent amount of control data to be stored on the drive, in order for the format to function. Not only that, for the drive to be able to find saved files, each of them has to have an address, which itself is saved as data on the drive. This would be a simple number assigned to a single file, but imagine saving tens of thousands of document, .txt and configuration files on the drive and those numbers will also start making up a decent chunk of the information stored on the drive. Since this is a “behind the curtains” process, the amount of storage that address and formatting data takes up is hidden.
The amount of space that formatting data takes up depends on your usage of the drive, what system it’s formatted for etc., making it a difficult value to estimate. As a historic point of reference, such system files could take up almost a third of a FAT12 DOS format drive (this is undoubtedly way less today). Since this is a “behind the curtains” process, the amount of storage that address and formatting data takes up is hidden from the user.