Try using a sledgehammer to pack 15 pounds of potatoes into a bag with a 5-pound capacity, and what do you end up with? Too much messy and disgusting material crammed into a vessel too small for the job and a lot of sloppy over-spill.
Unless you have the right sledgehammer. What does this have to do with computer storage? Plenty. And it all starts with a new data-reshaping capability of Seagate’s SandForce DuraWrite technology. Keep the numbers in mind: 15 pounds of data in a 5-pound bag. Or three units of data in a space designed for one. Its key. More on that later.
What does DuraWrite technology do again?
My blog Write Amplification (Part 2) talks about how DuraWrite data reduction technology can make more space for over provisioning. That translates into faster data transfers, longer flash memory endurance and lower power draw. What it has NOT done is increase the space available for data storage.
DuraWrite Virtual Capacity is like the Dr. Who TARDIS
Excuse my Dr. Who phone booth reference, but for those who know the TARDIS, it provides a great analogy. For those unfamiliar with the TARDIS, it is a fictional time machine that looks like a British police call box. It is very small by external appearances, but inside it is vast in its carrying capacity, taking occupants on an odyssey through time.
DuraWrite Virtual Capacity (DVC) is a new feature of our SandForce SSD controllers, and its a bit like the TARDIS. While there is no time travel involved, it does provide a lot more than can be seen. DVC takes advantage of data entropy (randomness) as data is written to the SSD. Some people like to think of it as data compression. Whatever you call it, the end result is the same — less data is written to the flash than what is written from the host. DuraWrite technology alone will increase the over provisioning of an SSD, but DVC increases the user data storage available in an SSD (not the over provisioning).
How much more space can it add?
The efficiency of DVC is inversely related to the entropy. High entropy data like JPEG, encrypted and similar compressed files do little to increase data capacity. In contrast, files like Microsoft Office, Outlook PST, Oracle databases, EXE, and DLL (operating system) files have much lower entropy and can increase usable storage space on the order of two to three times for the same physical flash memory. Yes, I said two to three times. Better still, that translates into a two to three times reduction in the net cost of the flash storage. Again, no typo: two to three times more affordable. Since most enterprise deployments of flash memory are limited by the cost per GB of the flash, this kind of advance has the potential to further accelerate flash memory deployments in the enterprise.
Why hasn’t this been done in the past?
Think TARDIS again. Step into the booth, and take a joy ride through space and time. It happens on-the-fly. Simple but, only in a fictional world. With on-the-fly data reduction and compression, the process is filled with complexities. The biggest problem is most operating systems don’t understand that the maximum capacity of a primary storage device (hard disk or solid state drive) can increase or decrease over time. However, open source operating systems can address the issue with customized drivers.
The other problem is any storage device that includes data reduction or compression must use a variable mapping table to track the location of the data on the device once data is reduced. Hard disk drives (HDDs) do not require any kind of mapping table because the operating system can write new data over old data. However, the lack of a mapping table prevents HDDs from supporting an on-the-fly data reduction and compression system.
All solid state drives (SSDs) using NAND flash feature a basic mapping table, typically called the flash translation layer (FTL). This mapping table is required because NAND flash memory pages cannot be rewritten directly, but must first be erased in larger blocks. The SSD controller needs to relocate valid data while the old data gets erased. This process, called garbage collection, uses the mapping table. However, the data reduction and compression system requires a mapping table that is variable in size. Most SSDs lack that capability, but not those using a SandForce controller, making SSDs with SandForce controllers perfectly suited to the job.
What use cases can be applied to DVC?
DVC can be used to increase usable data storage space or provide more cache capacity flexibility by two to three times. To create more usable data storage space, the operating system must be altered with new primary storage device drivers for it to understand the drives maximum capacity, which can fluctuate over time based on how much the data is reduced or compressed.
To support greater cache capacity flexibility, a host controller would manage the flash memory directly as a cache. The controller would isolate the flash memory capacity from the host so the operating system does not even see it. The dynamic cache capacity would increase cache performance at a lower price per GB depending upon the entropy of the data. The Seagate Nytro product line and some SandForce Driven program SSDs already support both of these use cases.
When will this appear in my personal computer?
While DVC is already being deployed and evaluated in enterprise datacenters around the world, the use in personal computers will take a bit longer due to the need to have the operating system changed with new storage device drivers that understand the fluctuating maximum capacity.
When these operating system changes come together, you will not need that sledgehammer to pack more data into your TARDIS (SSD). Now that’s a space odyssey to write home about.