2014/07/12

RAID++ and Storage Pools: Leveraging GPT partitions for Asymmetric Media Logical Volumes. Pt 1.

This is an exploration of addressing Storage problems posed by directly connected or (ethernet) networked drives, not for SAN-connected managed disks served by Enterprise Storage Arrays.

The Problem

One of the most important features of the Veritas Logical Volume Manager (LVM) circa 1995 was the ~1MB disk label that contained a full copy of the LVM information of the drive/volume and allowed drives to be renamed by the system of physically shuffled, intentionally or not.

Today we have a standard, courtesy UEFI, for GUID Partition Tables (GPT) of Storage Devices supported by all major Operating Systems. Can this provide similar, or additional capability?


LVM didn't just solve problems of providing Data Protection with RAID, Performance Tuning by spreading IO's across spindles and creating larger volumes than were available from single drives, but allowed "live" changes that were previously difficult, error-prone and often time consuming with "stop, reconfigure, rebuild, reboot". Viz:

  • "just in time" dynamically growing volumes and filesystems, reducing "orphaned" and wasted space from fixed, pre-allocated volumes. Prior to this, it was very easy to run out of filesystem space while across the whole system, more than enough space was available.
  • On-the-fly replacement and upgrade of hard disks, by adding new drives as mirrors of physical drives, then removing the old drives from the mirror, once the data was synced.
Computers have migrated out of the back-office, off our desktops and into our pockets to be mobile and constantly with us. They are rapidly becoming both pervasive and invisible. Alongside this, there's been a revolution in Storage, affecting most systems from small (tablets, notebooks, desktops to large servers running Virtual Machines.

Users know that devices are expendable, fungible and replaceable, but data is everything. It's not unreasonable for mobile phone users to demand loss-free Storage and by extension, expect that of every system or service they use. Hard Disks were invented in 1956, the IBM 305 RAMAC, replacing Magnetic Drums as large, fast external Storage and ushering in the current era of Storage.

There are now many choices for Storage Media (PCI-Flash, SSD's, 2.5" HDD's, 3.5" HDD's), different performance & reliability versions of each (Enterprise or NAS Flash and HDD's, fast (10,000RPM) or slow/energy-saving (5400RPM) drives, large 3.5" drives (Shingled Magnet Recording (SMR), 5/6 platter and Helium-filled) or 2.5" drives with 1-, 2-, 3- or 4-platters), and a large choice of connection methods: SD-Card, USB, Thunderbolt/PCIe, SATA/eSATA, SAS and increasingly, ethernet. All of which support removable drives.

A single system may have all or most of these options connected at one time or another, and within even small operations, all media and connection types and can be expected.
Other media, such as Optical Disk and Enterprise Tape, do exist, but don't commonly support GPT's.

All of which needs to incorporate & transparently interoperate with "Cloud" storage and backup services. Both large Drive Vendors, Seagate and Western Digital, are releasing new drives with native Ethernet interfaces, forcing more changes in the world of Storage.

There are now also many more Storage system operations expected and commonly demanded and used than in 1995:
  • Snapshots
    • including "differences from a base snapshot", useful in provisioning VM images from a common base.
  • Replication
    • e.g. "rsync"
  • Clones
  • Versioning and associated Repositories
  • Migration, especially for VM images between physical hosts.
  • Continuous Data Protection (CDP)
    • For Disaster Recovery (DR) and Business Continuity, a near current off-site copy of the Storage, accessible by standby servers
  • Backups
    • Full, Incremental, Differential
    • Bare-metal recovery
  • Archives
It's no longer enough to reply on simple backups or a single Storage Array for an organisations Data Holdings, it's now necessary to catalogue and index all the media and data, preferably in a single place (i.e. Database) that is directly User Accessible. It's expensive and time-consuming forcing Restore requests to be serviced manually.

"Advanced" features are now demanded at all levels:

  • Compression
  • Long-range Compression & DeDupe (lrzip)
  • De-Duplication
  • Encryption, public-key based

No comments: