The core technologies that have built the Unified Storage Device's (USD) success in the marketplace are based on the novel and patented design of its file system. The Write Anywhere File Layout (WAFL) file system always writes new blocks on disk when a file is written, instead of overwriting already allocated blocks. Following are the advantages of this concept:
Taking a Snapshot copy of the system is simple. It requires the copy of the root inode, a new "snapshot bitplane" file describing which blocks are in the active file system, and an updated "bitplane summary" file that is an OR of all the bitplanes for all Snapshot copies. Freeing up blocks when they are no longer allocated in any Snapshot copies is relatively simple. Because data is always written to new blocks, a block only transitions to unused after going back in time. So a binary diff of any bitplane file with the bitplane summary file yields a bitmap of blocks that can be freed.
Writing is much faster. Because writes are always to new blocks, WAFL can aggregate writes into large stripes, instead of having to seek all over the disk for old blocks to rewrite.
Reading is faster. Because of WAFL's write allocation policies, blocks in a file can frequently be clustered near each other on a disk, resulting in fewer disk seeks per read. This happens whether or not the application lays out the file linearly.
The file system is always in a consistent state, eliminating the need for costly "chkdsk" operations
Data can be recovered from Snapshot copies in the case of accidental deletion, and the file system can be reverted to a former state easily.
The USD uses battery-backed non volatile RAM (NVRAM) to store incoming writes, as well as buffering a copy in main memory. It returns success to the application when the data is safely on NVRAM, so write latencies are much smaller than in normal file systems. Every 10 seconds, or sooner if things are busy, WAFL does a "consistency point" (CP), and initiates a "write episode". The file system proceeds to write all the data since the last CP to disk, and at the very last moment, a new root inode is written. At that point, the changes are committed and the file system is in a consistent state. Contrary to popular misconception, write episodes are not staged out of the NVRAM, which would make them slow. The NVRAM is used as a redo log, and is never read except when there has been a crash since the last CP. Incoming data is kept in main memory as well, which ensures there is no single point of failure, and staged out of main memory to disk. The fact that NVRAM is only a redo log also means that less data needs to be stored, especially for small writes.
A Snapshot copy is basically a "promoted" CP, so Snapshot copies represent consistent views of the file system as well. It's important to understand that each additional Snapshot copy uses almost no extra space, so long as the file system is not changing rapidly between Snapshot copies. It's also important to distinguish between USD copy-on-write Snapshot copies and traditional copy-on-write Snapshot copies, which also don't consume extra space until a write happens. Those are called "copy-out-on-write" Snapshot copies herein. Copy-out-on-write Snapshot copies require the file system to copy out the block that's about to be written and do bookkeeping about where it went, then write the changed block. When a file is written in WAFL, it just gets written to a new block as always, a much more efficient process, because only one block is written, not two (with corresponding parity costs). The WAFL scheme also does not require changes to Snapshot copy file systems, where copy-out-on-write schemes require updates to the inode blocks in every Snapshot copy the first time a block is changed.
SnapRestore essentially reverses the process by which a Snapshot copy is taken, and turns a Snapshot copy into the live file system (all Snapshot copies newer than the one being reverted to are lost). This makes it possible, for instance, to revert a corrupted database in seconds or minutes instead of the hours watching tape reels spin that are usually required.
SnapMirror copies Snapshot copies asynchronously to a remote USD over Ethernet, and mirrors changes in that file system back to the master as well. If a disaster occurs, the remote USD may be made the master copy of the mirror. The interval of asynchrony is not that long (minutes). Because most businesses can survive a loss of 5 minutes worth of written data, this is the most widely used Disaster Recovery (DR) feature of the OSD.
Some businesses with really critical data integrity needs, however, such as banks, use SyncMirror, a synchronous mirroring product, along with MetroCluster to guarantee that all transactions are safe. In such a setup, two clusters are set up at least 100 km apart, and each contains a synchronous mirror of the other's data. This configuration has obvious performance limitations with respect to write latencies, but this cost is acceptable to a business that could conceivably lose a multi-billion dollar transaction during a disaster, were such precautions not taken.
Of particular interest to those deploying compute grids, FlexClone images are writable Snapshot copies that can be used to provision diskless workstations, among other uses. A single golden image can be used to boot hundreds of servers, with no additional storage used except what they write back to their OS image or need for paging space.
The NearStore product line, which used cheaper ATA disks, has now been merged back into the main USD product line, and is now simply an option one chooses when selecting a system. SATA drives are primarily used for online backup of both USD data, and of data stored in other host file systems. This allows consolidation of data from various systems into one location and reduces the need for tape. The NearStore technology also provides for block-level deduplication (ASIS), providing for further storage efficiencies. Due to the reduced duty cycle times generally afforded by ATA drives, NearStore configurations are not recommended as primary data stores, but do get used that way by some customers.
NearStore configurations generally run SnapVault, which is a technology very much like SnapMirror that offers different capabilities, reduced synchronization between source and target, and a lower price tag. It is designed for online backup environments. RAID and WAFL protection make Nearline storage safer than storage on tape. Although tape might be necessary for long-period archival or remote offsite storage, Nearline storage can be used as the source for the tape dump, freeing the primary USDs of that load. Additionally, weekly and monthly Snapshot copies can be kept on the Nearline store, resulting in reduced frequency of tape backups. This reduces the need to buy new tape, a major expense at large shops.
A version of the NearStore product suite runs as a virtual tape library (VTL), with throughputs up to 2,400 MB/s. This allows data centers to move some or all of their backup activity to Nearline storage without disrupting existing backup practices, because the device emulates exactly the hardware and software signature of a large tape jukebox.
The MultiStore product allows for virtual OSDs, called vFiler units, to be provisioning on a single physical system. Each vFiler runs in its own administrative domains. This allows storage administrator to maintain control of physical OSD resources and farm out administration of the data, users, and applications running against the storage on a per-vFiler basis. This is commonly used to segregate department-level resources in large companies without incurring the wasted space and IT resource that allocating a separate physical server to each department can entail. vFiler units are not completely virtualized storage systems, but a large majority of OSD functionality is available on each vFiler.
The SnapLock product is designed to help customers comply with regulatory and compliance requirements, such as Sarbanes-Oxley and HIPPA. A SnapLock volume is a volume that can be mounted through any standard file protocol, but the atime stamp on the file is treated differently. When the administrator sets the atime on a file, that file cannot be deleted until the clock time is equal to or greater than the a time. Provisions for protecting the clock, as well as stringent auditing provisions, are included in the product suite, which has several levels, depending on the stringency of the compliance requirement.
The USD can be used in a gateway configuration as the V-Series product, which can currently serve data from all major storage array vendors' products. When used in such a configuration, V-Series treats LUNs exposed by the foreign arrays as disks. However, because the foreign array is presumed to be already offering RAID protection, the disks are striped in RAID 0 fashion, and no parity is calculated or stored.
Other characteristics of the USD and operating system include the following:
A remote management module that allows for remote problem resolution and advanced monitoring.
An AutoSupport feature that once a week sends a customer-chosen subset of data about the unit to the vendor for analysis and predictive maintenance.
An OS component called FlexShare that allows fencing of applications into performance domains, so critical loads get priority.
Integration with all major tape backup systems and applications, using NDMP (Network Data Management Protocol).
An integrated quota management system.
Full integration with all major NFS, CIFS, HTTP, and WebDAV clients, and with all major Fibre Channel and iSCSI initiators.
Related Topics