By Erik Ottem, Director of Product Marketing, Data Center SystemsWestern Digital Corporation and Vivek Tyagi, Director for India business development, SanDisk brand Commercial sales and Support at Western Digital Corporation
Abundance of Data
To accommodate theoverwhelming amount of data that is being generated by billions of people, using millions of applications, on billions of PCs, smartphones and mobile devices, data centers around the globe are changing. The abundance of data, in many cases, needsto be stored and manipulated so that valueand intelligence can be extracted from it. In many cases, it is required in real-time, enabling an organization to achieve better customer service, internal and external support,decision-making, and more– all of which can improve revenue and the bottom line.
The use of analytics, analytical reporting and historical trending on large datasets, as well as the ability to perform active archiving across multiple storage tiers, are two examples as to why organizationswant to retain their large datasets on a longer term basis. Their data has become their most valuable asset, and as such, pressures are placed on IT organizations to ensure that the data they store and protect is always available when required, and accurate when read.
Unfortunately, traditional methods such as off-site archiving or tape library recoveryare no longer effective or in many cases, even allowed by today’s businesses given the disruptions in service and performance slow-downs they cause. With these demanding storage requirements, IT organizations are moving towardhigher levels of data durability (long-term data protection) through more robust technologies, such as Object-Based Storage (OBS).
Object-Based Storage
Object-Based Storage is an alternative storage architecture (versus traditional NAS or SAN-based architectures) that managesdata as objects. Whether a document or presentation, audio or video file, image or photo, or other unstructured data, the entity is stored as a single object in OBS. Since the object also includes metadata, the need to structure the data in a hierarchy is no longer required. Instead, the object (data and metadata) isplaced in a flat address space (single namespace), which simplifiesdata accessover traditional processes.
In NAS file-based architectures, data is stored in a folder that must be traversed each time it is accessed. In SAN-based architectures, the data is stored as a collection of block-based sectors with unique addresses. In either case, there is little or no information about the data stored that can simplify manageability or support increasing amounts of data at scale.
In an OBS architecture, a unique identifier is assigned to each object making it easier to index and retrieve data, find a specific object (such as a video or photo), or even leverage data analytics or other discovery techniques for a large volume of data at scale.
What is Data Durability?
Data durability is all about long-term data protection. Traditional storage architectures typically use RAID (redundant array of independent disks), which in most configurations, encodes parity for data recovery in the event of a drive failure and spreads the blocks across multiple storage devices (such as HDDs and SSDs). This approach protects one or two of the storage devices in the array from losing data if a drive failure occurs by rebuilding the data that was stored on the failed drive. A drive rebuild not only causes performance drops during the operation, but the probability of other devices failing in the group places data integrity at risk. RAID rebuild times can take hours, or even weeks, and if anunrecoverable read error occurs during the rebuild, data can be permanently lost, placing the businessat risk.
Conversely, data durability in OBS is achieved through erasure coding – a technique that combines data with the parity information divided into chunks and distributed across the local storage pool. While RAID rebuilds a disk drive, OBS recovers data independent of the device. OBS solutions available in the market are further advanced through dynamic data placement in their advanced erasure coding that takes erasure coding one step further – it adapts to drive or network failures by continuing to write objects and seamlessly adopts additional storage upgrades. There is typically no rebuild time nor degraded performance, and no urgency to immediately replace failed devices at the time of the read error – they can be replaced when it is convenient.
An additional concern relating to long-term data access is protecting chunks of data as they lie dormant on storage devices in enterprise storage environments. Simply protecting against device failures (as is the case with RAID), does not protect against gradual failure of the bits stored, such as bit rot, where a portion of the storage device becomes unreadable, corrupted, orimpossible to retrieve in its original, unaltered form.
OBS solutions protect against bit failures, like bit rot, so if a given chunk were to become corrupt, a replacement chunk can be constructed from the parity information stored in the remaining chunks that constitute a self-healing object. Once again, it isn’t necessary to rebuild or replace an entire drive, just the affected data. Utilizing erasure coding with a data scrubbingapplication willachieve extreme data durability. Combining these technologies can deliver up to 17 nines of data durability, or in simpler terms, for every 1,000 trillion objects, only one would be unreadable. It is for this reason that OBS is widely used in hyperscale data centers and cloud computing environments.
Final Thoughts
Organizations face an exponential growth in data that adversely affects storage solutions based on traditional architectures as they are not equipped to handle this deluge, and the loss of data can be catastrophic,significantly impacting a business’s reputation and revenue. To retain large datasets on a long-term basis, and ensure that the data stored will always be available when required, and accurate when read, requires a platform built on an Object-Based Storage system. The combination of erasure codingand the ability to seamlessly distribute data across multiple data center locations can cost-effectively support data at scale, at a lower capital and operational expense, due to more efficient data protection when compared with traditional storage architectures.