Multi-tiered Storage: Why and How

It is an established industry observation that the data loses its value as it gets older. With the passage of time, the reference data which is less frequently used grows exponentially as compared to active, frequently accessed data. This poses a challenge to manage all such type of data as it may lead to higher costs and sub-optimal utilization of investment into storage infrastructure. Treating all this data alike will have an extremely negative impact on the costs of storing it. So, the real need is to align your storage options with the data growth requirements completely in line with its frequency of access. A single-tier based, monolithic storage system will start to yield diminishing returns with increasing volumes of aging and stale data, as the associated costs of storage will begin to increase prohibitively.

Not all Data is the Same – Why You Need Multiple Storage Levels

A fair estimate reveals that in a typical enterprise environment, the active data constitutes only 2-4% at a certain point in time. Aging data contributes close to 10% and the rest of it rarely used. Same is true for a SharePoint based Enterprise Content Management system as well. The active BLOB content base is only a small proportion of the total TBs of content that it carries.

Multi-tiered Storage Figure-1

Figure 1: A Typical Taxonomy of BLOB Data in a SharePoint Environment

Considering the costs that the storage of the aging or stale BLOBs in primary, high-end storage can incur due to utilization of valuable storage space, it is unjust to treat all the BLOBS the same in terms of its storage. Simply offloading the SharePoint BLOB payload from expensive and high-end transactional SQL Server storage to less expensive external storage may yield visible cost savings. The primary contributor to this saving is the fact that aging and non-active BLOB data, which though is part of SharePoint content base but is hardly accessed, no longer occupies the large tracts of expensive storage space. However, the externalization of BLOBs is only one important aspect of BLOB storage management. How you manage the physical storage of externalized BLOBs on external storage is the other one.

If you keep all BLOBs, whether active, aging or stale, in your primary, single tier of external storage, there is a fair chance you will get only a marginal benefit out of externalization, as the cost of storage in your primary, relatively higher-end storage tier will keep on increasing. So even if you have offloaded 90-95% of data to a so-called lesser expensive tier of storage and you are still able to manage only marginal cost savings, you need to seriously rethink your storage strategy.

How Best to Manage your Storage Infrastructure – A Multi-tiered Hierarchical Storage System

An effective approach for the storage of externalized BLOBs is to structure the external storage as a hierarchy of multiple tiers in the form of a hierarchical storage management system (HSM), with one storage tier corresponding to one age-based category of BLOBs.

Multi-tiered Storage Figure-2

Figure 2: A Multi-tiered Hierarchical Storage System

Figure 2 above shows a Multi-tiered Hierarchical Storage System. This hierarchical structure is based on the cost vs. activeness of data principle. Naturally, more active the data, the more you are willing to spend on its storage and access performance. So, It has been structured in such a way that the most active BLOBs should reside at Tier-1 which is a high-end, faster-access storage which may be a File System or a SAN. Similarly, the aging data should be kept at Tier-2 which is typically a NAS based storage and the archived/seldom-accessed data at Tier-3 which can be a Cloud. So based on this strategy, you effectively push lesser active content to the cheaper tiers.

StorageEdge – An Effective Multi-tiered HSM Solution

StorageEdge has been built keeping the aforementioned, important BLOB storage concept in perspective. It provides multi-tiered storage that allows you to keep your active content in the most expensive storage and archives older content out to less expensive storage. It has intelligent archiving facility which effectively ensures that the primary storage is not over-burdened with millions of documents. It provides for a fine-grained approach to BLOB archiving on multiple tiers storage which allow managing movement and placement of BLOBs at an atomic level.

Multi-tiered Storage Figure-3

Figure 3: Configure a Storage Tier in StorageEdge

StorageEdge does SharePoint archiving of these documents based on two criteria:

  1. Age: You can specify content age that allows your content to stay in a tier before it gets migrated to the next tier. This age can be specified separately for each storage tier and you can have as many tiers as you want.
  2. Versioning: The other criterion is based on document versioning. Whenever you update a document, SharePoint does not update the original document but instead creates a new version and save that. It preserves the older versions and when you update a document multiple times, you will end up creating many versions of it. Usually, everybody works on the latest copy of the documents so it is practical to archive the older versions to the cheaper storage tier and keep the latest version on your expensive storage tier. The subsequent tiers could be simply cheaper storage either locally in your LAN or Cloud storage. For Cloud Storage, you may want to control your bandwidth and for that StorageEdge provides bandwidth throttling feature that handles this automatically. I'll discuss throttling in more detail later in a separate blog.

For Cloud Storage, you may want to control bandwidth, for which StorageEdge provides throttling. Further, StorageEdge allows you to edit these criteria right from within SharePoint Central Administration (CA) to the control the movement of BLOBs across tiers.

Multi-tiered Storage Figure-4

Figure 4: StorageEdge Tiers in a Storage Profile

So by having multi-tiered storage, you are able to control your storage cost in alignment with the business needs as you are able to grow you storage options in an incremental manner. You can expect exponentially larger cost savings if your storage system can continue to move offloaded BLOBs to even less expensive tiers of storage with respect to their age. Surprisingly, it also improves SharePoint performance because the most active storage (meaning your Tier-1) no longer contains all those huge amount of documents that would have overwhelmed it.

How Does StorageEdge Help?

StorageEdge combines all the things discussed above and automatically improves SharePoint performance and scalability for you. Here is how it works:

  • Install StorageEdge: You install StorageEdge on all the WFE servers. StorageEdge installs itself as web modules to SharePoint application and also configures an External BLOB Storage (EBS) adaptor. StorageEdge also includes an extremely fast in-memory cache (NCache) bundled inside it on the WFE servers.
  • Configure BLOB Externalization: Through a web interface, you configure StorageEdge to externalize all your BLOBs. You can specify an external storage location which can be file system, SAN, or NAS storage. Cloud Storage is coming soon.
  • Configure BLOB Caching: You also configure StorageEdge to cache BLOBs and you can specify for how long after which they’re expired (removed) from the cache.
  • Configure List Caching: You also configure StorageEdge to cache lists that you feel are frequently used and also for how long after which they are expired (removed) from the cache.
  • Monitor SharePoint Performance improvements: You can start using StorageEdge immediately and monitor its progress through a rich set of PerfMon counters. And, of course, you can monitor a noticeable improvement in your SharePoint performance with faster response times through your regular ASP.NET PerfMon counters. And, you’ll see the same good response times even when you increase user/transaction load on SharePoint.


What to Do Next?