An ever-increasing volume of documents in SharePoint repositories poses a growing challenge i.e. the length of time it takes to complete SharePoint backups. Storage of BLOBs in the content databases results in an infrastructure with a heavy backend, resulting in never-ending backups. Some of the more serious implications include performance-loss, inefficient use of resources, increased likelihood of data loss due to fewer recovery points and failure to meet the time constraints of backup/recovery windows vis-à-vis your SLA commitments.
Users often end up deleting their documents accidently from the document library which have even been removed from the SharePoint Recycle Bin. Although, SharePoint comes with a set of native tools for backup and recovery, but the experience of these tools tells that they don't help with such a situation much as none of them provides document level, granular content recovery. Let us look at some of these tools and their limitations given a certain situation.
BLOBs overwhelm SQL Server: SQL Server is a relational database that was designed to store structured relational data and not large binary data. Therefore, when SharePoint stores its documents (BLOBs) in SQL Server, it overwhelms the database and this slows down SharePoint performance considerably. More than 90% of the data stored in SQL Server for SharePoint is BLOB data.
Due to the inherent limitations of these tools, Microsoft doesn't recommend (http://technet.microsoft.com/en-us/library/cc263427.aspx) using these tools for
In such a situation, the administrators have no choice than to use SQL Server databases specific tools and techniques, in tandem with these native tools, for reliable, timely backup/recovery, but they too have their own limitations.
Firstly, the SharePoint administration by default creates a single data file for its content database. BLOBs which may eventually grow in terabytes are all stored in a single data-file, unless the data-file is manually split. A data-file more than 200 GB is not recommended by Microsoft since large data-files have serious performance issues. The time it takes to backup such large data-files usually spans more than the backup window and also have serious storage requirements. This further introduces an element of periodic maintenance for the administrators.
Secondly, content databases by design cannot span multiple file-groups; hence the benefit of parallel backup of file-groups available in SQL Server cannot be exploited.
Thirdly, the SQL backup should not be used to back up the search database, because the search indexes are not stored in SQL Server and cannot be synchronized with the search database after a database-only backup.
Also, it requires that you manually reattach your databases to web applications after a recovery.
And last but not the least, the granularity is still a question mark as SQL restore cannot be used to recover at any level smaller than a database.
Hence, the approaches discussed above may assist the SharePoint administrators with SharePoint backup and elementary recovery scenarios at best and that too with costs. So, the question arises can you survive in a world with long-running backup jobs? How long can you wait till your administrator finds and restores a lost document — minutes, hours, or days? Can you afford the unavailability of critical data for prolonged periods?
Now, when it comes to solutions to address the above-mentioned situations and limitations of the backup/recovery options, a deeper analysis reveals that the BLOBs are the culprits that further aggravate the limitations of these tools. BLOBs cause an exponential increase in the size of content database. So, somehow, externalizing blobs out of the database to an external storage media can surely help in better managing your backup/recovery requirements.
Externalizing blobs means that the blobs backup and recovery management is done out of the database. Immediate benefits that it brings are absolutely zero impact on the performance of SharePoint infrastructure during backup/recovery, use of backup/recovery features of the external storage media, no loss of data, small and manageable size of databases and above all timely completing backup/recovery and granular restore.
So, a tool such as StorageEdge can provide you with an out-of-the-box solution to overcome SharePoint backup/recovery challenges by providing enterprise-grade blob externalization providers. As a first step, it takes charge of your BLOBs by moving them out of the content database.
Figure 1: Configuring Storage Tiers in StorageEdge for BLOB storage
StorageEdge takes on itself the responsibility of management of these BLOBS including their backup/recovery. Some of the immediate benefits thus obtained are:
While StorageEdge provides a much-better approach to SharePoint backup and recovery, the separation of content database from BLOBs means you have distinct elements which are needed to be backed up independently. These include the content database which resides in SQL Server and the BLOB index, which the StorageEdge maintains for the externalized blobs, exists either separately or within the content database itself. Similarly, the externalized BLOBS may be sitting on files, NAS, SAN or Cloud. So, you need to be mindful of the order in which you plan your backup and recovery.
SharePoint performs two distinct operations on the BLOB items - Add and Delete. Update is handled by SharePoint as an Add operation. Consider a situation when new content is added when externalized content is backed up first and the content database subsequently. StorageEdge will update the content database and the BLOB index with new entries but backup will not have the content in it. If this backup is restored, an inconsistency will occur since the content database and BLOB index will contain content entries that will not be present on the external storage. This will give rise to a dangling reference and generate a "file not found" type of error when the item is clicked from within SharePoint.
Similarly, if an item is deleted from SharePoint when the BLOBs are being backed up first and the backup is in progress, the item might have already been backed up. On delete, the entry will be removed from the content database, the BLOB index of StorageEdge and from the physical storage as well. However, if this backup is restored at some point in time, the deleted BLOB will be restored but it will not have a corresponding entry in the BLOB index of StorageEdge and the content database, hence it will not show up in SharePoint as well. However, it will reside as an orphan BLOB on the external storage. Such repeated backups and restores may produce a large number of orphan blobs occupying the storage unnecessarily.
So, whenever you backup the content externalized using StorageEdge, it is important that you backup the content database and the BLOB index first before the BLOB store to avoid data inconsistencies. Similarly, the restore must take place in the reverse order. This is essential to avoid incurring inconsistencies in data.
Figure 2: Proper Order of Backup with StorageEdge Externalized Content
The garbage cleaner of StorageEdge helps remove dangling references and the orphan BLOBs, but you still need to be mindful of the fact that when you externalize your BLOBs, the order in which you backup/restore various SharePoint elements is crucial.
So summarily, externalizing blobs with StorageEdge can be an important step in managing the SharePoint backup/recovery needs. It not only has the potential to help you meet backup/recovery window requirements by ensuring timely backups, but it also ensures that SharePoint infrastructure delivers per the expectations of the end-users. Blobs management becomes much easier and the granularity of data recovery is a great support to the SharePoint administrators in ensuring that critical data is available to the organizations at all times.