No organization today is exempt from need to retrieve data for various purposes whether it is for disaster recovery, legal matters, financial audits, or other business-specific use cases. Organizations have traditionally protected data in on-premises environments in the typical fashion of rotating backups through a schedule of retention policies. Backup data then gets shifted to longer term storage for having on hand in case the need arises to access historic data. As businesses shift more and more data to cloud environments, understanding how specific data can be recalled in the event of not only disaster recovery, but other cases of business need, is a question that organizations must consider.
Concerningly, data protection in the cloud is often misunderstood, and in some cases, is not implemented at all. There are two types of data that generally fall under the umbrella of data that is stored for both data recovery and accessing for historic purposes. These are data backups and data archives. In this post, we will take a closer look at the differences between these two types of data. Do these two types of data storage compete with one another in a way that organizations choose only one? How are these types of data generally handled in on-premises environments? How do backups and archives differ in the public cloud? What are the mechanisms used on-premises for the various types of data? How can organizations solve the challenging complexities of retrieving data in the cloud?
Table of Contents
What is Data Backup?
The term backup may mean different things to different organizations. Some organizations may be happy to have a semi-recent copy of their data contained that can be retrieved for data recovery or review. However, for many of today’s organizations who are running 24×7 business-critical web infrastructures, backups may need to contain very recent copies of data from only a few minutes prior. Backups of systems typically contain the “hot” data or the data that is changing regularly, often, or frequently. This is where your business-critical data generally resides as it is the data that is currently being manipulated, accessed, or worked with in some form or fashion.
The reason a backup is termed as such is that backups contain the data that is typically restored when data is accidentally or intentionally deleted, corrupted, or otherwise made unreadable or inaccessible.
Typical scenarios that may call for backup data include the following:
- An employee accidentally deletes an important spreadsheet.
- The CEO inadvertently saves changes over a file he is working on.
- An employee maliciously and intentionally deletes a large amount of data from storage.
- An attacker breaches the perimeter network and intentionally deletes files.
- Ransomware infiltrates the network and corrupts files stored in shared storage.
All of the above scenarios would leave the data that is affected unreadable or inaccessible. Depending on the criticality of the data, the data loss event could constitute a disaster scenario to the business where a business continuity/disaster recovery or BC/DR plan kicks into action. In these types of events, organizations would be looking to restore the most recent data as quickly as possible to ensure the impact to the business is minimized.
With backups, the Restore Point Objective or RPO and the Restore Time Objective or RTO come into play. These affect the SLAs that are in place for restoring normal business access to data. The Restore Point Objective directly relates to the amount of data the organization is willing to tolerate. Hourly backup snapshots will generally result in an RPO of 1 hour since you would potentially lose an hour’s worth of data. The RTO defines how long it takes for the RPO data set to be restored. In other words, how long will it take to recover the data that has been chose to restore? All of these factors weigh into consideration with backups as they are directly involved with recovering from anything from a small data emergency (maybe a file) all the way up to a full-fledged disaster (many files or entire sections of data have been deleted or made inaccessible).
Generally speaking, today’s mission critical backups typically reside on disk. The traditional tape storage media has certainly not been eliminated. However, its role has transitioned from the primary target of backup storage to the archive storage media of choice for on-premises networks (more details below under archive data discussion). The reason for this transition is that businesses today require ultra-fast RTOs for restoring their data. Tape is cost-effective, resilient, and makes a good choice for archived data, however, it is excruciatingly slow and is cumbersome to restore from. Disk-based backups are extremely fast and data can be accessed and restored extremely quickly. With disk-based backups, there is no time spent inventorying, cataloging, and changing out tapes to have a complete representation of your data. Disk-based recovery is only limited by the underlying disk infrastructure as well as network connectivity speeds.
Typically, in a loose definition of the time frame of data contained in backups, data in backups is typically 90 days old or newer. However, in most environments, the data that is typically targeted for recovery is the newest data available. This may not always be the case. Most modern backup solutions that target disk are able to keep multiple versions of the data contained in production systems. Why is this important? In a simple example, if a Microsoft Office Word document is updated daily for the past 10 days and erroneous data was entered 3 days ago, a desirable recovery of the file may not be the newest version available. It may be 3-4 days ago depending on the specific time when the change occurred. Effective versioning of files in backup data allows recovering the correct representation of the data needed for recovery.
Backups of Data On-Premises
As already touched upon, backups are a mainstay of on-premises environments for decades now. There has certainly been a transition of backup technology and methodologies with the onset of virtualized environments and disk-based backup technologies. What do backups on-premises typically look like?
In a typical enterprise environment today, there are several key aspects of backups that take place. Following the 3-2-1 backup methodology best practice, most organizations perform virtual machine backups to disk on a regular interval, whether this is hourly or sooner. Most on-premises backup solutions perform incremental backups of the changes that have happened since the last backup. This efficiently copies only the changed data and alleviates a bulk copy of all the data from production which would constitute a full backup. The incremental changes are versioned with each backup iteration so that data versioning can take place with the benefits mentioned earlier.
Most organizations, with the selected data protection solution of choice, keep a number of days or weeks of backup data on disk for quick access to the data and quick restores. As backup data ages, the oldest data is then transitioned off disk-based systems over to tape or even to low-cost cloud storage for long-term archival (we will look at this process in more detail a bit later).
Backups of Data in the Cloud
Cloud environments represent new challenges for organizations deciding to migrate business-critical data and services to public cloud vendor environments. Organizations have had challenges trying to make their legacy design paradigm for backups fit new cloud environments which are hosting their data. Cloud is so very different in terms of provisioning, management, and cost processes than on-premises, it can be challenging for businesses to wrap their heads around the proper way to translate the known processes they have been accustomed to, into real methodologies and processes that work in the cloud.
Time and again businesses make the mistake of misunderstanding cloud infrastructure as making their data bullet-proof. This is NOT the case. Data in the cloud needs to be protected the same as if it is on-premises. In fact, data protection may even be more necessary in cloud environments, since if organizations do not have a robust cloud security solution, they may not fully understand which users have granted permissions to which data and from which mobile devices, to which apps. BYOD and third-party access to public cloud data, as in environments such as Google’s G Suite and Microsoft’s Office 365 backup offerings, opens up new threat vectors that most likely do not exist in on-premises environments.
Backups in Cloud environments have sorely lagged behind the on-premises equivalents due to many reasons. Initially, the demand from customers for cloud protection was not seen as a priority in the early days of adoption. However, once cloud overcame the first few years of fear about security and privacy, the floodgates have opened and businesses are migrating to cloud in droves. Market leading data protection vendors who have leading on-premises products have struggled to introduce solutions that allow customers to have feature parity with on-premises solutions. Public cloud vendors have been extremely slow introducing true native data protection features into their service offerings.
The best solution for businesses today who have migrated or are thinking of migrating to public cloud environments such as G Suite and Office 365 is a purpose-built, cloud-aware, third-party data protection solution that allows performing effective backups to protect all business-critical components in the cloud. There are many key facets to choosing a solution to perform backups of data in the cloud. Often, many of the needed characteristics and capabilities on-premises are needed in the cloud including:
- Automated backups
- Incremental copies
- Encrypted backups
- Retention capabilities
- Multiple storage options
Understanding and implementing cloud backup can be challenging without the right tools. However, it is essential that organizations take cloud backup seriously and pick the right, purpose-built, cloud-native solution that allows performing all the necessary tasks in protecting business-critical data. What about data archival? How is it different than data contained in backups?
What is Archive Data?
We have talked about data backup in some depth. The difference between backups and archiving has been alluded to, however, let’s take a closer look at these differences. Data backup contains data that changes often. As described, this is the data that is “hot” or active data presently being utilized by business operations and customers. Data archival differs from data backup in many ways, including the intent and purpose and the way in which it is accessed. For the most part, most businesses keep archival data around for a much different use case than the reason backups are stored. While the use case for backups as mentioned is almost entirely for the purpose of recovering data or disaster recovery, archive data is typically never used for data recovery. The entire intent and purpose of data archival is often to preserve records and data regarding business operations. It is typically fiscal or customer related data that is used in the event that legal or regulatory issues arise. A financial audit or other issue may necessitate that a business proves the data submitted at the time was legitimate, accurate, or not falsified.
Let’s consider a typical workflow of how data transitions through the lifecycle of existence from actual production data to backups, and then to archive data. As data is backed up and stored at regular intervals from the actual production environment, the data then starts the aging process. Typically, data backup, in general terms, contains data that is 90 days old or younger. Any data that passes beyond the 90-day mark is typically transitioned to some type of archive media such as tape. Keep in mind, this is only a general example of what is considered backup vs archive. Depending on the business needs and the actual use case of storing backups and rotating data to archives, these life cycles may be much longer or shorter than the 90-day period mentioned. Whatever length of this rotation from backup to archive, most organizations have some sort of schedule to rotate backups from their “hot” backup storage such as disk to their long-term storage (i.e. tape) after a certain period of time.
Additionally, organizations today are intermingling on-premises infrastructure and cloud infrastructure in a hybrid approach. This may involve making use of cheap cloud storage such as Amazon’s Glacier environment as an archive location for this archived data. Data stored in on-premises disk arrays or environments are copied to cloud storage for archiving, instead of rotating it to tape. After this general look at the difference between data backup and data archiving, let’s take a closer look at the process on-premises.
Archive Data On-Premises
Taking the process mentioned earlier in today’s on-premises environments running virtualized infrastructure, backups are performed incrementally of virtual machines and other infrastructure in the environment. These backups of VMs or file data is generally rotated to a tape at some point. Often, what organizations will do in this case is take a weekly tape backup of virtual machines or file servers, and keep these weekly tapes around for a month.
Many organizations will keep a specific set of weekly tapes and place those in a DR facility or offsite location. That specific week of the month, becomes the monthly tape set representing the data archived from disk to tape. Businesses may choose to keep a number of years of these monthly tapes in a vault for safekeeping for archive purposes. In this way, if an audit of data requires pulling SQL Server data from a year ago or more, the data exists on tape in the form of an archive of the virtual machine, database, or file folder which can be restored for review.
Those businesses making use of Amazon or other public cloud storage in a hybrid approach simply retrieve the backup located in cloud storage they are interested in viewing, and restore this backup on-premises for review. Making use of the hybrid approach alleviates the need to swap out tapes and all the physical implications of securing tape archives that come along with keeping physical media.
Security is certainly a topic that needs to be considered with both backup and archiving as backup data is simply a copy of real production data. If data is not properly secured using encryption and then also physically secured (tapes, hard drives, etc), a perpetrator could potentially gain physical access to those devices and then have access to the data they contain.
Archive of Cloud Data
Public cloud vendors ironically have provided better mechanisms for long-term archive storage than they have in giving attention to providing effective backups for their tenants consuming public cloud services, data, and other infrastructure. The main reason for this is legal obligations. However, those natively provided features that do exist for archiving are fairly limited when thinking about the needed features and data archiving needs that exist for legal and auditing purposes.
Google’s Vault solution allows organization to manage, retain, search, and export email, Google Drive file content and on-the-record chats. The Business and Enterprise G Suite packages allow you to archive corporate data from G Suite products including, Gmail, Google Drive, Google Teams Drive, Google Groups, and Google Hangouts Meet.
A major limitation to Google Vault is that it contains no direct restore functionality of items contained in the Vault. The items that are found in the Google Vault can only be searched, viewed, and exported. They cannot be restored to the original location whether it be an email inbox, Drive, or Team Drive. Organizations who are undergoing audit or other types of litigation may need to go beyond searching, viewing, and exporting and may need to restore data for easy access. This limitation can create a roadblock for organizations who are making use of this functionality to efficiently being able to work with archived data.
Microsoft Office 365 has the Security & Compliance Center that allows creating eDiscovery cases that allow many of the same features and functionality that are afforded customers in Google Vault such as searching and exporting. Again, no mention of restoration of data using this tool is available.
In the cloud, are data backups, retrieval, archiving, and restoration any less important than they are in on-premises? When you think about the movement of where infrastructure is being provisioned and where businesses today are looking at housing resources, all of these types of functionality and operations are going to be critical for both disaster recovery and compliance.
Outside of native tools that are provided by public cloud providers such as Google and Microsoft using their public cloud platforms, what tools can be used? Customers do not have the ability to archive data to the same types of storage such as tape and using the same types or processes they are using on-premises. What options do customers have? Let’s take a look at the purpose-built tools provided by Spinbackup that allow organizations to effectively perform data protection, disaster recovery, and data archiving in the G Suite and Office 365 public cloud environments.
Spinbackup – Backups, Archives, and Much More
Spinbackup provides the complete approach for protecting, archiving, and securing data. We will discuss the security aspect in just a bit. First, how does Spinbackup provide the answer for the needed tools to accomplish both backup and archiving in the public cloud? The Spinbackup data protection solution features the following:
- Automated backups – Automatically protect the public cloud like it is an on-premises environment; using automatic backups.
- Versioned backups – Create and maintain multiple versions of business-critical data contained in either G Suite or Office 365. Roll a file back in time with an almost DVR-like process. Simply select the data and version of the file and recover the file to that specific version.
- Security – Spinbackup encrypts your data both in-flight and at-rest using industry standard encryption algorithms.
Spinbackup provides effective versioning of cloud backups
- Efficient Incremental backups – Spinbackup intelligently only copies the data that has changed. This means the backup data footprint is much smaller and the backup cycle can finish much more quickly.
- Retention – Retention is the area that really shines when thinking about both backup and archiving. With Spinbackup, data can be kept indefinitely, meaning that you can potentially never delete the backups. Also, data can be finely tuned to allow rolling off restore points after a specified number of months. This means the backup and archiving functionality is seamlessly rolled into one solution. There is no rolling of restore points to specialized media or other specialized process that has to happen to create effective archives.
Indefinite retention allows seamless flow from backup to archive
- Multiple storage options – There is an aspect of controlling your storage and storage media that does come into play with Spinbackup in a powerful way. Spinbackup’s solution gives you the ability to choose which public cloud you want to utilize for storing your backups. This allows for creating an extremely resilient and purposeful design in where data resides. This helps to effectively keep production data separate from backup data and adds an extra layer of resiliency. Customers have the choice to store data in either Google Cloud Storage or in Amazon S3 storage.
Spinbackup allows choosing the storage location for cloud-to-cloud backups
All of the above features allow Spinbackup to stand out in the field of public cloud data protection. By giving customers the ability to have an all-in-one solution for both data backup and data archiving, the solution provides a seamless mechanism for accomplishing both in a single management interface. Security was just briefly touch upon earlier. What security features do customers get with Spinbackup?
- Ransomware Protection – Protect against perhaps the most dangerous threat to business-critical data, data-destroying ransomware. Spinbackup protects against and automatically restores data affected by ransomware corruption.
- Data Leak Prevention – Protect data from leaving your organization in an unauthorized way.
- Insider Threat Prevention – Detect threats to security and data that may come from users on the inside.
- High Risk Third-party Apps Control – Prevent malicious third-party apps or those accessing sensitive data from continuing bad behavior.
- Alerting and Monitoring – Spinbackup provides a proactive approach including monitoring and alerting when it intelligently notices anomalies in the environment.
- Machine-Learning enabled intelligence – Powerful machine-learning is at the heart of Spinbackup’s solution that allows it to intelligently learn and make decisions based on user behavior and activity in the environment.
Data backup and archiving are both essential components of the overall data storage methodology. Organizations have used both approaches over the years in on-premises environments and now as they migrate over to the cloud to run business-critical applications and store data, both are needed there as well. The mechanisms and tools that are used in on-premises environments are much different than what is available in the public cloud. While data contained in backups is considered “hot” data or data that changes frequently, data stored long-term can be equally important when data must be discovered for legal or auditory compliance reasons.
While businesses today are challenged to find the natively provided tooling for backup and archiving provided by public cloud vendors, Spinbackup provides an all-in-one solution that allows businesses to have the tools they need to accomplish both their backup and archiving needs in a seamless solution. In addition to the data protection capabilities they receive with the Spinbackup solution, organizations also get a machine-learning powered cybersecurity solution as well. This provides world-class capabilities for securing the G Suite and Office 365 public clouds.
Backup, protect, archive, discover, secure, and defend your data, all with one solution – Spinbackup!