Zerto disaster recovery is a software-only disaster recovery (DR) solution that provides VM-based backup and traditional site-to-site DR. It’s deployed on virtual machines for on-premises, cloud, and hybrid ransomware/disaster recovery plans.
For some enterprise businesses, Zerto is a solid DR solution. It provides near-synchronous replication and continuous data protection (CDP) by storing time-stamped copies of files on change journals on each machine, so users can recover data to a specific point in time in the event of an outage, ransomware attack, natural disaster, etc. Zerto users can achieve sub-10-second RPOs and RTOs in a few hours. Zerto also offers a fully managed service: Disaster Recovery as a Service (DRaaS).
But Zerto has a few limitations to keep in mind when you’re looking for a disaster recovery solution. Consider the following limitations:
- Site-to-site (point-to-point) architecture: Zerto only replicates data between two sites concurrently. This could impact your RPOs and RTOs if you are replicating to more than 2 sites. And while it can perform one-to-many replication (e.g., from site 1 to sites 2 and 3), it’s limited to one source and, at most, three targets. It can’t replicate data from multiple endpoints to multiple backup sites.
- Scalability challenges: Zerto works well in smaller environments. But as environments grow, Zerto experiences more frequent replication failures that can require help from Zerto support to identify and fix.
- Lack of reliability: Zerto uses point-to-point replication, which creates single points of failure. And users have recently reported outages on some of Zerto’s features, such as their analytics platform.
- Storage costs: Zerto’s journal-based recovery requires a lot of storage space, which will increase the costs of your DR strategy.
In this article, we’ll dive deep into what works well with Zerto’s disaster recovery, what presents challenges for some businesses, and how it compares to our own file replication and sync solution, Resilio Connect.
Want to see how Resilio can enhance your DR and business continuity strategy? Schedule a demo.
Resilio Connect offers an alternative approach. Resilio’s DR solution provides real-time multi-site replication to achieve low RPOs and RTOs across two or more sites.
The solution enables “write anywhere” DR where files can be updated across any site concurrently. Through a proprietary P2P (peer-to-peer) replication architecture, Resilio enables all sites to be active.
Resilio’s DR solution also overcomes limitations with point-to-point replication architectures and TCP to enable an exceptionally fast DR solution that scales well across any number of physical or VM-based servers — across multiple sites. Thus, Resilio can be considered for both warm and hot-site disaster recovery.
Resilio is:
- Flexible: You can deploy Resilio with any type of storage (file, block, or object) or any type of server (physical, virtual, or containerized). You can use any cloud storage provider and deploy the solution on-premises and in hybrid cloud environments.
- Fast: Resilio’s P2P replication architecture scales performance by simply adding Resilio agents and increasing bandwidth. Resilio Connect is able to synchronize hundreds of endpoints 3–10x faster than traditional solutions like Zerto.
- Reliable and highly available: By default, there is no single point of failure using Resilio. All sites are protected by active-active high availability. Clients can be load-balanced across sites for fast client redirection. Unlike traditional active/passive solutions, Resilio Connect dynamically routes around outages with connection-loss recovery and checksums for data integrity.
- Scalable: Resilio scales organically to support environments of any size. And it can replicate files of any size, type, and number (450+ million in a single job).
- Multidirectional: Traditional replication solutions like Zerto can only sync one way, from a source to a target. Resilio can sync in any direction: one- or two-way, one-to-many, many-to-one, and many-to-many (full mesh).
- Versatile: You can use Resilio in a variety of different DR scenarios. You can run Resilio Connect in VMs or directly on physical systems, such as NAS devices. You can use it for real-time off-site backup, cloud DR, or branch office data protection. Many customers use Resilio for large-scale server synchronization and for cloud sync. You can also replicate all of your VMs using Resilio Connect.
- Efficient: Resilio is designed to be efficient and gives you granular control over how replication occurs in your environment. Resilio’s proprietary WAN acceleration technology can optimize transfers over any network (including high-latency, low-quality networks), and enables you to utilize any type of connection (VSATs, cell, Wi-Fi, IP connections).
Organizations in tech (Match, Microsoft), media (Turner Sports, CBS), retail (Mercedes-Benz, McDonald’s), logistics (Deutsche Aircraft, Airtec Global), gaming (2K Games, Bungie), and more use Resilio Connect to quickly and reliably sync their replication environments and protect data as part of a disaster recovery strategy. To learn more about how Resilio Connect can enhance your disaster recovery strategy, schedule a demo.
Deployment, Management, and Automation
A good disaster recovery solution should be flexible enough to support your current IT infrastructure (i.e., work with the devices, operating systems, and cloud storage platforms you use), as well as easy to manage and automate.
Zerto and Resilio Connect are similar in that they both:
- Are flexible software-only solutions that work by installing agents on the machines you want to replicate to and from.
- Can be deployed in on-prem, cloud, and hybrid/multi-cloud environments.
- Enable you to manage your entire replication environment from a centralized location that can be accessed via a web browser or console.
- Allow you to automate how replication and recovery of your applications occurs.
- Provide insight into replication and recovery through analytics.
However, there are some key differences in how the solutions are deployed and managed.
Deployment
Ideally, you should be able to deploy your DR solution directly on your existing infrastructure. This enables you to begin replicating and protecting your data as quickly as possible without the need to migrate data and invest in new hardware or storage platforms.
Devices
Whether in the cloud or on-premises devices, Zerto agents work through virtual machines (VMs). Zerto supports replication between VMware vSphere, VMware Cloud Director, Microsoft Hyper-V, Microsoft Azure, Amazon Web Services (AWS), and IBM Cloud.
In addition to VMs (such as VMware, Citrix, and hypervisors), Resilio Connect agents can be installed directly on any device or cloud server, including servers, desktops, laptops, mobile devices (Resilio offers iOS and Android apps), and NAS/DAS/SAN devices.
Operating Systems
Zerto supports Windows, Linux, macOS, and Ubuntu operating systems.
Resilio Connect works with all popular operating systems, such as Linux, Windows, macOS, Ubuntu, Unix, FreeBSD, OpenBSD, and more.
Cloud Storage Providers
Both Resilio Connect and Zerto support just about any cloud storage platform you want to use. Zerto supports 350+ cloud service providers.
Resilio Connect works with any S3-compatible cloud storage provider, such as AWS, Google Cloud Platform, Microsoft Azure, Wasabi, MinIO, Oracle, and more.
Management and Automation
An effective DR system should provide centralized management (so you can easily control replication and recovery processes across your entire environment from one location), automation (so these processes can occur with minimal human intervention), and analytics (so you can monitor and optimize performance, identify and troubleshoot issues, and ensure SLAs are being met). This is particularly true of hardware and cloud-agnostic platforms like Resilio Connect and Zerto, as they allow you to control all of your storage environments from one single interface, rather than investing in multiple solutions.
With Zerto, you can manage your replication environment through the Zerto User Interface on a web browser, VMware VSphere web client, or a Client Console. Some of the things you can control through the interface include:
- Program VM recovery settings.
- Run live failover tests and DR tests
- Create pre and post-recovery scripts to automate recovery
- Create recovery checkpoints
- Manage users and access
- Create Virtual Protection Groups (i.e., groups of machines that replicate at the same time and failover to the same backup site)
- Plan workloads with monitoring and visibility.
- Monitor recovery SLAs
- Forecast future infrastructure needs to another site or the cloud.
Resilio Connect offers a similar portal through which you can manage and control your replication and recovery environment. But Resilio Connect was designed to give users granular control over their environment so they can configure replication to occur exactly as they want it to, maximize resource utilization, and minimize costs.
You can use Resilio’s Management Console to:
- Visualize, monitor, track, and configure all file replication globally for your disaster recovery services.
- Obtain visibility into your environment through an easy-to-use interface.
- Receive real-time performance metrics and notifications.
- Adjust replication parameters such as disk I/O threads, buffer size, packet size, file priorities, and more.
- Create bandwidth utilization profiles for each server that govern how much bandwidth the server can use at certain times of the day and on certain days of the week.
- Use Resilio’s powerful REST API to automate tasks through scripts and APIs.
All of your Resilio replication and sync jobs are fully automated for as many global locations and replication sites as needed — it’s a set-it-and-forget-it solution that works in the background.
Case Study: Deutsche Aircraft
Deutsche Aircraft manufactures commercial aircraft. DA switched from DFSR to Resilio Connect in order to sync millions of files across their Microsoft DFS namespace and protect business-critical data — which significantly reduced management time and increased efficiency.
“We have a 10Gbps network but prefer to use under 1Gbps for data transfer and replication. With Resilio, we’re able to keep that down to 250Mbps during the day and at night move back up to 1GBps… Resilio Connect is much easier to manage than DFSR. Using the Resilio Connect management console, you can see everything you need to know. Everything is visible.”
Replication Methodology
When choosing a DR solution, you must take the following aspects of your DR plan into consideration:
- RPO (Recovery Point Objective): What’s the maximum amount of data loss your organization/application can tolerate?
- RTO (Recovery Time Objective): What’s the maximum amount of downtime your organization/application can tolerate? How quickly do you need your system to be back online?
- DR/Failover Configuration: How do you want to configure the recovery process on your machines?
The replication methodology of your DR solution has a large impact on how well it can meet your requirements.
Replication Frequency
Many organizations and applications can’t tolerate much data loss when an outage or disaster occurs, and require a very low RPO. The more frequently a solution replicates data to a backup site, the lower your RPO will be.
Zerto provides near-synchronous, always-on replication. When changes are made to files, it replicates just the changed portions of files (almost) immediately. And through the Zerto User Interface, you can configure checkpoint frequency for each journal — i.e., how often the journal creates checkpoints that you can restore to when a disaster strikes. Because of Zerto’s continuous replication, it can achieve sub-10-second RPOs and RTOs within hours.
But while Zerto provides continuous replication, it performs near-synchronous replication — which creates a slight delay between when a change occurs and when it’s replicated. In other words, Zerto’s replication doesn’t actually occur in true real-time.
Resilio Connect, on the other hand, performs true real-time replication. It uses optimized checksum calculations (i.e., identification markers assigned to each file that change whenever a change is made to the file) and notification events from the host OS to immediately detect and replicate file changes (you can also sync on a fixed schedule or perform manual syncs). Because of this, Resilio can achieve sub-5 second RPOs and RTOs within minutes of an outage — a difference that may seem trivial but is critical in high-activity production environments.
Replication Architecture
The replication architecture of a DR replication solution has a large impact on the speed with which replication occurs as well as the scalability and reliability of the solution.
Zerto, like most traditional replication solutions, uses a point-to-point replication architecture. In a point-to-point architecture, replication can only occur between two machines at a time in one of two topologies:
- Hub and spoke: This consists of a hub device and several remote devices. The remote devices can’t communicate with each other, but the hub device can communicate with any of the remote devices. So all file transfers must first go to the hub device, which then replicates them to the remote devices one by one.
- Follow-the-sun: In this topology, replication occurs from one device to another sequentially — i.e., Device 1 shares files with Device 2; then Device 2 shares files with Device 3; and so forth.
Both topologies are slow, as replication can only occur from one device to another. And hub and spoke replication adds an unnecessary step where data must first get sent to the hub server. These speed issues only compound as your replication environment grows and replication needs to occur across more devices.
These topologies are also unreliable, as they create single points of failure. In other words, if replication is interrupted between any devices (e.g., Device 1 to Device 2), then every other device must wait to receive files and full synchronization is delayed. And in the hub and spoke topology, if the hub device goes down, replication fails across your entire site.
Resilio’s P2P (peer-to-peer) replication architecture is fast, organically scalable, and highly reliable. With P2P replication, every device in your environment can communicate with each other and can share files simultaneously.
Resilio uses a process known as file chunking to break files down into multiple pieces that can transfer independently of each other. All endpoints in your system can work together to share file chunks simultaneously — enabling Resilio to synchronize your entire environment 3–10x faster than traditional solutions.
For example, imagine you need to sync a file across five servers. Resilio can break that file down into five chunks and begin sharing it across devices. Device 1 can share a file chunk with Device 2. As soon as it receives the first chunk, Device 2 can immediately begin sharing it with another device. Soon, every device in your system will be sharing file chunks concurrently.
This enables Resilio Connect to provide replication that is:
Blazing Fast
Because of its P2P architecture, Resilio Connect can perform horizontal scale-out replication. We’ve seen replication speeds of 100+ Gbps per cluster. And our engineers were able to transfer a 1 TB dataset between Azure regions in 90 seconds.
Organically Scalable
A P2P environment scales organically. Since every endpoint can share files with every other endpoint, then:
- Adding more endpoints only increases replication speed and available resources (i.e., bandwidth, compute power, CPU, etc.).
- You can non-disruptively sync and protect hundreds of millions of files across your environment and DR sites. This effectively turns every endpoint in your environment into a backup site.
Resilio can also replicate files of any size and number. We’ve successfully synced 450+ million files in a single job.
In contrast, Zerto experiences issues in large-scale environments. Some users have reported that Zerto doesn’t scale well. And Zerto’s journal-based recovery consumes lots of storage space, which increases the costs of your disaster recovery plan.
Highly Reliable
P2P replication is also one of the major keys to Resilio’s bulletproof reliability.
Unlike point-to-point solutions, P2P environments have no single points of failure. If any endpoint in your system goes down, the necessary files or services can be obtained from any other device.
Resilio’s P2P architecture enables it to achieve Active-Active High Availability for Disaster Replication — it minimizes data loss via backups and active/passive or active/active failover strategies.
And in the event that a device or network goes down, Resilio Connect can dynamically route around the outage and ensure your data is still replicated to its target endpoint.
Replication Direction
Zerto is primarily a one-way replication solution — it replicates and protects data from a source to a backup site, and performs failover and failbacks. Zerto can also perform one-to-many replication, but it’s limited to one source replicating to (at most) three targets.
Resilio Connect can replicate files in any direction, such as one-way, two-way, one-to-many, many-to-one, and N-way sync.
N-way sync is particularly effective for many disaster recovery scenarios, such as:
- Remote work: Employees in different locations can collaborate on the same files and have all of their file changes synchronized across your entire environment, including your backup site. In the event of a disaster, every endpoint can work together to quickly restore any affected systems/endpoints.
- Websites and applications: Match (which owns dating websites/apps like Hinge and Tinder) uses Resilio Connect to sync data across their servers. Any change made by a user located anywhere in the world can be immediately distributed to every other server globally. In the event of a disaster, any server can be used as a backup site, and every server can work together to restore the affected servers.
Case Study: VoiceBase
VoiceBase’s software provides speech-to-text transcriptions for audio and video. They use Resilio Connect for software distribution, enabling them to distribute large speech model files (50+ GB) to over 400 production servers every 2–4 weeks.
“Resilio Connect enables us to reliably distribute our code, specifically new language models in a fraction of the time. These copy jobs now take an hour, down from eight. Best of all, once Resilio Connect was installed, it just works: We never need to manually intervene in any way.”
WAN Acceleration
Disaster recovery in the cloud or for globally distributed physical sites requires transfer over WAN networks. Depending on your locations, backup may even need to happen over poor-quality networks in areas with little or no connectivity. In these scenarios, a good DR solution must have the ability to manage network traffic and work around network quality issues.
Zerto employs some traditional WAN optimization techniques, such as using compression algorithms to reduce the bandwidth required between sites. But if you want more WAN acceleration capabilities, you’ll need to invest in a 3rd-party WAN optimization software.
Resilio Connect uses a proprietary WAN acceleration protocol known as Zero Gravity Transport™ (ZGT).
ZGT enhances WAN transfers via:
- A congestion control algorithm that constantly probes the Round Trip Time to calculate the ideal data packet send rate. It uses this information to maintain a uniform packet send rate over time.
- Sending acknowledgements for each packet receipt in groups.
- Retransmitting lost packets in groups once per RTT, rather than after each packet is delivered.
ZGT optimizes how data is transferred and fully utilizes any network connection. Because of this, you can use Resilio Connect with any type of connection, such as VSATs, cell (3G, 4G, 5G), Wi-Fi, and any IP connection.
Resilio can also be deployed at the edge of networks, in remote areas with little to no network coverage (such as in the ocean or in undeveloped countries with poor internet availability).
Organizations in remote locations can use Resilio Connect to sync data from the edge to the cloud, then sync across their entire replication environment (including to a backup site).
No matter your situation, Resilio Connect will fully utilize any connection and ensure your data is reliably delivered to its destination.
Case Study: MixHits Radio
MixHits Radio, a music streaming service for businesses (such as McDonald’s and Dunkin Donuts) uses Resilio Connect to sync their music and metadata in real-time over WANs for all their web servers across the US.
“We have gone from spending 15 hours on average per week troubleshooting conflicts in the prior solution to spending no time at all with Resilio. We configure jobs once in the Resilio Connect Management Console and never have to look at it again.”
Disaster Recovery Configurations
Resilio Connect and Zerto are both software-only, agent-based solutions for disaster recovery. But understanding how each solution configures disaster recovery is crucial to determining how it will fit into your business continuity strategy.
Zerto DR Configuration
Zerto is installed only on VMs. For the public cloud, you must install the Zerto Cloud Appliance (ZCA). For on-premises environments, you must use the Zerto Virtual Replication (ZVR).
Both appliances consist of several key components:
- Zerto Virtual Manager (ZVM): The ZVM runs on a dedicated Windows VM and manages everything that’s required for replication between the protected sites and the recovery sites, except the actual replication of data.
- Virtual Replication Appliance (VRA): VRAs are installed on each hypervisor host on the protection and recovery sites. They manage data replication. The VRAs on the recovery sites maintain protected VM disks.
- Virtual Backup Appliance (VBA): The VBA is a Windows service that manages File-Level Recovery within Zerto.
- Zerto User Interface: This is the Zerto management interface, which is accessed via a browser, in VMware vSphere Web Client, or in a Client console.
- Zerto Elastic Journal: All data replicated to the recovery sites is copied into a journal, which saves time-stamped copies of data. In the event of a disaster or attack, you can restore data to any point in time saved in the journal.
Functionally, Zerto operates by replicating data site-to-site. The data on your devices is replicated to a designated backup site. Time-stamped copies of data are preserved in journals on each backup machine. And in the event of a disaster, you can recover data to any saved point in time — which is a capability that Resilio Connect doesn’t offer.
You can also create Virtual Replication Groups (VPGs). VPGs are groups of up to 3 machines that backup data at the same time and recover from the same backup site — enabling you to sync certain machines that need to function cohesively.
Resilio Connect DR Configuration
While Zerto offers a few capabilities that Resilio Connect doesn’t, Resilio provides more flexibility and reliability for disaster recovery strategies.
Resilio works by simply deploying Resilio agents on all of your endpoints. And you can configure how Resilio Connect replicates and syncs files for various DR scenarios, such as:
Scenario 1: Consolidating Data to Backup Center
You have multiple endpoints and want to sync data to one or more backup sites. You can program Resilio to sync all endpoints to all backup sites. Or you can configure Resilio to sync specific endpoints to specific backup sites.
In the event of a disaster, you can program endpoints to failover to specific backup sites.
Scenario 2: Sync Data Across Your Entire Environment
You can sync every endpoint in your environment via Resilio’s N-way sync capabilities. This effectively turns every endpoint in your environment into a backup site.
You can configure Active-Active failover scenarios. And in the event of a disaster, every endpoint can work together to restore your application and achieve sub-5 second RPOs and RTOs within minutes of an outage.
Versatility
Resilio Connect is a much more versatile solution than Zerto and can be used for multiple use cases simultaneously, including:
- Disaster recovery: Any enterprise business can use Resilio to back up data to a data center and/or the cloud. Resilio can replicate data to any cloud, to and across multiple clouds, and across cloud regions.
- Web and app server sync: Businesses running websites and apps can use Resilio Connect to keep their servers synchronized. App updates can be quickly distributed across all globally distributed servers. Changes made by users on one server can be immediately distributed to every other server. And thousands of servers can be synchronized at all times concurrently.
- Software distribution: Software companies can use Resilio Connect to distribute updates in record time (VoiceBase reduced software distribution time by 88% with Resilio). And Resilio’s Active-Active High Availability enables them to minimize downtime and meet SLAs.
- Remote work: Hybrid/remote work organizations can use Resilio Connect to synchronize data across all globally distributed offices, remote sites, and backup sites. Resilio can enhance collaboration by enabling employees to work on the same files and have those files synced to every other endpoint immediately — so everyone always has the most up-to-date versions of files.
Resilio Connect can also be used as an efficient object storage gateway to provide low-latency access to files stored in any S3-compatible cloud storage. Resilio’s storage gateway enables organizations to enhance productivity and reduce costs via:
- Selective sync: You can automate syncs and control which files get synchronized to which endpoints. Employees don’t need to manually sync files and can focus on their work instead.
- Selective caching: Resilio allows you to choose which files you want to cache locally, so you can store frequently accessed files locally (providing employees with faster access to files and reducing data transfer fees) and store infrequently accessed files in long-term cloud storage (freeing up space on your on-prem devices).
- Partial downloads: Employees can perform full downloads or only download the portions of files they need. This gives them quicker access to files while also reducing data transfer costs.
- Edge deployment: Resilio’s WAN optimization technology enables organizations that work in remote locations at the edge of networks to sync and backup data for disaster recovery.
Case Study: Delirio Films
Delirio Films is a documentary production company whose projects require fast collaboration among multiple team members in different locations across the world. They use Resilio Connect — along with their media production tools — to predictably sync files across production sites and reduce IT management time.
“Remote work would be cumbersome and cost prohibitive without Resilio. By integrating Resilio Connect into our workflow, we’re able to meet demanding production schedules using top talent. Resilio gives us the flexibility to use our choice of tools, storage, and other investments we either already own or will need in the future.”
Learn more about how Resilio Connect syncs files and enables fast collaboration for Delirio Films.
Security
As a solution designed specifically for disaster recovery and ransomware resilience, one would think that Zerto would provide the most advanced security features available. But there are actually some gaping holes in Zerto’s security features.
Some of the good security features Zerto offers include:
- Multi-factor authentication and encryption keys to authenticate endpoints and ensure data is only delivered to your approved locations.
- Communication over secure channels and encryption of communications (via HTTPS) between the ZVM and its peer ZVMs, and between the ZVM and its local VRAs.
- Real-time encryption detection that enables you to detect and respond to ransomware and cyberattacks quickly.
However, Zerto doesn’t encrypt communications across networks, and requires users to invest in VPNs or IPsec if they want network encryption. It also, by default, doesn’t encrypt data across WAN networks.
Resilio Connect, however, provides built-in, state-of-the-art security features that were reviewed by 3rd party security experts. These include:
- Mutual authentication: Every endpoint is required to provide an authentication key before it receives any files, ensuring data only gets delivered to approved endpoints.
- End-to-end encryption: Resilio encrypts data at rest and in transit via AES 256-bit encryption.
- Cryptographic data integrity validation: Resilio ensures data remains intact and uncorrupted via cryptographic validation.
- Forward secrecy: Resilio protects sessions using one-time session encryption keys.
- Permission controls: Resilio enables you to control who has access to specific files and folders.
- Ransomware protection: Resilio stores immutable copies of files in the cloud to protect against ransomware.
Enhance Your DR Strategy with Resilio Connect
Resilio Connect enables you to enhance your DR strategy by providing:
- Fast, reliable replication: Resilio’s P2P replication architecture enables it to perform horizontal scale-out replication and achieve speeds of 100+ Gbps per cluster. It also eliminates single points of failure and allows Resilio to dynamically route around outages.
- Fast recovery: Resilio Connect can achieve sub-5-second RPOs and RTOs within minutes of an outage.
- Organic scalability: Every endpoint you add into a P2P environment only increases replication speed and resources. Resilio can support environments of any size, replicate large files, and replicate large numbers of files.
- Flexibility: Resilio works on any device, any OS, and in any cloud storage provider. You can install Resilio on your current IT infrastructure and be replicating in as little as 2 hours.
- Versatility: Resilio can be used for disaster recovery and server synchronization for remote work, software update distribution, web/app server sync, edge deployments, and more.
- Security: Resilio employs native security features that protect your data at rest and in transit.
Organizations in tech (Match, Microsoft), media (Turner Sports, CBS), retail (Mercedes-Benz, McDonald’s), logistics (Deutsche Aircraft, Airtec Global), gaming (2K Games, Bungie), and more use Resilio Connect to quickly and reliably sync their replication environments and protect data as part of a disaster recovery strategy. To learn more about how Resilio Connect can enhance your disaster recovery strategy, schedule a demo.