Properly managing your AWS (Amazon Web Services) file cache can help you avoid issues and achieve benefits such as:
- Faster access and workflows: Businesses that work with large amounts of data (like large media files or big data workloads) can get fast access to necessary files and complete their tasks efficiently.
- Reduced costs: Caching data locally helps you avoid costly AWS egress fees.
- Updated files: Distributed teams that collaborate on files can ensure that all locally stored files are up to date and avoid redundancies.
The complexity and scale of data storage are growing faster than IT budgets and resources, especially for enterprise teams supporting multiple offices and hybrid workforces.
Using the RefreshCache API can be a cumbersome process that involves managing cron jobs, scripts, AWS credentials, or AWS Lambda functions. Plus, determining which directories need to be refreshed is difficult. Luckily, AWS file gateway recently released a new cache refresh feature that automates the process and makes cache management much easier.
In this article, we’ll describe how the new file gateway cache refresh feature works and how to set up an automated cache refresh process in the file gateway.
We’ll also discuss some of the challenges that persist with AWS File Gateway cache management (and File Gateways as a whole) — such as the fact that it only allows you to cache your most recently accessed files and can only be used to access files stored in the AWS cloud.
In the second part of this article, we’ll go over our own file synchronization and object storage gateway solution — Resilio Connect — and how it makes caching, synchronizing, and accessing files easier and more efficient.
Want to learn more about how Resilio provides fast, efficient access to all of your on-premises and cloud files? Schedule a demo with our team.
Resilio Connect is a high-performance file replication and synchronization system. One popular use case for Resilio is a file gateway for object storage. Resilio Connect can mount any type of storage — file, object, or block storage device. As a file gateway for object storage, Resilio presents objects as files — and enables rapid replication and efficient caching across multiple storage buckets.
In scenarios where you’re replicating object storage across regions or multiple sites, Resilio enables high-performance low latency real-time and on-demand replication. Resilio enables you to keep hundreds of millions of objects and files current across sites on-demand and in real-time, across any distance. Its policy-driven approach allows granular control over cache management and data movement.
Unlike AWS File Gateway, Resilio is designed to be:
- Flexible: Resilio works by mounting agents directly on storage hardware or virtual machines. You can install it directly on your existing IT infrastructure and use it to access all files stored on-prem or in any cloud from one location.
- Efficient: Resilio has features that enable you to optimize cloud costs, such as selective sync and caching (in addition to caching recently accessed files, you can cache any files you want). And since you can use Resilio to access all files from one solution, you don’t need to invest in multiple storage gateways.
- Reliability: Resilio uses a peer-to-peer replication architecture, which eliminates single points of failure and ensures files are always synchronized as quickly as possible. And its proprietary WAN acceleration technology enables it to fully utilize any network (including VSATs, cell, Wi-Fi, and any IP connection) to optimize data transfer speeds.
- Automated: In Resilio, any UI action — including caching, hydration, and synchronization — can be automated using the API. You can also integrate with 3rd-party systems and trigger hydration or 3rd-party actions based on the cache’s state.
Organizations use Resilio Connect to ingest, sync, and access data for media workflows (Turner Sports, Innovative), gaming (Wargaming, Larian Studios), remote operations (Mercedes-Benz, Buckeye Power Sales), and more. If you want to learn how Resilio Connect can help efficiently cache, sync, and access data on-prem and in the cloud, schedule a demo with our team.
How to Manage AWS File Gateway Cache
AWS File Gateway (one of the components of AWS Storage Gateway) caches your most recently accessed files on local devices. Before the new cache refresh feature, you’d have to manage this process manually utilizing an API to configure cron-jobs, scripts, AWS credentials, or AWS Lambda functions.
Still, determining which directories needed to be refreshed was difficult. Many users would simply refresh the entire file share, creating unnecessary work for the Amazon S3 (Amazon Simple Storage Service) API and negatively impacting the performance of the file gateway.
File Gateway’s new cache refresh feature allows you to automatically refresh the metadata cache to stay current with changes in your S3 buckets.
How File Gateway’s Cache Refresh Feature Works
The new cache refresh feature makes managing your cache easier and provides you with more flexibility.
With this new feature, you can:
- Create multiple file shares for a single Amazon S3 bucket
- Sync the local cache with an S3 bucket based on access frequency
- Limit the number of S3 buckets necessary to manage the file shares on your File Gateway
- Define multiple prefixes for a single S3 bucket
- Map a single prefix to a single gateway file share
The File Gateway cache refresh feature works based on the “duration since last access.” You set a time limit (anywhere from 5 minutes to 30 days) for when a cache is to be refreshed.
For example, let’s assume you set the time limit to 2 hours. If a user accesses a directory after that 2-hour time limit has expired, the gateway will automatically refresh the cache. But if a user accesses the directory before the 2-hour time limit completes, the gateway will treat the cache as current and not perform a refresh.
This process ensures that your frequently accessed directories are automatically refreshed and synced with S3 buckets, while infrequently accessed directories are refreshed only when needed. This frees you up from cumbersome cache management processes and reduces unnecessary work on the gateway.
How to Set Up an Automated Refresh for File Gateway Cache
To set up your automated cache refresh process:
1. Visit your AWS Storage Console. Go to the “File Shares” tab.
2. Choose a file share. Then click the “Actions” drop-down menu, and click “Edit share settings”.
3. A window will pop up that allows you to add values for your automated cache refresh.
Resilio Connect’s Fast, Efficient File Gateway Alternative
While the new S3 File Gateway cache feature provides more flexibility and automation for managing your AWS file cache, it still suffers from some issues that make it suboptimal for certain use cases.
For example, it only allows you to cache recently accessed files. And it doesn’t provide any additional features to increase efficiency and reduce data egress costs.
In addition, the AWS File Gateway can only be used to access files stored in AWS. Organizations that store data in multi and hybrid cloud scenarios will need to buy and use multiple gateways to access their data.
Our file synchronization and gateway software system, Resilio Connect, provides a superior alternative to AWS File Gateway for organizations that need to extend their on-premises applications to the cloud.
It offers efficient file caching and access, enabling you to transform your on-premises servers and NAS devices into flexible storage gateways. Plus, it’s focused on very large file systems and data movement at the best possible speeds with all transfers being UDP-based and WAN-optimized.
Resilio Connect includes features that make it a superior solution for caching and syncing large datasets between any S3-compatible cloud object storage and your on-premises devices or data centers, such as:
- Efficiency and cost-reduction
- Versatility and granular control
- Reliable sync and access
- Bulletproof security
Efficient, Cost-Effective Caching and Access
Resilio Connect is designed to enhance the cost-effectiveness of storing data in the cloud. Unlike other gateways, our solution is built to reduce costs associated with data egress fees and multi-cloud storage.
AWS File Gateway only provides access to data stored in the AWS cloud. However, many organizations choose to store data in multiple clouds and on-premises storage solutions in order to increase redundancy and maintain high availability. To access their data, the businesses would need to use multiple different cloud gateways for each cloud platform.
But Resilio Connect is cloud vendor agnostic. You can use it with just about any cloud storage platform, such as AWS, GCP, Azure, Wasabi, MinIO, and more. You can deploy Resilio on your existing infrastructure at low cost, and manage and access all of your cloud and on-prem data from one location.
Resilio’s built-in security features mean that you won’t have to invest in 3rd-party security solutions or VPNs, as is the case with other gateway solutions.
And Resilio includes features that help you enhance productivity and reduce costs associated with data egress, such as:
- Selective caching: You can choose which files are stored on local devices, providing employees with faster access to the data they need and reducing data egress fees associated with downloading files.
- Selective sync: You can choose which specific files and folders sync to which endpoints. This ensures that files only sync to the destinations where they’re needed so you can reduce data transfer fees.
- Partial downloads: You can fully or partially download files and folders, so you can get quicker access to the files you need and reduce data egress costs.
- Automation: You can automatically sync, cache, download, and purge any file based on the policies you set.
Versatility and Granular Control Over Your Environment
File Gateway works by installing AWS agents on virtual machines on your hardware. And it only provides access to files stored in the AWS cloud.
But Resilio Connect is a much more versatile solution. It provides low-latency access to files via SMB (Server Message Block) and NFS (Network File System) protocols.
You can deploy Resilio agents directly onto your file servers, NAS/DAS/SAN systems, desktops, laptops, mobile devices (Resilio offers iOS and Android apps), and virtual machines (such as VMware, Citrix, hypervisors, and more).
Resilio supports just about any:
- Operating system: Microsoft Windows, macOS, Linux, Unix, FreeBSD, OpenBSD, Ubuntu, and more.
- Cloud storage platform: AWS, Google Cloud Platform, Azure, Wasabi, MinIO, Backblaze, and more.
Resilio is easy to deploy and enables you to sync and access files across your entire on-prem and cloud environment from a single location, saving you from having to invest in and manage multiple gateways.
Plus, Resilio’s Management Console provides you with granular control over replication in your environment. You can:
- Create bandwidth profiles for each endpoint that govern how much bandwidth is allocated to each endpoint at certain times of the day and on certain days of the week.
- Manage every server in your environment, and manage up to 50,000 agents per console.
- Get insight into the status of individual replication jobs, with real-time performance metrics that can get sent to email or Webhooks
- Review a history of all executed jobs
- Establish job priorities
- Automate replication jobs and script any type of functionality using Resilio’s REST API.
All end-users are provided with the same view of files from a unified interface that operates much like Microsoft OneDrive.
Case Study: Deutsche Aircraft
Commercial aircraft manufacturer Deutsche Aircraft switched from DFSR to Resilio Connect to sync their Microsoft DFS namespace, manage data access, secure mission-critical data, and increase efficiency.
“We have a 10Gbps network but prefer to use under 1Gbps for data transfer and replication. With Resilio, we’re able to keep that down to 250Mbps during the day and at night move back up to 1GBps… Resilio Connect is much easier to manage than DFSR. Using the Resilio Connect management console, you can see everything you need to know. Everything is visible.”
Reliable File Synchronization and Access
Keeping data synchronized and up to date is one of the top concerns for businesses and IT teams.
AWS File Gateway synchronizes data using an unreliable point-to-point replication architecture that can be configured in one of two models:
- Hub and spoke: This model consists of a hub server and several remote servers. The remote servers can’t share files directly with each other. Instead, they must first transfer files to the hub server, which then syncs those files with each remote server one by one.
- Follow the sun: In this model, replication occurs from one server to another sequentially.
Both of these replication architectures are as slow as their weakest endpoint — i.e., if replication fails or is delayed on one endpoint, it can delay synchronization for every other endpoint in your environment.
They also introduce single points of failure. If any endpoint goes down, it can delay sync for your entire environment. And if the hub server goes down in a hub and spoke architecture, then replication fails entirely.
Resilio Connect uses a blazing fast and highly reliable P2P (peer-to-peer) replication architecture that eliminates the challenges experienced by point-to-point systems.
In a P2P architecture, every server can communicate with every other server and take part in replication simultaneously, resulting in:
Faster Sync Speeds
Resilio uses a process known as file chunking to break files down into multiple chunks that can transfer independently of each other. Every server can share file chunks simultaneously.
For example, imagine you want to sync a file across five servers. Resilio can use file chunking to break that file down into five chunks. Server 1 can share the first file chunk with Server 2. While it’s waiting to receive the other four chunks, Server 2 can immediately share the first chunk with one of the other servers. Soon every server will be sharing chunks concurrently, allowing you to sync your system 3–10x faster than with AWS file gateway or other point-to-point solutions.
As your environment grows, point-to-point solutions take longer to sync, as replication occurs from just one server to another.
But Resilio’s P2P architecture allows it to scale organically. More endpoints create more supply (i.e., bandwidth, CPU, etc.) and sync speeds increase as your environment grows. Resilio performs horizontal scale-out replication, allowing it to reach speeds of 100+ Gbps per server.
In fact, our engineers successfully transferred a 1 TB dataset across Azure regions in 90 seconds.
Case Study: VoiceBase
VoiceBase — a software that provides speech-to-text transcriptions for video and audio — uses Resilio Connect to distribute speech model updates (50+ GB files) across 400+ production servers each month.
“Resilio Connect enables us to reliably distribute our code, specifically new language models in a fraction of the time. These copy jobs now take an hour, down from eight. Best of all, once Resilio Connect was installed, it just works: We never need to manually intervene in any way.”
Learn more Resilio Connect helped VoiceBase reduce software distribution time by 88%.
Greater Reliability
Since every server in a P2P environment can communicate with each other, there are no single points of failure.
If any server or network goes down, Resilio can dynamically route around the outage and find the optimal path to deliver files to their destination. And if a transfer is interrupted, Resilio can perform a checksum restart to resume the transfer where it left off and will retry all transfers until they’re complete.
This type of reliability and resilience makes Resilio an excellent solution for hot-site disaster recovery. P2P replication enables it to provide Active-Active High Availability. And it can achieve sub-five-second RPOs and RTOs within minutes of an outage.
Proprietary WAN Acceleration Technology
Another key source of Resilio’s speed and reliability is its proprietary WAN acceleration protocol known as Zero Gravity Transport™ (ZGT).
Syncing data across cloud regions and geographically distributed teams often involves utilizing WAN networks, which suffer from high latency and varying degrees of packet loss.
And many organizations often operate on low-quality, unreliable networks — such as consumer grade networks in remote work scenarios and edge networks in edge deployments.
ZGT eliminates latency and allows you to fully utilize any network by optimizing traffic over the network. It accomplishes this using:
- Congestion control algorithm: ZGT uses a congestion control algorithm that constantly probes the RTT (Round Trip Time) to calculate and maintain the ideal data packet send rate. This enables it to maintain a uniform packet distribution over time.
- Interval acknowledgements: Rather than acknowledging each packet receipt, ZGT sends acknowledgements for groups of packets.
- Delayed retransmission: Rather than immediately retransmitting lost packets, ZGT retransmits lost packets in groups once per RTT to reduce unnecessary retransmissions.
ZGT enables you to use Resilio with any type of network connectivity, such as VSAT, Wi-Fi, cell (3G, 4G, 5G), and any IP connection.
It also allows you to ingest and sync data from the edge and areas with little connectivity. Our client Shifo uses Resilio to sync healthcare data across remote communities with underdeveloped networks, such as Uganda.
Bulletproof Native Security Features
Many file gateway solutions don’t include native security features to protect your data. This forces you to use 3rd party security tools and VPNs.
But Resilio Connect includes built-in security features that were reviewed by 3rd party security experts. Your data is protected by:
- End-to-end data encryption: Resilio encrypts data at rest and in transit using AES-256-bit encryption.
- Integrity validation: Resilio uses cryptographic data integrity validation to ensure files arrive at their destination intact and uncorrupted.
- Permission controls: In Resilio’s Management Console, you can control who is allowed to access specific files and folders.
- Mutual authentication: Before initiating a transfer with any endpoint, the endpoint is required to provide an authentication key. This ensures your data is only delivered to approved destinations.
- Forward secrecy: Each session is protected with a one-time session encryption key.
Use Resilio’s Flexible, Efficient File Gateway
While the new AWS File Gateway feature makes managing cached volumes easier, Resilio Connect’s file gateway is a superior alternative for syncing and accessing data stored on-premises and in the cloud due to its:
- Efficiency: Resilio eliminates the need to invest in expensive hardware, multiple gateways, and security solutions. You can manage all of your data storage from one location. And it includes features (such as selective sync, selective cache, and partial downloads) that enable you to increase productivity and reduce cloud storage costs.
- Versatility and granular control: You can deploy Resilio agents on any on-premises storage device, cloud storage platform, and operating system. And you can use Resilio’s Management Console to obtain granular control over how files are replicated and accessed in your environment.
- Reliability: Resilio uses a P2P replication architecture that eliminates single points of failure and provides fast, resilient synchronization. It also uses a proprietary WAN acceleration protocol that allows it to optimize transfers over any network regardless of quality, latency, or packet loss.
- Native security: Resilio includes built-in security features that protect your data at rest and in transit and eliminate the need to invest in 3rd party security tools.
Organizations use Resilio Connect to ingest, sync, and access data for media workflows (Turner Sports, Innovative), gaming (Wargaming, Larian Studios), remote operations (Mercedes-Benz, Buckeye Power Sales), and more. If you want to learn how Resilio Connect can help efficiently sync, access, and cache data (based on automated policies) on-prem and in the cloud, schedule a demo with our team.
Frequently Asked Questions
What is Amazon Storage Gateway?
Amazon Storage Gateway is an AWS service that connects on-premises environments to AWS cloud storage via four types of gateways:
- Amazon S3 File Gateway: A file interface that enables you to access files (via SMB and NFS protocols) stored as objects in Amazon S3, at your data center, or in Amazon EC2.
- Amazon Fsx File Gateway: A file interface that provides access to file shares stored in the cloud via SMB protocols.
- Volume Gateway: Uses iSCSI protocol to provide access to block storage volumes, and enables you to backup data in Amazon EBS snapshots.
- AWS Tape Gateway: Tape gateway stores your virtual tapes in Amazon S3 via an iSCSI-based virtual tape library (VTL) of virtual tape drives.