Don’t Use S3 for Backups!

Below is a summary of issues with Amazon S3 and S3 compatible storage in general. While S3 serves a purpose in making files available online, its repurposing as backup storage is not so effective.

The Biggest Problem First: It’s Just Storage

When you are dealing with backups for your Windows Servers, for example, you are concerned about being able to back up and restore the server effectively. You want a guarantee that if anything goes wrong, there is someone who can help. You would like to have someone available to look over your backups and backup settings to ensure all is well.

When you are dealing with S3 storage providers, such as Amazon S3, they are just providing you with a storage platform. You don’t receive any technical support. And even if you did, it’s expensive and generalized, not focused on the server backup solution you are implementing for your client or business.

Purchasing a complete server cloud backup solution is different because whatever happens, you have technical support available to make sure everything is set up properly, running correctly, and restoring as it should. No matter what storage you use, when it’s part of the overall solution, you get the help you need when you need it. Whenever you have separate providers, things get complicated and you will not automatically get the competent technical support assistance that you may need in that moment.

Costs

While S3 offers scalable and pay-as-you-go pricing, costs can add up, especially for large-scale storage or data transfer. Users need to carefully manage their storage and access patterns to optimize costs.

Data Consistency

S3 provides strong read-after-write consistency for all objects, but this consistency might not be immediate in certain situations. In some cases, during a bucket update, some objects might exhibit eventual consistency rather than immediate consistency.

Limited Performance for Small Objects

S3 might not be the best option for applications that require very low-latency access to small objects, as the overhead of establishing connections and performing authentication can be relatively high for small requests.

Limited Indexing and Search Capabilities

S3 is designed for simple storage and retrieval of objects but lacks advanced search and indexing features. Users who need to perform complex queries on their data may need to integrate additional services or tools.

Versioning Overhead

While versioning is a feature in S3, it can lead to increased storage costs and complexity. Storing multiple versions of the same object can accumulate over time and result in higher expenses.

Limited Lifecycle Management

Although S3 provides lifecycle policies for managing objects, it may not be as feature-rich as some dedicated data management solutions. More advanced data lifecycle requirements may require additional tools or services.

Security Configuration Complexity

While S3 offers robust security features, the configuration options can be complex. Users must carefully manage access controls, bucket policies, and other security settings to ensure the proper protection of their data; otherwise, their backups may be visible to entire world without their knowledge.

No Native File System Semantics

S3 is an object storage service and does not provide native file system semantics. This can be a disadvantage for applications that require file-level operations and traditional file system features, such as backup services.

Data Transfer Costs

Data transfer costs can be a significant factor, especially when moving large volumes of data in and out of S3, particularly across different AWS regions. This includes both data transfer within AWS and data transfer to and from the internet. Data transfer costs quickly add up when you perform a backup verification or restore operation on your backups.

Global Consistency

S3 is designed to be a globally distributed service, which is advantageous for high availability. However, achieving global consistency in a distributed system can be challenging, and this might impact applications that require strong consistency across regions.

No Native Search Functionality

S3 does not provide built-in search capabilities. This can be a problem when you need to search for specific files in order to restore them.

Limited Transaction Support

S3 is not a traditional database, and it doesn’t provide transactional support like a relational database. If your application requires ACID transactions, S3 may not be the best choice for certain types of data storage.

Access Control Complexity

While S3 offers an access control mechanism, managing fine-grained access control for a large number of users and resources can become complex. IAM policies, bucket policies, and access control lists (ACLs) need careful configuration and are not compatible with NTFS in Windows, for example.

Latency

Although Amazon S3 is designed for high availability, low-latency access might not be suitable for certain real-time applications. If your application requires extremely low-latency access to data, you might need to consider alternative solutions.

Limited Support for File Locking

S3 lacks built-in support for file locking mechanisms, which might be essential for applications that require exclusive access to files to prevent conflicts in concurrent updates. This can be an issue when the same bucket is used by multiple server backups.

Eventual Consistency for Overwritten Objects

While S3 provides strong consistency for new object creation, overwriting an existing object might result in eventual consistency. This could potentially lead to scenarios where different clients see different versions of the same object for a brief period.

Learning Curve

Understanding the full range of features, security configurations, and best practices for using S3 effectively may require some time and effort. This learning curve can be a consideration for teams new to AWS or cloud storage in general.

Limited Support for Retroactive Changes

Once an object is stored in S3, retroactively changing the storage class or encryption settings can be challenging. This could lead to complexities if you need to update these settings for a large volume of existing objects. This also causes fees to be added to the account each time an object has to be changed.

Performance Impact of Lifecycle Policies

While S3 provides lifecycle policies for automatic data management (e.g., moving objects to Glacier for archiving), these policies can have performance implications during execution, especially for large-scale operations.

API Request Costs

S3 charges for API requests, and the costs accumulate, especially with high-frequency or small-sized requests. Understanding your application’s access patterns and optimizing API usage is crucial to controlling costs, but costs cannot really be controlled at 100%.

Limited Native Encryption Management

While S3 supports server-side encryption, managing and rotating encryption keys may require additional solutions or integration with AWS Key Management Service (KMS), adding complexity to key lifecycle management.

Bucket Naming Constraints

S3 bucket names are globally unique, and once a name is chosen, it cannot be changed. This constraint can be a challenge in scenarios where naming conventions need to be revised or when dealing with a large number of teams managing their own buckets.

Object Size Limitations

While S3 supports very large object sizes, there is a limitation on the maximum size of a single PUT operation (5 GB for standard PUT and 5 TB for multipart uploads). This can be a consideration for applications dealing with extremely large files, such as backup solutions.

Third-Party Integration Challenges

Some third-party applications or tools may not seamlessly integrate with S3, and workarounds or custom development may be required to facilitate interoperability. For example, syncing your backup folder to another site is not so straightforward when dealing with S3.

Data Transfer Speed and Bandwidth

While S3 provides high availability, the speed at which data can be transferred in and out of S3 is subject to network bandwidth limitations. Large-scale data transfers may require careful planning to optimize performance.

Data Egress Costs

While uploading data to S3 is often free or relatively inexpensive, retrieving and transferring data out of S3 to the internet or other AWS regions can incur additional costs. This is an important consideration for applications with frequent data access patterns. Also speed throttling is commonly used to slow down egress at certain times.

Dependency on Internet Connectivity

Accessing S3 relies on internet connectivity. If your application is hosted in an environment with limited or unreliable internet connectivity, this could impact the reliability and availability of your backups.

Cross-Region Replication Complexity

While S3 supports cross-region replication for data redundancy and disaster recovery, managing and configuring cross-region replication can be complex. Additionally, data transfer costs between regions can add up.

S3 Transfer Acceleration Costs

While S3 Transfer Acceleration can speed up uploads to S3 by using Amazon CloudFront’s globally distributed edge locations, it comes with additional costs. Users should weigh the benefits against the associated expenses.

Limited Fine-Grained Access Control for Versioning

While S3 supports versioning, managing access controls for specific versions of an object can be challenging. Fine-grained access control for individual versions may require additional IAM roles or bucket policies.

Metadata Size Limitations

The metadata associated with each S3 object has size limitations. If your application requires extensive metadata for each object, you might need to consider alternative storage solutions or external databases for managing metadata.

Versioning Storage Costs

Enabling versioning increases storage costs, as each version of an object is stored separately. This can lead to higher-than-expected storage costs if versioning is enabled for a large number of objects with frequent updates.

Object Deletion Overhead

Deleting a large number of objects from an S3 bucket can take time, and the process might be subject to certain rate limitations. This can be a consideration for applications with high object turnover.

Limited Access Logging Capabilities

While S3 provides access logs that capture information about requests made to a bucket, the granularity of these logs might not be sufficient for some advanced auditing and monitoring requirements.

Versioning for All Objects in a Bucket

Once versioning is enabled for a bucket, it applies to all objects within that bucket. This lack of granularity might be a consideration for applications that require versioning for only specific subsets of data.

Data Retrieval Costs for Glacier and Glacier Deep Archive

If you use S3 Glacier or Glacier Deep Archive for long-term archival, retrieving data from these storage classes can incur additional costs and may have substantial retrieval time delays when you need to restore your backups.

Rate Limiting on S3 API Calls

S3 imposes rate limits on certain API operations, and exceeding these limits can result in temporary throttling. Understanding these limits is crucial for applications with high request rates, which are common during backup processes.

Limited Logging for Data Changes

While S3 provides access logs for tracking requests made to a bucket, it may not capture fine-grained information about changes to the content of an object. Detailed tracking of changes might require additional custom logging solutions.

Multipart Upload Overhead

While multipart uploads are useful for large object transfers, managing the multipart upload process incurs additional overhead. Abandoned or incomplete multipart uploads can result in storage costs and should be periodically reviewed and cleaned up.

Data Transfer Acceleration Trade-offs

S3 Transfer Acceleration can improve upload speeds, but it comes with additional costs. Users should evaluate whether the performance gains justify the added expenses for their specific use cases.

Bucket Versioning Implications

Enabling versioning for a bucket affects not only storage costs but also the way data is managed. It can impact how objects are deleted, and developers need to be aware of the implications for their specific application logic.

S3 Select and Compression

When using S3 Select on compressed objects, the compression type and format should be compatible with S3 Select. Some compression formats may not be supported, and selecting data from certain types of compressed objects may result in additional processing costs.

Data Transfer Acceleration Endpoint

Using S3 Transfer Acceleration requires accessing a specific endpoint (e.g., .s3-accelerate.amazonaws.com). This endpoint might not be geographically optimized for all users, and some users might experience slower performance.

Bucket Versioning and S3 Transfer Acceleration Interaction

Enabling both versioning and Transfer Acceleration on a bucket might lead to increased costs and could impact the performance and behavior of your applications. It’s important to understand the interactions between these features.

Performance Variability

S3 performance can vary based on factors such as the geographical location of the bucket, the size of the objects, and the access patterns. Understanding these variations is important for applications with specific performance requirements.

Object Lock Limitations

While S3 provides Object Lock for data retention and protection against object deletion, it has certain limitations. Object Lock can be applied at the bucket or object level, and changing or removing the lock can be subject to restrictions.

Understanding S3 Transfer Costs

Apart from data transfer costs, users should be aware of the costs associated with S3 Transfer Acceleration and Cross-Region Replication, as they can contribute to the overall expense.

The Alternative: The BackupChain Cloud Backup Solution

Instead of purchasing storage from a S3 provider, consider using the BackupChain Cloud Backup service instead. You receive perpetual licensed backup software bundled with the cloud storage you need. Apart from the technical side, you also receive competent technical support that will help you set up, monitor, and restore you backups and be available whenever you need assistance. Have a look at BackupChain today and try it yourself:

BackupChain Overview

BackupChain Main Site
Download BackupChain
DriveMaker

Resources