Artifact storage - AWS S3 vs AWS CodeArtifact
Revision | Date | Description |
|---|---|---|
| 27.08.2024 | Init document |
Introduction
Purpose
The purpose of this document is to compare two AWS services in terms of artifact storage.
Background
We've come to the point where we need a decision to be made regarding the future of artifact storage. Therefore, comparison of two chosen AWS services - AWS S3 and AWS CodeArtifact - must be made.
We agreed upon the following requirements:
Versioning is a MUST
Minimal support for: MVN, NPM, Python, and ZIP
No need to scan artifacts/packages - everything will be used for image building, images can be scanned by AWS Inspector
Every system (application) dependencies need to be separated from each other:
Developers of system A have access to libraries/dependencies/artifacts used in system A
Support for fine-grained access using ABAC
AWS S3
Amazon S3 is a highly scalable, durable, and secure object storage service offered by Amazon Web Services (AWS). It provides developers and businesses with a simple and efficient way to store and retrieve any amount of data from anywhere on the web. S3 is designed to be highly available, reliable, and cost-effective. Therefore, it is widely used for a variety of applications, including backup and restore, content storage and distribution, data archiving, data lakes, and serving static assets for websites and applications.
Key Features and Capabilities of AWS S3
Object Storage: S3 stores data as objects, which consist of the data itself, a key (unique identifier), and optional metadata. Objects can range in size from 0 bytes to 5 terabytes.
Scalability and Durability: S3 is designed for massive scalability and durability. It automatically scales to handle any amount of data and replicates data across multiple availability zones within a region to ensure high durability.
Data Availability: S3 provides strong data consistency across all regions, enabling immediate access to data once it is uploaded. It offers high availability and is built to sustain both planned and unplanned events.
Security and Access Control: S3 offers various security features, including encryption at rest and in transit, access control policies, bucket policies, and integration with AWS Identity and Access Management (IAM) for fine-grained access control.
Data Lifecycle Management: S3 allows us to define lifecycle policies to automatically transition and manage data between storage classes based on predefined rules. This helps optimize costs and performance based on the data's lifecycle.
Versioning: S3 supports versioning, allowing us to keep multiple versions of an object. This feature helps with data recovery, maintaining historical records, and protecting against accidental overwrites or deletions.
Event Notifications: S3 can trigger events and send notifications (e.g., via Amazon Simple Notification Service) when specific operations occur, such as object creation, deletion, or replication, enabling real-time data processing and integration with other services.
Logging and Analytics: S3 provides logging capabilities to capture detailed access logs for auditing and analysis. It also integrates with services like Amazon CloudWatch and AWS Athena for monitoring and analytics of S3 data.
Integration with AWS Services: S3 integrates seamlessly with other AWS services such as AWS Lambda, AWS Glue, AWS Redshift, and more, allowing us to build powerful data processing and analytics pipelines.
Cost-Effective Storage Options: S3 offers different storage classes, including Standard, Intelligent-Tiering, Standard-IA (Infrequent Access), One Zone-IA, Glacier, and Glacier Deep Archive. Each class has different durability, availability, and cost characteristics to match specific use cases.
Pros and cons of using AWS S3 as code artifact storage
Pros
Scalability: S3 provides virtually unlimited storage capacity, allowing us to store numerous code artifacts without worrying about running out of space.
Durability: S3 offers high durability, ensuring that our code artifacts are stored safely and protected against data loss. S3 automatically replicates our data across multiple availability zones within a region.
Accessibility: S3 provides easy accessibility to our code artifacts. We can retrieve artifacts using AWS SDKs, CLI, or API, making it convenient to integrate with our build and deployment processes. Therefore, we must remember that accessing it trough code-native package managers might require use of plugins or workarounds.
Versioning: S3 supports versioning, which allows us to keep track of different versions of our code artifacts. This can be helpful when we need to roll back to a previous version or compare changes over time.
Security: S3 offers robust security features such as access control policies, bucket policies, fine-grained access control for both ABAC and RBAC, and encryption options. We can control access to our code artifacts and encrypt them to ensure data privacy and compliance.
Cons
Latency: Retrieving code artifacts from S3 may introduce some latency, especially if we have large files or complex folder structures. This can impact the speed of our build and deployment processes.
Complexity: Setting up and managing S3 as code artifact storage requires some configuration and understanding of AWS services. It may involve configuring access control, permissions, and integrating S3 with our build and deployment tools.
Network Dependency: Storing and retrieving code artifacts from S3 relies on network connectivity. If there are network issues or limitations, it may affect our ability to access the artifacts when needed.
AWS CodeArtifact
AWS CodeArtifact is a fully managed artifact repository service provided by AWS. It offers a secure and scalable solution for managing and storing software packages and dependencies. CodeArtifact helps organizations simplify package management, improve developer productivity, and ensure reliable and efficient software artifact distribution.
Key Features and Capabilities of AWS CodeArtifact
Package Management: CodeArtifact supports popular package formats, including Maven, npm, Python (PyPI), and NuGet. It provides versioning, dependency management, and caching capabilities, allowing us to easily resolve, publish, and manage packages.
Secure and Private Repository: CodeArtifact enables us to create private repositories to securely store and manage our packages. It integrates with AWS Identity and Access Management (IAM), providing granular access control and allowing us to define fine-grained permissions for users and teams.
Artifact Tracing and Auditing: CodeArtifact provides traceability features, allowing us to track the usage of artifacts across deployments and applications. This helps in troubleshooting, auditing, and ensuring accountability for artifact consumption.
Cross-Region Replication: CodeArtifact supports cross-region replication, allowing us to replicate us repositories and packages across multiple AWS regions. This provides improved availability, disaster recovery, and reduced latency for artifact retrieval.
Enhanced Caching: CodeArtifact includes a built-in caching mechanism that can improve build and deployment speeds. It caches frequently accessed packages, reducing the need to retrieve packages from external sources and improving overall performance.
Compliance and Security: CodeArtifact is built with security in mind and meets industry standards and compliance requirements. It supports encryption at rest and transit, and integrates with AWS CloudTrail for logging and monitoring.
Pros and cons of using AWS CodeArtifact as code artifact storage
Pros
Package Management: AWS CodeArtifact provides a fully managed artifact repository that supports popular package formats such as Maven, npm, Python(PyPI), and NuGet. It simplifies the management of dependencies and allows us to resolve and download packages efficiently.
Integration with AWS Ecosystem: CodeArtifact seamlessly integrates with other AWS services such as AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy. This integration enables us to easily incorporate artifact management into our CI/CD workflows.
Fine-grained Access Control: CodeArtifact allows us to define granular access policies to control who can access and publish packages. We can enforce permissions based on AWS Identity and Access Management (IAM) roles and policies (RBAC) or attributes (ABAC), ensuring secure and controlled access to our artifacts.
Artifact Versioning: CodeArtifact supports versioning, allowing us to manage different versions of packages. This helps in tracking changes, rolling back to previous versions, and ensuring reproducibility in our software builds.
Enhanced Caching: CodeArtifact includes a built-in caching mechanism that can improve build and deployment speeds by caching frequently accessed packages. It reduces the need to retrieve packages from external sources, resulting in faster artifact resolution.
Artifact Tracing: CodeArtifact provides a traceability feature that allows us to track the usage of artifacts. We can see which applications and deployments are consuming specific packages, aiding in troubleshooting and auditing.
Cons
AWS Ecosystem Dependency: CodeArtifact is tightly integrated with AWS services. If we have a multi-cloud or hybrid environment, it may not be the ideal solution as it primarily caters to AWS-centric workflows.
Limited Package Formats: Although CodeArtifact supports popular package formats, it may not cover all package types. If we are going to use less common or specialized package formats in the future, we might need to explore alternative solutions.
Limited Storage Options: Storage options are limited to fixed (per region) price per gigabyte per month of stored data.
Comparison
Feature | AWS S3 | AWS CodeArtifact } |
|---|---|---|
Package Management | No native support; General purpose object storage | Full support; Created especially for Package Management |
Access Control and Permissions | Offers access control policies and bucket policies to control access to artifacts, but configuration and maintenance requires much more knowledge and resources. | Gradual access control trough IAM. Supports resource ARN up to package-name level. |
Versioning and Artifact Tracing | Supported; Does not provide artifact tracing | Supported; Provides artifact tracing |
Cost | $0.0245/GB worldwide | $0.05/GB/mo in Ireland region (eu-west-1) \ $0.55/GB/mo - rest of European regions && $0.065 per 10,000 requests per month to outside repositories |
Package Management:
AWS S3: S3 is a general-purpose object storage service and does not provide native package management capabilities. We would need to implement our own package management solution or rely on third-party tools.
AWS CodeArtifact: CodeArtifact is a fully managed artifact repository that supports popular package formats such as Maven, npm, Python (PyPI), and NuGet. It offers built-in package management features, making it easier to resolve and download packages.
Access Control and Permissions:
AWS S3: S3 offers access control policies and bucket policies to control access to artifacts. However, it requires more manual configuration to manage fine-grained permissions and access for specific artifacts.
AWS CodeArtifact: CodeArtifact provides granular access control based on IAM roles and policies. It offers more streamlined and fine-grained permissions management for artifact access and publishing.
Versioning and Artifact Tracing:
AWS S3: S3 supports versioning, allowing us to store and manage different versions of artifacts. However, it does not provide native artifact tracing capabilities.
AWS CodeArtifact: CodeArtifact supports artifact versioning and also provides artifact tracing features. We can track which applications and deployments are consuming specific packages, aiding in troubleshooting and auditing.
Cost:
AWS S3: up to $0.0245/GB
AWS CodeArtifact: $0.055/GB in Europe (Frankfurt)
Conclusion
AWS S3 is a cheap, general-purpose object storage. It will suit our needs, but it was not made for artifact storage. Therefore, choosing AWS S3 will require a little bit of tinkering from us to make package managers capable of downloading packages directly from S3 (i.e. for Maven it requires specific plugin). Additionally, configuring fine-grained access control is much more complex than is AWS CodeArtifact.
On the opposite, AWS CodeArtifact was created with artifact storage in mind. It supports out-of-the-box all the most common package management system (Maven, NPM, Python). If something isn’t supported, we can upload dependency as zip. Additionally, it has support for fine-grained access control trough IAM. We can specify resources trough ARN up to package-name level. However, we must keep in ming that AWS CodeArtifact is almost twice as expensive as AWS S3.