S3 provides devs and IT teams with secure, highly durable, highly scalable. It is easy to use, with a simple web service interface to store and retrieve any amount of data from anywhere on the web object storage.

  • Object-based storage, object are files (pdf, png, doc…) (flat files).
  • Data is spread across multiple devices and Facilities. It’s designed to withstand failure.
  • Allows you to upload files from 0 bytes to 5 terabytes
  • Unlimited storage
  • Files are stored in buckets(folder)
  • Universal namespace, names must be uniquely global.
  • https://s3-eu-west-1.amazonaws.com/fahadjameel
    • s3 -> service
    • eu-west-1 -> region
    • fahadjameel -> bucket name
  • Successful upload will give HTTP 200 response

Data consistency model

When you PUT in S3 (upload) you will have immediate consistency, you’ll be able to read it right away. When you upload or delete object you don’t get the immediate consistency, because S3 is designed across multiple devices and locations. So when you update and try to read immediately after, you might get new or old data. Updates are atomic, either you get new or old version, no partial or corrupted data.

Simple key-value store

S3 is object-based and consists of:

  • Key – name of objects (alphabetical order)
  • Value – data, made up of sequence of bites.
  • Version Id – versioning
  • Metadata – data about data
  • Sub-Resources exist underneath object, there are two parts:
    • Access Control List – Who has access to this object, file or bucket
    • Torrent – Supports BitTorrent protocol
  • Built for 99.99% availability for S3 platform, and amazon grantees it
  • Amazon grantees 99.999999999% (11 x 9) durability for S3 information
  • Have tired storage availability.
  • Life-cycle management – ex. after 30 days you can automatically move files to different tier.
  • Secure data using Access Control List and Bucket Policies.

Storage Tiers/Classes

  • S3:
    • 99.99% availability.
    • Redundantly across multiple devices in multiple facilities is how its stored.
    • Designed to sustain the loss of 2 facilities concurrently.
  • S3 – IA (Infrequently Access):
    • Data that is less accessed, availability 99.99%
    • Data requires rapid access when needed.
    • Lower fees then S3, but you are charged a retrieval fee.
  • Reduced Redundancy Storage:
    • For data that is easily reproducible.
    • Durability drops. Only 99.99% and 99.99% availability of objects over a given year.
  • Glacier:
    • Very cheap
    • Used for archival.
    • Takes 3-5 hours to restore data.
    • Extremely low cost storage service for data archival. Storage as low as $0.01 per GB. Optimized for data that is infrequently accessed. Retrieval time 3-5 hours.

Charges

S3 storage size, number of requests.

  • Storage management – (you can add tags to uploaded files and allows you to track costs against S3 charge per tag).
  • Data Transfer – Incoming is free, moving data around within S3 is not.
  • Transfer Acceleration – Enables fast service transfer of files over long distances between users and S3. Takes advantage of Amazon Cloud Front’s edge locations. As data arrives at edge location, data is routed to S3 over an optimized network path.

Encryption

  • Client Side – Encrypt everything locally and upload
  • Server Side –
    • Amazon S3 managed keys, SSE – S3 (Server Side Encryption)
    • KMS – SSE – KMS
    • Customer Provided Keys – SSE – C
  • Control access to buckets using either Bucket ACL or bucket policies.
  • By default all buckets are private and all objects stored inside are private.

Versioning

  • Once turned on you cannot disable it, you can only suspend it.
  • If a file is updated, the old and new one will both be kept. This can cause space to be used up rapidly if you have many files that are updated.
  • MFA delete – Prevents people from accidentally deleting. Adds another layer of security.
  • All objects are stored, even deleted ones.
  • Great for backing up items.
  • Easily integrates with life cycle rules.