e2fyi.utils.aws.s3

utils to interact with s3 buckets.

Module Contents

Classes

S3Bucket

S3Bucket is an abstraction of the actual S3 bucket with methods to interact

e2fyi.utils.aws.s3.T
e2fyi.utils.aws.s3.StringOrBytes
e2fyi.utils.aws.s3.ALLOWED_DOWNLOAD_ARGS = ['VersionId', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'RequestPayer']
e2fyi.utils.aws.s3.ALLOWED_UPLOAD_ARGS = ['ACL', 'CacheControl', 'ContentDisposition', 'ContentEncoding', 'ContentLanguage', 'ContentType', 'Expires', 'GrantFullControl', 'GrantRead', 'GrantReadACP', 'GrantWriteACP', 'Metadata', 'RequestPayer', 'ServerSideEncryption', 'StorageClass', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'SSEKMSKeyId', 'WebsiteRedirectLocation']
class e2fyi.utils.aws.s3.S3Bucket(name: str, get_prefix: Callable[[str], str] = _noop, s3client: boto3.client = None)

S3Bucket is an abstraction of the actual S3 bucket with methods to interact with the actual S3 bucket (e.g. list objects inside the bucket), and some utility methods.

Prefix rules can also be set during the creation of the S3Bucket object - i.e. enforce a particular prefix rules for a particular bucket.

Example:

from e2fyi.utils.aws import S3Bucket

# prints key for all resources with prefix "some_folder/"
for resource in S3Bucket("some_bucket").list("some_folder/"):
    print(resource.key)

# prints key for the first 2,000 resources with prefix "some_folder/"
for resource in S3Bucket("some_bucket").list("some_folder/", max_objects=2000):
    print(resource.key)

# creates a s3 bucket with prefix rule
prj_bucket = S3Bucket(
    "some_bucket", get_prefix=lambda prefix: "prj-a/%s" % prefix
)
for resource in prj_bucket.list("some_folder/"):
    print(resource.key)  # prints "prj-a/some_folder/<resource_name>"

# get obj key in the bucket
print(prj_bucket.create_resource_key("foo.json"))  # prints "prj-a/foo.json"

# get obj uri in the bucket
# prints "s3a://some_bucket/prj-a/foo.json"
print(prj_bucket.create_resource_uri("foo.json", "s3a://"))

# create S3Resource in bucket to read in
foo = prj_bucket.create_resource("foo.json", "application/json")
# read "s3a://some_bucket/prj-a/foo.json" and load as a dict (or list)
foo_dict = foo.load()

# create S3Resource in bucket and save to "s3a://some_bucket/prj-a/foo.json"
prj_bucket.create_resource("foo.json", obj={"foo": "bar"}).save()

Creates a new instance of s3 bucket.

Args:

name (str): name of the bucket get_prefix (Callable[[str], str], optional): function that takes a filename

and return the full path to the resource in the bucket.

s3client (boto3.Session.client, optional): use a custom boto3

s3 client.

create_resource_key(self, filename: str) → str

Creates a resource key based on the s3 bucket name, and configured prefix.

Example:

from e2fyi.utils.aws import S3Bucket

s3 = S3Bucket(name="foo", get_prefix=lambda x: "bar/%s" % x)

print(s3.create_resource_key("hello.world"))  # > bar/hello.world
Args:

filename (str): path for the file.

Returns:

str: key for the resource in s3.

create_resource_uri(self, filename: str, protocol: str = 's3a://') → str

Create a resource uri based on the s3 bucket name, and configured prefix.

Example:

from e2fyi.utils.aws import S3Bucket

s3 = S3Bucket(name="foo", get_prefix=lambda x: "bar/%s" % x)

print(s3.create_resource_uri("hello.world"))  # > s3a://foo/bar/hello.world
Args:

filename (str): path for the file. protocol (str, optional): protocol for the uri. Defaults to “s3a://”.

Returns:

str: uri string for the resource.

list(self, prefix: str = '', within_project: bool = True, max_objects: int = -1) → Generator[S3Resource[bytes], None, None]

Returns a generator that yield S3Resource objects inside the S3Bucket that matches the provided prefix.

Example:

# prints key for all resources with prefix "some_folder/"
for resource in S3Bucket("some_bucket").list("some_folder/"):
    print(resource.key)

# prints key for the first 2,000 resources with prefix "some_folder/"
for resource in S3Bucket(
    "some_bucket").list("some_folder/", max_objects=2000
):
    print(resource.key)

# creates a s3 bucket with prefix rule
prj_bucket = S3Bucket(
    "some_bucket", get_prefix=lambda prefix: "prj-a/%s" % prefix
)
for resource in prj_bucket.list("some_folder/"):
    print(resource.key)  # prints "prj-a/some_folder/<resource_name>"
Args:

prefix (str, optional): [description]. Defaults to “”. within_project (bool, optional): [description]. Defaults to True. max_objects (int, optional): max number of object to return. Negative

or zero means all objects will be returned. Defaults to -1.

Returns:

Generator[S3Resource[bytes], None, None]: [description]

Yields:

Generator[S3Resource[bytes], None, None]: [description]

create_resource(self, filename: str, content_type: str = '', obj: Any = None, protocol: str = 's3a://', metadata: Dict[str, str] = None, pandas_kwargs: dict = None, **kwargs) → S3Resource

Creates a new instance of S3Resource binds to the current bucket.

Example:

# create S3Resource in bucket to read in
foo = prj_bucket.create_resource("foo.json", "application/json")
# read "s3a://some_bucket/prj-a/foo.json" and load as a dict (or list)
foo_dict = foo.load()

# create S3Resource in bucket and save to "s3a://some_bucket/prj-a/foo.json"
prj_bucket.create_resource("foo.json", obj={"foo": "bar"}).save()
Args:

filename (str): name of the resource. content_type (str, optional): mime type. Defaults to

“application/octet-stream”.

obj (Any, optional): python object to convert into a resource. Defaults

to None.

protocol (str, optional): protocol. Defaults to “s3a://”. stream (Union[io.StringIO, io.BytesIO, IO[StringOrBytes]], optional):

content of the resource. Defaults to None.

metadata (dict, optional): metadata for the object. Defaults to None. pandas_kwargs: Any additional args to pass to pandas. **kwargs: Any additional args to pass to S3Resource.

Returns:

S3Resource: a S3Resource related to the active S3Bucket.

upload(self, filepath: str, body: Any, content_type: str = 'text/plain', protocol: str = 's3a://', metadata: dict = None, **kwargs) → Any

Deprecated since v0.2.0. Use S3Resource.save instead.