`e2fyi.utils.aws.s3_stream`¶

Provides S3Stream which represents the data stream to and from S3 buckets.

Module Contents¶

Classes¶

S3Stream

S3Stream represents the data stream of a S3 resource, and provides static

e2fyi.utils.aws.s3_stream.StringOrBytes¶

class e2fyi.utils.aws.s3_stream.S3Stream(stream: Union[IO, BinaryIO, io.BytesIO, io.StringIO], content_type: str = 'application/octet-stream')¶

S3Stream represents the data stream of a S3 resource, and provides static methods to convert any python objects into a stream. This is generally used with S3Resource to upload or download resource from S3 buckets.

Examples:

import io

import pandas as pd

from e2fyi.utils.aws import S3Stream
from pydantic import BaseModel

# create a s3 stream
stream = S3Stream(io.StringIO("random text"), "text/plain")
print(stream.read())        # prints "random text"
print(stream.content_type)  # prints "text/plain"

# string
stream = S3Stream.from_any("hello world")
print(stream.read())        # prints "hello world"
print(stream.content_type)  # prints "text/plain"

# dict
stream = S3Stream.from_any({"foo": "bar"})
print(stream.read())        # prints "{"foo": "bar"}"
print(stream.content_type)  # prints "application/json"

# pandas dataframe as csv
df = pd.DataFrame([{"key": "a", "value": 1}, {"key": "b", "value": 2}])
stream = S3Stream.from_any(df, index=False)  # do not include index column
print(stream.read())        # prints string as csv format for the dataframe
print(stream.content_type)  # prints "application/csv"

# pandas dataframe as json
stream = S3Stream.from_any(df, orient="records")  # orient dataframe as records
print(stream.read())        # prints string as json list for the dataframe
print(stream.content_type)  # prints "application/json"


# pydantic model
class Person(BaseModel):
    name: str
    age: int
stream = S3Stream.from_any(Person(name="william", age=21))
print(stream.read())        # prints "{"name": "william", "age": 21}"
print(stream.content_type)  # prints "application/json"


# any other python objects
class Pet:
    name: str
    age: int
stream = S3Stream.from_any(Pet(name="kopi", age=1))
print(stream.read())        # prints some binary bytes
print(stream.content_type)  # prints "application/octet-stream"

Creates a new S3Stream.

Args:: stream (Union[IO, BinaryIO, io.BytesIO, io.StringIO]): any stream object. content_type (str, optional): mime type for the data in the stream.

Defaults to “application/octet-stream”.

read(self, size=-1) → StringOrBytes¶: duck-typing for a readable stream.

seek(self, offset: int, whence: int = 0) → int¶

duck-typing for readable stream. See https://docs.python.org/3/library/io.html

Change the stream position to the given byte offset. offset is interpreted relative to the position indicated by whence. The default value for whence is SEEK_SET. Values for whence are:

SEEK_SET or 0 – start of the stream (the default); offset should be zero
or positive

SEEK_CUR or 1 – current stream position; offset may be negative

SEEK_END or 2 – end of the stream; offset is usually negative

Return the new absolute position.

seekable(self) → bool¶: Whether if a stream is seekable

tell(self) → int¶: Return the current stream position.

close(self) → 'S3Stream'¶: Close the resource stream.

get_value(self) → StringOrBytes¶: Retrieve the entire contents of the S3Resource.

classmethod from_any(cls, obj: Any, content_type: str = '', output_as='csv', **kwargs) → 'S3Stream'¶

Returns a S3Stream from any python object.

Args:

obj (Any): any python object. content_type (str, optional): mime type. Defaults to “”. output_as (str, optional): format to output as if obj is a pandas object.

Defaults to “csv”.

**kwargs: additional keyword arguments to pass to pandas.to_csv,: pandas.to_json, json.dumps or joblib.dumps methods depending on the input object.

Returns:

S3Stream: S3Stream object.

static from_file(filepath: str, content_type: str = '') → 'S3Stream'¶

Returns a S3Stream from a file. If content_type is not provided, python-magic will be used to infer the mime type from the file data.

Args:: filepath (str): path to the file. content_type (str, optional): mime type of the file. Defaults to “”.
Returns:: [type]: [description]

classmethod from_object(cls, obj: Union[str, bytes, dict, BaseModel, object], content_type: str = 'application/octet-stream', **kwargs) → 'S3Stream'¶

Returns a S3Stream from any string, bytes, dict, pydantic models, or any python object.

Dicts and pydantic models will be converted into a JSON string stream with json.dumps and the content type “application/json”.

Anything that is not a string, bytes, dict, or pydantic model will be converted into a pickle binary stream with joblib.

Any extra keyword arguments will be passed to json.dumps or joblib.

See: https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html#joblib.dump

Args:

obj (Union[str, bytes, dict, pydantic.BaseModel, object]): [description] content_type (str, optional): [description]. Defaults to

“application/octet-stream”.

**kwargs: Additional keyword arguments to pass to joblib.dump or: json.dumps.

Returns:

S3Stream: S3Stream object.

static from_pandas(df: Union[pd.DataFrame, pd.Series], output_as: str = 'csv', **kwargs: dict) → 'S3Stream'¶

Returns a S3Stream object from a pandas dataframe or series. When output as a “csv”, content type will be “application/csv”, otherwise it will be “application/json”.

Example:

import pandas

from e2fyi.utils.aws.s3_stream import S3Stream

# create some pandas dataframe
df = pd.DataFrame([...])

# create a csv stream, and don't output an index column.
csv_stream = S3Stream.from_pandas(df, index=False)

# create a json stream - output as records
json_stream = S3Stream.from_pandas(df, orient="records")

Args:: df (Union[pd.DataFrame, pd.Series]): pandas dataframe or series. output_as (str, optional): either “csv” or “json”. Defaults to “csv”. **kwargs: additional keyword arguments to pass to either pandas.to_csv

or pandas.to_json methods.
Returns:: S3Stream: S3Stream object.

static from_io(stream: Union[IO, BinaryIO, io.StringIO, io.BytesIO], content_type: str = 'application/octet-stream') → 'S3Stream'¶

Returns a S3Stream object from an io stream.

Args:

stream (Union[IO, BinaryIO, io.StringIO, io.BytesIO]): any: stream object.
content_type (str, optional): mime type of the stream. Defaults to: “application/octet-stream”.

Returns:

S3Stream: S3Stream object.

e2fyi.utils.aws.s3_stream¶

Module Contents¶

Classes¶

`e2fyi.utils.aws.s3_stream`¶