Skip to content

Commit

Permalink
Add S3File input type
Browse files Browse the repository at this point in the history
  • Loading branch information
di committed Jul 31, 2015
1 parent 99b1476 commit 5502cc1
Show file tree
Hide file tree
Showing 5 changed files with 135 additions and 55 deletions.
106 changes: 64 additions & 42 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,15 @@ file.
Features
--------

- **Write validation schemas in plain-old Python**
**Write validation schemas in plain-old Python**
No UI, no XML, no JSON, just code.

No UI, no XML, no JSON, just code.
**Write your own validators**
Vladiate comes with a few by default, but there's no reason you can't write
your own.

- **Write your own validators**

Vladiate comes with a few by default, but there's no reason you can't
write your own.

- **Validate multiple files at once**

Either with the same schema, or different ones.
**Validate multiple files at once**
Either with the same schema, or different ones.

Documentation
-------------
Expand Down Expand Up @@ -188,63 +185,88 @@ Built-in Validators

Vladiate comes with a few common validators built-in:

- *class* ``Validator``
*class* ``Validator``

Generic validator. Should be subclassed by any custom validators. Not to
be used directly.

*class* ``CastValidator``

Generic "can-be-cast-to-x" validator. Should be subclassed by any
cast-test validator. Not to be used directly.

*class* ``IntValidator``

Validates whether a field can be cast to an ``int`` type or not.

:``empty_ok=False``:
Specify whether a field which is an empty string should be ignored.

Generic validator. Should be subclassed by any custom validators. Not to
be used directly.
*class* ``FloatValidator``

- *class* ``CastValidator``
Validates whether a field can be cast to an ``float`` type or not.

Generic "can-be-cast-to-x" validator. Should be subclassed by any
cast-test validator. Not to be used directly.
:``empty_ok=False``:
Specify whether a field which is an empty string should be ignored.

- *class* ``IntValidator``
*class* ``SetValidator``

Validates whether a field can be cast to an ``int`` type or not.
Validates whether a field is in the specified set of possible fields.

- ``empty_ok=False``
:``valid_set=[]``:
List of valid possible fields
:``empty_ok=False``:
Implicity adds the empty string to the specified set.

Specify whether a field which is an empty string should be ignored.
*class* ``UniqueValidator``

- *class* ``FloatValidator``
Ensures that a given field is not repeated in any other column. Can
optionally determine "uniqueness" with other fields in the row as well via
``unique_with``.

Validates whether a field can be cast to an ``float`` type or not.
:``unique_with=[]``:
List of field names to make the primary field unique with.

- ``empty_ok=False``
*class* ``EmptyValidator``

Specify whether a field which is an empty string should be ignored.
Ensure that a field is always empty. Essentially the same as an empty
``SetValidator``. This is used by default when a field has no
validators.

- *class* ``SetValidator``
*class* ``Ignore``

Validates whether a field is in the specified set of possible fields.
Always passes validation. Used to explicity ignore a given column.

- ``valid_set=[]``
Built-in Input Types
^^^^^^^^^^^^^^^^^^^^

List of valid possible fields
Vladiate comes with the following input types:

- ``empty_ok=False``
*class* ``VladInput``

Implicity adds the empty string to the specified set.
Generic input. Should be subclassed by any custom inputs. Not to be used
directly.

- *class* ``UniqueValidator``
*class* ``LocalFile``

Ensures that a given field is not repeated in any other column. Can
optionally determine "uniqueness" with other fields in the row as well
via ``unique_with``.
Read from a file local to the filesystem.

- ``unique_with=[]``
:``filename``:
Path to a local CSV file.

List of field names to make the primary field unique with.
*class* ``S3File``

- *class* ``EmptyValidator``
Read from a file in S3. Uses the `boto <https://github.com/boto/boto>`_
library. Optionally can specify either a full path, or a bucket/key pair.

Ensure that a field is always empty. Essentially the same as an empty
``SetValidator``. This is used by default when a field has no
validators.
:``path=None``:
A full S3 filepath (e.g., ``s3://foo.bar/path/to/file.csv``)

- *class* ``Ignore``
:``bucket=None``:
S3 bucket. Must be specified with a ``key``.

Always passes validation. Used to explicity ignore a given column.
:``key=None``:
S3 key. Must be specified with a ``bucket``.

Testing
~~~~~~~
Expand Down
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,12 @@ def run_tests(self):
errno = pytest.main(self.pytest_args)
sys.exit(errno)


def readme():
with open('README.rst') as f:
return f.read()


setup(
name='vladiate',
version=version,
Expand Down Expand Up @@ -60,7 +62,7 @@ def readme():
packages=find_packages(exclude=['examples', 'tests']),
include_package_data=True,
zip_safe=False,
install_requires=[],
install_requires=['boto'],
tests_require=['pytest'],
cmdclass={'test': PyTest},
entry_points={
Expand Down
34 changes: 34 additions & 0 deletions vladiate/inputs.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
import io
import boto
from urlparse import urlparse


class VladInput(object):
''' A generic input class '''

Expand All @@ -22,3 +27,32 @@ def open(self):

def __repr__(self):
return "{}('{}')".format(self.__class__.__name__, self.filename)


class S3File(VladInput):
''' Read from a file in S3 '''

def __init__(self, path=None, bucket=None, key=None):
if path and not any((bucket, key)):
self.path = path
parse_result = urlparse(path)
self.bucket = parse_result.netloc
self.key = parse_result.path
elif all((bucket, key)):
self.bucket = bucket
self.key = key
self.path = "s3://{}{}"
else:
raise ValueError(
"Either 'path' argument or 'bucket' and 'key' argument must be set.")

def open(self):
s3 = boto.connect_s3()
bucket = s3.get_bucket(self.bucket)
key = bucket.new_key(self.key)
contents = key.get_contents_as_string()
ret = io.BytesIO(bytes(contents))
return ret

def __repr__(self):
return "{}('{}')".format(self.__class__.__name__, self.path)
23 changes: 23 additions & 0 deletions vladiate/test/test_inputs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import pytest

from ..inputs import *


@pytest.mark.parametrize('kwargs', [
({'path':'s3://some.bucket/some/s3/key.csv'}),
({'bucket':'some.bucket', 'key':'/some/s3/key.csv'}),
])
def test_float_validator_works(kwargs):
S3File(**kwargs)


@pytest.mark.parametrize('kwargs', [
({}),
({'path':'s3://some.bucket/some/s3/key.csv', 'bucket':'some.bucket'}),
({'path':'s3://some.bucket/some/s3/key.csv', 'key':'/some/s3/key.csv'}),
({'bucket':'some.bucket'}),
({'key':'/some/s3/key.csv'}),
])
def test_float_validator_fails(kwargs):
with pytest.raises(ValueError):
S3File(**kwargs)
23 changes: 11 additions & 12 deletions vladiate/vlad.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,18 +49,17 @@ def validate(self):
for field, value in self.validators.iteritems() if not value
})

with self.source.open() as csvfile:
reader = csv.DictReader(csvfile)
self.missing_fields = set(reader.fieldnames) - set(self.validators)
if not self.missing_fields:
for line, row in enumerate(reader):
for field_name, field in row.iteritems():
for validator in self.validators[field_name]:
try:
validator.validate(field, row=row)
except ValidationException, e:
self.failures[field_name][line].append(e)
validator.fail_count += 1
reader = csv.DictReader(self.source.open())
self.missing_fields = set(reader.fieldnames) - set(self.validators)
if not self.missing_fields:
for line, row in enumerate(reader):
for field_name, field in row.iteritems():
for validator in self.validators[field_name]:
try:
validator.validate(field, row=row)
except ValidationException, e:
self.failures[field_name][line].append(e)
validator.fail_count += 1

if self.missing_fields:
self.logger.info("\033[1;33m" + "Missing..." + "\033[0m")
Expand Down

0 comments on commit 5502cc1

Please sign in to comment.