Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: partial @streaming blob support for Python #747

Open
wants to merge 719 commits into
base: main-1.x
Choose a base branch
from

Conversation

robin-aws
Copy link
Contributor

@robin-aws robin-aws commented Dec 12, 2024

Adds partial support for the @streaming Smithy trait on blobs, depending on a actions-and-streaming-stdlibs branch of Dafny that adds an Actions standard library, and only implemented when targeting Python so far. Event streaming is not technically supported yet because more codegen changes are necessary, but this will be a relatively small delta, as the Dafny types involved are already handled.

The motivation behind merging this incomplete work is largely to get the test model coverage into CI, so that it doesn't regress as the feature is completed, especially when Dafny verification is perturbed in future versions. The next priority will be to complete the Dafny branch and merge it to Dafny mainline.

Note that smithy-dafny will still emit a DANGER validation event saying that @streaming is not supported, which is intentional in order to guard against using the feature in production before it is fully completed: the test models exercising it explicitly suppress this event.

  • Added a dafny submodule under TestModels/dafny-dependencies/dafny tracking the actions-and-streaming-stdlibs branch, just to access the contents of Source/DafnyStandardLibraries/src/Std/Actions etc. This copy of Dafny is never built or used for any other building tasks.
  • Added a build task under StandardLibrary to build a .doo file from a subset of the Dafny standard library files above (using DafnyStandardLibraries-smithy-dafny-subset.toml). This is then used as a --library argument when verifying/translating polymorph libraries. This approach allows the standard library source to be built with different but compatible options such as --reads-clauses-on-methods.
    • This is guarded by a USE_DAFNY_STANDARD_LIBRARIES Makefile variable, since a newer Dafny is required for this to work.
  • Bumped the high watermark Dafny version tested in CI to 4.9.1, which is necessary since the standard library above uses newer features like @-attributes.
  • Modified Dafny and Python codegen to emit references to the ByteStream and RewindableByteStream types, and removed the previous workaround to first .read() all data in Python.
    • Because streams have mutable state, the Dafny code generation needs to take them into account when generating things like modifies clauses. I haven't yet implemented this, instead using a patch file to add the missing specification to the Streaming test model code.
  • Modified the aws-sdk/s3 test model and fleshed the Streaming test model out to actually test the feature.
  • Removes not-grep, since our globs are actually completely broken and the tool doesn't support our current code layout, and it was complaining about the contents of the dafny submodule. See License header scanning is broken #763.

Reviewer note: it is worth reading the contents of the pending Actions and Streams Dafny libraries, but they are still under construction and I'm not claiming they will not change before being merged. Interface feedback is welcome but is not intended to be a hard requirement in scope for this review.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

lucasmcdonald3 and others added 30 commits May 20, 2024 10:26
…-aws/smithy-dafny into robin-aws/generate-everything-mode

# Conflicts:
#	codegen/smithy-dafny-codegen/src/main/resources/templates/src/Index.dfy
@robin-aws robin-aws changed the title Robin aws/streaming test models python feat: @streaming support for Python (WIP) Jan 16, 2025
@robin-aws robin-aws changed the title feat: @streaming support for Python (WIP) feat: @streaming blob support for Python (WIP) Jan 21, 2025
@robin-aws robin-aws changed the title feat: @streaming blob support for Python (WIP) feat: partial @streaming blob support for Python Jan 21, 2025
@robin-aws robin-aws marked this pull request as ready for review January 21, 2025 18:22
@robin-aws robin-aws requested a review from a team as a code owner January 21, 2025 18:22
Copy link
Contributor

@lucasmcdonald3 lucasmcdonald3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly just reviewed Python.
Nothing blocking, just some suggestions/questions.

This project does not build. The `software.amazon.polymorph.smithydafny` project does not support code generation for the "streaming" trait.

Once the Polymorph code generator supports code generation for this trait, these files should be extended to complete the type implementation.
This project only currently works on Python: the Dafny code generation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also note that someone will need to copy-paste these from the TestModels' StandardLibrary to the MPL's StandardLibrary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan after merging this is actually to merge those so that only the smithy-dafny copy exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Let's chat when you do. Crypto Tools might need to update our release process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! I don't think it will change much, just that MPL components will depend on the StandardLibrary in the smithy-dafny submodule instead of it's own copy

from smithy_python.interfaces.blobs import ByteStream
from smithy_dafny_standard_library.internaldafny.generated.Std_Streams import ByteStream as DafnyByteStream, RewindableByteStream as DafnyRewindableByteStream
from smithy_dafny_standard_library.internaldafny.generated.Std_Enumerators import Enumerator
from smithy_dafny_standard_library.internaldafny.generated.Std_Wrappers import Option, Option_Some, Option_None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd -- in my local this is just Wrappers. Did something change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the code under the Dafny standard libraries are namespaced under Std::, so this is that copy of Wrappers.dfy rather than the local one from dafny-lang/libraries

return self.dafny_byte_stream.Seek(position)


class StreamingBlobAsDafnyDataStream(DafnyByteStream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for others' clarification maybe:

Suggested change
class StreamingBlobAsDafnyDataStream(DafnyByteStream):
class StreamingBlobAsDafnyDataStream(DafnyByteStream):
"""Wrapper class allowing a native streaming blob to be accessed from Dafny streaming code."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I'll add some more comments in general in fact

return bytes(next.value) if next.is_Some else bytes()


class RewindableDafnyByteStreamAsByteStream(DafnyByteStreamAsByteStream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class RewindableDafnyByteStreamAsByteStream(DafnyByteStreamAsByteStream):
class RewindableDafnyByteStreamAsByteStream(DafnyByteStreamAsByteStream):
"""Wrapper class allowing a Dafny byte stream to be accessed and used in native code as a rewindable byte stream."""

@@ -0,0 +1,58 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by the intention behind the implementation.
Should/could these methods extend the generated code (like a typical extern), or are should these methods be in separate classes and not linked to the generated code?
Right now, generated Dafny won't have these methods. But they could, if you want to set that up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comments you left clear this up (and I'll integrate and expand on them and add more comments at a wider scope too): one class wraps the Python streaming concept up with the Dafny-generated one, and the other wraps up the Dafny-generated one with Python's interface. So they are externs in both directions.

from smithy_dafny_standard_library.internaldafny.generated.Std_Wrappers import Option, Option_Some, Option_None


class DafnyByteStreamAsByteStream(ByteStream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to help me understand what's going on -- feel free to discard if these are already clarified elsewhere

Suggested change
class DafnyByteStreamAsByteStream(ByteStream):
class DafnyByteStreamAsByteStream(ByteStream):
"""Wrapper class allowing a Dafny byte stream to be accessed and used in native code as a byte stream."""

self.dafny_byte_stream = dafny_byte_stream

def read(self, size: int = -1) -> bytes:
# TODO: assert size is -1, buffer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest adding an __iter__ method to these classes so they're broadly recognized as streams.
Notably, Pandas will say these classes aren't "file-like" because they don't have an __iter__ method: https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.api.types.is_file_like.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed since __iter__ splits on newlines and I think there's no use for that, I'll leave this out for now.

Copy link
Contributor

@seebees seebees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. Are you trying to get this in as is?

nameonly chunkSize: CountingInteger
)
datatype ChunksOutput = | ChunksOutput (
nameonly bytesOut: ByteStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is from Std.Streams right?
I'm wondering if it would be better to not open these modules? I get why, but coming in and reading the code I have to imagine what is going on.

method CountBits ( config: InternalConfig , input: CountBitsInput )
returns (output: Result<CountBitsOutput, Error>)
{
var counter := new Folder<BoundedInts.bytes, int>(0, (sum, byte) => sum + BytesBitCount(byte));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, This is an unfortunate name. I thought it was a directory at first :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YUP, especially as it's main use is an example of an accumulator I'm definitely renaming it

// SPDX-License-Identifier: Apache-2.0
// Do not modify this file. This file is machine generated, and any changes to it will be overwritten.
include "../Model/SimpleStreamingTypes.dfy"
module SimpleStreamingImpl refines AbstractSimpleStreamingOperations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module SimpleStreamingImpl refines AbstractSimpleStreamingOperations {
module {:options "/functionSyntax:4" } SimpleStreamingImpl refines AbstractSimpleStreamingOperations {

Comment on lines 28 to 29
// Should really have the Folder fail instead,
// but this is a simpler correct approach.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I would like to see the other approach. For a friend :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the link elsewhere for Errors in general.


}

module Chunker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why you put this here, but it should be in a different file

{
var s: seq<BoundedInts.bytes> := [[0x0], [0x1, 0x2], [0x3], [], [0x4, 0x5]];
var e := new Enumerators.SeqEnumerator(s);
var stream := new EnumeratorDataStream(e, 5 as BoundedInts.uint64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total count of all bytes, which (currently) must be known ahead of time. Added the parameter name at least.

Comment on lines 42 to 43
var s: seq<BoundedInts.bytes> := [[0x0], [0x1, 0x2], [0x3], [], [0x4, 0x5]];
var stream := new Enumerators.SeqEnumerator(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these just vestigial copy/paste?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to block,
but how do we get/pass errors?

There are also no Chunk tests.

I would also like some tests that could show that it really is a stream. Like have a sleep or something? Not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For errors in general I think the Action abstraction can handle anything you need, and I added a big explicit design callout on that: https://github.com/dafny-lang/dafny/pull/6074/files#diff-cb5145d4b584ea7656360f05c946523c48c5651815cea269b469a855844c3ed2R53

For ByteStream specifically I think there does need to be errors in the stream of chunks somehow, and I added a small comment on that one. It will be a little tricky since a Smithy byte stream cannot produce modeled errors in the middle of the stream, only unmodelled things like ConnectionDropped etc. I supposed it will have to be an opaque object just like the service-wide error.

Added the missing Chunk test.

Demonstrating more stream-ness will need more Python adaptors or other languages to land, so I'll leave that for the next batch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only the one test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason, failure to update the wrapped copy. Would love to find a way to not have to manually curate the set of tests to run in either runner.

Comment on lines +72 to +73
assume {:axiom} fresh(MyBody.value.Repr);
assume {:axiom} MyBody.value.Valid();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice having the targeted axioms! Thanks.
The only even more targeted we have found to be a little nicer is to put this in a by construction exactly where it is needed. e.g. Collect() by {... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants