-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: partial @streaming blob support for Python #747
base: main-1.x
Are you sure you want to change the base?
Conversation
…-aws/smithy-dafny into robin-aws/generate-everything-mode # Conflicts: # codegen/smithy-dafny-codegen/src/main/resources/templates/src/Index.dfy
…aming-test-models
…aming-test-models
…obin-aws/streaming-test-models
…aming-test-models
…obin-aws/streaming-test-models-python # Conflicts: # .github/workflows/smithy-polymorph.yml
…obin-aws/streaming-test-models-python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly just reviewed Python.
Nothing blocking, just some suggestions/questions.
This project does not build. The `software.amazon.polymorph.smithydafny` project does not support code generation for the "streaming" trait. | ||
|
||
Once the Polymorph code generator supports code generation for this trait, these files should be extended to complete the type implementation. | ||
This project only currently works on Python: the Dafny code generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also note that someone will need to copy-paste these from the TestModels' StandardLibrary to the MPL's StandardLibrary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan after merging this is actually to merge those so that only the smithy-dafny copy exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Let's chat when you do. Crypto Tools might need to update our release process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure! I don't think it will change much, just that MPL components will depend on the StandardLibrary in the smithy-dafny submodule instead of it's own copy
from smithy_python.interfaces.blobs import ByteStream | ||
from smithy_dafny_standard_library.internaldafny.generated.Std_Streams import ByteStream as DafnyByteStream, RewindableByteStream as DafnyRewindableByteStream | ||
from smithy_dafny_standard_library.internaldafny.generated.Std_Enumerators import Enumerator | ||
from smithy_dafny_standard_library.internaldafny.generated.Std_Wrappers import Option, Option_Some, Option_None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd -- in my local this is just Wrappers
. Did something change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the code under the Dafny standard libraries are namespaced under Std::
, so this is that copy of Wrappers.dfy rather than the local one from dafny-lang/libraries
return self.dafny_byte_stream.Seek(position) | ||
|
||
|
||
class StreamingBlobAsDafnyDataStream(DafnyByteStream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for others' clarification maybe:
class StreamingBlobAsDafnyDataStream(DafnyByteStream): | |
class StreamingBlobAsDafnyDataStream(DafnyByteStream): | |
"""Wrapper class allowing a native streaming blob to be accessed from Dafny streaming code.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I'll add some more comments in general in fact
return bytes(next.value) if next.is_Some else bytes() | ||
|
||
|
||
class RewindableDafnyByteStreamAsByteStream(DafnyByteStreamAsByteStream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class RewindableDafnyByteStreamAsByteStream(DafnyByteStreamAsByteStream): | |
class RewindableDafnyByteStreamAsByteStream(DafnyByteStreamAsByteStream): | |
"""Wrapper class allowing a Dafny byte stream to be accessed and used in native code as a rewindable byte stream.""" |
@@ -0,0 +1,58 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by the intention behind the implementation.
Should/could these methods extend the generated code (like a typical extern), or are should these methods be in separate classes and not linked to the generated code?
Right now, generated Dafny won't have these methods. But they could, if you want to set that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comments you left clear this up (and I'll integrate and expand on them and add more comments at a wider scope too): one class wraps the Python streaming concept up with the Dafny-generated one, and the other wraps up the Dafny-generated one with Python's interface. So they are externs in both directions.
from smithy_dafny_standard_library.internaldafny.generated.Std_Wrappers import Option, Option_Some, Option_None | ||
|
||
|
||
class DafnyByteStreamAsByteStream(ByteStream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to help me understand what's going on -- feel free to discard if these are already clarified elsewhere
class DafnyByteStreamAsByteStream(ByteStream): | |
class DafnyByteStreamAsByteStream(ByteStream): | |
"""Wrapper class allowing a Dafny byte stream to be accessed and used in native code as a byte stream.""" |
self.dafny_byte_stream = dafny_byte_stream | ||
|
||
def read(self, size: int = -1) -> bytes: | ||
# TODO: assert size is -1, buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest adding an __iter__
method to these classes so they're broadly recognized as streams.
Notably, Pandas will say these classes aren't "file-like" because they don't have an __iter__
method: https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.api.types.is_file_like.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed since __iter__
splits on newlines and I think there's no use for that, I'll leave this out for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. Are you trying to get this in as is?
nameonly chunkSize: CountingInteger | ||
) | ||
datatype ChunksOutput = | ChunksOutput ( | ||
nameonly bytesOut: ByteStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is from Std.Streams
right?
I'm wondering if it would be better to not open these modules? I get why, but coming in and reading the code I have to imagine what is going on.
method CountBits ( config: InternalConfig , input: CountBitsInput ) | ||
returns (output: Result<CountBitsOutput, Error>) | ||
{ | ||
var counter := new Folder<BoundedInts.bytes, int>(0, (sum, byte) => sum + BytesBitCount(byte)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, This is an unfortunate name. I thought it was a directory at first :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YUP, especially as it's main use is an example of an accumulator I'm definitely renaming it
// SPDX-License-Identifier: Apache-2.0 | ||
// Do not modify this file. This file is machine generated, and any changes to it will be overwritten. | ||
include "../Model/SimpleStreamingTypes.dfy" | ||
module SimpleStreamingImpl refines AbstractSimpleStreamingOperations { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module SimpleStreamingImpl refines AbstractSimpleStreamingOperations { | |
module {:options "/functionSyntax:4" } SimpleStreamingImpl refines AbstractSimpleStreamingOperations { |
// Should really have the Folder fail instead, | ||
// but this is a simpler correct approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I would like to see the other approach. For a friend :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the link elsewhere for Errors in general.
|
||
} | ||
|
||
module Chunker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get why you put this here, but it should be in a different file
{ | ||
var s: seq<BoundedInts.bytes> := [[0x0], [0x1, 0x2], [0x3], [], [0x4, 0x5]]; | ||
var e := new Enumerators.SeqEnumerator(s); | ||
var stream := new EnumeratorDataStream(e, 5 as BoundedInts.uint64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total count of all bytes, which (currently) must be known ahead of time. Added the parameter name at least.
var s: seq<BoundedInts.bytes> := [[0x0], [0x1, 0x2], [0x3], [], [0x4, 0x5]]; | ||
var stream := new Enumerators.SeqEnumerator(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these just vestigial copy/paste?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to block,
but how do we get/pass errors?
There are also no Chunk tests.
I would also like some tests that could show that it really is a stream. Like have a sleep or something? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For errors in general I think the Action abstraction can handle anything you need, and I added a big explicit design callout on that: https://github.com/dafny-lang/dafny/pull/6074/files#diff-cb5145d4b584ea7656360f05c946523c48c5651815cea269b469a855844c3ed2R53
For ByteStream specifically I think there does need to be errors in the stream of chunks somehow, and I added a small comment on that one. It will be a little tricky since a Smithy byte stream cannot produce modeled errors in the middle of the stream, only unmodelled things like ConnectionDropped etc. I supposed it will have to be an opaque object
just like the service-wide error.
Added the missing Chunk test.
Demonstrating more stream-ness will need more Python adaptors or other languages to land, so I'll leave that for the next batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only the one test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No good reason, failure to update the wrapped copy. Would love to find a way to not have to manually curate the set of tests to run in either runner.
assume {:axiom} fresh(MyBody.value.Repr); | ||
assume {:axiom} MyBody.value.Valid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice having the targeted axioms! Thanks.
The only even more targeted we have found to be a little nicer is to put this in a by
construction exactly where it is needed. e.g. Collect() by {...
:)
Adds partial support for the
@streaming
Smithy trait on blobs, depending on aactions-and-streaming-stdlibs
branch of Dafny that adds an Actions standard library, and only implemented when targeting Python so far. Event streaming is not technically supported yet because more codegen changes are necessary, but this will be a relatively small delta, as the Dafny types involved are already handled.The motivation behind merging this incomplete work is largely to get the test model coverage into CI, so that it doesn't regress as the feature is completed, especially when Dafny verification is perturbed in future versions. The next priority will be to complete the Dafny branch and merge it to Dafny mainline.
Note that
smithy-dafny
will still emit aDANGER
validation event saying that@streaming
is not supported, which is intentional in order to guard against using the feature in production before it is fully completed: the test models exercising it explicitly suppress this event.dafny
submodule underTestModels/dafny-dependencies/dafny
tracking theactions-and-streaming-stdlibs
branch, just to access the contents ofSource/DafnyStandardLibraries/src/Std/Actions
etc. This copy of Dafny is never built or used for any other building tasks.StandardLibrary
to build a .doo file from a subset of the Dafny standard library files above (usingDafnyStandardLibraries-smithy-dafny-subset.toml
). This is then used as a--library
argument when verifying/translating polymorph libraries. This approach allows the standard library source to be built with different but compatible options such as--reads-clauses-on-methods
.USE_DAFNY_STANDARD_LIBRARIES
Makefile variable, since a newer Dafny is required for this to work.@
-attributes.ByteStream
andRewindableByteStream
types, and removed the previous workaround to first.read()
all data in Python.modifies
clauses. I haven't yet implemented this, instead using a patch file to add the missing specification to theStreaming
test model code.aws-sdk/s3
test model and fleshed theStreaming
test model out to actually test the feature.dafny
submodule. See License header scanning is broken #763.Reviewer note: it is worth reading the contents of the pending Actions and Streams Dafny libraries, but they are still under construction and I'm not claiming they will not change before being merged. Interface feedback is welcome but is not intended to be a hard requirement in scope for this review.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.