-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support chunked_vector and other json/httpd changes #2647
base: master
Are you sure you want to change the base?
Conversation
In the exception handler we accessed exception.message which doesn't exist in general, so this would throw a new exception obscuring the true error. Fix it by just using str(e).
Factor out some common code for constructing the URL and doing the GET into _do_query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
"description": "Whether to return the response as a stream_object", | ||
"required": true, | ||
"allowMultiple": false, | ||
"type": "string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was about to suggest to use "boolean". but ironically, it turns out that the "string" type with enum contraints is indeed simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to use boolean if you prefer. I actually used enum since I did a quick check and dind't see any existing uses of boolean but looking at seastar-json2code.py a boolean
type is indeed supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's less surprising to have a boolean here.
@amnonh hi Amnon, could you please take a look as well? |
if (&rhs != this) { | ||
clear(); | ||
std::copy_n(rhs.begin(), rhs.size(), std::back_inserter(*this)); | ||
shrink_to_fit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In copy() you decided to use reserve() and here shrink_to_fit(). Is there a reason?
By the way, maybe these two functions can call each other instead of having two versions of the copying algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I'm pretty sure that bulk copy can be done more efficiently than copying the items one by one, but I don't know if it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nyh for your feedback!
In copy() you decided to use reserve() and here shrink_to_fit(). Is there a reason?
In the copy()
case, we know the destination object (ret
) is empty and we know the target size, so we can just reserve that size, it's the easy case.
In the assignment operation the LHS may already have allocated memory and there are sort of three primary cases as I see it:
- The LHS has less capacity than the RHS (including LHS empty).
- The LHS has "equal or slightly more" capacity than RHS.
- The LHS has much higher capacity than RHS (including RHS empty).
The goal is not not reallocate memory if not necessary (case 2) and not leave the container with a capacity totally out of line with its contents (case 3). The chosen approach does that. reserve()
does not do that: it never shrinks the container. Arguably it is possible to be more efficient in case 2, currently we destroy all objects and then copy the new ones in, but it would also be possible to move the source objects into the LHS directly without destruction (then destroy any LHS objects in the suffix).
By the way, maybe these two functions can call each other instead of having two versions of the copying algorithm?
The reason for the two versions is as above, so it seems like the common code is really only the copy_n
line (if you accept the validity of the argument above, that is)?
By the way, I'm pretty sure that bulk copy can be done more efficiently than copying the items one by one, but I don't know if it matters.
Yes, and it is easiest in case 2, where the LHS is already big enough, then we can simply use std::copy
with iterators and internally this does the right thing (e.g., lowered to memcpy
depending on the characteristics of the involved objects). For the other cases it is not so simple: certainly it can be done but it requires working with unitialized memory, or assuming the type T
has a default constructor and then default constructing the LHS to the right size and doing a copy, which is definitely cheaper for primitive types, but puts additional requirements on the type.
In general it doesn't seem like existing routines in chunked_fifo are optimized to that level, but I'm happy to go this route if you think that's what's required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. No, I don't think that super-optimizing this code is very important. By definition, making a copy of a whole vector is already non-optimal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By definition, making a copy of a whole vector is already non-optimal...
Indeed, and here we are adding the copy assignment operator only because that's a requirement of the json elements in seastar: we auto-generate a copy constructor (which internally uses assignment) so we require all types used as json elements to have at least an assignment operator.
@@ -178,8 +179,9 @@ public: | |||
chunked_fifo(chunked_fifo&& x) noexcept; | |||
chunked_fifo(const chunked_fifo& X) = delete; | |||
~chunked_fifo(); | |||
chunked_fifo& operator=(const chunked_fifo&) = delete; | |||
chunked_fifo& operator=(const chunked_fifo&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more symmetrical to require x = y.copy(), no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind allowing both the copy constructor and assignment operator. C++ is a copyful language, and pretending it isn't usually just makes life harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more symmetrical to require x = y.copy(), no?
Yeah, but ... the main problem I'm trying to solve is that json2code generates a "copy constructor" which expects all the elements of the object to be copy-assignable, here's an example from api.json in this repo:
my_object(const my_object& e) {
register_params();
var1 = e.var1;
var2 = e.var2;
enum_var = e.enum_var;
}
As shown, this copy constructor does element-wise assignment (not sure why it doesn't use member init in order to use copy-ctor instead), so that's why I'm adding the assignment operator: to make the above compile with chunked_fifo as an element.
Actually copy()
is not needed here at all, it is vestigial from a different approach I tried first, though we do use this copy()
pattern in Redpanda, exactly as you wanted: the container class declares move-assignment, but not move-copy, so you'd use x = y.copy()
if you wanted to "force" a copy-assignment.
I'm happy to whatever here:
- This change as is
- This change but remove any traces of copy()
- Add full copy support to
chunked_fifo
and update the code generation to use the copy-ctor - Remove copy-assignment, but leave
copy()
and update the code generation to callcopy()
explicitly (I had a change along these lines originally)
Please advise.
I agree that containers without copy are kind of anti-C++ though I know having them as saved us many unecessary copies in Redpanda as it forces you to get move working everywhere (with copy()
as an escape hatch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also 5. change json2code to call a helper template function, which defaults to a regular copy, but which we can override to do something else for specific types.
But I think supporting the copy constructor is the path of least friction (though it opens the door to bad surprises).
We handle unexpected copies by having our small-scale performance tests monitor allocation count (and task count, and instruction count) per op and watching for changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think supporting the copy constructor is the path of least friction (though it opens the door to bad surprises).
Sounds good to me.
Push 44e087a adds full copy support to chunked_fifo, and removes copy()
. This actually simplifies the fix to the dangling code that @nvartolomei pointed out: currently it's hard to fully support move-only types all the way down the serialization hierarchy as we have only a virtual write(ostream) const
method on json elements: this is not suitable for move-only types, which need a && overload but then can't even compile the const
method, so it all gets very messy (e.g., the code generator would need to track whether the current object was "tainted" anywhere by a move-only type and then stub out the const write method to throw an exception, or implement a different interface or something.
We handle unexpected copies by having our small-scale performance tests monitor allocation count (and task count, and instruction count) per op and watching for changes.
I love this idea. Is this source-available so I can peek at it?
include/seastar/core/chunked_fifo.hh
Outdated
@@ -190,6 +192,9 @@ public: | |||
inline void pop_front() noexcept; | |||
inline bool empty() const noexcept; | |||
inline size_t size() const noexcept; | |||
// Return a new chunked_fifo which is a copy of this one, which | |||
// is useful as this class does not allow copy creation or assignment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous change allows assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is toast as copy()
is removed now.
static future<> write(output_stream<char>& s, const Range& range) { | ||
return do_with(std::move(range), [&s] (const auto& range) { | ||
static future<> write(output_stream<char>& s, Range&& range) { | ||
return do_with(std::forward<Range>(range), [&s] (const auto& range) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the original code pretends to move but actually copies.
The new code moves when it's safe.
I'm now regretting my decision not to allow random containers like chunked_vector into Seastar (or maybe, the decision to allow the json stuff in). It's an important building block for intermediate layers like this json stuff. |
_elements.clear(); | ||
for (auto i : list) { | ||
push(i); | ||
} | ||
return *this; | ||
} | ||
virtual future<> write(output_stream<char>& s) const override { | ||
return formatter::write(s, _elements); | ||
auto t = const_cast<json_list_template<T, Container> *>(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm puzzled by this cast and the intention behind it.
- Why are we moving data from a const object?
- If an optimization is possible/desirable why not add a
virtual future<> write(...) &&
overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also https://en.cppreference.com/w/cpp/language/const_cast notes: "Modifying a const object through a non-const access path and referring to a volatile object through a non-volatile glvalue results in undefined behavior."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye, this is definitely wrong and left over from my earlier experimentation: will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in 44e087a
This changes adds copy assignment, and construction to chunked_fifo. These methods were missing from chunked_fifo. Copy construction in particular is a possible performance footgun which is perhaps why it was not offered in the past, or perhaps it was simply that chunked_fifo is a self-described minimalist class and copy wasn't needed. This is a precursor to using chunked_fifo in the json autogenerated classes (seastar-json2code.py).
Recently the json::formatter support was improved, allowing any Range to be formatted using formatter::write(). However, this changed the old behavior which took std::vector by value and moved it into the write method to using a const Range&, which prevents moving (we did call std::move in the same place as before but given we pass a const ref this does nothing). Change to this take a forwarding reference Range&& and use std::forward to pass the range object to do_with, which enables write to both (a) work with move-only types, and (b) avoid copying where possible even for copyable types.
stream_object takes an object by value and returns a stream function which when called with an output stream, writes the captured value to the stream. This mechanism did not work for move only objects. Enhance it to do so by moving the captured object into formatter::write.
There are a variety of http function handler types, e.g., returning string or json, async or sync. Response objects may be streaming or not, and we didn't handle the streaming case for several handler permutations. To fix, we need to check if body_writer is set (the streaming case) and use that if so for the streaming case. Do this in a utility method common to all the cases where it applies.
Stream responses were broken for a few handler types (see previous fix) so add test coverage for the streaming case: in every json2code test case do a streaming and non-streaming call to the endpoint and ensure their results are identical.
Remove unused <time.h> and <sstream> headers.
542ac71
to
44e087a
Compare
tests/unit/rest_api_httpd.cc
Outdated
// This demonstrate enum conversion | ||
obj.enum_var = v; | ||
return obj; | ||
stream_enum is_streaming =str2stream_enum(req.query_parameters.at("stream_enum")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to add spaces around =
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eagle eyes 🦅 !
Fixed in 9f22206.
Currently json2code only supports std::vector as a list type, but this results in unavoidable large allocations for even modest response sizes, e.g., is is not uncommon for sizeof(elem_type) for these vectors to be 100 - 1000 bytes for simple to moderately complex response types, so then a mere ~1200 to ~120 objects in the vector are enough to result in allocations > 128K, a problem for seastar applications which must avoid large allocations (because of fragmentation). To avoid this problem, support chunked_fifo as a second type for lists in json2code. Use the type "chunked_array" instead of "array" to use it and the generated code will use chunked_fifo instead of vector. Also adds tests for the new use case in the json2code test.
44e087a
to
9f22206
Compare
This series came out of an effort to squash large allocations in a json httpd endpoint in redpanda: the basic idea is to optionally use chunked_fifo instead of std::vector as the underlying type in the json2code generation.
Some other bugs I ran into along the the way are also fixed, and even code that doesn't use chunked_fifo should get some benefits from the addition of move support on the json response hot path.
Test cases added where I thought it made sense.