Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support chunked_vector and other json/httpd changes #2647

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions include/seastar/core/chunked_fifo.hh
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#ifndef SEASTAR_MODULE
#include <algorithm>
#include <cassert>
#include <iterator>
#include <type_traits>
#include <seastar/util/modules.hh>
#endif
Expand Down Expand Up @@ -176,10 +177,11 @@ public:
public:
chunked_fifo() noexcept = default;
chunked_fifo(chunked_fifo&& x) noexcept;
chunked_fifo(const chunked_fifo& X) = delete;
chunked_fifo(const chunked_fifo&);
~chunked_fifo();
chunked_fifo& operator=(const chunked_fifo&) = delete;
chunked_fifo& operator=(const chunked_fifo&);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more symmetrical to require x = y.copy(), no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind allowing both the copy constructor and assignment operator. C++ is a copyful language, and pretending it isn't usually just makes life harder.

Copy link
Contributor Author

@travisdowns travisdowns Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more symmetrical to require x = y.copy(), no?

Yeah, but ... the main problem I'm trying to solve is that json2code generates a "copy constructor" which expects all the elements of the object to be copy-assignable, here's an example from api.json in this repo:

    my_object(const my_object& e) {
        register_params();
        var1 = e.var1;
        var2 = e.var2;
        enum_var = e.enum_var;
    }

As shown, this copy constructor does element-wise assignment (not sure why it doesn't use member init in order to use copy-ctor instead), so that's why I'm adding the assignment operator: to make the above compile with chunked_fifo as an element.

Actually copy() is not needed here at all, it is vestigial from a different approach I tried first, though we do use this copy() pattern in Redpanda, exactly as you wanted: the container class declares move-assignment, but not move-copy, so you'd use x = y.copy() if you wanted to "force" a copy-assignment.

I'm happy to whatever here:

  1. This change as is
  2. This change but remove any traces of copy()
  3. Add full copy support to chunked_fifo and update the code generation to use the copy-ctor
  4. Remove copy-assignment, but leave copy() and update the code generation to call copy() explicitly (I had a change along these lines originally)

Please advise.

I agree that containers without copy are kind of anti-C++ though I know having them as saved us many unecessary copies in Redpanda as it forces you to get move working everywhere (with copy() as an escape hatch).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also 5. change json2code to call a helper template function, which defaults to a regular copy, but which we can override to do something else for specific types.

But I think supporting the copy constructor is the path of least friction (though it opens the door to bad surprises).

We handle unexpected copies by having our small-scale performance tests monitor allocation count (and task count, and instruction count) per op and watching for changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think supporting the copy constructor is the path of least friction (though it opens the door to bad surprises).

Sounds good to me.

Push 44e087a adds full copy support to chunked_fifo, and removes copy(). This actually simplifies the fix to the dangling code that @nvartolomei pointed out: currently it's hard to fully support move-only types all the way down the serialization hierarchy as we have only a virtual write(ostream) const method on json elements: this is not suitable for move-only types, which need a && overload but then can't even compile the const method, so it all gets very messy (e.g., the code generator would need to track whether the current object was "tainted" anywhere by a move-only type and then stub out the const write method to throw an exception, or implement a different interface or something.

We handle unexpected copies by having our small-scale performance tests monitor allocation count (and task count, and instruction count) per op and watching for changes.

I love this idea. Is this source-available so I can peek at it?

chunked_fifo& operator=(chunked_fifo&&) noexcept;
inline bool operator==(const chunked_fifo& rhs) const;
inline void push_back(const T& data);
inline void push_back(T&& data);
T& back() noexcept;
Expand Down Expand Up @@ -296,6 +298,25 @@ chunked_fifo<T, items_per_chunk>::chunked_fifo(chunked_fifo&& x) noexcept
x._nfree_chunks = 0;
}

template <typename T, size_t items_per_chunk>
inline
chunked_fifo<T, items_per_chunk>::chunked_fifo(const chunked_fifo& rhs)
: chunked_fifo() {
std::copy_n(rhs.begin(), rhs.size(), std::back_inserter(*this));
}

template <typename T, size_t items_per_chunk>
inline
chunked_fifo<T, items_per_chunk>&
chunked_fifo<T, items_per_chunk>::operator=(const chunked_fifo& rhs) {
if (&rhs != this) {
clear();
std::copy_n(rhs.begin(), rhs.size(), std::back_inserter(*this));
shrink_to_fit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In copy() you decided to use reserve() and here shrink_to_fit(). Is there a reason?
By the way, maybe these two functions can call each other instead of having two versions of the copying algorithm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I'm pretty sure that bulk copy can be done more efficiently than copying the items one by one, but I don't know if it matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nyh for your feedback!

In copy() you decided to use reserve() and here shrink_to_fit(). Is there a reason?

In the copy() case, we know the destination object (ret) is empty and we know the target size, so we can just reserve that size, it's the easy case.

In the assignment operation the LHS may already have allocated memory and there are sort of three primary cases as I see it:

  1. The LHS has less capacity than the RHS (including LHS empty).
  2. The LHS has "equal or slightly more" capacity than RHS.
  3. The LHS has much higher capacity than RHS (including RHS empty).

The goal is not not reallocate memory if not necessary (case 2) and not leave the container with a capacity totally out of line with its contents (case 3). The chosen approach does that. reserve() does not do that: it never shrinks the container. Arguably it is possible to be more efficient in case 2, currently we destroy all objects and then copy the new ones in, but it would also be possible to move the source objects into the LHS directly without destruction (then destroy any LHS objects in the suffix).

By the way, maybe these two functions can call each other instead of having two versions of the copying algorithm?

The reason for the two versions is as above, so it seems like the common code is really only the copy_n line (if you accept the validity of the argument above, that is)?

By the way, I'm pretty sure that bulk copy can be done more efficiently than copying the items one by one, but I don't know if it matters.

Yes, and it is easiest in case 2, where the LHS is already big enough, then we can simply use std::copy with iterators and internally this does the right thing (e.g., lowered to memcpy depending on the characteristics of the involved objects). For the other cases it is not so simple: certainly it can be done but it requires working with unitialized memory, or assuming the type T has a default constructor and then default constructing the LHS to the right size and doing a copy, which is definitely cheaper for primitive types, but puts additional requirements on the type.

In general it doesn't seem like existing routines in chunked_fifo are optimized to that level, but I'm happy to go this route if you think that's what's required here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. No, I don't think that super-optimizing this code is very important. By definition, making a copy of a whole vector is already non-optimal...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition, making a copy of a whole vector is already non-optimal...

Indeed, and here we are adding the copy assignment operator only because that's a requirement of the json elements in seastar: we auto-generate a copy constructor (which internally uses assignment) so we require all types used as json elements to have at least an assignment operator.

}
return *this;
}

template <typename T, size_t items_per_chunk>
inline
chunked_fifo<T, items_per_chunk>&
Expand Down Expand Up @@ -455,6 +476,11 @@ chunked_fifo<T, items_per_chunk>::emplace_back(Args&&... args) {
++_back_chunk->end;
}

template <typename T, size_t items_per_chunk>
inline bool chunked_fifo<T, items_per_chunk>::operator==(const chunked_fifo& rhs) const {
return size() == rhs.size() && std::equal(begin(), end(), rhs.begin());
}

template <typename T, size_t items_per_chunk>
inline void
chunked_fifo<T, items_per_chunk>::push_back(const T& data) {
Expand Down
28 changes: 16 additions & 12 deletions include/seastar/http/function_handlers.hh
Original file line number Diff line number Diff line change
Expand Up @@ -82,31 +82,22 @@ public:
function_handler(const request_function & _handle, const sstring& type)
: _f_handle(
[_handle](std::unique_ptr<http::request> req, std::unique_ptr<http::reply> rep) {
rep->_content += _handle(*req.get());
return make_ready_future<std::unique_ptr<http::reply>>(std::move(rep));
return append_result(std::move(rep), _handle(*req.get()));
}), _type(type) {
}

function_handler(const json_request_function& _handle)
: _f_handle(
[_handle](std::unique_ptr<http::request> req, std::unique_ptr<http::reply> rep) {
json::json_return_type res = _handle(*req.get());
rep->_content += res._res;
return make_ready_future<std::unique_ptr<http::reply>>(std::move(rep));
return append_result(std::move(rep), _handle(*req.get()));
}), _type("json") {
}

function_handler(const future_json_function& _handle)
: _f_handle(
[_handle](std::unique_ptr<http::request> req, std::unique_ptr<http::reply> rep) {
return _handle(std::move(req)).then([rep = std::move(rep)](json::json_return_type&& res) mutable {
if (res._body_writer) {
rep->write_body("json", std::move(res._body_writer));
} else {
rep->_content += res._res;

}
return make_ready_future<std::unique_ptr<http::reply>>(std::move(rep));
return append_result(std::move(rep), std::move(res));
});
}), _type("json") {
}
Expand All @@ -122,6 +113,19 @@ public:
});
}

private:
// send the json payload of result to reply, return the reply pointer
static future<std::unique_ptr<http::reply>> append_result(
std::unique_ptr<http::reply>&& reply,
json::json_return_type&& result) {
if (result._body_writer) {
reply->write_body("json", std::move(result._body_writer));
} else {
reply->_content += result._res;
}
return make_ready_future<std::unique_ptr<http::reply>>(std::move(reply));
}

protected:
future_handler_function _f_handle;
sstring _type;
Expand Down
6 changes: 3 additions & 3 deletions include/seastar/json/formatter.hh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ namespace internal {

template<typename T>
concept is_map = requires {
typename T::mapped_type;
typename std::remove_reference_t<T>::mapped_type;
};

template<typename T>
Expand Down Expand Up @@ -357,8 +357,8 @@ public:
*/
template<std::ranges::input_range Range>
requires (!internal::is_string_like<Range>)
static future<> write(output_stream<char>& s, const Range& range) {
return do_with(std::move(range), [&s] (const auto& range) {
static future<> write(output_stream<char>& s, Range&& range) {
return do_with(std::forward<Range>(range), [&s] (const auto& range) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the original code pretends to move but actually copies.

The new code moves when it's safe.

if constexpr (internal::is_map<Range>) {
return write(s, state::map, std::ranges::begin(range), std::ranges::end(range));
} else {
Expand Down
4 changes: 2 additions & 2 deletions include/seastar/json/json_elements.hh
Original file line number Diff line number Diff line change
Expand Up @@ -368,8 +368,8 @@ std::function<future<>(output_stream<char>&&)> stream_range_as_array(Container v
template<class T>
std::function<future<>(output_stream<char>&&)> stream_object(T val) {
return [val = std::move(val)](output_stream<char>&& s) mutable {
return do_with(output_stream<char>(std::move(s)), T(std::move(val)), [](output_stream<char>& s, const T& val){
return formatter::write(s, val).finally([&s] {
return do_with(output_stream<char>(std::move(s)), T(std::move(val)), [](output_stream<char>& s, T& val){
return formatter::write(s, std::move(val)).finally([&s] {
return s.close();
});
});
Expand Down
5 changes: 2 additions & 3 deletions scripts/seastar-json2code.py
Original file line number Diff line number Diff line change
Expand Up @@ -677,9 +677,8 @@ def parse_file(param, combined):
if (combined):
fprintln(combined, '#include "', base_file_name, ".cc", '"')
create_h_file(data, hfile_name, api_name, init_method, base_api)
except:
type, value, tb = sys.exc_info()
print("Error while parsing JSON file '" + param + "' error ", value.message)
except Exception as e:
print("Error while parsing JSON file '" + param + "' error " + str(e))
sys.exit(-1)

if "indir" in config and config.indir != '':
Expand Down
12 changes: 12 additions & 0 deletions tests/unit/api.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,18 @@
"VAL2",
"VAL3"
]
},
{
"name": "stream_enum",
"description": "Whether to return the response as a stream_object",
"required": true,
"allowMultiple": false,
"type": "string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was about to suggest to use "boolean". but ironically, it turns out that the "string" type with enum contraints is indeed simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to use boolean if you prefer. I actually used enum since I did a quick check and dind't see any existing uses of boolean but looking at seastar-json2code.py a boolean type is indeed supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's less surprising to have a boolean here.

"paramType": "query",
"enum": [
"no",
"yes"
]
}
]
}
Expand Down
Loading