Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph octopus 19.06.0 45 g7744693c #1

Open
wants to merge 692 commits into
base: master
Choose a base branch
from

Conversation

ronen-fr
Copy link
Owner

@ronen-fr ronen-fr commented Aug 8, 2019

No description provided.

espindola and others added 30 commits February 12, 2019 18:40
… being monitored

In request_preemption we have a loop waiting for need_preempt to
return true. For that to happen, the kernel must update the
preemption ring.

The ring is updated any time a request finishes, but
request_preemption cannot assume that there is anything left to do. It
must ensure that timerfd is in the preemption ring.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
Message-Id: <[email protected]>
When a BOOST_REQUIRE(a && b && c) fails, it is not clear if a, b or c
is false. It is better to write it as 3 BOOST_REQUIREs.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
Message-Id: <[email protected]>
With this change, the `futures` test will produce an executable named
`tests/unit/futures_test`.

This executable can be referenced by name, or with a qualified alias:

    ninja -C build/release tests/unit/futures_test

or

    ninja -C build/release test_unit_futures

Fixes scylladb#604
With this change, the executable and target are now
`tests/perf/fstream_perf`.
With this change, the executable and target are now `demos/ip_demo`.
Server hold shared pointer to all its connection which ensures that it
can stop and wait for all of them when terminated and that connection
will not disappear prematurely since there is always one shared pointer
to it. But if streaming is not enabled for a server the insertion code
does not work: it inserts only one connection and drops all others
because they all have the same id. Fix it by generating unique id for
all connection even if streaming is not enabled.

Message-Id: <[email protected]>
With this change, if code internal to the Seastar repository references
a deprecated declaration then we will issue a warning but not trigger an
error.

This is consistent with the way Seastar behaved prior to the switch to
CMake.

Fixes scylladb#602

Tests: dist (release)
Signed-off-by: Jesse Haber-Kucharsky <[email protected]>
Message-Id: <7335649f868c36ff76b6c0bf21677d96441bc9e4.1550080427.git.jhaberku@scylladb.com>
The future::then() and future::then_wrapped() templates are usually called
with a distinct Func type per invocation (since these types are usually
lambdas). That means that the code in then() and then_wrapped() is compiled
as different functions for each invocation. Moreover, Func::operator() is
invoked twice, so if the compiler starts inlining, even more code is compiled.
The result is an explosion of generated code and compile time.

To combat this, provide a mode where Func is type-erased before the meat
of then() and then_wrapped() is called. That reduces the amount of generated
code significantly. We also call it just once to discourage the compiler
from attempting to inline in. This results in a 37% decrease in generated
code (and presumably compile time) on a large source file.

Note that type erasure is partial: we still have a version of then() for
each future specialiation, and for each result type. But there's far fewer
of these combinations that individual invocations of these functions.

Message-Id: <[email protected]>
We only want Doxygen to search the files in `gen/include`; not
`gen/src`.
Without this change, the generated HTML from Doxygen for a class such as
`seastar::future` says

    #include <future.hh>

With this change, it says

    #include <seastar/core/future.hh>
There are header files which do not contain Doxygen mark-up (i.e., are
not documented) but which contain definitions which need to be processed
by Doxygen in order to successfully parse header files which *are*
documented.

With this change, the rendered Doxygen HTML includes previously-missing
documentation for classes like `seastar::future`.

Fixes scylladb#605
"
These patches fix some issues with the Seastar API
documentation generated by Doxygen.
"

* 'jhk/fix_doxygen/v1' of https://github.com/hakuch/seastar:
  docs: Ensure all header files are searched by Doxygen
  docs: Show proper header file names for classes
  docs: Search generated headers, not sources
scheduling_group provides better functionality (inheritance along continuation
chains, submit_to() calls, and even RPC) and better performance as it doesn't
require a high precision timer per thread. It is time to deprecate
thread_scheduling_group with a view to retiring it later.
Message-Id: <[email protected]>
Since we have a "Dev" mode, the previous names of
`Seastar_DEVELOPMENT_*` were a little confusing.

To be consistent with the CMake target-properties of `*_CXX_FLAGS` and
`*_COMPILE_DEFINITIONS`, we rename the variables like so:

    Seastar_DEVELOPMENT_FLAGS   --> Seastar_PRIVATE_CXX_FLAGS
    Seastar_DEVELOPMENT_DEFINES --> Seastar_PRIVATE_COMPILE_DEFINITIONS

We also rename the amended `seastar` target from `seastar_with_flags` to
`seastar_private`.

Signed-off-by: Jesse Haber-Kucharsky <[email protected]>
Message-Id: <fa84add04c2f4d0facb5dd7f2e1af0a4073a1d25.1550250959.git.jhaberku@scylladb.com>
We are referencing cfg attributes but cfg is moved already by then. This
seems to work because the duration type moves are not destructive but it is
still confusing for readers and not a good practice.

Found this by code inspection while searching for something else.

Signed-off-by: Glauber Costa <[email protected]>
Message-Id: <[email protected]>
The earliest version of fmt we support based on its public API is 4.0.0.
However, we hit fmtlib/fmt#744 and so the
minimum released version that works is 5.0.0.

Fixes scylladb#559

Signed-off-by: Jesse Haber-Kucharsky <[email protected]>
Message-Id: <500d8681015f800c24f4c110e8404e38bb2aae67.1550609990.git.jhaberku@scylladb.com>
There is a typo in the implementation of posix_file_impl::read_dma
with `iovec` input: it writes `iovec` instead of the data it points to.

Message-Id: <[email protected]>
In debug mode, there's no need to do BOOST_REQUIRE() check.
The parameter DEBUG should be SEASTAR_DEBUG according to CMakeLists.txt.

Jira: ENTWLS-853

Change-Id: I85ec71f1356adc80b18c5bdadda845891294284e
Signed-off-by: Richael Zhuang <[email protected]>
Message-Id: <[email protected]>
deleter append should not "unwind" chained deleters as it
causes early "free" of objects referenced by "shared" deleter.

When deleter is shared only leading "impl" refcount is updated, while
"chained" deleters lifecycle is managed by "leading" deleter "impl" via
"next" chain. Therefore unwinding of this chain during append will
limit lifecycle of chained object to "deleter" on which "append" is called,
completely ignoring "shared" delter created in "share".

Signed-off-by: Solganik Alexander <[email protected]>
Message-Id: <[email protected]>
This reverts commit 168a7f5, does not
build:

../../tests/unit/deleter_test.cc:19:1: error: "/*" within comment [-Werror=comment]
 /* Copyright (C) 2019 Lightbits Labs Ltd. - All Rights Reserved
…er shard

The connection load balancing has two paths: one where the non-listening (ap)
shard is already waiting to a connection, and another where the listening
(main) shard got a connection first, and is queueing it until such time as
the non-listening shard gets a round to dequeue it. In the second case, we
forgot to pass along the conntrack handle, and so we did not account for it
in the load balancer data structures. This can cause a shard which used the second
path to later receive more than its fair share.

Tests: unit(dev), some manual tests with httpd.
Message-Id: <[email protected]>
README.md suggests that Seastar builds by default with C++14, and special
steps need to be taken to build with C++17. But today, it's actually the
other way around, so let's correct the instructions.

Also, since most users will probably build Seastar using "configure.py"
and not use "cmake" directly (for building Seastar itself), explain the
configure.py option to use a different dialect.

I left the part about SEASTAR_USE_STD_OPTIONAL_VARIANT_STRINGVIEW mostly
unchanged. It should probably be rewritten to be clearer, but I think it's
still mostly correct.

Signed-off-by: Nadav Har'El <[email protected]>
Message-Id: <[email protected]>
static_assert was introduced in C++11 as taking two arguments - the
expression to check, and an error message. Only C++17 added the option
of dropping the second argument (and just showing which expression failed),
but Seastar is supposed to be usable for C++14, so we shouldn't use this
feature.

I found three occurances of single-argument static_assert() that crept into
our code based, and fixed them in this patch.

Signed-off-by: Nadav Har'El <[email protected]>
Message-Id: <[email protected]>
Currently the CLI help message never shows the network stack options or
DPDK settings. This is because the help message is generated once by
Boost::program_options in the constructor of app-template, before
smp::configure is called. This patch separates the reactor configuration
routine from the registration of network stacks and their respective
CLI options.

Message-Id: <[email protected]>
A partial introduction to memory allocation in Seastar (there's still a lot
to be added in the future), and also to foreign_ptr.

Signed-off-by: Nadav Har'El <[email protected]>
Message-Id: <[email protected]>
deleter append should not "unwind" chained deleters as it
causes early "free" of objects referenced by "shared" deleter.

When deleter is shared only leading "impl" refcount is updated, while
"chained" deleters lifecycle is managed by "leading" deleter "impl" via
"next" chain. Therefore unwinding of this chain during append will
limit lifecycle of chained object to "deleter" on which "append" is called,
completely ignoring "shared" delter created in "share".

Signed-off-by: Solganik Alexander <[email protected]>
Message-Id: <[email protected]>
Custom targets in CMake do not automatically propogate file-level
dependencies.

This is explained in a blog post [1] in the section titled "File-level
dependencies of custom targets are not propagated".

Without this change, modifying `tutorial.md` and building the
`doc/split` output would not regenerate `tutorial.html`.

Fixes scylladb#614

[1] https://samthursfield.wordpress.com/2015/11/21/cmake-dependencies-between-targets-and-files-and-custom-commands/

Signed-off-by: Jesse Haber-Kucharsky <[email protected]>
Message-Id: <ede82cc3713de768e344c277987e7928664b3037.1551116263.git.jhaberku@scylladb.com>
And add unit tests for aligned_alloc and temporary_buffer<char>::aligned()

Signed-off-by: Benny Halevy <[email protected]>
Message-Id: <[email protected]>
"
This series adds suffixes to target names (tests, demos) so
that invoking the build tool like

    ninja -C build/release tests/unit/futures_test

works as expected.
"

* 'jhk/target_names/v1' of https://github.com/hakuch/seastar:
  build: Add the "_demo" suffix to demo. executables
  build: Add the "_perf" suffix to perf. test executables
  build: Add the "_test" suffix to test executables
…s already closed

close() should not fail when other side already closed the socket.

Fixes: scylladb#606

Message-Id: <[email protected]>
espindola and others added 29 commits June 6, 2019 07:58
GCC's alias analysis could not figure out that it could reuse the
pointer to the future.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
GCC's alias analysis could not figure out that it could reuse the
pointer to the promise.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
By defining a few non template base classes we avoid duplicating code
for each instantiation and can move two functions out of line.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
"
There are quite a few ideas about futures floating around (not
checking need_preempt, templating over T instead of T..., etc).

I would like to avoid having to rebase my patches over that, so this
series extracts only the simpler ones in an independent series that is
hopefully not too controversial.

The net result of this series is

* build/release/scylla builds 1.146 times faster (from 17:44.58 to 15:28.40)
* The tps of 'perf_simple_query --duration 16 --smp 1 -m4G' increases
  by 1.025 times (from 134,289.71 to 137,675.10)
* The size of the .text section of build/release/scylla gets 1.266
  times smaller (from 58,290,644 to 46,015,012 bytes)

Each patch is also an independent improvement IMHO. Some are better
style (no auto&& in lambda), some are more idiomatic c++ (an object
knows how to construct itself) and others are common practice (moving
common code out of a template).
"

* 'cleanups-future-v5' of https://github.com/espindola/seastar:
  future: Move code out of line
  future: Avoid alias in the future move constructor
  future: Avoid alias in the promise move constructor
  future: Use the future_state move constructor in the future move constructor
  future: Refactor duplicated code into a take_exception helper
  future: Reduce code duplication
  future: Avoid a few std::exception_ptr copies and moves
  future: Add more future_state constructors
  future: Delete set_urgent_value and set_urgent_exception
  future: Replace state::forward_to with promise::set_urgent_state
  future: Don't check for self move assignment
  future: delete continuation::set_state(std::tuple)
  future: delete ready_future_from_tuple_marker
  future: Remove redundant set_value overloads
  future: Simplify promise::set_value
  future: avoid auto in lambda
  future: pass a reference to schedule
  future: make some functions noexcept
Introduce a mode that is going to be returned by a tuner that doesn't
'care' (and doesn't configure) any IRQ binding.

Signed-off-by: Vlad Zolotarov <[email protected]>
…modes_set) method

This method returns a combined tuning mode for a given set of modes.

Signed-off-by: Vlad Zolotarov <[email protected]>
This tuner is going to be responsible for tuning the general parts
of the system that are not related to a particular device.

Currently it tunes nothing.

Signed-off-by: Vlad Zolotarov <[email protected]>
We would like to be able to enforce that the preferred clocksource
is used (tsc for x86 architectures). Not doing so can cause applications
that rely heavily on timestamps to take a performance penalty.

If the clocksource is available but not used, maybe there is a reason to
do so (for instance, it may not be trusted), in which case the user must
explicitly select that the clock must be set throught the --tune-clock
option.

To do the tuning, we will leverage the newly created System tuner.

Signed-off-by: Glauber Costa <[email protected]>
This fixes a use after free of the cache during shutdown. With that
fixed we only need to annotate two functions with no_sanitize_address
for this to work in debug mode.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
Message-Id: <[email protected]>
Seastar crashes when running on Arm with native network stack enabled.

It's caused by a code snippet like "func(free(ptr), use(ptr))", which
is a function call with two parameters. First parameter is a function
to free a ptr, while the second parameter dereferences the ptr.

This code works okay on x86 as parameters are passed from right to left
in stack pushing order. So use(ptr) is called before free(ptr).

But it fails on Arm as parameters are passed from left to right on Arm
to leverage abundant general purpose registers. So free(ptr) will be
called before use(ptr), which leads to crash.

To show the issue, below code outputs "ba" on x86, but "ab" on Arm.

printf("", printf("a"), printf("b"));

Message-Id: <[email protected]>
When iotune looks for the directory mount point it iterates 'backward' in
directory tree until it will find directory for which device id in stat
structure changes. In case when tuned directory is mounted at root then
iotune ends up in infinite loop.

Signed-off-by: Michal Maslanka <[email protected]>
Message-Id: <[email protected]>
Leaving unprocessed work items in the smp queues can trigger the
LeakSanitizer.

Signed-off-by: Duarte Nunes <[email protected]>
Message-Id: <[email protected]>
The sanitizer flags were already marked as public in cmake, but for
projects using pkgconfig instead of cmake the link flags were not
being exported.

With this patch pkgconfig is consistent with cmake and we can drop
deplicated code from scylla's configure.py.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
Message-Id: <[email protected]>
submit_to()'s documentation explained that it extends the life of a temporary
function object passed to it, but didn't explain where this object is
eventually destructed - on the target CPU, or on the calling CPU.
Since this became a debate on the Seastar mailing list, obviously it
wasn't clear what really happens - so we should clarify this in the
submit_to() documentation.

Signed-off-by: Nadav Har'El <[email protected]>
Message-Id: <[email protected]>
"
Because some seastar applications are  timestamp-intensive, the choice
of the clocksource can have a very big impact on performance. As with
other performance sensitive options, we would like perftune to be in
charge of enforcing good standards.

Because clocksource setting is not related to neither disks nor nic,
this patchset introduces first a new, System tuner, and on top of that
builds the infrastructure to tune the clocksource.
"

* 'clocksource-tune-v3-1' of https://github.com/vladzcloudius/seastar:
  SystemTune: tune the clocksource
  perftune.py: initial commit for a new tuner - SystemPerfTuner
  perftune.py: introduce a static PerfTunerBase.SupportedModes.combine(modes_set) method
  perftune.py: introduce a 'no_restrictions' tuning mode
For now this is just Debug + -Os, but it will probably become more
specialized over time.

The objective of this mode is to be able to run tests with sanitizers
faster and, more importantly, be able to run sanitized scylla in more
realistic use cases.

On my machine ctest takes 3:48.21 total on a Debug build and 2:51.41
on a Sanitize build.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
Message-Id: <[email protected]>
…Device ones

pyudev.Device.from_device_number() and pyudev.Device.from_device_file()
are deprecated since 0.18.

Let's use the new API instead and avoid annoying warning messages.

Signed-off-by: Vlad Zolotarov <[email protected]>
Message-Id: <[email protected]>
The sanitizer flags were being exported in pkgconfig even in modes
that didn't use them.

Signed-off-by: Rafael Ávila de Espíndola <[email protected]>
Message-Id: <[email protected]>
Remove this unused variable, whose purpose it already carried out upon
the lambda's destruction.

Signed-off-by: Duarte Nunes <[email protected]>
Message-Id: <[email protected]>
It is a pity that the sanitize mode is still so slow for seastar, but
a quick look at the perf output of perf_simple_query doesn't show
anything obvious other than trying to reduce malloc/free traffic.

Message-Id: <[email protected]>
Add flush() function to sink class that ensures that any buffered data
is sent out. Current implementation is simple, it just waits for all
data to be sent, since sink does not buffer indefinitely, but this may
change in the feature.

Message-Id: <[email protected]>
Applications may want to know for how long they have been running.  It
is possible to aquire this information from /proc, but since we already
record start time we can expose this faster using a simpler operation.

There will be differences of uptime in each shard, but this seems like
a reasonable restriction for seastar applications.

Signed-off-by: Glauber Costa <[email protected]>
Message-Id: <[email protected]>
…ice ones

pyudev.Device.from_device_number() and pyudev.Device.from_device_file()
are deprecated since 0.18.

Let's use the new API instead and avoid annoying warning messages.

Signed-off-by: Vlad Zolotarov <[email protected]>
Message-Id: <[email protected]>
- Move to new memory subsystem
  https://doc.dpdk.org/guides/rel_notes/release_18_05.html
  * Leverage VFIO+IOMMU to use IOVA, not PA, for DMA.
  * Direct map(IOVA=VA) each core's whole heap space at startup.

- Move to new offloads API
  DPDK ethdev offloads API has changed since:
    commit ce17edd ("ethdev: introduce Rx queue offloads API")
    commit cba7f53 ("ethdev: introduce Tx queue offloads API")

- Other changes
  * Drop explicit enabling CRC strip offload. From 18.11 release notes:
      The default behavior of CRC strip offload has changed in this
      release. Without any specific Rx offload flag, default behavior
      by a PMD is now to strip CRC.
  * Replace PKT_RX_VLAN_PKT with PKT_RX_VLAN. From 17.11 release notes:
      The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT have been
      removed since their behavior was not properly described.
      Two mbuf flags have been added to indicate that the VLAN identifier
      has been saved in in the mbuf structure. For instance:
        If VLAN is not stripped and TCI is saved: PKT_RX_VLAN
        If VLAN is stripped and TCI is saved: PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED
  * Set RSS offload rules per dev_info
      commit aa1a6d8 (ethdev: force RSS offload rules again)
  * Move xstats initialization after port started to fix errors on
    mellanox nics.
  * Update port id to 16 bits per 17.11 changes.
  * Replace deprecated function rte_eth_dev_count() with
    rte_eth_dev_count_avail().
  * Update DPDK build scripts and configs. Add "--whole-archive" when
    generating DPDK enabled applications.
  * Drop frag0 cross page checking as IOVA is always continuous.

Message-Id: <[email protected]>
Right now, we have the following code that tries to enable ntuple
offloading

        try:
            run_one_command(['ethtool','-K', iface, 'ntuple', 'on'], stderr=subprocess.DEVNULL)
            print("ok")
        except:
            print("not supported")

This always fails, and we've been essentially ignoring believing that this
is just the way the NIC is.

The reason for the failure is more subtle:

we pass an 'stderr' argument to run_one_command, while the function
expects 'my_stderr' in its signature. If we take this code out of the
try-catch block, here's what we see:

Trying to enable ntuple filtering HW offload for enp30s0...ERROR: run_one_command() got an unexpected keyword argument 'stderr'. Your system can't be tuned until the issue is fixed.

Instead of the usual sneaky:

Trying to enable ntuple filtering HW offload for enp30s0...not supported

This patch fixes the keyword so this won't happen
(also maybe we shouldn't be using a catch all here)

Signed-off-by: Glauber Costa <[email protected]>
Message-Id: <[email protected]>
which prevents copy elision. see
https://en.cppreference.com/w/cpp/language/copy_elision

and this change silences warning of

iostream-impl.hh:40:37: warning: redundant move in return statement
[-Wredundant-move]
   40 |                     return std::move(buffer);
      |                            ~~~~~~~~~^~~~~~~~

Signed-off-by: Kefu Chai <[email protected]>
Message-Id: <[email protected]>
as there is a chance that we need to stick with an ancient linker which
will fail to link with "-fsanitize=undefined" option:

/usr/bin/ld: unrecognized option '--push-state--no-as-needed'
/usr/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status

for instance, the `ld` from binutils on ubuntu 16.04 fails to work.

Signed-off-by: Kefu Chai <[email protected]>
we should pass -pthread to compiler and linker, if pthread_setname_np()
is used. for cmake 3.1 and up, we need to link against Threads::Threads.

Signed-off-by: Kefu Chai <[email protected]>
to silence warning like

warning: unused variable ‘ret’ [-Wunused-variable]

Signed-off-by: Kefu Chai <[email protected]>
ronen-fr pushed a commit that referenced this pull request Jan 7, 2020
This reverts commit 33406cf. It
introduces memory leaks:

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7fb773b389d7 in operator new(unsigned long) (/lib64/libasan.so.5+0x10f9d7)
    #1 0x108f0d4 in seastar::reactor::poller::~poller() ../src/core/reactor.cc:2879
    ceph#2 0x11c1e59 in std::experimental::fundamentals_v1::_Optional_base<seastar::reactor::poller, true>::~_Optional_base() /usr/include/c++/9/experimental/optional:288
    ceph#3 0x118f2d7 in std::experimental::fundamentals_v1::optional<seastar::reactor::poller>::~optional() /usr/include/c++/9/experimental/optional:491
    ceph#4 0x108c5a5 in seastar::reactor::run() ../src/core/reactor.cc:2587
    ceph#5 0xf1a822 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/core/app-template.cc:199
    ceph#6 0xf1885d in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) ../src/core/app-template.cc:115
    ceph#7 0xeb2735 in operator() ../src/testing/test_runner.cc:72
    ceph#8 0xebb342 in _M_invoke /usr/include/c++/9/bits/std_function.h:300
    ceph#9 0xf3d8b0 in std::function<void ()>::operator()() const /usr/include/c++/9/bits/std_function.h:690
    ceph#10 0x1034c72 in seastar::posix_thread::start_routine(void*) ../src/core/posix.cc:52
    ceph#11 0x7fb7738804e1 in start_thread /usr/src/debug/glibc-2.30-13-g919af705ee/nptl/pthread_create.c:479

Reported-by: Rafael Avila de Espindola <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.