Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] fix dispatch table copy #24

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Jonas-Heinrich
Copy link

@Jonas-Heinrich Jonas-Heinrich commented Sep 3, 2023

Hi,

while comparing QPL to TUM Umbra's related functionality, I noticed that the former has a large overhead for small amounts of tuples. After investigating with VTune and the following microbenchmark, I think I found a typo that leads to an accidental copy of the function pointer table in input_stream_t::initialize_sw_kernels. Here's the benchmark:

#include "qpl/qpl.h"
#include <benchmark/benchmark.h>

#include "../DataDistribution.hpp"
#include "../Utils.hpp"

int main() {
    // pointer wrapper with destructor
   ExecutionContext fixture(qpl_path_software);
   qpl_job* job = reinterpret_cast<qpl_job*>(fixture.getExecutionContext());

   uint32_t numTuples = 1;
   double selectivity = 0.5;
   uint32_t inBitWidth = 8;
   qpl_out_format outFormat = qpl_out_format::qpl_ow_32;

   // private utilities, does what you would expect
   LittleEndianBufferBuilder builder(inBitWidth, numTuples);
   Dataset dataset = UniformDistribution::generateDataset(
      builder,
      inBitWidth,
      (1ul << inBitWidth) - 1,
      numTuples,
      umbra::VectorizedFunctions::Mode::Eq,
      selectivity);
   std::vector<uint8_t> destination;
   destination.resize(divceil(32 * numTuples, 8));

   for (size_t i = 0; i < 1'000'000'000; i++) {
      if (i % 1'000'000 == 0) {
         std::cout << i << std::endl;
      }

      // Parameterize jobs.
      {
         job->parser = builder.getQPLParser();
         job->next_in_ptr = dataset.data.data();
         job->available_in = dataset.data.size();
         job->next_out_ptr = destination.data();
         job->available_out = static_cast<uint32_t>(destination.size());
         job->op = map_umbra_to_qpl_op(umbra::VectorizedFunctions::Mode::Eq);
         job->src1_bit_width = inBitWidth;
         job->num_input_elements = numTuples;
         job->out_bit_width = outFormat;
         auto [param_low, param_high] = dataset.predicateArguments->template getValues<uint32_t, 2>();
         job->param_low = param_low;
         job->param_high = param_high;
         job->flags = static_cast<uint32_t>(QPL_FLAG_OMIT_CHECKSUMS | QPL_FLAG_OMIT_AGGREGATES);
      }

      qpl_status status = qpl_execute_job(job);
      if (status != QPL_STS_OK) {
         ERROR("An error occurred during job execution: " << status);
      }

      const auto indicesByteSize = job->total_out;
      const auto bytesPerHit = divceil(32, 8);
      const auto qplHits = indicesByteSize / bytesPerHit;
      if ((outFormat != qpl_ow_nom && qplHits != dataset.predicateHits)) {
         ERROR("Result does not fit expectations: " << qplHits << " != " << dataset.predicateHits);
      } else if (outFormat == qpl_ow_nom && job->total_out != divceil(numTuples, 8)) {
         ERROR("Result does not fit expectations: " << job->total_out << " != " << divceil(numTuples, 8));
      }

      benchmark::DoNotOptimize(job->total_out);
      benchmark::DoNotOptimize(status);
      benchmark::DoNotOptimize(destination);
      benchmark::ClobberMemory();
   }
}

The benchmark was run for 15s on an i9 13900K. Screenshot of VTune summary before the PR:

Screenshot 2023-09-03 at 12 45 58

after the PR:

Screenshot 2023-09-03 at 13 39 15

@Jonas-Heinrich
Copy link
Author

Jonas-Heinrich commented Sep 3, 2023

After further investigation, I noticed that the issue is also present in other locations. The second force push now includes other locations (found by searching for core_sw::dispatcher::kernels_dispatcher::get_instance()).

@mzhukova mzhukova self-assigned this Sep 5, 2023
@mzhukova
Copy link
Contributor

Hi @Jonas-Heinrich thank you for the contribution!
Team would review and do a thorough testing on our side in the upcoming weeks.

@mzhukova mzhukova added the enhancement New feature or request label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants