Skip to content

Latest commit

 

History

History
262 lines (191 loc) · 8.7 KB

README.md

File metadata and controls

262 lines (191 loc) · 8.7 KB
======================================================================
                 __________  ____  ____  _________
                / ____/ __ \/ __ \/ __ \/ ____/   |
               / / __/ /_/ / / / / /_/ / /   / /| |
              / /_/ / ____/ /_/ / _, _/ /___/ ___ |
              \____/_/    \____/_/ |_|\____/_/  |_|
                  The Greenplum Query Optimizer
              Copyright (c) 2015, Pivotal Software, Inc.
            Licensed under the Apache License, Version 2.0
======================================================================

Welcome to GPORCA, the Greenplum Next Generation Query Optimizer!

To understand the objectives and architecture of GPORCA please refer to the following articles:

Want to Contribute?

GPORCA supports various build types: debug, release with debug info, release. You'll need CMake 3.1 or higher to build the gporca_test and gpos_test utilities to test GPORCA. Get it from cmake.org, or your operating system's package manager.

First Time Setup

Pre-Requisites

GPORCA uses the following library:

  • GP-Xerces - Greenplum's patched version of Xerces-C 3.1.X

Installing GP-Xerces

GP-XERCES is available here. The GP-XERCES README gives instructions for building and installing.

Build and install GPORCA

ORCA is built automatically with GPDB as long as --disable-orca is not used.

Test GPORCA

To test GPORCA, first go into the gporca directory:

cmake -GNinja -H. -Bbuild

To run all GPORCA tests, simply use the ctest command from the build directory after build finishes.

cd build
ctest

Much like make, ctest has a -j option that allows running multiple tests in parallel to save time. Using it is recommended for faster testing.

ctest -j8

By default, ctest does not print the output of failed tests. To print the output of failed tests, use the --output-on-failure flag like so (this is useful for debugging failed tests):

ctest -j8 --output-on-failure

To run only the previously failed ctests, use the --rerun-failed flag.

ctest -j8 --rerun-failed --output-on-failure

To run a specific individual test, use the gporca_test executable directly.

./server/gporca_test -U CAggTest

To run a specific minidump, for example for ../data/dxl/minidump/TVFRandom.mdp:

./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdp

Note that some tests use assertions that are only enabled for DEBUG builds, so DEBUG-mode tests tend to be more rigorous.

Adding tests

Most of the regression tests come in the form of a "minidump" file. A minidump is an XML file that contains all the input needed to plan a query, including information about all tables, datatypes, and functions used, as well as statistics. It also contains the resulting plan.

A new minidump can be created by running a query on a live GPDB server:

  1. Run these in a psql session:
set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;
set optimizer_enable_constant_expression_evaluation=off;
  1. Run the query in the same psql session. It will create a minidump file under the "minidumps" directory, in the master's data directory:
$ ls -l $MASTER_DATA_DIRECTORY/minidumps/
total 12
-rw------- 1 heikki heikki 10818 Jun 10 22:02 Minidump_20160610_220222_4_14.mdp
  1. Run xmllint on the minidump to format it better, and copy it under the data/dxl/minidump directory:
xmllint --format $MASTER_DATA_DIRECTORY/minidumps/Minidump_20160610_220222_4_14.mdp > data/dxl/minidump/MyTest.mdp
  1. Add it to the test suite, in server/src/unittest/gpopt/minidump/CICGTest.cpp
--- a/server/src/unittest/gpopt/minidump/CICGTest.cpp
+++ b/server/src/unittest/gpopt/minidump/CICGTest.cpp
@@ -217,6 +217,7 @@ const CHAR *rgszFileNames[] =
                "../data/dxl/minidump/EffectsOfJoinFilter.mdp",
                "../data/dxl/minidump/Join-IDF.mdp",
                "../data/dxl/minidump/CoerceToDomain.mdp",
+               "../data/dxl/minidump/Mytest.mdp",
                "../data/dxl/minidump/LeftOuter2InnerUnionAllAntiSemiJoin.mdp",
 #ifndef GPOS_DEBUG
                // TODO:  - Jul 14 2015; disabling it for debug build to reduce testing time

Alternatively, it could also be added to the proper test suite in server/CMakeLists.txt as follows:

--- a/server/CMakeLists.txt
+++ b/server/CMakeLists.txt
@@ -183,7 +183,8 @@ CPartTbl5Test:
 PartTbl-IsNullPredicate PartTbl-IsNotNullPredicate PartTbl-IndexOnDefPartOnly
 PartTbl-SubqueryOuterRef PartTbl-CSQ-PartKey PartTbl-CSQ-NonPartKey
 PartTbl-LeftOuterHashJoin-DPE-IsNull PartTbl-LeftOuterNLJoin-DPE-IsNull
-PartTbl-List-DPE-Varchar-Predicates PartTbl-List-DPE-Int-Predicates;
+PartTbl-List-DPE-Varchar-Predicates PartTbl-List-DPE-Int-Predicates
+Mytest;

Update tests

In some situations, a failing test does not necessarily imply that the fix is wrong. Occasionally, existing tests need to be updated. There is now a script that allows for users to quickly and easily update existing mdps. This script takes in a logfile that it will use to update the mdps. This logfile can be obtained from running ctest as shown below.

Existing minidumps can be updated by runing the following:

  1. Run ctest -j8.

  2. If there are failing tests, run

ctest -j8 --rerun-failed --output-on-failure | tee /tmp/failures.out
  1. The output file can then be used with the fix_mdps.py script.
gporca/scripts/fix_mdps.py --logFile /tmp/failures.out

Note: This will overwrite existing mdp files. This is best used after committing existing changes, so you can more easily see the diff. Alternatively, you can use gporca/scripts/fix_mdps.py --dryRun to not change mdp files

  1. Ensure that all changes are valid and as expected.

Advanced Setup

How to generate build files with different options

Here are a few build flavors (commands run from the ORCA checkout directory):

# debug build
cmake -GNinja -D CMAKE_BUILD_TYPE=Debug -H. -Bbuild.debug
# release build with debug info
cmake -GNinja -D CMAKE_BUILD_TYPE=RelWithDebInfo -H. -Bbuild.release

Explicitly Specifying GP-Xerces For Build

GP-XERCES

It is recommended to use the --prefix option to the Xerces-C configure script to install GP-Xerces in a location other than the default under /usr/local/, because you may have other software that depends on Xerces-C, and the changes introduced in the GP-Xerces patch make it incompatible with the upstream version. Installing in a non-default prefix allows you to have GP-Xerces installed side-by-side with unpatched Xerces without incompatibilities.

You can point cmake at your patched GP-Xerces installation using the XERCES_INCLUDE_DIR and XERCES_LIBRARY options like so:

However, to use the current build scripts in GPDB, Xerces with the gp_xerces patch will need to be placed on the /usr path.

cmake -GNinja -D XERCES_INCLUDE_DIR=/opt/gp_xerces/include -D XERCES_LIBRARY=/opt/gp_xerces/lib/libxerces-c.so ..

Again, on Mac OS X, the library name will end with .dylib instead of .so.

How to debug the build

Show all command lines while building (for debugging purpose)

ninja -v -C build

Extended Tests

Debug builds of GPORCA include a couple of "extended" tests for features like fault-simulation and time-slicing that work by running the entire test suite in combination with the feature being tested. These tests can take a long time to run and are not enabled by default. To turn extended tests on, add the cmake arguments -D ENABLE_EXTENDED_TESTS=1.

How to Contribute

We accept contributions via Github Pull requests only.

ORCA has a style guide, try to follow the existing style in your contribution to be consistent.

A set of clang-format-based rules are enforced in CI. Your editor or IDE may automatically support it. When in doubt, check formatting locally before submitting your PR:

CLANG_FORMAT=clang-format src/tools/fmt chk

For more information, head over to the formatting README.

We follow GPDB's comprehensive contribution policy. Please refer to it here for details.