====================================================================== __________ ____ ____ _________ / ____/ __ \/ __ \/ __ \/ ____/ | / / __/ /_/ / / / / /_/ / / / /| | / /_/ / ____/ /_/ / _, _/ /___/ ___ | \____/_/ \____/_/ |_|\____/_/ |_| The Greenplum Query Optimizer Copyright (c) 2015, VMware, Inc. or its affiliates. Licensed under the Apache License, Version 2.0 ======================================================================
Welcome to GPORCA, the Greenplum Next Generation Query Optimizer!
To understand the objectives and architecture of GPORCA please refer to the following articles:
- Orca: A Modular Query Optimizer Architecture for Big Data.
- Profiling Query Compilation Time with GPORCA
- Improving Constraints In ORCA
Want to Contribute?
GPORCA supports various build types: debug, release with debug info, release. You'll need CMake 3.1 or higher to build the gporca_test and gpos_test utilities to test GPORCA. Get it from cmake.org, or your operating system's package manager.
ORCA is built automatically with GPDB as long as --disable-orca
is not used.
To test GPORCA, first go into the gporca
directory:
cmake -GNinja -H. -Bbuild
ninja -C build
To run all GPORCA tests, simply use the ctest
command from the build directory
after build finishes.
cd build
ctest
Much like make
, ctest
has a -j option that allows running multiple tests in
parallel to save time. Using it is recommended for faster testing.
ctest -j8
By default, ctest
does not print the output of failed tests. To print the
output of failed tests, use the --output-on-failure
flag like so (this is
useful for debugging failed tests):
ctest -j8 --output-on-failure
To run only the previously failed ctests, use the --rerun-failed
flag.
ctest -j8 --rerun-failed --output-on-failure
To run a specific individual test, use the gporca_test
executable directly.
./server/gporca_test -U CAggTest
To run a specific minidump, for example for ../data/dxl/minidump/TVFRandom.mdp
:
./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdp
Note that some tests use assertions that are only enabled for DEBUG builds, so DEBUG-mode tests tend to be more rigorous.
Most of the regression tests come in the form of a "minidump" file. A minidump is an XML file that contains all the input needed to plan a query, including information about all tables, datatypes, and functions used, as well as statistics. It also contains the resulting plan.
A new minidump can be created by running a query on a live GPDB server:
- Run these in a psql session:
set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;
set optimizer_enable_constant_expression_evaluation=off;
- Run the query in the same psql session. It will create a minidump file under the "minidumps" directory, in the master's data directory:
$ ls -l $MASTER_DATA_DIRECTORY/minidumps/
total 12
-rw------- 1 heikki heikki 10818 Jun 10 22:02 Minidump_20160610_220222_4_14.mdp
- Run xmllint on the minidump to format it better, and copy it under the data/dxl/minidump directory:
xmllint --format $MASTER_DATA_DIRECTORY/minidumps/Minidump_20160610_220222_4_14.mdp > data/dxl/minidump/MyTest.mdp
- Add it to the test suite, in server/src/unittest/gpopt/minidump/CICGTest.cpp
--- a/server/src/unittest/gpopt/minidump/CICGTest.cpp
+++ b/server/src/unittest/gpopt/minidump/CICGTest.cpp
@@ -217,6 +217,7 @@ const CHAR *rgszFileNames[] =
"../data/dxl/minidump/EffectsOfJoinFilter.mdp",
"../data/dxl/minidump/Join-IDF.mdp",
"../data/dxl/minidump/CoerceToDomain.mdp",
+ "../data/dxl/minidump/Mytest.mdp",
"../data/dxl/minidump/LeftOuter2InnerUnionAllAntiSemiJoin.mdp",
#ifndef GPOS_DEBUG
// TODO: - Jul 14 2015; disabling it for debug build to reduce testing time
Alternatively, it could also be added to the proper test suite in server/CMakeLists.txt
as follows:
--- a/server/CMakeLists.txt
+++ b/server/CMakeLists.txt
@@ -183,7 +183,8 @@ CPartTbl5Test:
PartTbl-IsNullPredicate PartTbl-IsNotNullPredicate PartTbl-IndexOnDefPartOnly
PartTbl-SubqueryOuterRef PartTbl-CSQ-PartKey PartTbl-CSQ-NonPartKey
PartTbl-LeftOuterHashJoin-DPE-IsNull PartTbl-LeftOuterNLJoin-DPE-IsNull
-PartTbl-List-DPE-Varchar-Predicates PartTbl-List-DPE-Int-Predicates;
+PartTbl-List-DPE-Varchar-Predicates PartTbl-List-DPE-Int-Predicates
+Mytest;
In some situations, a failing test does not necessarily imply that the fix is wrong. Occasionally, existing tests need to be updated. There is now a script that allows for users to quickly and easily update existing mdps. This script takes in a logfile that it will use to update the mdps. This logfile can be obtained from running ctest as shown below.
Existing minidumps can be updated by running the following:
-
Run
ctest -j8
. -
If there are failing tests, run
ctest -j8 --rerun-failed --output-on-failure | tee /tmp/failures.out
- The output file can then be used with the
fix_mdps.py
script.
gporca/scripts/fix_mdps.py --logFile /tmp/failures.out
Note: This will overwrite existing mdp files. This is best used after
committing existing changes, so you can more easily see the diff.
Alternatively, you can use gporca/scripts/fix_mdps.py --dryRun
to not change
mdp files
- Ensure that all changes are valid and as expected.
Here are a few build flavors (commands run from the ORCA checkout directory):
# debug build
cmake -GNinja -D CMAKE_BUILD_TYPE=Debug -H. -Bbuild.debug
# release build with debug info
cmake -GNinja -D CMAKE_BUILD_TYPE=RelWithDebInfo -H. -Bbuild.release
If you want to build with a custom version of Xerces, is recommended to use the
--prefix
option to the Xerces-C configure script to install Xerces in a
location other than the default under /usr/local/
, because you may have other
software that depends on the platform's version of Xerces-C. Installing in a
non-default prefix allows you to have GP-Xerces installed side-by-side with
unpatched Xerces without incompatibilities.
You can point cmake at your custom Xerces installation using the
XERCES_INCLUDE_DIR
and XERCES_LIBRARY
options like so:
However, to use the current build scripts in GPDB, Xerces will need to be placed on the /usr path.
cmake -GNinja -D XERCES_INCLUDE_DIR=/opt/gp_xerces/include -D XERCES_LIBRARY=/opt/gp_xerces/lib/libxerces-c.so ..
Again, on Mac OS X, the library name will end with .dylib
instead of .so
.
Show all command lines while building (for debugging purpose)
ninja -v -C build
We accept contributions via Github Pull requests only.
ORCA has a style guide, try to follow the existing style in your contribution to be consistent.
A set of clang-format-based rules are enforced in CI. Your editor or IDE may automatically support it. When in doubt, check formatting locally before submitting your PR:
CLANG_FORMAT=clang-format src/tools/fmt chk
For more information, head over to the formatting README.
A set of clang-tidy-based checks are enforced in CI. Your editor or IDE may support displaying ClangTidy diagnostics. When in doubt, check formatting locally before submitting your patches:
CLANG_TIDY=clang-tidy-12 src/tools/tidy build.debug
See our Clang-Tidy README for details about how to invoke the tidy script.
We follow GPDB's comprehensive contribution policy. Please refer to it here for details.