Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: ADBDEV-6630 (for CI only) #1170

Closed
wants to merge 26 commits into from
Closed

DRAFT: ADBDEV-6630 (for CI only) #1170

wants to merge 26 commits into from

Conversation

whitehawk
Copy link

DRAFT: ADBDEV-6630 (for CI only)

whitehawk and others added 25 commits November 7, 2024 20:20
This patch adds a skeleton for a new extension gpcontrib/gp_orca with empty
_PG_init and _PG_fini. The extension will be filled in consequent commits.
Plus this patch updates 'shared_preload_libraries' value in the
postgresql.conf.sample file if ORCA is enabled.
GPMemoryProtect_TrackStartupMemory() contained compile-time dependency on Orca.
It prevented moving of all Orca's code into a shared lib and making its work
transparent to the GPDB core.

This patch reworks the logic of the GPMemoryProtect_TrackStartupMemory(). Now
extension should call GPMemoryProtect_RequestAddinStartupMemory() at its
_PG_init(), if the extension affects the per-process startup committed memory.
Such call is added to the Orca's _PG_init(). Orca related code is removed from
the GPMemoryProtect_TrackStartupMemory().
Problem:
Orca's code was strongly coupled with the standard_planner() code. It introduced
difficulties for moving Orca's functionality into a shared lib.

This patch:
1. moves the invocation of Orca planner out from the standard_planner() into a
hook in gp_orca shared lib;
2. moves the init procedure of Orca to the first call of the hook (thus the init
is done when the backend has already been initialized including its memory
protection means);
3. moves the deinit procedure of Orca into a callback, that is registered at the
init (thus the deinit is done right before ShutdownPostgres() is called, where
it was before).
4. adds clang-format config file
Problem:
Explain component had compile-time dependencies on Orca. It introduced
difficulties for moving Orca's functionality into a shared lib.

This patch:
1. Adds explain hook implementation to the gp_orca extension. It handles DXL
output format of explain command.
2. Removes all Orca related code from explain component in the GPDB core.
3. Adds function standard_ExplainOneQuery() that now contains the default
explain code (thus it can be invoked from the hook).
4. Removes 'planGen' field from 'PlannedStmt' structure. Its type was enum, and
it had only 2 values: Planner and Optimizer (for Postgres planner and Orca
planner respectively). It was not suitable for a general planner plugged in from
an extension.
5. Adds 'plannerName' field to 'PlannedStmt' structure. It is now used to output
the name of the external planner, if it is used.
6. Updates 'compute_jit_flags()' due to changes above. Now gp_orca extension
defines its own implementation for computing JIT parameters. Orca related logic
for JIT parameters is removed from the core.
7. Adds new ABI ignore file, as the 'PlannedStmt' structure is updated.
…ce (#1126)

Problem:
Functions in gp_optimizer_functions.c had a compile-time dependency on Orca.
It introduced difficulties for moving Orca's functionality into a shared lib.

This patch:
1. Moves implementation of 'DisableXform()', 'EnableXform()',
'LibraryVersion()' into gp_orca shared lib.
2. Updates implementation of wrapper functions 'enable_xform()', 
'disable_xform()', 'gp_opt_version()'. Now they try to load the underlying 
functions from a shared lib. Attempt to load a symbol from a shared lib is 
wrapped into PG_TRY & PG_CATCH (with all necessary shenanigans), as the lib may 
be missing. In this case we catch the error, and show an appropriate message.
3. Updates tests due to change of the message if Orca is not enabled.

Note: we do not move the wrapper's implementation into gp_orca extension, as
they are already a part of the system catalog. And we'd like to avoid changing
the system catalog.
Problem:
File 'guc_gp.c' contained a compile-time dependency on Orca, related to
'optimizer' GUC. It prevented moving of all Orca's code into a shared lib and
making its work transparent to the GPDB core.

This patch:
1. Sets the default value for 'optimizer' GUC to 'false'.
2. Adds enabling of 'optimizer' GUC in gp_orca extension shared lib.
3. Updates check in 'check_optimizer()' function. Now it utilizes runtime check
of planner_hook presence to distinguish if an external planner is available or
not.

Note: the semantic of 'optimizer' GUC has changed. Previously it was used
solely to enable & disable Orca. Now it is used to enable & disable any general
external planner. The name of GUC hasn't been changed, because too many places
in code and infrastructure rely on it, and we aim not to blow up the size of
the patch.
Decouple ORCA from the pg_hint_plan extension.

This change moves the definition of plan_hint_hook from ORCA, which is going to
be an extension to the planner. Although this hook is currently used only in
ORCA, it makes it possible to use it with any planner that needs to get a hint
list from an extension, such as pg_hint_plan.

Additionally, we make this hook, plan_hint_hook, typed and return a pointer to
HintState, as it actually does. There is no reason to keep it generic and
returning void *.
…1135)

Problem:
If a session was started with disabled 'optimizer' GUC, and EnableXform() or
DisableXform() was called from a query, the query failed with a crash. The issue
was found on 'rowhints' test.

Cause:
The functions mentioned above tried to access internal Orca's structures, that
were not initialized, as the init was done during first planning only if
'optimizer' was on.

Fix:
Init Orca when planner is invoked first time regardless the value of
'optimizer'.

Note:
One can think that it is better to perform Orca init somewhen at the backend
init. But we'd like to keep Orca init after GPMemoryProtect_Init() (as it was
before Orca moved to an extension). And there is no existing standard hook to
call after GPMemoryProtect_Init() and before PostgresMain() enters its main
loop. Thus it is where it is.

No new tests are added, as existing 'rowhints' test can catch this issue.
This patch renames gp_orca shared lib into orca, as we'd like to avoid using
any 'gp' prefix now. Also it updates file names and internal namings to avoid
the prefix.
Problem:
'explain_format' test failed if 'optimizer' GUC was set to 'on'.
'auto_explain' test failed with any 'optimizer' GUC value.

Cause:
Explain command outputs values of GUCs that affect planning if their value
differ from the default. Default value for 'optimizer' GUC has been changed to
'off' recently. So, the output of the tests has been changed as well.

Fix:
The patch updates matchsubs in 'explain_format' test to exclude handling of
'optimizer' GUC value and introduces respective updates in the files with
expected output. For 'auto_explain' test, it just updates the files with
expected output.
Problem description:
Tests for 'pg_stat_statements', launched with 'optimizer=on', failed with error:
"FATAL: External planner is not registered".

Root cause:
Setup and teardown steps of 'pg_stat_statements' configured values of
'shared_preload_libraries' without taking into account its previous value. Thus,
it was set to 'pg_stat_statements' at test start, and to empty string after the
tests. And 'orca' lib was lost. It caused the error.

Fix:
Now setup and teardown steps take into account previous value of
'shared_preload_libraries'.
As a part of the bigger task of separating Orca into a Postgres extension, this
patch:
1. Moves 'src/backend/gporca/' and 'src/backend/gpopt/' folders with source
files into the extension.
2. Moves related headers into the extension.
3. Moves 'src/backend/optimizer/plan/orca.c' into the extension. As the
extension already contains 'orca.c' file with hooks implementation, old 'orca.c'
is renamed to 'orca_entry.c'.
4. All obj file names of newly added files are added into OBJS of
'gpcontrib/orca/Makefile'. It is mandated by the build infrastructure for
extensions (called PGXS). Old makefiles in 'gporca' and 'gpopt' are removed, as
no more needed. But CMake files are preserved, allowing to launch 'gporca' unit
tests.
5. Updates ABI ignore files, as some symbols are now missing in the core.
6. Moves OptimizerMemoryContext into the extension, as now it is referenced only
by the code in the extension.
7. Updates 'fmt' and 'tidy' tools and moves them into the extension, as they
were used only for Orca code.
Note: 'src/tools/vagrant' also contains some references to Orca code, but it
looks too obsolete to be workable (as it references the old gporca git repo,
when Orca wasn't a part of the core). So it is left unchanged.
8. Updates Readme and infrastructure files due to all changes above.
Problem:
After recent changes related to moving Orca into a shared lib, 'gpconfig'
behave test started to fail.

Cause:
Fail could be reproduced with the following commands:
"gpconfig -c optimizer -v on"
"gpstop -ar"

'gpconfig' alters 'postgresql.conf' by adding 'optimizer = on' there. So, the
issue appeared when we tried to set 'optimizer' GUC from the 'postgresql.conf'
file. At this moment shared preload libraries are yet not processed, so the
planner hook is yet not registered. It led to an error in 'check_optimizer()'
validation function when setting 'optimizer'.

Fix:
In case planner hook is not registered, issue an error in 'check_optimizer()'
only if source is more than PGC_S_ARGV. So, for cases when 'optimizer' is set
from the config file, env variable or postmaster's cmd argument, we allow
setting 'optimizer' to 'on' even if planner hook is yet not registered, as it
can't be registered at this stage, only later.
USE_ORCA define is no longer used. This patch removes generation of this define.
File 'configure.in' is modified manually. Other files are re-generated by
'autoconf' and 'autoheader' tools.
Move Orca's GUCs into the extension

This patch:
1. Moves all GUCs, that are related to Orca and used inside it, into Orca
extension.
2. Updates minirepro utility tool. This tool generates a detailed information
related to the monitored SQL commands being executed, including non-default
GUCs. The patch updates the query used to retrieve non-default GUCs. The update
is needed, as, after moving the "optimizer_*" GUCs, their 'category' in
'pg_settings' table had been changed to 'Customized Options' value. Without this
change the respective regression test 'minirepro' fails.
3. Adds all the touched GUC names into the file with ignored symbols for ABI
check, as they no longer exist in the main executable.
4. Removes GUC 'optimizer_partition_selection_log' completely, as it is not used
anywhere.

Note. Following GUCs were not moved, as they are used inside the core:
 - 'optimizer_enable_query_parameter'
 - 'optimizer_analyze_root_partition'
 - 'optimizer_analyze_midlevel_partition'
 - 'optimizer_replicated_table_insert'
 - 'optimizer'
 - 'optimizer_control'
@whitehawk whitehawk closed this Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants