Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide non-necessary symbols from shared object #1136

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

ksivaman
Copy link
Member

Description

libtransformer_engine currently exposes all symbols globally, which can lead to symbol resolution conflicts when certain libraries are exported after TE's shared object file is loaded dynamically, for .e.g.,

pip install rdkit==2024.3.3
python -c "import transformer_engine; from rdkit import Chem"

Leads to the following error:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor

Changes

  • Limit exposing unnecessary symbols in libtransformer_engine.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
@ksivaman ksivaman requested a review from ptrendx August 26, 2024 20:13
@ksivaman ksivaman added 1.10.0 bug Something isn't working labels Aug 26, 2024
@ksivaman ksivaman self-assigned this Aug 26, 2024
@ksivaman ksivaman added the build Build system label Aug 26, 2024
@ksivaman
Copy link
Member Author

/te-ci

@@ -0,0 +1,4 @@
{
global: *nvte*; *transformer_engine*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need * before nvte or transformer_engine? Shouldn't it be nvte_* instead (since that is the format of our C APIs)? I'm also not convinced we need to expose anything with transformer_engine, could you give examples of the APIs we would want to expose that way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The * before nvte and transformer_engine is needed because some symbols are mangled, for e.g. _Z14nvte_unpermuteP....
  2. transformer_engine is necessary because we use some functionality such as transformer_engine::getenv in the frameworks, so we need to define the symbol for it to load the framework shared objects.

Given how both *transformer_engine* and (to a lesser extent) *nvte* patterns are specific to TE, I think it's safe to continue exposing these globally.

Copy link
Member

@ptrendx ptrendx Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Anything that is mangled is a bug since we advertise C API. In the case of permute it is because there is no extern "C" decorator around those functions. Also we are using nullptr instead of NULL or 0 - apparently C23 introduced it but I don't think we want to expect people to have such new C standards. @phu0ngng could we have it fixed?
  2. I thought we do not do this anymore and instead compile that function in each framework extension separately. @timmoon10 could you comment?

@ksivaman
Copy link
Member Author

Pipeline 17865970

@ksivaman ksivaman merged commit 4ddb0a7 into NVIDIA:main Aug 27, 2024
14 of 15 checks passed
BeingGod pushed a commit to BeingGod/TransformerEngine that referenced this pull request Aug 30, 2024
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: beinggod <[email protected]>
ptrendx pushed a commit that referenced this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.10.0 bug Something isn't working build Build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants