Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mangling global names breaks OS X #2

Open
comex opened this issue Oct 31, 2014 · 4 comments
Open

Mangling global names breaks OS X #2

comex opened this issue Oct 31, 2014 · 4 comments
Assignees
Labels

Comments

@comex
Copy link

comex commented Oct 31, 2014

  // Mangle globals with the standard mangler interface for LLC compatibility.
  if (const GlobalValue *GV = dyn_cast<GlobalValue>(Operand)) {
    SmallString<128> Str;
    Mang->getNameWithPrefix(Str, GV, false);
    return CBEMangle(Str.str().str());
  }

OS X binaries prefix all C symbols with a leading underscore, so this code ends up producing functions like int _main(void). When trying to compile the result with a standard C compiler, of course, another underscore gets added, making it fail to link.

I'm not sure exactly what LLC compatibility entails, but this doesn't seem to be the right thing to do.

Disclaimer: I manually updated the code to a newer version of LLVM, but as only trivial changes were required I don't think it affects this issue.

@rtc-draper
Copy link
Collaborator

Thanks for the issue report. Will add it to the punch list.

@edanor
Copy link

edanor commented Dec 5, 2014

Hello everyone,

I'm very interested in this backend and this issue is a real blocker for me.

The simple workaround for this that I have used is to add some defines that could cover this up. Something like this seems to be working fine:
#define _main main

As for me I've made this workaround a heavier by making the backend to automatically include header with redefinitions of these decorated names:

CBackend.cpp (~1795)
// get declaration for alloca
Out << "/* Provide Declarations */\n";
Out << "#include <stdarg.h>\n"; // Varargs support
Out << "#include <setjmp.h>\n"; // Unwind support
Out << "#include <limits.h>\n"; // With overflow intrinsics support.
// WA: function names are mangled with underscore
// a workaround would be to unmangle names inside C backend
Out << "#include "redefs.h"\n";

And the redefs.h file itself looks like this:
#ifndef REDEFS_H
#define REDEFS_H

#define _main main
#define _printf printf
#define _atoi atoi

#endif REDEFS_H

This works fine for small programs that don't have external linkage dependencies.

Now why is it a very bad workaround? When compiling the backend output code (a larger one ~30000 lines) with GCC (4.9.0) I'm getting awfully lot of errors like this one:

Error: junk `@0' after expression
Error: invalid character '@' in mnemonic

The problem is with the assembler that cannot handle decorated names or rather with Backend itself. I believe that Backend should remove all decorations that LLVM is imposing on names before emiting the C code so that it really can be compiled with a standard C compiler. In this case the problem is observed already with main, but since '' is a valid nondigit character of an identifier the problem is not so heavy. However when LLVM decorates names with '@' it causes the CBackend to generate code that is not conforming to ANSI C standard, and thus Incorrect (this is only my opinion - alas there are many standard and nonstandard C compilers that might tolerate this!).

As far as I've looked into LLVM code I didn't see any simple method of unmangling the names. Any suggestions on how this can be handled? I could try doing the implementation but I am not an LLVM expert and some guidance would be appreciated.

Thanks,
Przemek.

@twalters25
Copy link
Contributor

We are still in the process of acquiring a Mac to address this issue. In the meantime, do you have any specific test cases related to this issue that would help us get started? They would be greatly appreciated.

@edanor
Copy link

edanor commented Dec 10, 2014

You don't need OS X to observe this problem. The '_' appears on both linux and windows also. I checked that the problem with '@' in the names appears when compiling on windows using MinGW setup. Since this is not fully reliable, I suggest trying to build this on Linux.

The code, together with makefile for the sample i was experimenting with can be found at:

https://github.com/edanor/bzip2.git

type: 'make gcc', 'make clang' or 'make fields' to build it using different toolchains. The last option generates temp/llvm_cbe.c file using the backend and tries to compile it to a binary using GCC.

There are multiple things wrong with the generated file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants