Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linker puts the wrong source file path in symbol files #1354

Open
rdos314 opened this issue Dec 1, 2024 · 15 comments
Open

Linker puts the wrong source file path in symbol files #1354

rdos314 opened this issue Dec 1, 2024 · 15 comments
Labels

Comments

@rdos314
Copy link
Member

rdos314 commented Dec 1, 2024

ASM source, and C suorce not compiled for debug gets the module name in the symbol file instead of the full path name. This results in the debugger not finding the source code. The linker writes the correct path to map-files, so the issue must be with the linker.

@rdos314 rdos314 added bug WLINK Linker labels Dec 1, 2024
@jmalak jmalak removed the WLINK Linker label Dec 2, 2024
@jmalak
Copy link
Member

jmalak commented Dec 2, 2024

The issue is probably with DIP which doesn't read properly source file name.
Now two records in object file exist. First one with module name and second one with file name.
It looks like the second record is not read properly and first one is read instead of second.
Module name is logical module name and doesn't contains file name as before.

@jmalak
Copy link
Member

jmalak commented Dec 2, 2024

#1350 analysis in more details, that probably issue is with Dwarf writer code which is used by code generator
The isue with assembly modules is little different because it doesn't use Dwarf writer it looks like DIP issue.
There could be issue with wlink code generating auxiliary Dwarf data that it doesn't select source file name correctly.

@rdos314
Copy link
Member Author

rdos314 commented Dec 2, 2024

It can't be DIP. Both headers contain the same problem.

Here are hex-dumps of the sym-file (wdump -B0 test.sym)

old Watcom:
Open Watcom Executable Image Dump Utility
Version 2.0 beta Dec 22 2023 03:37:00 (64-bit)
Copyright (c) 2002-2023 The Open Watcom Contributors. All Rights Reserved.
Portions Copyright (c) 1984-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See https://github.com/open-watcom/open-watcom-v2#readme for details.

offset = 00000000, length = 00000263
0000: 7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 ELF
0010: 02 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00
0020: 34 00 00 00 00 00 00 00 34 00 20 00 00 00 28 00 4 4 (
0030: 06 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
0060: 01 00 00 00 00 00 00 00 00 00 00 00 24 01 00 00 $
0070: 55 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 U
0080: 00 00 00 00 0D 00 00 00 01 00 00 00 00 00 00 00
0090: 00 00 00 00 79 01 00 00 2A 00 00 00 00 00 00 00 y *
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 1B 00 00 00
00B0: 01 00 00 00 00 00 00 00 00 00 00 00 A3 01 00 00
00C0: 54 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 T
00D0: 00 00 00 00 27 00 00 00 01 00 00 00 00 00 00 00 '
00E0: 00 00 00 00 F7 01 00 00 1C 00 00 00 00 00 00 00
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 36 00 00 00 6
0100: 03 00 00 00 00 00 00 00 00 00 00 00 13 02 00 00
0110: 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @
0120: 00 00 00 00 51 00 00 00 02 00 00 00 00 00 04 01 Q
0130: 45 3A 5C 72 64 6F 73 5C 6B 65 72 6E 65 6C 5C 74 E:\rdos\kernel\t
0140: 65 73 74 5C 74 65 73 74 2E 61 73 6D 00 56 31 2E est\test.asm V1.
0150: 30 20 57 41 54 43 4F 4D 00 00 00 00 00 03 0A 00 0 WATCOM
0160: 00 00 00 74 65 73 74 5F 66 75 6E 63 00 03 0C 00 test_func
0170: 00 00 00 69 6E 69 74 00 00 01 11 01 03 08 25 08 init %
0180: 10 10 00 00 02 11 01 03 08 25 08 00 00 03 0A 00 %
0190: 11 01 3F 0C 03 08 00 00 04 34 00 11 01 3F 0C 03 ? 4 ?
01A0: 08 00 00 50 00 00 00 02 00 30 00 00 00 01 01 FF P 0
01B0: 04 0A 00 01 01 01 01 00 00 00 00 00 45 3A 5C 72 E:\r
01C0: 64 6F 73 5C 6B 65 72 6E 65 6C 5C 74 65 73 74 5C dos\kernel\test
01D0: 74 65 73 74 2E 61 73 6D 00 00 00 00 00 00 05 02 test.asm
01E0: 00 00 00 00 03 09 33 10 03 04 0F 14 14 15 20 20 3
01F0: 18 03 06 3F 00 01 01 18 00 00 00 02 00 00 00 00 ?
0200: 00 04 00 00 00 00 00 2D 00 00 00 00 00 00 00 00 -
0210: 00 00 00 00 2E 64 65 62 75 67 5F 69 6E 66 6F 00 .debug_info
0220: 2E 64 65 62 75 67 5F 61 62 62 72 65 76 00 2E 64 .debug_abbrev .d
0230: 65 62 75 67 5F 6C 69 6E 65 00 2E 64 65 62 75 67 ebug_line .debug
0240: 5F 61 72 61 6E 67 65 73 00 2E 73 68 73 74 72 74 _aranges .shstrt
0250: 61 62 00 54 49 53 00 00 00 00 00 00 00 00 00 63 ab TIS c
0260: 02 00 00

New Watcom:

Open Watcom Executable Image Dump Utility
Version 2.0 beta Nov 28 2024 02:46:40 (64-bit)
Copyright (c) 2002-2024 The Open Watcom Contributors. All Rights Reserved.
Portions Copyright (c) 1984-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See https://github.com/open-watcom/open-watcom-v2#readme for details.

offset = 00000000, length = 00000233
0000: 7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 ELF
0010: 02 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00
0020: 34 00 00 00 00 00 00 00 34 00 20 00 00 00 28 00 4 4 (
0030: 06 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
0060: 01 00 00 00 00 00 00 00 00 00 00 00 24 01 00 00 $
0070: 3D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =
0080: 00 00 00 00 0D 00 00 00 01 00 00 00 00 00 00 00
0090: 00 00 00 00 61 01 00 00 2A 00 00 00 00 00 00 00 a *
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 1B 00 00 00
00B0: 01 00 00 00 00 00 00 00 00 00 00 00 8B 01 00 00
00C0: 3C 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <
00D0: 00 00 00 00 27 00 00 00 01 00 00 00 00 00 00 00 '
00E0: 00 00 00 00 C7 01 00 00 1C 00 00 00 00 00 00 00
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 36 00 00 00 6
0100: 03 00 00 00 00 00 00 00 00 00 00 00 E3 01 00 00
0110: 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @
0120: 00 00 00 00 39 00 00 00 02 00 00 00 00 00 04 01 9
0130: 74 65 73 74 00 56 31 2E 30 20 57 41 54 43 4F 4D test V1.0 WATCOM
0140: 00 00 00 00 00 03 0A 00 00 00 00 74 65 73 74 5F test_
0150: 66 75 6E 63 00 03 0C 00 00 00 00 69 6E 69 74 00 func init
0160: 00 01 11 01 03 08 25 08 10 10 00 00 02 11 01 03 %
0170: 08 25 08 00 00 03 0A 00 11 01 3F 0C 03 08 00 00 % ?
0180: 04 34 00 11 01 3F 0C 03 08 00 00 38 00 00 00 02 4 ? 8
0190: 00 18 00 00 00 01 01 FF 04 0A 00 01 01 01 01 00
01A0: 00 00 00 00 74 65 73 74 00 00 00 00 00 00 05 02 test
01B0: 00 00 00 00 03 09 33 10 03 04 0F 14 14 15 20 20 3
01C0: 18 03 06 3F 00 01 01 18 00 00 00 02 00 00 00 00 ?
01D0: 00 04 00 00 00 00 00 2D 00 00 00 00 00 00 00 00 -
01E0: 00 00 00 00 2E 64 65 62 75 67 5F 69 6E 66 6F 00 .debug_info
01F0: 2E 64 65 62 75 67 5F 61 62 62 72 65 76 00 2E 64 .debug_abbrev .d
0200: 65 62 75 67 5F 6C 69 6E 65 00 2E 64 65 62 75 67 ebug_line .debug
0210: 5F 61 72 61 6E 67 65 73 00 2E 73 68 73 74 72 74 _aranges .shstrt
0220: 61 62 00 54 49 53 00 00 00 00 00 00 00 00 00 33 ab TIS 3
0230: 02 00 00

@jmalak
Copy link
Member

jmalak commented Dec 2, 2024

You are right, what I wrote before about DIP is non-sense.
DIP read only info generated by linker and it is wrong.

@rdos314
Copy link
Member Author

rdos314 commented Dec 2, 2024

I've done a bit of debugging of wlink, and I can conclude that in the ProcPubsSect function in objcalc.c, the head element contains head->name.u.ptr (containing "test"), and it this element that is written to the symbol file in the DwarfAddModule function in dbgdwarf.c.

@rdos314
Copy link
Member Author

rdos314 commented Dec 2, 2024

It seems like if you compile the ASM source with the old Watcom, then wlink will add the correct path. So, the incorrect path appears to depend on contents of the obj file.

It's the IdentifyObject() function called in DoPass1 in procfile.c that returns either the module name (new object file) or the correct path name (old object file),

In objomf.c, PassCmd1 function, I can see that ProcTHREDR reads the correct pathname from the object file as one of the first actions, but then this must become overwritten since when this info is read with IdentifyObject, it instead contains the module name.

@rdos314
Copy link
Member Author

rdos314 commented Dec 2, 2024

I notice that ProcTHEADR actually doesn't do anything useful. It only reads the full pathname into a local variable, and then discards it. A fix that solves the problem is to use the name and link it to CurrMod->name.u.ptr.

This works for a single file, but is not the final solution:

static void ProcTHEADR( void )
/****************************/
{
static char name[256];
int sym_len;

if( CurrMod->omfdbg == OMF_DBG_CODEVIEW ) {
    sym_len = *ObjBuff++;
    if( sym_len == 0 ) {
        BadObject();
    }
    memcpy( name, ObjBuff, sym_len );
    name[sym_len] = '\0';
    CurrMod->name.u.ptr = name;
}

}

I can see that upon entry to PassCmd1, CurrMod->name.u.ptr is "test", which is the module name. The problem is that this is never overridden for the ASM source or when an object is not compiled for debugging.

@jmalak
Copy link
Member

jmalak commented Dec 2, 2024

It is not as simple as it looks like. Multiple formats exists for debug info on input and on output.
C code can use DWARF, Watcom or CodeView formats and simple line format. Wasm use only line numbers and source file names in THEADR OMF record. That linker must combine multiple sources and generate also multiple output formats. The source file name is more complex process, because thare is module name used in library and source file name which can change if generated code is created from multiple source files. I will try to look to this problem and fix it.

@rdos314
Copy link
Member Author

rdos314 commented Dec 2, 2024

Actually, the line numbers generated for macros that are included in ASM source are wrong. The line numbers refer to another file, but the debugger think they relate to the current ASM file. Which creates strange behavior. Still, I'm used to that so it's not a big problem. The current problem with missing source is a big problem though. It makes it more or less impossible to debug device drivers and ASM source.

Looking at the history of the ProcTHEADR, the current code was introduced in 1.7.0, but at that time did not use the name either.

Anyway, from my understanding of how the debugger works, it cannot handle symbol files that have the module name as the source path. Therefore, the default that the linker use will ALWAYS malfunction if passed to the debugger. If the linker uses this information for internal operation, perhaps it would be better to add another name that always must be the pathname and that is written to the symbol file?

@jmalak
Copy link
Member

jmalak commented Dec 2, 2024

Yes, I did it by multiple THEADR records for OMF format (first record is module name which can be different from source file name and second which should always contains full source file name. For DWARF it is not a problem there should be always source file name available. Problem can be if you have source OMF module (with line numbers only) and it is converted to DWARF output format by linker. This is current situation for ASM OMF modules. It looks like somewhere is bug and it lost source file name for output DWARF format.

@rdos314
Copy link
Member Author

rdos314 commented Dec 2, 2024

Since it works with old object files, I think something in ASM OMF modules has changed, which triggers a latent problem in the linker. I think the best solution would be to fix OMF modules in the linker rather than find out what change to the OMF modules triggered the issue.

@rdos314
Copy link
Member Author

rdos314 commented Dec 2, 2024

This solves the issue for any number of ASM sources (but results in memory leaks)

#include "liballoc.h"

static void ProcTHEADR( void )
/****************************/
{
char *name;
int sym_len;

if( CurrMod->omfdbg == OMF_DBG_CODEVIEW ) {
    sym_len = *ObjBuff++;
    if( sym_len == 0 ) {
        BadObject();
    }
    name = (char *)lib_calloc( 1, sym_len + 1 );
    memcpy( name, ObjBuff, sym_len );
    name[sym_len] = '\0';
    CurrMod->name.u.ptr = name;
}

}

@jmalak
Copy link
Member

jmalak commented Dec 2, 2024

The correct function to get module name and source file name is IdentifyObject and GetOMFName function.
It uses cached OMF THEADR record. In WASM the THEADR records are created in writepass1stuff by write_header function.
The wlink IdentifyObject or other function should be modified to read first two OMF records and if only one THEADER then source file name should be derived from first record, if second exists then it is full source file name. Module name is mostly same as source file base name, but can be changed by ASM directive to any value than you cannot derive source file name from module name. But second THEADR record (if exist) always contains true source file name, not module name.

@rdos314
Copy link
Member Author

rdos314 commented Dec 3, 2024

I made a tempory fix in my build system that uses an old version of wasm until this is fixed, which appears to work correctly.

@jmalak
Copy link
Member

jmalak commented Dec 3, 2024

OK.
I will check and fix it.
It looks like somewhere is still used module name instead of source file name, because old version used only module name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants