Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Civilization 6, Factorio, Stellaris crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed #705

Open
zhaoweny opened this issue Nov 8, 2024 · 60 comments

Comments

@zhaoweny
Copy link

zhaoweny commented Nov 8, 2024

Your system information

  • Steam Runtime Version: (steam version 1730853027, steam-runtime_0.20241024.105847)
  • Distribution (e.g. Ubuntu 18.04): Arch Linux
  • Link to your full system information (Help -> Steam Runtime Diagnostics) in a Gist: see link
  • Have you checked for system updates?: yes - fresh install and up to date
  • What compatibility tool are you using?: Steam Linux Runtime
  • What versions are listed in steamapps/common/SteamLinuxRuntime/VERSIONS.txt? 0.20240806.0
  • What versions are listed in steamapps/common/SteamLinuxRuntime_soldier/VERSIONS.txt? 0.20240917.101880
  • What versions are listed in steamapps/common/SteamLinuxRuntime_sniper/VERSIONS.txt? 0.20240916.101795

Please describe your issue in as much detail as possible:

When launching Factorio on my physical Arch Linux machine, it crashes almost immediately. I tried reinstall Steam, reinstall Arch Linux then reinstall Steam, the issue presists.

Currently I found 3 workaround:

  • run steam with CLI flag -compat-force-slr off
  • modify the launch option to steam-runtime-launch-options -- %command% and configure container runtime to None
  • launch Factorio from a fresh Arch Linux virtual machine - factorio can launch normally

FYI same issue on Factorio Forum

attaching slr log file as requested: slr-app427520-t20241108T233119.log

Steps for reproducing this issue:

  • From a fresh Arch Linux install, install steam and Factorio
  • launch Factorio from steam library page

expected behavior: game loads up loading screen, lands me on main menu

actual behavior:

  • game crashes without game specific log file
  • from system journal I can see that crash handler prints out the game encounters a SIGSEGV
@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

launch Factorio from a fresh Arch Linux virtual machine

To clarify, is that running it by copying the installed game files and running it like an independent non-Steam game, without Steam being involved at all?

Or do you have both Steam and Factorio installed in the VM?

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

modify the launch option to steam-runtime-launch-options -- %command% and configure container runtime to None

Since you've already discovered that developer tool: what happens if you switch the container runtime to SteamLinuxRuntime_sniper? Does that help any? (If you don't already have sniper installed, you can get it by running steam steam://install/1628350)

I don't see any immediately obvious problems in Factorio with bundled libraries or anything like that. Presumably it's making some sort of assumption about the host system that isn't true any more when it runs in a container, but it's hard to say what that assumption would be.

We've had people running Factorio successfully in Steam Linux Runtime 1.0 (scout) in the past (#262) but presumably it doesn't work in all system configurations.

@TTimo
Copy link
Collaborator

TTimo commented Nov 8, 2024

I'm on Arch and Factorio launches just fine in scout SLR fwiw.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

About Arch Linux VM with Steam and Factorio:

I was installed steam, from steam installs Factorio then launching Factorio from steam in that virtual machine.
The VM is now nuked due to clean reinstall of Arch Linux, sadly.

About the developer tool:

No, changing runtime does not solve this issue for me.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

I made a test user with useradd instead of systemd-homed managed (my normal account), factorio is working even with SLR. So the difference between not working and working could be summarized as:

user home directory uses container runtime work as expected
btrfs subvolume yes yes
systemd-homed yes no
systemd-homed no yes

systemd-homed managed user directory is located at /home/username.homedir and bind mounts to /home/username on unlock (I asssume)

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

Interesting...

Your SLR log says we're using /home/zhaow as your home directory. Is /home/zhaow a "real" directory, rather than a symbolic link? Does it contain everything that you think it should normally contain?

I also notice this in your log:

   0.000 Error MessageDialog.cpp:218: Unable to show message dialog. SDL Error: [zenity reported error or failed to launch: 255]

so maybe this means there's a problem with X11 or Wayland?

Do other Steam games launch successfully in the same runtime? Floating Point is a good one to test, because it's very small (and is free).

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

It would be useful if you can get a new log with STEAM_LINUX_RUNTIME_VERBOSE=1 in addition to STEAM_LINUX_RUNTIME_LOG=1.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

about the /home/zhaow directory: it's a real directory as far as I can tell. here's some relevant console output:

$ file /home/zhaow
/home/zhaow: directory

$ realpath /home/zhaow
/home/zhaow

$ ls /home
zhaow  zhaow.homedir  zwydbg

$ mount | grep /home
/dev/nvme0n1p2 on /home type btrfs (rw,relatime,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=257,subvol=/@home)
/dev/nvme0n1p2 on /home/zhaow type btrfs (rw,nosuid,nodev,relatime,idmapped,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=264,subvol=/@home/zhaow.homedir)

I'll collect logs as soon as possible. Please wait for a moment

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

it's a real directory as far as I can tell

Yes, I agree. (The reason I asked is that symlinks sometimes break container frameworks, including ours - but your home directory isn't a symlink, so that should be OK)

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

I collected 3 log files this round:

In the console log you can tell I launched factorio with developer tool steam-runtime-launch-options and tested out each runtime, where the last option being None, and it's known good and works as expected

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

I'm abandoning this issue and I'll move to a normal non-homed user. Seems systemd-homed is not stable enough for my use cases :(

@zhaoweny zhaoweny closed this as completed Nov 8, 2024
@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

I notice that you're using AMDVLK, which has been known to cause weird issues in the past. We generally recommend Mesa's Vulkan driver for AMD GPUs.

I also notice that you're using a dual-GPU setup (discrete + integrated GPUs, both AMD) which can sometimes have weird effects.

It's weird that it makes a difference whether you're using systemd-homed, though... I wouldn't have expected that to have an effect.

When you tested without systemd-homed, was it with the same $HOME contents? Or is it possible that you might have been comparing a pre-existing user with non-default configuration in $HOME to a new user with all settings at defaults?

If someone can reproduce a similar issue, the next step would probably be to see whether this affects all games or just Factorio, and either reproduce the crash with something open-source that we can analyze (like maybe xterm), or strace something that is crashing to get an idea of what it's doing immediately before the crash.

@adomaskizogian
Copy link

@smcv I can reproduce the issue. running steam -compat-force-slr off does solve it.

ubuntu 24.10. Up to date.
6.11.0-8-generic

how can I help

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

after some sleep, I'm back :)

AMDVLK

this is because archinstall script default selects this vulkan driver - it's now uninstalled.

dual gpu setup

my system is 7800X3D + 7900XTX, with monitor plugged into GPU directly, which means the integrated GPU is mostly idle.

$HOME content

it's slightly different - the zwydbg user is freshly created with useradd -m --btrfs-subvolume-home, but my normal zhaow user is also fresh, due to re-installation of Arch Linux

strace logs or something open source for analysis

I'm working on this. I did obtain strace log awhile back when I reported this issue to Factorio Forum. I'll try get some fresh logs and reproduce with other native title.

I should note that Floating Point works inside SLR; next target would be Dota 2 for me (since it's effectively open source to you guys, right?)

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

some test result, by launching each game from steam:

  • Tiny Glade: works as expected
  • Euro Truck Simulator 2: works as expected
  • Don't Starve Together: works as expected
  • Dota 2: works as expected
  • Counter Strike 2: works as expected
  • Don't Starve: works as expected
  • Portal 2: works as expected
  • Civ 6: crashes and produces a coredump
  • Stellaris: game launcher works, the game crashes when click play from the launcher
  • Team Fortress 2: works as expected
  • Left 4 Dead 2: works as expected
  • Surviving Mars: works as expected
  • Stardew Vally: works as expected

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

I ran Civ 6 and Factorio with strace -tt -ff -o _log_dir_/strace.log %command% and got strace logs for each game:

strace-logs.zip

you can use strace-log-merge to combine these logs according to strace man page

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

I think Stellaris have same issue as Civ6 and Factorio. Paradox launcher for stellaris can start, but the game failed to launch. Here's strace logs for stellaris:
stellaris-strace_logs.zip

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

One additional note, I tried to launch gdbserver inside the runtime to debug Factorio, but it would fail with error (something like unknown register ymm0h) even with SLR_sniper - which is annoying.

@zhaoweny
Copy link
Author

re-opening this since my setup for reproducing the issue is still valid, hope we can solve this mystery together :)

@zhaoweny zhaoweny reopened this Nov 11, 2024
@smcv
Copy link
Contributor

smcv commented Nov 11, 2024

Civ 6: crashes and produces a coredump

It's probably best if you can open a separate issue for this: if your issue with Factorio does not affect most of your games, then it seems likely that Civ 6, Factorio and Stellaris have different things going wrong.

And if I'm wrong about that and there is a common root cause, closing issues as duplicates is much easier than understanding an issue thread that has three separate conversations about three separate bugs :-)

Stellaris: game launcher works, the game crashes when click play from the launcher

Similarly this is probably best as its own issue.

@smcv
Copy link
Contributor

smcv commented Nov 11, 2024

Civ 6 has had known compatibility problems in the past because it bundles a lot of libraries that it shouldn't, so definitely open a separate issue for that one instead of discussing Civ 6 further on this particular issue.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 12, 2024

OK, let’s focus on Factorio for now, as I’m not currently playing Civ 6 or Stellaris.

Could you please guide me on how to properly set up gdbserver with Steam Linux Runtime enabled to obtain a stack trace for the SIGSEGV error when launching Factorio? I’ve attempted this before, but my last try resulted in an internal error from gdbserver—specifically, it mentioned an unknown register ymm0h or something similar. I suspect that the game is utilizing AVX2 registers, and the gdbserver bundled with the Steam Linux Runtime may be outdated.

@zhaoweny
Copy link
Author

I downloaded the steam-runtime SDK (soldier, according to /doc/reporting-steamlinuxruntime-bugs.md).
Then I started a shell with env PRESSURE_VESSEL_SHELL=instead steam-runtime-launch-options -- %command% from steam.
Finally I ran gdb bin/x64/factorio to launch factorio inside the SDK runtime.

I got this stack trace with SIGSEGV (finally!)

(gdb) bt
#0  Paths::getSystemWriteData () at /tmp/factorio-build-EZorjK/src/Paths.cpp:259
#1  0x00000000013bb28c in PathMacroReplacer::apply[abi:cxx11](re2::StringPiece const*) const () at /tmp/factorio-build-EZorjK/src/Info/PathMacroReplacer.cpp:12
#2  0x00000000023dcbec in ReplacerWrapper::operator()[abi:cxx11](re2::StringPiece const*) const () at /tmp/factorio-build-EZorjK/src/Info/MacroReplacer.cpp:12
#3  std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ReplacerWrapper&, re2::StringPiece const*> ()
    at /opt/gcc-13.2.0/include/c++/13.2.0/bits/invoke.h:61
#4  std::__invoke_r<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ReplacerWrapper&, re2::StringPiece const*> ()
    at /opt/gcc-13.2.0/include/c++/13.2.0/bits/invoke.h:116
#5  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (re2::StringPiece const*), ReplacerWrapper>::_M_invoke(std::_Any_data const&, re2::StringPiece const*&&) () at /opt/gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:291
#6  0x00000000012496ab in std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (re2::StringPiece const*)>::operator()(re2::StringPiece const*) const () at /opt/gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:591
#7  RegexUtil::replace<2u>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, re2::RE2 const&, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (re2::StringPiece const*)> const&) () at /tmp/factorio-build-EZorjK/src/Util/RegexUtil.hpp:17
#8  MacroReplacer::replace () at /tmp/factorio-build-EZorjK/src/Info/MacroReplacer.cpp:30
#9  0x00000000021c5881 in GlobalContext::init () at /tmp/factorio-build-EZorjK/src/GlobalContext.cpp:332
#10 0x00000000021db366 in MainLoop::run(Filesystem::Path const&, Filesystem::Path const&, bool, bool, std::function<void ()>, Filesystem::Path const&, MainLoop::HeavyMode) () at /tmp/factorio-build-EZorjK/src/MainLoop.cpp:286
#11 0x00000000021e837b in fmain () at /tmp/factorio-build-EZorjK/src/Main.cpp:1348
#12 0x00000000024241be in main () at /tmp/factorio-build-EZorjK/src/Main.cpp:1370

I'll report this issue to Factorio dev and I'll try dig a little deeper.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I see you've figured out a way to get a backtrace while I was writing this, but for completeness...

Could you please guide me on how to properly set up gdbserver with Steam Linux Runtime enabled to obtain a stack trace for the SIGSEGV error when launching Factorio?

To get a stack trace, it's often simpler if you can use a post-mortem crash analysis tool like systemd-coredump rather than fighting with gdb. Since you say you're an Arch user, https://wiki.archlinux.org/title/Core_dump might be useful.

Or, the next best thing is:

  1. Get a shell inside the game container.
  2. Run the game like gdbserver 127.0.0.1:12345 ./bin/x64/factorio
  3. Connect an external gdb to the gdbserver, e.g. see https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/blob/main/docs/slr-for-game-developers.md#attaching-a-debugger-by-using-gdbserver

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I don't currently have access to Factorio the full game, but for what it's worth, the demo is working fine for me on Arch Linux under SLR 1.0.

However, I haven't yet tried it with a user that is managed by systemd-homed.

One thing I notice from your backtrace:

at /opt/gcc-13.2.0/include/...

This seems like it indicates that Factorio was compiled with a third-party compiler, and not with one of the ones we provide in the Steam Runtime SDK. The demo shows signs of having been compiled with the same compiler.

This hopefully shouldn't be a problem: the demo is statically linked with libstdc++, which probably means the full game is the same.

Looking at the demo executable with objdump -T -x, it looks like it's accidentally exporting libstdc++ data symbols like std::__cxx11::numpunct<char>::id, which is a possible cause of crashes if these symbols "interpose" symbols from the dynamically-linked libstdc++ that will be pulled in by your graphics drivers. If the full game is the same, this might be something that the developers should look into - hiding those symbols from the dynamic symbol table would probably be safer.

I don't have any real evidence that this is the reason for your crash, though.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

It's weird that it makes a difference whether you're using systemd-homed, though...

This is just speculation, but one thing that occurs to me is that systemd-homed creates a user with a large numeric uid (on my test system, my normal user has uid 1000 but the user created via systemd-homed has uid 60032) so if some component assumes that a uid will fit into a signed 16-bit integer, systemd-homed would break that assumption?

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

#0 Paths::getSystemWriteData () at /tmp/factorio-build-EZorjK/src/Paths.cpp:259

If the Factorio developers can tell us what's happening in that function (and, more specifically, around that line), that would probably be the most useful piece of information here.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 12, 2024

I think I got this minimized down to a non-game example. I searched around the Internet and landed myself on this post. Which indicate that Factorio is might be using getpwuid to obtain a passwd entry for a user, at least around the time of original post (2021).

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

My ideal solution would be to not run the game against the steam Linux runtime at all - we provide a standalone version of the game and it works great.

I'm sure it works great today, but the goal of the Steam Linux Runtime is that it still works in 10 years' time, and that's hard to achieve in a standalone Linux binary - assumptions about the underlying system that seem completely reasonable today are not going to remain true forever.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I searched around the Internet and landed myself on this post. Which indicate that Factorio is using getpwuid to obtain a passwd entry for a user, at least around the time of original post (2021).

That's consistent with my theory in #705 (comment), and confirms that users do expect $HOME to be used as a higher precedence than whatever getpwuid() says.

@raiguard
Copy link

raiguard commented Nov 12, 2024

However, if Factorio doesn't take HOME into account, relies on getpwuid(getuid()) or similar, and also doesn't take into account the possibility that getpwuid() might fail, then that would explain the symptoms we're seeing.

You are correct. Here is the entire contents of Paths::getSystemWriteData() on Linux:

Filesystem::Path Paths::getSystemWriteData()
{
  struct passwd* pw = getpwuid(getuid());
  const char* homedir = pw->pw_dir;
  return Filesystem::Path(homedir + std::string("/.factorio"));
}

Ironically enough, I actually did catch this flaw a few months ago, but the fix didn't get merged because it was bundled with a few other changes that were rejected (the change being that we would use $XDG_DATA_HOME instead of ~/.factorio by default, but was rejected because of potentially wreaking havoc with steam cloud saves, among other things). That was bad branch etiquette on my part.

I'm sure it works great today, but the goal of the Steam Linux Runtime is that it still works in 10 years' time, and that's hard to achieve in a standalone Linux binary - assumptions about the underlying system that seem completely reasonable today are not going to remain true forever.

Point. I tend to try not to think about the day when I inevitably stop working on Factorio. :)

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

  struct passwd* pw = getpwuid(getuid());
  const char* homedir = pw->pw_dir;

Yeah, that's the segfault I expected: if getpwuid() fails, it will return NULL, and then the next line is a NULL dereference. I'd suggest something more like this (untested!):

  const char* homedir = getenv("HOME");
  if (!homedir) {
    pid_t pid = getuid();
    errno = 0;
    struct passwd* pw = getpwuid(pid);
    if (!pw) {
      errx(1, "Unable to find uid %d: %s", pid, errno ? strerror(errno) : "not found");
      /* or whatever way you prefer to handle fatal errors */
    }
    homedir = pw->pw_dir;
  }
  ...

(The error behaviour of getpwuid() is odd - it can return 0 without setting errno)

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

@zhaoweny or @kisak-valve, can we perhaps retitle this to something like Factorio crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed now that we know why it's crashing?

I'll look at mitigating this from the SLR side.

@zhaoweny zhaoweny changed the title Factorio crashes with SIGSEGV after recent steam client update, which enables scout runtime by default Factorio crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed Nov 12, 2024
@zhaoweny
Copy link
Author

I edited the title as you suggested, but I'd like to add that it's same behavior across different Steam Linux Runtime versions.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I edited the title as you suggested, but I'd like to add that it's same behavior across different Steam Linux Runtime versions.

That makes sense, it's a problem with SLR in general rather than that version specifically. (But SLR 1.0 is (currently) the only one that is available for running Factorio without using unsupported tweaks, because SLR 3.0 is only meant to be for games whose developers have specifically told us they want a newer runtime, like CS2 and Retroarch.)

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

In the short term, a workaround for this is to append a record for the systemd-homed-managed user to /etc/passwd, making sure to replace the home directory field with the user's intended home directory.

In some brief testing on Arch, the result of getent passwd "$(id -nu)" will show a home directory of /, which is unsuitable.

For example, on my test system, getent passwd says:

usinghomed:x:60032:60032:usinghomed:/:/usr/bin/systemd-home-fallback-shell

but when I log in as usinghomed, I get HOME=/home/usinghomed. So I appended this to /etc/passwd as a workaround:

usinghomed:x:60032:60032:usinghomed:/home/usinghomed:/usr/bin/systemd-home-fallback-shell

Obviously this workaround loses a few of the benefits of systemd-homed, so it would be better to make SLR mitigate this failure mode (in progress) or to teach Factorio to use $HOME.

@smcv
Copy link
Contributor

smcv commented Nov 13, 2024

We can also mitigate this from the Steam Runtime side, by programmatically generating an /etc/passwd with the contents that Factorio expects to see, instead of passing through the one from the host system as-is.

I prototyped this and it seems to resolve the crash, at least for the demo.

If you're comfortable with using unreleased software, you can try this out by replacing steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel with the result of unpacking this build: https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/jobs/800334/artifacts/raw/_build/pressure-vessel-bin.tar.gz. It would be useful if a user of systemd-homed could verify this with the full game.

This change will hopefully be part of the next Steam Linux Runtime 2.0 beta when it has been through review and more testing. Because of the way the container runtime works internally, this would be a change to SLR 2.0, and not SLR 1.0 as you might expect.

[note to self: this is !767 v4]

@smcv
Copy link
Contributor

smcv commented Nov 13, 2024

@adomaskizogian, I don't have enough information about your system or your situation to guess whether you were experiencing the same bad interaction between systemd-homed and Factorio that was originally reported here, or something different.

If your issue was the same thing originally reported here, then the pressure-vessel build in #705 (comment) should hopefully resolve it.

Or, if that isn't it, please open a separate issue with the info/logs that are requested by the issue template, and we can look into that separately.

@smcv
Copy link
Contributor

smcv commented Nov 13, 2024

@zhaoweny:

* Civ 6: **crashes and produces a coredump**

Looking at your strace logs, I think you might be correct to have thought that this is actually the same issue as Factorio, either in Civ 6 itself or in some library that it uses. The end of the log for process 134315 looks like the same order of operations I would expect from what Factorio does:

12:30:21.674785 getuid()                = 60104
12:30:21.674811 newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=505, ...}, 0) = 0
12:30:21.674839 newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=420, ...}, 0) = 0
12:30:21.674864 openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
12:30:21.674889 fstat(3, {st_mode=S_IFREG|0644, st_size=505, ...}) = 0
12:30:21.674911 read(3, "# /etc/nsswitch.conf\n#\n# Example"..., 4096) = 505
12:30:21.674937 read(3, "", 4096)       = 0
12:30:21.674958 fstat(3, {st_mode=S_IFREG|0644, st_size=505, ...}) = 0
12:30:21.674980 close(3)                = 0
12:30:21.675005 openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
12:30:21.675029 fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0
12:30:21.675051 lseek(3, 0, SEEK_SET)   = 0
12:30:21.675073 read(3, "root:x:0:0::/root:/usr/bin/bash\n"..., 4096) = 1307
12:30:21.675101 read(3, "", 4096)       = 0
12:30:21.675121 close(3)                = 0
12:30:21.675144 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} ---

So it would be useful if you could retry Civ 6 with the pressure-vessel build from #705 (comment), or with the workaround from #705 (comment).

* Stellaris: game launcher works, **the game crashes when click play from the launcher**

Stellaris shows a similar pattern, so it would be useful if you could retry Stellaris in a similar way.

@emberfade
Copy link

We can also mitigate this from the Steam Runtime side, by programmatically generating an /etc/passwd with the contents that Factorio expects to see, instead of passing through the one from the host system as-is.

I prototyped this and it seems to resolve the crash, at least for the demo.

If you're comfortable with using unreleased software, you can try this out by replacing steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel with the result of unpacking this build: https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/jobs/800334/artifacts/raw/_build/pressure-vessel-bin.tar.gz. It would be useful if a user of systemd-homed could verify this with the full game.

I use systemd-homed and am affected by the crash as well. I can verify this fixes the issue and Factorio starts.

@zhaoweny
Copy link
Author

replacing steamapps/common/SteamLinuxRuntime_soldier/pressure-vesselwith the result of unpacking this build

I was busy playing Factorio last night (It's a great game!). I will try this fix tonight when I get home.

@zhaoweny
Copy link
Author

I tested Civ6, Stellaris, and Factorio (full game, version 2.0.17). They all works with pressure-vessel fix. Thank you, for your hard work and excellent support!

@smcv
Copy link
Contributor

smcv commented Nov 14, 2024

@zhaoweny: Would you be able to get a backtrace from Civ 6 and Stellaris, with a method similar to what you did for Factorio in #705 (comment) ? If we can find out where their similar pattern is happening (in the main executable, or in some library that they use), that would give us better information to report to those games' developers.

You can use Properties → Installed Files → Verify integrity on Steam Linux Runtime 2.0 (soldier) to get it back to the version that has this bug.

@zhaoweny
Copy link
Author

backtrace for stellaris and civ6

Sure, here's backtrace (and a small section of disassembled code) for stellaris:

Program received signal SIGSEGV, Segmentation fault.
0x000000000337df2b in GetUserDir(char const*, char*, int) ()
(gdb) bt
#0  0x000000000337df2b in GetUserDir(char const*, char*, int) ()
#1  0x000000000337e5e9 in VFSGetDefaultUserDir(char const*) ()
#2  0x00000000012eae08 in StartVFS(CString&, char const*, bool, CPdxArray<CString, int>&) ()
#3  0x00000000012f0032 in RunGame(int, char**) ()
#4  0x00000000012ea38c in main ()
(gdb) disassemble 
Dump of assembler code for function _Z10GetUserDirPKcPci:
   0x000000000337df10 <+0>:     push   %r15
   0x000000000337df12 <+2>:     push   %r14
   0x000000000337df14 <+4>:     push   %rbx
   0x000000000337df15 <+5>:     sub    $0x20,%rsp
   0x000000000337df19 <+9>:     mov    %rsi,%r14
   0x000000000337df1c <+12>:    mov    %rdi,%r15
   0x000000000337df1f <+15>:    call   0x120e7c0 <getuid@plt>
   0x000000000337df24 <+20>:    mov    %eax,%edi
   0x000000000337df26 <+22>:    call   0x120f1b0 <getpwuid@plt>
=> 0x000000000337df2b <+27>:    mov    0x20(%rax),%rsi

Here's Civ6 under same Steam Linux runtime, a bit of backtrace and some disassembled code:

(gdb) bt
#0  0x0000000002caedac in ?? ()
#1  0x0000000002caef57 in ?? ()
#2  0x0000000002caf13a in ASL::ASL_GetAspyrDataPath() ()
#3  0x0000000002cb03e2 in ?? ()
#4  0x0000000002cb018d in ASL::ASL_GetJsonData(char const*) ()
#5  0x0000000002ccc829 in ?? ()
#6  0x0000000002ccc267 in ASL::Internal::Prefs::Prefs() ()
#7  0x0000000002cce8c6 in ASL::Internal::Prefs& ASL::ASL_Singleton<ASL::Internal::Prefs>::Create<ASL::Internal::Prefs>(long) ()
#8  0x0000000002cafb7f in ASL::ASL_Main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&&, bool) ()
#9  0x0000000002caf7e2 in main ()

(gdb) display/20i ($pc - 0x10)
2: x/20i ($pc - 0x10)
   0x2caed9c:   push   %rbx
   0x2caed9d:   mov    %rdi,%r15
   0x2caeda0:   call   0x2c62d10 <getuid@plt>
   0x2caeda5:   mov    %eax,%edi
   0x2caeda7:   call   0x2c63780 <getpwuid@plt>
=> 0x2caedac:   mov    0x20(%rax),%r14

If disassembling code is not welcome here, please tell me :P

@smcv
Copy link
Contributor

smcv commented Nov 14, 2024

Thanks! I was half expecting you to report two matching backtraces, indicating that Aspyr and Paradox were both linking to (or perhaps even bundling) the same utility library; but it seems that instead, they've each made the same mistake independently.

Do I assume correctly that both of those are somewhere inside their respective games' main executables?

@smcv
Copy link
Contributor

smcv commented Nov 14, 2024

Reported to Aspyr, for Civ 6 (ticket 233908) and to Paradox, for Stellaris (ticket 308296). I'm assuming we don't need a support ticket for Factorio since a developer is already in this conversation.

For best robustness I'm hoping we can get this fixed from both sides, in SLR and in the affected games.

@raiguard
Copy link

I have merged the fix into Factorio - the game will now prefer $HOME over the results of getpwuid and will have a better error message if the directory can't be determined.

However, I am unable to test this because, as I mentioned before, I am on vacation in Japan with limited resources. I would kindly ask those affected by this to test the next experimental release (2.0.20) when it is released and let me know if there are issues.

@Ealrann
Copy link

Ealrann commented Nov 18, 2024

@raiguard
I just tested with the new 2.0.20, it's working 🎉
No need to run steam with -compat-force-slr off anymore to play Factorio
Thank you

@smcv
Copy link
Contributor

smcv commented Nov 19, 2024

We can also mitigate this from the Steam Runtime side, by programmatically generating an /etc/passwd with the contents that [Civ 6, Factorio < 2.0.20 and Stellaris expect to see], instead of passing through the one from the host system as-is.

If you're comfortable with using unreleased software, you can try this out by replacing steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel with the result of unpacking this build: https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/jobs/800334/artifacts/raw/_build/pressure-vessel-bin.tar.gz. It would be useful if a user of systemd-homed could verify this with the full game.

There has been an update to Steam Linux Runtime 2.0 (soldier) with some unrelated changes, and this particular change unfortunately missed the boat for that update. This means that if you are using this test-build as a workaround, you will need to reapply it now.

I'm hoping to get this fixed from the SLR side in the next SLR 2.0 beta.

I've also been in contact with the Civ 6 and Stellaris support teams, so I'm hoping we can get this fixed from their side as well (by having them make a change similar to the one in Factorio 2.0.20).

@kisak-valve or @zhaoweny, I think at this point it could be useful to retitle this issue so the title says "Civilization 6, Factorio, Stellaris" instead of just Factorio - we're now quite confident that they are all basically the same issue in different codebases.

@kisak-valve kisak-valve changed the title Factorio crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed Civilization 6, Factorio, Stellaris crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed Nov 19, 2024
@zhaoweny
Copy link
Author

I've tested 2.0.20 build of Factorio, and this issue is fixed for me. Will try Civ 6 and Stellaris when the new SLR beta come out.

@smcv
Copy link
Contributor

smcv commented Dec 5, 2024

I'm hoping to get this fixed from the SLR side in the next SLR 2.0 beta.

This change went out in today's beta, which is identified as depot 0.20241127.109699 in VERSIONS.txt.

@smcv
Copy link
Contributor

smcv commented Dec 5, 2024

For completeness, if there are any games affected by this that use Steam Linux Runtime 3.0 'sniper' (I don't think there are), they would be fixed by today's SLR 3.0 beta, which is identified as 0.20241127.109710 in VERSIONS.txt.

@zhaoweny
Copy link
Author

zhaoweny commented Dec 7, 2024

I updated my steam client with beta version of SLR, and I can verify that both Civ6 and Stellaris works as expected with new SLR - while Factorio resolves this issue form their side. I'd consider this bug is solved, thank you!

@smcv
Copy link
Contributor

smcv commented Dec 9, 2024

I'd consider this bug is solved

Thanks for re-testing. Let's leave the issue open until the SLR-side change gets into the stable (non-beta) branch, and then close it.

I haven't heard back from the Civ 6 or Stellaris developers since reporting this, but hopefully they will eventually apply fixes similar to the one in Factorio (which is desirable for other reasons, not just because of this issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants