Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] SearchHost.exe may freeze caused by FileUtil::CreateDirectory #1076

Open
hiroyuki-komatsu opened this issue Oct 12, 2024 · 1 comment

Comments

@hiroyuki-komatsu
Copy link
Collaborator

hiroyuki-komatsu commented Oct 12, 2024

There are some reports that SearchHost.exe freezes with Mozc built with BRANDING=GoogleJapaneseInput.

Through bisect investigations, I figured out 68e99df is the change of this issue. It modified FileUtil::CreateDirectory.

We should update FileUtil::CreateDirectory again to address the problem.

As a tricky point, this issue is reported only on BRANDING=GoogleJapaneseInput and channel_dev=0.

References:

yukawa added a commit to yukawa/mozc that referenced this issue Oct 14, 2024
This commit affects only mozc_tip{32,64}.dll build with debug mode.
There must be no behavior change in release build.

Currently 'Mozc_tsf_ui.log' is created only when NDEBUG is defined.
With this commit we stop creating it even when NDEBUG is defined.
This addresses a crash issue in debug builds discussed in google#1077.

See google#856 about why absl::LocalTimeZone cannot be used in Windows right
now.

Note that this commit may also help us diagnose google#1076, where
  Windows.Storage.OneCore.dll
looks to be intercepting certain Win32 file I/O API calls in
AppContainer processes then trigger RoInitialize as needed.
Creating 'Mozc_tsf_ui.log' only in debug builds can make our debugging
more complicated.
yukawa added a commit to yukawa/mozc that referenced this issue Oct 14, 2024
This commit affects only mozc_tip{32,64}.dll build with debug mode.
There must be no behavior change in release build.

Currently 'Mozc_tsf_ui.log' is created only when NDEBUG is defined.
With this commit we stop creating it even when NDEBUG is defined.
This addresses a crash issue in debug builds discussed in google#1077.

See google#856 about why absl::LocalTimeZone cannot be used in Windows right
now.

Note that this commit may also help us diagnose google#1076, where
  Windows.Storage.OneCore.dll
looks to be intercepting certain Win32 file I/O API calls in
AppContainer processes then trigger RoInitialize as needed.
Creating 'Mozc_tsf_ui.log' only in debug builds can make our debugging
more complicated.
yukawa added a commit to yukawa/mozc that referenced this issue Oct 14, 2024
This attemps to avoid the deadlock issue discussed in google#1076.

From what I can observe on Windows 11 23H2 (22631.4317), it looks to be
dangerous for us to call File I/O Win32 APIs such as CreateFile from
AppContainer processes unless we are confident that the target file/dir
is accessible to the AppContainer process. When such an access is not
allowed at the ACL level, the debugger shows that

  Windows.Storage.OneCore.dll

tries to forward it to a broker process (e.g. via BrokeredCreateFile2).
The issue is that Windows.Storage.OneCore.dll is a WinRT component,
which means that just calling CreateFile() results in dyanamically
invoking RoInitialize() internally, which can confuse TSF runtime as we
have already seen in google#856.

To summarize, the safest option is to ensure that the target file/dir is
always accessible to AppContainer so that

  Windows.Storage.OneCore.dll

will not be involved if the file/dir is opened with CreateFile() from
MozcTip{32,64}.dll.

If this commit is not sufficient to address google#1076, we then need to take
care of other File I/O APIs such as CreateDirectoryW() and
GetFileAttributes().
yukawa added a commit to yukawa/mozc that referenced this issue Oct 14, 2024
This attempts to avoid the deadlock issue discussed in google#1076.

From what I can observe on Windows 11 23H2 (22631.4317), it looks to be
dangerous for us to call File I/O Win32 APIs such as CreateFile from
AppContainer processes unless we are confident that the target file/dir
is accessible to the AppContainer process. When such an access is not
allowed at the ACL level, the debugger shows that

  Windows.Storage.OneCore.dll

tries to forward it to a broker process (e.g. via BrokeredCreateFile2).
The issue is that Windows.Storage.OneCore.dll is a WinRT component,
which means that just calling CreateFile() results in dynamically
invoking RoInitialize() internally, which can confuse TSF runtime as we
have already seen in google#856.

To summarize, the safest option is to ensure that the target file/dir is
always accessible to AppContainer so that

  Windows.Storage.OneCore.dll

will not be involved if the file/dir is opened with CreateFile() from
MozcTip{32,64}.dll.

If this commit is not sufficient to address google#1076, we then need to take
care of other File I/O APIs such as CreateDirectoryW() and
GetFileAttributes().
yukawa added a commit to yukawa/mozc that referenced this issue Oct 14, 2024
This attempts to avoid the deadlock issue discussed in google#1076.

Our current hypothesis to explain google#1076 is that "broadFileSystemAccess"
capability given to SearchHost.exe is playing an interesting role.

When TIP DLL calls CreateFileW API to open "config1.db", the process
itself does not have sufficient  permission to open it. Then
"broadFileSystemAccess" capability will take place and

  Windows.Storage.OneCore.dll

will attempts brokered file access after initializing the thread with
RoInitialize() when not yet initialized. If this happens in a certain
situation, TSF runtime gets confused and may start re-initializing Mozc
TIP.

  1. TSF calls back into Mozc TIP
  2. Mozc TIP calls CreateFileW()
  3. The system invokes RoInitialize() before returning from
     CreateFileW()
  4. TSF calls back into Mozc TIP before returning from RoInitialize()
  5. Now Mozc TIP is handling a reentrant callback.

Given that whether the target app has "broadFileSystemAccess" capability
or not is completely out of Mozc TIP's control, the safest option would
be to ensure that the target file/dir is always accessible to
AppContainer processes.

As the first step, this commit makes 'config1.db' readable from
AppContainer processes. There are two cases when the file permission
of 'config1.db' is updated.

 A. When the installer is installing a new version of Mozc.
 B. When 'config1.db' is updated by mozc_server or mozc_tool.

If this commit is not sufficient to address google#1076, we then need to take
care of other cases such as calling GetFileAttributes() against the
user profile directory.
yukawa added a commit to yukawa/mozc that referenced this issue Oct 14, 2024
This attempts to avoid the deadlock issue discussed in google#1076.

Our current hypothesis to explain google#1076 is that "broadFileSystemAccess"
capability given to SearchHost.exe is playing an interesting role.

When TIP DLL calls CreateFileW API to open "config1.db", the process
itself does not have sufficient  permission to open it. Then
"broadFileSystemAccess" capability will take place and

  Windows.Storage.OneCore.dll

will attempts brokered file access after initializing the thread with
RoInitialize() when not yet initialized. If this happens in a certain
situation, TSF runtime gets confused and may start re-initializing Mozc
TIP.

  1. TSF calls back into Mozc TIP
  2. Mozc TIP calls CreateFileW()
  3. The system invokes RoInitialize() before returning from
     CreateFileW()
  4. TSF calls back into Mozc TIP before returning from RoInitialize()
  5. Now Mozc TIP is handling a reentrant callback.

Given that whether the target app has "broadFileSystemAccess" capability
or not is completely out of Mozc TIP's control, the safest option would
be to ensure that the target file/dir is always accessible to
AppContainer processes.

As the first step, this commit makes 'config1.db' readable from
AppContainer processes. There are two cases when the file permission
of 'config1.db' is updated.

 A. When the installer is installing a new version of Mozc.
 B. When 'config1.db' is updated by mozc_server or mozc_tool.

If this commit is not sufficient to address google#1076, we then need to take
care of other cases such as calling GetFileAttributes() against the
user profile directory.
@yukawa
Copy link
Collaborator

yukawa commented Oct 14, 2024

While I still cannot reproduce the issue locally, based on the information I got my current hypothesis is that broadFileSystemAccess capability given to SearchHost.exe is playing an interesting role.

When Mozc TIP DLL calls CreateFileW API to open config1.db, the process itself does not have sufficient permission to open it. Then broadFileSystemAccess capability takes place and Windows.Storage.OneCore.dll attempts brokered file access after initializing the thread with RoInitialize() when not yet initialized. If this happens in a certain situation, TSF runtime gets confused and may start re-initializing Mozc TIP.

  1. TSF calls back into Mozc TIP
  2. Mozc TIP calls CreateFileW()
  3. The system invokes RoInitialize() before returning from CreateFileW()
  4. TSF calls back into Mozc TIP before returning from RoInitialize()
  5. Now Mozc TIP is handling a reentrant callback.

The on-going issue looks to be a deadlock due to non-reentrant lock.

  1. TSF calls back into Mozc TIP
  2. Mozc TIP acquires a non-reentrant lock.
  3. Mozc TIP calls CreateFileW()
  4. The system invokes RoInitialize() before returning from CreateFileW()
  5. TSF calls back into Mozc TIP before returning from RoInitialize()
  6. Now Mozc TIP is handling a reentrant callback.
  7. Mozc TIP tries to acquire the same non-reentrant lock, which causes a dead lock.

To confirm that SearchHost.exe has broadFileSystemAccess capability, you can find the following SID in its process.

  • S-1-15-3-1024-3247244612-4072385457-573406302-3159362907-4108726569-214783218-394353107-2658650418

yukawa added a commit to yukawa/mozc that referenced this issue Oct 14, 2024
This attempts to avoid the deadlock issue discussed in google#1076.

Our current hypothesis to explain google#1076 is that "broadFileSystemAccess"
capability given to SearchHost.exe is playing an interesting role.

When TIP DLL calls CreateFileW API to open "config1.db", the process
itself does not have sufficient  permission to open it. Then
"broadFileSystemAccess" capability will take place and

  Windows.Storage.OneCore.dll

will attempts brokered file access after initializing the thread with
RoInitialize() when not yet initialized. If this happens in a certain
situation, TSF runtime gets confused and may start re-initializing Mozc
TIP.

  1. TSF calls back into Mozc TIP
  2. Mozc TIP calls CreateFileW()
  3. The system invokes RoInitialize() before returning from
     CreateFileW()
  4. TSF calls back into Mozc TIP before returning from RoInitialize()
  5. Now Mozc TIP is handling a reentrant callback.

Given that whether the target app has "broadFileSystemAccess" capability
or not is completely out of Mozc TIP's control, the safest option would
be to ensure that the target file/dir is always accessible to
AppContainer processes.

As the first step, this commit makes 'config1.db' readable from
AppContainer processes. There are two cases when the file permission
of 'config1.db' is updated.

 A. When the installer is installing a new version of Mozc.
 B. When 'config1.db' is updated by mozc_server or mozc_tool.

If this commit is not sufficient to address google#1076, we then need to take
care of other cases such as calling GetFileAttributes() against the
user profile directory.
yukawa added a commit to yukawa/mozc that referenced this issue Oct 15, 2024
One of the biggest challenge while investigating google#1076 is that it has
been reliably reproducible only on certain users' environments. One idea
to accelerate the debugging in such a scenario is to ship debug symbol
file for Mozc TIP DLL (e.g. mozc_tip64.dll) to users. Then we can ask
the reporter to use tools like Process Explorer and Process Hacker to
dump the call stack of the thread that got stuck.

With this commit we actually start deploying mozc_tip64.dll.pdb to
users.

The storage impact looks to be acceptable overall as long as we strip
private symbols [1]. Here are actual values taken in my local
environment.

 * Mozc64.msi: + 584 kB
 * mozc_tip64.dll.pdb: + 6,116 kB

Note that this commit does not fully take care of Bazel build. While the
symbol file is actually deployed, there remain the following known
issues only in Bazel build.

 * private symbols are not yet stripped out.
 * the symbol file name embedded in mozc_tip64.dll is mozc_tip.pdb
   rather than mozc_tip64.dll.pdb.

 [1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/public-and-private-symbols
yukawa added a commit to yukawa/mozc that referenced this issue Oct 15, 2024
This attempts to avoid the deadlock issue discussed in google#1076.

Our current hypothesis to explain google#1076 is that "broadFileSystemAccess"
capability given to SearchHost.exe is playing an interesting role.

When TIP DLL calls CreateFileW API to open "config1.db", the process
itself does not have sufficient  permission to open it. Then
"broadFileSystemAccess" capability will take place and

  Windows.Storage.OneCore.dll

will attempts brokered file access after initializing the thread with
RoInitialize() when not yet initialized. If this happens in a certain
situation, TSF runtime gets confused and may start re-initializing Mozc
TIP.

  1. TSF calls back into Mozc TIP
  2. Mozc TIP calls CreateFileW()
  3. The system invokes RoInitialize() before returning from
     CreateFileW()
  4. TSF calls back into Mozc TIP before returning from RoInitialize()
  5. Now Mozc TIP is handling a reentrant callback.

Given that whether the target app has "broadFileSystemAccess" capability
or not is completely out of Mozc TIP's control, the safest option would
be to ensure that the target file/dir is always accessible to
AppContainer processes.

As the first step, this commit makes 'config1.db' readable from
AppContainer processes. There are two cases when the file permission
of 'config1.db' is updated.

 A. When the installer is installing a new version of Mozc.
 B. When 'config1.db' is updated by mozc_server or mozc_tool.

If this commit is not sufficient to address google#1076, we then need to take
care of other cases such as calling GetFileAttributes() against the
user profile directory.
hiroyuki-komatsu added a commit that referenced this issue Oct 15, 2024
* This is a temporary workaround to avoid freeze of the host application.
* #1076

PiperOrigin-RevId: 685979550
hiroyuki-komatsu pushed a commit that referenced this issue Oct 15, 2024
This commit affects only mozc_tip{32,64}.dll build with debug mode.
There must be no behavior change in release build.

Currently 'Mozc_tsf_ui.log' is created only when NDEBUG is defined.
With this commit we stop creating it even when NDEBUG is defined.
This addresses a crash issue in debug builds discussed in #1077.

See #856 about why absl::LocalTimeZone cannot be used in Windows right
now.

Note that this commit may also help us diagnose #1076, where
  Windows.Storage.OneCore.dll
looks to be intercepting certain Win32 file I/O API calls in
AppContainer processes then trigger RoInitialize as needed.
Creating 'Mozc_tsf_ui.log' only in debug builds can make our debugging
more complicated.

PiperOrigin-RevId: 686025719
yukawa added a commit to yukawa/mozc that referenced this issue Oct 15, 2024
One of the biggest challenge while investigating google#1076 is that it has
been reliably reproducible only on certain users' environments. One idea
to accelerate the debugging in such a scenario is to ship debug symbol
file for Mozc TIP DLL (e.g. mozc_tip64.dll) to users. Then we can ask
the reporter to use tools like Process Explorer and Process Hacker to
dump the call stack of the thread that got stuck.

With this commit we actually start deploying mozc_tip64.dll.pdb to
users.

The storage impact looks to be acceptable overall as long as we strip
private symbols [1]. Here are actual values taken in my local
environment.

 * Mozc64.msi: + 584 kB
 * mozc_tip64.dll.pdb: + 6,116 kB

Note that this commit does not fully take care of Bazel build. While the
symbol file is actually deployed, there remain the following known
issues only in Bazel build.

 * private symbols are not yet stripped out.
 * the symbol file name embedded in mozc_tip64.dll is mozc_tip.pdb
   rather than mozc_tip64.dll.pdb.

 [1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/public-and-private-symbols
hiroyuki-komatsu pushed a commit that referenced this issue Oct 16, 2024
This attempts to avoid the deadlock issue discussed in #1076.

Our current hypothesis to explain #1076 is that "broadFileSystemAccess"
capability given to SearchHost.exe is playing an interesting role.

When TIP DLL calls CreateFileW API to open "config1.db", the process
itself does not have sufficient  permission to open it. Then
"broadFileSystemAccess" capability will take place and

  Windows.Storage.OneCore.dll

will attempts brokered file access after initializing the thread with
RoInitialize() when not yet initialized. If this happens in a certain
situation, TSF runtime gets confused and may start re-initializing Mozc
TIP.

  1. TSF calls back into Mozc TIP
  2. Mozc TIP calls CreateFileW()
  3. The system invokes RoInitialize() before returning from
     CreateFileW()
  4. TSF calls back into Mozc TIP before returning from RoInitialize()
  5. Now Mozc TIP is handling a reentrant callback.

Given that whether the target app has "broadFileSystemAccess" capability
or not is completely out of Mozc TIP's control, the safest option would
be to ensure that the target file/dir is always accessible to
AppContainer processes.

As the first step, this commit makes 'config1.db' readable from
AppContainer processes. There are two cases when the file permission
of 'config1.db' is updated.

 A. When the installer is installing a new version of Mozc.
 B. When 'config1.db' is updated by mozc_server or mozc_tool.

If this commit is not sufficient to address #1076, we then need to take
care of other cases such as calling GetFileAttributes() against the
user profile directory.

PiperOrigin-RevId: 686325184
hiroyuki-komatsu pushed a commit that referenced this issue Oct 16, 2024
One of the biggest challenge while investigating #1076 is that it has
been reliably reproducible only on certain users' environments. One idea
to accelerate the debugging in such a scenario is to ship debug symbol
file for Mozc TIP DLL (e.g. mozc_tip64.dll) to users. Then we can ask
the reporter to use tools like Process Explorer and Process Hacker to
dump the call stack of the thread that got stuck.

With this commit we actually start deploying mozc_tip64.dll.pdb to
users.

The storage impact looks to be acceptable overall as long as we strip
private symbols [1]. Here are actual values taken in my local
environment.

 * Mozc64.msi: + 584 kB
 * mozc_tip64.dll.pdb: + 6,116 kB

Note that this commit does not fully take care of Bazel build. While the
symbol file is actually deployed, there remain the following known
issues only in Bazel build.

 * private symbols are not yet stripped out.
 * the symbol file name embedded in mozc_tip64.dll is mozc_tip.pdb
   rather than mozc_tip64.dll.pdb.

 [1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/public-and-private-symbols

PiperOrigin-RevId: 686360089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants