Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Use NtQueryDirectoryFile to enumerate file entries #36

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rpaquay
Copy link
Contributor

@rpaquay rpaquay commented May 20, 2019

Windows: Use NtQueryDirectoryFile to enumerate file entries

  • NtQueryDirectoryFile is more efficient than FindFindFile/FindNextFile
  • Include a minimal copy of "ntifs.h" containing the definitions required
    to call NtQueryDirectoryFile and RtlNtStatusToDosError.
  • Use a new set of JNI entry point to maximize performance. Instead
    of c++ code calling back into Java, we use a DirectByteBuffer to share
    native memory between C++ and Java, making the JNI interface less chatty
    and more efficient. This is about 20% faster than using Java callbacks.
  • This is available on Windows Vista/Windows Server 2008 and later.
  • Add entry point to check if new API is supported

@lacasseio
Copy link

@rpaquay Could you remove the commits already submitted as PR for #34 and #37. It makes it harder to review the PR.

Copy link

@lacasseio lacasseio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't completely reviewed the Nt* calls but here are a couple comments:

  • Why can't this be used with WINDOWS_MIN? It's probably safe to say this library is only going to run on Windows Vista/7+
  • Given the previous assumption, there should be no concern over NtQueryDirectoryFile availability

@@ -81,6 +94,7 @@ bool is_path_absolute_unc(wchar_t* path, int path_len) {

//
// Returns a UTF-16 string that is the concatenation of |prefix| and |path|.
// The string must be deallocated with a call to free().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be moved to #34

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lacasseio, WINDOWS_MIN is XP, not vista. The library currently works on XP, and I wouldn't change that in this PR. Possibly something we can look at changing later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, the method NtQueryDirectoryFile seems to be Windows XP+ a well: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntifs/nf-ntifs-ntquerydirectoryfile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is supposedly supported on XP, but I have no way of testing it, and some of the other PRs require access to Vista+ only APIs (GetFileInformationByHandleEx).

Overall, I think it is just less confusing to keep this new behavior (symlinks, NtQueryDirectoryFile, file ID) Vista+ only, given we already have a mechanism to enforce that (i.e. WINDOWS_MIN)

#else
// Open file for directory listing
wchar_t* pathStr = java_to_wchar_path(env, path, result);
if (pathStr == NULL) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was an out of memory scenario, the code will crash before reaching this condition. It's most likely this condition is useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem removing these checks, it is just that it is good practice to check for out of memory conditions. It could be, for example, that the String passed in from Java code is big (let's say 1MB), so converting it to UTF-16 may fail because allocating another 1MB fails. It does not necessarily mean the process/machine is out of memory.

That said, all other calls don't check the return value of "malloc", so I can remove these checks if you still think they are useless.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right allocation should be checked. As you noted, there is no check in the original question. It may just get optimized out too.

@rpaquay
Copy link
Contributor Author

rpaquay commented May 30, 2019

@rpaquay Could you remove the commits already submitted as PR for #34 and #37. It makes it harder to review the PR.

I am not sure how to do that. Each PR builds on the previous one. There are submitted in a way that only the last commit of each PR is the actual change (the previous commits are from previous PRs). If I re-submit these PRs as stand alone PRs, they will contains a single commit with all the changes from the previous PRs, so it will be more difficult to review. Maybe I am missing something?

@rpaquay
Copy link
Contributor Author

rpaquay commented May 30, 2019

I haven't completely reviewed the Nt* calls but here are a couple comments:

  • Why can't this be used with WINDOWS_MIN? It's probably safe to say this library is only going to run on Windows Vista/7+
  • Given the previous assumption, there should be no concern over NtQueryDirectoryFile availability

I followed the same convention as b964354.

@lacasseio
Copy link

Some commit could be changed to avoid depending on each other but it would require manual editing or a final PR to enable everything afterward. I will leave it to Adam for the final review.

@rpaquay
Copy link
Contributor Author

rpaquay commented May 31, 2019

I am open to doing anything that makes reviewing easier. As I mentioned, I created these PRs so that only the last commit is the actual change, the previous commits are commits from the previous PRs. My "plan" was to rebase each PR as they are merged into master, so that the final diff is straightforward.

I am also open to abandoning these PRs and wait for each one to be merged before re-submitting the next one, so things are less confusing (i.e. diffs would always be directly against master).

I would rather not make each PRs independent of each other, because I think this will create churn that eventually makes things even more confusing.

Please let me know how you would like me to proceed.

* NtQueryDirectoryFile is more efficient than FindFindFile/FindNextFile
* Include a minimal copy of "ntifs.h" containing the definitions required
  to call NtQueryDirectoryFile and RtlNtStatusToDosError.
* Use a new set of JNI entry point to maximize performance. Instead
  of c++ code calling back into Java, we use a DirectByteBuffer to share
  native memory between C++ and Java, making the JNI interface less chatty
  and more efficient. This is about 20% faster than using Java callbacks.
* This is available on Windows Vista/Windows Server 2008 and later.
* Add entry point to check if new API is supported
@rpaquay rpaquay force-pushed the add-ntquerydirectoryfile-support branch from 1e21238 to cc92066 Compare August 5, 2019 19:59
@rpaquay
Copy link
Contributor Author

rpaquay commented Aug 7, 2019

@adammurdoch

I rebased this PR on top of master, and fixed it to make sure all tests pass. There were 2 failures due to new tests added for checking that "readdir" throws the right exception when called on a non-existing file. This has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants