Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explainer for FileSystemHandle.getUniqueId() #46

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

a-sully
Copy link
Collaborator

@a-sully a-sully commented Aug 10, 2022

@a-sully a-sully added the addition/proposal New features or enhancements label Aug 10, 2022
@domenic
Copy link
Member

domenic commented Aug 11, 2022

Minor nit: should be Id, not ID. https://w3ctag.github.io/design-principles/#casing-rules

@a-sully a-sully changed the title Add explainer for FileSystemHandle::getUniqueID() Add explainer for FileSystemHandle::getUniqueId() Aug 11, 2022
Copy link
Contributor

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "Id" and not "Uri"?

UniqueID.md Outdated Show resolved Hide resolved
@a-sully
Copy link
Collaborator Author

a-sully commented Aug 17, 2022

Why "Id" and not "Uri"?

This identifier the browser provides is not "universal", since it will provide different strings to different sites for the same file. "Id" is a bit more generic, so I went with that.

Happy to consider other names for this method, though!

@martinthomson
Copy link
Contributor

The "universality" in URIs does not uniqueness in correspondance, either from the reference to the referred object or the converse. file:///etc/passwd is universal, even if the same reference refers to different objects. And it is very common to see multiple references (or aliases) for the same resource on the web.

@a-sully
Copy link
Collaborator Author

a-sully commented Aug 20, 2022

I'm not well-versed in this debate so bear with me here, but my intuition is that "URI" implies it can be used to fetch the resource, which is not the case here. The browser provides the identifier, but after that it's only useful as a tool for the developer to structure their databases.

If we planned to expose a getHandleFrom{Id|Uri}() method, then I agree that "URI" might be the right term. But that's explicitly a non-goal of this proposal...

@annevk
Copy link
Member

annevk commented Aug 29, 2022

  1. This repository is not a good place to host explainers I think. At least that's not something we've traditionally done for WHATWG standards.
  2. If for some reason we wanted this field to be a URL, we shouldn't call it URI: https://url.spec.whatwg.org/#url-apis-elsewhere.

@a-sully
Copy link
Collaborator Author

a-sully commented Aug 29, 2022

  1. This repository is not a good place to host explainers I think. At least that's not something we've traditionally done for WHATWG standards.

Happy to move the explainer elsewhere. Where would you prefer it be hosted?

  1. If for some reason we wanted this field to be a URL, we shouldn't call it URI: https://url.spec.whatwg.org/#url-apis-elsewhere.

Since this method provides a completely opaque identifier, I would assume we don't want it called URL or URI? (which to me implies use for fetching)

@annevk
Copy link
Member

annevk commented Aug 31, 2022

I'm not sure. Maybe @domenic has thoughts on this. Personal repository seems fine to me. Eventually the examples and specification text should become part of the specification through a PR.

(As for URL vs ID, I was mainly replying to the earlier conversation. I'm not sure I see the use for a URL either.)

@domenic
Copy link
Member

domenic commented Aug 31, 2022

I personally think explainers are fine in WHATWG repos, e.g. we have them in whatwg/streams. They can be a crutch, however, leading to people avoiding putting the useful text into the spec itself; so please do avoid that failure mode.

IMO it should be up to the repo editors to decide on things like that.

@jesup
Copy link
Contributor

jesup commented Aug 31, 2022

So... "IDs should not be stable after clearing browsing data. In other words, this
should not be a truly persistent, unclearable identifier"

But hashing the output of resolve() provides a unique identifier already (a full path to the root of the OPFS store), and would remain the same regardless of clearing.

We should be clear about what we're buying with this, and what it actually accomplishes. Obfuscating the hash value here doesn't actually add any security/privacy, at least in the OPFS case. Adding restrictions that don't actually help users just complicates the implementations with no benefit. Are there any risks here?

Also, if we were to expose our internal IDs (with or without hashing/salting/etc) for objects, that would lock us into the existing ID scheme since these need to be stable; to be safe we'd implement an entirely independent ID scheme which could be guaranteed to be stable for all time (or we'd have to blow away older OPFS files on an internal ID scheme change). But that's more complexity yet (though we could defer the complexity there until we need to do the first change to the ID scheme, if ever)

@domenic
Copy link
Member

domenic commented Aug 31, 2022

Expose our internal IDs

If at all possible, I would recommend that the standard require IDs to be returned as v4 (random) UUIDs, which act as a facade over any internal IDs. This guarantees a cross-browser-compatible format which developers can't take any dependency on.

@a-sully
Copy link
Collaborator Author

a-sully commented Sep 1, 2022

You're correct that in the OPFS case this doesn't add anything. The scenarios where this does matter are for files outside OPFS, where this adds significant new capabilities in a privacy-preserving way (i.e. without exposing the full file path).

So... "IDs should not be stable after clearing browsing data. In other words, this should not be a truly persistent, unclearable identifier"

But hashing the output of resolve() provides a unique identifier already (a full path to the root of the OPFS store), and would remain the same regardless of clearing.

As mentioned in the explainer, it's currently not possible to distinguish between two files of the same name in different directories for files outside of the OPFS (since the site may not have access to a shared ancestor directory).

We should be clear about what we're buying with this, and what it actually accomplishes. Obfuscating the hash value here doesn't actually add any security/privacy, at least in the OPFS case. Adding restrictions that don't actually help users just complicates the implementations with no benefit. Are there any risks here?
Also, if we were to expose our internal IDs (with or without hashing/salting/etc) for objects, that would lock us into the existing ID scheme since these need to be stable; to be safe we'd implement an entirely independent ID scheme which could be guaranteed to be stable for all time (or we'd have to blow away older OPFS files on an internal ID scheme change). But that's more complexity yet (though we could defer the complexity there until we need to do the first change to the ID scheme, if ever)

The idea is that the hash is based on things that are stable: the handle type and path, the origin, and some salt that hangs around until the user clears site data. The biggest risk I see is that some future change (something analogous to the Origin -> StorageKey migration, say) would necessitate changing the ID to incorporate this new information... but users can invalidate the ID by clearing site data anyways. For sites using the ID in a client-side database, this is behavior they'll have to be resilient against regardless. So this doesn't feel like a huge risk?

Your concerns do make sense. That said, we do have a need for this for outside-OPFS scenarios. I appreciate your thinking through this with pragmatic suggestions like deferring the complexity until we do need to change the scheme. But before we commit to anything, do you have alternate suggestions?

@a-sully
Copy link
Collaborator Author

a-sully commented Sep 1, 2022

If at all possible, I would recommend that the standard require IDs to be returned as v4 (random) UUIDs, which act as a facade over any internal IDs. This guarantees a cross-browser-compatible format which developers can't take any dependency on.

Seems reasonable to me. I'll add this to the explainer soon if no one objects

@jesup
Copy link
Contributor

jesup commented Sep 30, 2022

random UUIDs could help avoid accidental exposure of info, and as mentioned avoid accidental dependence on a particular hashing format by developers using this.

Does this buy any advantage to spec and implement for OPFS? As you say,

You're correct that in the OPFS case this doesn't add anything. The scenarios where this does matter are for files outside OPFS, where this adds significant new capabilities in a privacy-preserving way (i.e. without exposing the full file path).

So what is the reasoning to add this to OPFS?

Is it "developers can assume getUniqueId() is available to use whether using OPFS or WICG FS Access?" Does that usecase matter to developers?

@a-sully
Copy link
Collaborator Author

a-sully commented Oct 4, 2022

Thinking about this a bit more, I do think there's value to this method within the OPFS. At least there might be, depending on what we decide getUniqueId() returns a unique ID to. As mentioned here #39 (comment), it's unspecified at this point whether a FileSystemHandle is conceptually a path or a reference...

If it's just a path, getUniqueId() provides no value within the OPFS, since calling resolve() on the root directory generates a unique ID (the file path).

If it's a reference, getUniqueId() could track a file even it's moved.

Is it "developers can assume getUniqueId() is available to use whether using OPFS or WICG FS Access?" Does that usecase matter to developers?

I can't speak for all developers, but I would imagine this does. At the very least it would be surprising if the method was only available for handles outside of the OPFS. By adding the getUniqueId() to the WICG spec, the method would be available on all handles, including those in the OPFS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements
Development

Successfully merging this pull request may close these issues.

5 participants