Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Ability to get all descendants #2285

Open
wvanderdeijl opened this issue Jan 20, 2025 · 5 comments
Open

[FR] Ability to get all descendants #2285

wvanderdeijl opened this issue Jan 20, 2025 · 5 comments
Assignees
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@wvanderdeijl
Copy link

We would love to have the ability to query all descendants of a document (or collection) even when the intermediate documents do not exist.

For example, let's say /my-collection/my-doc/my-sub-collection/my-sub-doc (and many siblings) exists in Firestore, but /my-collection/my-doc does not exist. It's then not possible to query those documents from the sub-collection. Most of the pieces seem to exist in QueryOptions.forKindlessAllDescendants that is used from RecursiveDelete.getAllDescendants but those are all private.

We currently achieve this by getting the FirestoreClient from @google-cloud/firestore/types/v1 by poking in some internal variables of the public Firestore client and then invoking the runQuery method on that client:

client.runQuery({
    parent: `projects/${projectId}/databases/${databaseId}/documents/${rootDoc.path}`,
    structuredQuery: {
        from: [{ allDescendants: true }],
        orderBy: [{ field: { fieldPath: '__name__' } }],
    },
});

But this is a hassle and we get raw protobuf responses that we need to serialize and all the other good stuff that the public Firestore client normally does.

Another alternative is to manipulate allDescendants directly in the QueryOptions (which also works):

const collection = firestore.collection('my-collection')
assert('_queryOptions' in collection);
const qo = collection._queryOptions;
assert(typeof qo === 'object');
assert(!!qo);
assert('allDescendants' in qo);
qo.allDescendants = true;
const { docs } = await collection.get();
expect(docs.map(d => d.ref.path)).toEqual([nestedDoc.path]);

It would be wonderful if a .allDescendants() method could be added to CollectionReference that sets allDescendants in QueryOptions to true. The rest of the handling seems to be there.

@wvanderdeijl wvanderdeijl added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jan 20, 2025
@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/nodejs-firestore API. label Jan 20, 2025
@wvanderdeijl
Copy link
Author

Would you be willing to take a Pull Request for this? If so, I could try to create one.

@wu-hui wu-hui self-assigned this Jan 21, 2025
@wu-hui
Copy link
Contributor

wu-hui commented Jan 21, 2025

Hi @wvanderdeijl

Thanks for poking around!! This is great to see.

The reason we do not have this as an official API is that this would not work with onSnapshot because our backend does not support it. We are looking into ways to bring this feature into public API however, but maybe in a different API form.

In the meantime, please continue to use your walkaround. I will post something here when we do have it as a public API.

@wvanderdeijl
Copy link
Author

Ah, I see. That makes the public API design a bit more tricky. Exposing this as something similar to AggregateQuery would remove onSnapshot as it should. But it also removes the possibility to build on the query with .limit, .startAfter etcetera.

But exposing as a full blown Query would give all these nice query building methods, but also exposes onSnapshot which it should not. So you would need something like DescendantQuery that extends or wraps Query but does not have the onSnapshot and all query building methods on it should return another DescendantQuery instead of Query.

For our own workaround we'll just live with a Query object that throws an error when using onSnapshot. But I understand that is not acceptable for the official public API.

For others interesting in a similar workaround in your own codebase. Be aware how allDescendants:true interacts with kindless:true in the QueryOptions.
You typically start a Query by taking a CollectionReference and treating it as a query. Under the hood a query uses a "parent" and a "collectionName" The "parent" in this scenario is the database root or a DocumentReference.

When query'ing with allDescendants:true but with the default of kindless being false. You recursively query all child documents from that root but with the restriction that the direct parent collection of the found document equals the name of the collection from your query. So starting from a collection at /collection/doc/subCollection and then performing a allDescendants and kindless=false query searches for /collection/doc/**/subCollection/anyDoc.

When setting both allDescendants and kindless to true, the name of the collection from the query is completely ignored. So starting this query from the same collection /collection/doc/subCollection would search for /collection/doc/**/anyDoc.

This had us confused for a while and we think it is an easier mental model when initiating this type of query from a DocumentReference. We now have a utility method to build a allDescendants query with that DocumentReference as its root. You can supply an optional collection name to that method to restrict the direct parent collection name of the found documents. Having that optional argument determines if the query is kindless or not.

Having that utility method means we have to start from a DocumentReference and cannot use the database root as origin to find all documents in the entire database. If you want to do that with a restriction on the direct parent collection of the found documents, this would just be a collectionGroup query which already exists. So the one thing we are missing is a allDescendants without restriction on the collection name for the entire database. But that could easily be added to have a similar utility function that does not take a DocumentReference as its input.

So, for the eventual public API design I feel it would be nice to have a allDescendents method on DocumentReference with a single optional collectionName argument. This method would create a DescendantQuery object with allDescendants set to true. When the collectionName is undefined the DescendantQuery would not have a restriction on the collection name and have kindless set to true. When the collectionName was specified, the DescendantQuery would not be kindless and would have a restriction on the collection name.

A similar allDescendents method might also be added to the Firestore class itself to build a DescendantQuery to get all documents of the entire database. The user could then further restrict that query with a where clause, limits, etc.

@pavadeli
Copy link

When query'ing with allDescendants:true but with the default of kindless being false. You recursively query all child documents from that root but with the restriction that the direct parent collection of the found document equals the name of the collection from your query. So starting from a collection at /collection/doc/subCollection and then performing a allDescendants and kindless=false query searches for /collection/doc/**/subCollection/anyDoc.

When setting both allDescendants and kindless to true, the name of the collection from the query is completely ignored. So starting this query from the same collection /collection/doc/subCollection would search for /collection/doc/**/anyDoc.

So that gives us a way to query for all anyDocs in:

  • /collection/doc/**/subCollection/anyDoc and
  • /collection/doc/**/anyDoc

That is, find all docs that live under a certain parent doc (or the root database I guess), with or without a certain collection-name as direct parent. It does not give us a way to find all docs that live under a certain collection (/collection/**/anyDoc).

It turns out there is a way to do that, which you can find in the RecursiveDelete#getAllDescendants method. The source code explains it quite well:

// To find all descendants of a collection reference, we need to use a
// composite filter that captures all documents that start with the
// collection prefix. The MIN_KEY constant represents the minimum key in
// this collection, and a null byte + the MIN_KEY represents the minimum
// key is the next possible collection.
const nullChar = String.fromCharCode(0);
const startAt = collectionId + '/' + REFERENCE_NAME_MIN_ID;
const endAt = collectionId + nullChar + '/' + REFERENCE_NAME_MIN_ID;
query = query
.where(FieldPath.documentId(), '>=', startAt)
.where(FieldPath.documentId(), '<', endAt);

You can look at the complete method for more inspiration if you need anything like that.

@wvanderdeijl
Copy link
Author

wvanderdeijl commented Jan 27, 2025

We have created a utility function that works in most situations. Perhaps this is helpful to other people looking into this:

import { Query } from '@google-cloud/firestore';
import { QueryOptions } from '@google-cloud/firestore/build/src/reference/query-options';
import assert from 'assert';

/**
 * Builds a `Query` that queries all recursively descendant documents from a given document. When `collection` is given it only returns
 * documents where its immediate parent collection has this name. Please note that this parent collection does not have to be a direct child
 * of the given `DocumentReference` since the query is recursive.
 *
 * Descendant documents will be found even if the given `DocumentReference` itself, or any intermediate documents, do not actually exist.
 *
 * The returned query does not support live queries and `onSnapshot` will throw a runtime error
 *
 * When the query was constructed without a `collection` argument, you cannot use `withConverter` on it as that will re-introduce an (internal)
 * predicate on the `collectionId` with a non existing collection.
 */
export function allDescendants(parent: FirebaseFirestore.DocumentReference, collection?: string) {
    // determine if this will be a "kindless" query meaning without restriction on the name of the direct parent (collection) of the found
    // documents.
    const kindless = collection === undefined;
    // build a query from the document reference and optionally restrict the name of the collection owning the found document(s)
    const query = parent.collection(kindless ? 'unused' : collection);
    assert('_queryOptions' in query, 'Firestore query always has private _queryOptions');
    // cast Query as its constructor is `protected` and we need a public constructor for Typescript to not complain.
    const PublicQuery = Query as unknown as {
        new (
            firestore: FirebaseFirestore.Firestore,
            options: QueryOptions<unknown, FirebaseFirestore.DocumentData>,
        ): FirebaseFirestore.Query;
    };
    // construct a new Query instance with a new QueryOptions instance similar to what Query does internally in the other query building
    // methods.
    return new PublicQuery(
        query.firestore,
        (query._queryOptions as QueryOptions<unknown, FirebaseFirestore.DocumentData>).with({
            allDescendants: true,
            kindless,
        }),
    );
}

And have a look at the unit tests for the (Rust based) Firestore Emulator to see how such a allDescendants query behaves when you continue building on that query with additional predicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants