Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Make Postgraphile Treat One-to-Many Relationship as Such #2331

Closed
machineghost opened this issue Jan 22, 2025 · 24 comments
Closed

How to Make Postgraphile Treat One-to-Many Relationship as Such #2331

machineghost opened this issue Jan 22, 2025 · 24 comments

Comments

@machineghost
Copy link
Contributor

machineghost commented Jan 22, 2025

Summary

I'm trying to query for "groups with recommendations", which should be a one-to-many relationship. In other words, I expect Postgraphile to return:

group100: {
   recommendation: [recommendation6, recommendation7]
},
group200 ...

However, Postgraphgile returns duplicate groups, each with a single (different) recommendation (ie. a one-to-one relationship):

group100: {
    recommendation: [recommendation6]
},
group100: {
    recommendation: [recommendation7]
}

Background

My recommendation_groups table is connected to my recommendations table via a grouped_content join table. For instance, If group 100 has recommendations 6 and 7, and child group 200, there would be grouped_content records of:

 parent_id        | recommendation_id               | recommendation_group_id
 100              | 6                               | NULL
 100              | 7                               | NULL
 100              | NULL                            | 200

I also have a recursive_grouped_content function which recurses through the grouped_content records to find recommendations that are (eg.) in groups that are in groups that are in the original group.

Problem View

With that helper I created a view to selects groups and their recommendations:

SELECT 
    rg.*
    r.id AS recommendation_id,
FROM recommendation_groups rg
LEFT JOIN recursive_grouped_content(rg.id) rgc ON rgc.parent_id = rg.id
LEFT JOIN recommendations r ON rgc.recommendation_id = r.id

That query does return multiple rows for each group/recommendation combo ... but it returns:

group100 | recommendation6
group100 | recommendation7

and I can't figure out how to get Postgraphile to combine the results. I'm not sure how I can structure the query differently, or what smart comment I can add, so any help would be appreciated.

Additional context

In my dreams there would be somewhere in the docs that clearly explains how Postgraphile decides to convert (or not) a flat result set into nested returned objects. Presumably, it looks at the structure of the results, meta information(?), and any smart comments, and then there's some algorithm for how it uses them?

I'd be willing to volunteer again to help write such docs.

@github-project-automation github-project-automation bot moved this to 🌳 Triage in V5.0.0 Jan 22, 2025
@benjie
Copy link
Member

benjie commented Jan 22, 2025

In my dreams there would be somewhere in the docs that clearly explains how Postgraphile decides to convert (or not) a flat result set into nested returned objects.

I think you're overthinking what PostGraphile does.

Consider:

create table forums (id serial, name text);
create table posts (id serial, forum_id int not null references forums, body text);

PostGraphile would see the FK from posts to forums and would infer from that that a Post belongs to a Forum, and a Forum has many Posts.

Now consider:

create table forums (id serial, name text);
create table forum_details (id serial, forum_id int not null references forums unique, description text);

Note this one is similar except there's a unique; as such PostGraphile knows there's at most one ForumDetail for each Forum (and each ForumDetail belongs to a single Forum).

PostGraphile does not think in terms of "converting a flat result set into nested returned objects" - there is no "flat result set" as far as it's concerned.

PostGraphile only understands relationships one layer at a time. In V5 you can use refs to make shortcuts to go further (helping you achieve things like many-to-many relationships) but that has to be manually configured.


For me to fully understand your issue and give suggestions, you'll need to share your table definitions, relationships, some sample data, the GraphQL query you've written, the data you're getting back, and what you'd like it to be. I don't quite follow what you have at the moment, and your join table seems a bit weird with all the nulls (e.g. without further information/explanation I don't understand why you don't have two join tables, one for each responsibility; or reference the tables directly via foreign keys).

@machineghost
Copy link
Contributor Author

First off, thanks so much for the help! I was trying not to overwhelm you with useless details, but since you asked ...

For me to fully understand your issue and give suggestions, you'll need to share your table definitions, relationships, some sample data, the GraphQL query you've written, the data you're getting back, and what you'd like it to be.

table definitions, relationships

NOTE: recommendations and recommendation groups have other fields that I didn't include (eg. a title, IDs from other tables, etc.) because they don't impact this issue.

    Table "public.recommendations"
   Column    |           Type           | Collation | Nullable |                   Default                   
-------------+--------------------------+-----------+----------+---------------------------------------------
 id          | integer                  |           | not null | nextval('recommendations_id_seq'::regclass)
(other fields, not relevant)
                                   Table "public.recommendation_groups"
           Column           |           Type           | Collation | Nullable |                      Default                      
----------------------------+--------------------------+-----------+----------+---------------------------------------------------
 id                         | integer                  |           | not null | nextval('recommendation_groups_id_seq'::regclass)
(other fields, not relevant)
   Table "public.grouped_content"
         Column          |  Type   | Collation | Nullable |                      Default                      
-------------------------+---------+-----------+----------+---------------------------------------------------
 recommendation_id       | integer |           |          | 
 recommendation_group_id | integer |           |          | 
 id                      | integer |           | not null | nextval('genre_recommendations_id_seq'::regclass)
 index                   | integer |           |          | 
 parent_id               | integer |           |          | 
 type                    | text    |           |          | 
Indexes:
    "grouped_content_pkey" PRIMARY KEY, btree (id)
Check constraints:
    "grouped_content_type_check" CHECK (type = ANY (ARRAY['game'::text, 'resource'::text]))
Foreign-key constraints:
    "grouped_content_parent_recommendation_group_id_foreign" FOREIGN KEY (parent_id) REFERENCES recommendation_groups(id) ON DELETE CASCADE
    "grouped_content_recommendation_group_id_foreign" FOREIGN KEY (recommendation_group_id) REFERENCES recommendation_groups(id) ON DELETE CASCADE
    "grouped_content_recommendation_id_foreign" FOREIGN KEY (recommendation_id) REFERENCES recommendations(id) ON DELETE CASCADE
View "public.recommendation_groups_with_recommendations"

SELECT
    rg.id,
    (other irrelevant fields from recommendation_groups)
    r.id AS recommendation_id,
    gc.type
 FROM recommendation_groups rg
     LEFT JOIN LATERAL recursive_grouped_content(rg.id) rgc (
         recommendation_id,
         recommendation_group_id,
         id,
         index,
         parent_id,
         type
     ) ON rgc.parent_id = rg.id
     LEFT JOIN recommendations r
         ON rgc.recommendation_id = r.id
     JOIN grouped_content gc
         ON gc.recommendation_id = r.id AND gc.parent_id = rg.id

View smart tags:

      @foreignKey (recommendation_id) references recommendations(id)

the GraphQL query you've written

query Foo {
  recommendationGroupsWithRecommendations(condition: { id: 20 }) {
    id
    title
    recommendation {
      id
      title
  }
}

the data you're getting back

{
  "data": {
    "recommendationGroupsWithRecommendations": [
      {
        "id": 20,
        "title": "Foo",
        "recommendation": {
          "id": 216,
          "title": "bar"
          }
        }
      },
      {
        "id": 20,
        "title": "Foo",
        "recommendation": {
          "id": 332,
          "title": "baz",
        }
    }
  ]
}

what you'd like it to be

{
  "data": {
    "recommendationGroupsWithRecommendations": [
      {
        "id": 20,
        "title": "Foo",
        "recommendation": [{
          "id": 216,
          "title": "bar"
          },
         {
          "id": 332,
          "title": "qux",
        }
    }
  ]
}

I don't understand why you don't have two join tables, one for each responsibility; or reference the tables directly via foreign keys

Because I made it that way years ago and not everyone has the benefit of being a PostgreSQL expert when they design their database ;) At the time I was doing the same basic thing with both recommendations and groups (nesting them in groups), so one table for both seemed to make sense. In hindsight, two separate tables may have been simpler, but I now have other functions and views built around grouped_content.

Summary: Three tables, recommendations, recommendation_groups, and (weirdly complex join table) grouped_content. The first two have IDs, while the last has foreign keys to those IDs (recommendation_id and recommendation_group_id), making both children of a one-to-many relationship. grouped_content also has a foreign key on parent_id (back to recommendation_groups.id), creating a one-to-many relationship from the groups to their content. My view of all three has a further @foreignKey (recommendation_id) references recommendations(id) which I thought would create a one-to-one connection to the (many) recommendations being brought in by the one-to-many foreign key.

So I have three one-to-many relationships, and a "one-to-one" off one of them ... and yet when I query a view that joins based on those relationships, I wind up with separate records for each recommendation ... instead of a single group with multiple recommendations inside it.

@benjie
Copy link
Member

benjie commented Jan 23, 2025

So PostGraphile doesn't understand this schema in the way that you do; you'll have to teach it:

-- 1. Create a view with **one row for each list entry that you want in the output**:
create view unique_recommendation_groups_with_recommendations as
  select
    id, 
    array_agg(recommendation_id) as recommendation_ids
  from recommendation_groups_with_recommendations
  group by id;

-- 2. Create a function that implements the relationship **on each of the above rows**:
create function unique_recommendation_groups_with_recommendations_recommendations(
  row unique_recommendation_groups_with_recommendations
) returns setof recommendations as $$
  select *
  from recommendations
  where id = any(row.recommendation_ids)
$$ language sql stable;

@benjie
Copy link
Member

benjie commented Jan 23, 2025

(Note: the function name is the view name with _recommendations appended since we want our nested field to be called recommendations.)

@benjie
Copy link
Member

benjie commented Jan 23, 2025

[semi-automated message] Thanks for your question; hopefully we're well on the way to helping you solve your issue. This doesn't currently seem to be a bug in the library so I'm going to close the issue, but please feel free to keep requesting help below and if it does turn out to be a bug we can definitely re-open it 👍

You can also ask for help in the #help-and-support channel in our Discord chat.

@benjie benjie closed this as completed Jan 23, 2025
@github-project-automation github-project-automation bot moved this from 🌳 Triage to ✅ Done in V5.0.0 Jan 23, 2025
@machineghost
Copy link
Contributor Author

machineghost commented Jan 24, 2025

Thank you so much!

However, getting back to:

me: In my dreams there would be somewhere in the docs that clearly explains how Postgraphile decides to convert (or not) a flat result set into nested returned objects.

you: I think you're overthinking what PostGraphile does.

I think this very ticket shows that PostGraphile does a lot, and it seems simple and clear to you because you wrote it, but it's opaque to an outside dev. All the conversion from SQL results => JSON happens "behind the scenes" ... which makes documentation explaining what it does so valuable to developers.

In short, if a dev can't find the answer you provided here (in the docs), I'd argue you shouldn't be wasting your time answering questions from jerks like me ... without also asking that we improve the docs, so others don't need to ask such the questions in the future :)

@machineghost
Copy link
Contributor Author

machineghost commented Jan 24, 2025

I'm just imagining something simple like this (no new page or anything):

When it comes to tables Postgraphile handles one-to-many relationships perfectly, but when it comes to functions and views (which don't have actual foreign keys, but return IDs with foreign key relationships) Postgraphile can get confused about the relationship type.

If it does, instead of returning a foreign key relationship column directly (which Postgraphile might mistakenly consider to be a one-to-one relationship, eg. some_id), you can specify a one-to-many relationship by making the function/view return an aggregate array (ie. array_agg(some_id)).

@machineghost
Copy link
Contributor Author

machineghost commented Jan 24, 2025

Also, the provided solution doesn't work. First off, PostgreSQL doesn't seem to like the variable name "row"

  • syntax error at or near "row"

That's easy enough to fix though (row => recs):

create function unique_recommendation_groups_with_recommendations_recommendations(
  recs unique_recommendation_groups_with_recommendations
) returns setof recommendations as $$
  select *
  from recommendations
  where id = any(recs.recommendation_ids)
$$ language sql stable;

... only I renamed things, so my version actually looks like:

CREATE OR REPLACE FUNCTION public.recommendation_groups_with_recommendation(
  recs pg_wrapped_recommendation_groups_with_recommendations
)
RETURNS SETOF recommendations
LANGUAGE sql
STABLE
AS $function$
    SELECT *
    FROM recommendations
    WHERE id = any(recs.recommendation_ids)
$function$
;

That version builds, but I wasn't sure how to run it in PostgreSQL; when I tried:

SELECT *
FROM recommendation_groups_with_recommendations(
    SELECT * FROM pg_wrapped_recommendation_groups_with_recommendations
);

I got:

ERROR: syntax error at or near "SELECT"
LINE 3: SELECT * FROM pg_wrapped_recommendation_groups_with_reco...

I then tried going straight to Postgraphile:

query Foo {
  recommendationGroupsWithRecommendations(condition: {id: 20}) {
    id
    recommendation {
      id
    }
  }
}

... but I got:

{
  "errors": [
    {
      "message": "operator does not exist: integer[] = integer",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": [
        "recommendationGroupsWithRecommendations"
      ]
    }
  ],
  "data": {
    "recommendationGroupsWithRecommendations": null
  }
}

It would appear Postgraphile doesn't like this line of SQL:

where id = any(row.recommendation_ids)

(or in my version):

where id = any(recs.recommendation_ids)

because id is an integer but recommendation_ids is an array_agg(recommendation_id)? I'm not sure why that would be a problem, as any seems to be used properly here?

P.S. My pg_wrapped_recommendation_groups_with_recommendations VIEW is just:

CREATE VIEW pg_wrapped_recommendation_groups_with_recommendations AS
  SELECT id, ARRAY_AGG(recommendation_id) AS recommendation_ids
  FROM unwrapped_recommendation_groups_with_recommendations
  GROUP BY id
`

and the unwrapped version is just the original VIEW that joins everything:

SELECT
   rg.id,
   r.id AS recommendation_id,
   gc.type
-- FROM ... recommendations, recommendation_groups and grouped_content, all join-ed

@machineghost
Copy link
Contributor Author

machineghost commented Jan 26, 2025

I gave up and just made the query return a pair of recommendations/groups. It's not ideal, but it works, and after losing so many hours to this I just can't sink any more in.

Thanks for your help, but man do I wish Postgraphile had better docs! It's the only weak spot in an otherwise amazing library.

@benjie
Copy link
Member

benjie commented Jan 27, 2025

Thank you for sharing your frustrations and your path, it'll be really valuable if we can get to the crux of where your confusion or lack of understanding or broken expectations are coming from such that we can improve the documentation. Unfortunately I've not yet grasped where this is originating, so I'm really not sure how to improve the documentation to address it for you; so with that in mind let's keep digging to see if we can figure it out together...


You can better think of what PostGraphile does not in terms on the SQL that it actually runs, but the equivalent SQL that it could run if it were written with traditional GraphQL resolvers and we didn't care about things like the N+1 problem.

In general, each field either selects a row (or rows) from the database, or accesses an attribute on the result. (This isn't always true, but applies to many of the situations.) Selecting a row (or rows) looks like:

select *
from table
where <conditions>

so:

query Foo {
  recommendationGroupsWithRecommendations(condition: {id: 20}) {
    id
    recommendation {
      id
    }
  }
}

works like this...

First we evaluate recommendationGroupsWithRecommendations(condition: {id: 20}):

select rgwr.*
from recommendation_groups_with_recommendations as rgwr
where rgwr.id = 20;

The definition of your view seems to yield multiple rows with the same id value (I think?), so this yields a list of rows.

We then loop through each of these rows as rgwr and execute its selection set.

To satisfy id we just return the rgwr.id value from the result.

To satisfy the recommendation field we need to ask the DB again for each rgwr.recommendation_id we see:

select r.*
from recommendations r
where r.id = ${rgwr.recommendation_id};

finally we look at the resulting r record and we resolve the id field in its selection set by simply returning r.id.

All of this comes out of how GraphQL itself works (layer by layer resolution via the "resolver" model), understanding of which is assumed as required knowledge by the documentation.

(Quick pause to ask yourself: is this the part you're struggling with? Have you already read https://graphql.org/learn/execution/ or do we need to call this out more in the docs?)


I'm just imagining something simple like this (no new page or anything):

When it comes to tables Postgraphile handles one-to-many relationships perfectly, but when it comes to functions and views (which don't have actual foreign keys, but return IDs with foreign key relationships) Postgraphile can get confused about the relationship type.

This isn't really true; there's no confusion.

Views don't have relationships, but you can add fake foreign key constraints to create relationships and they work in the exact same way that they do for tables - PostGraphile doesn't really see a difference between views and tables (a view is just a table with no constraints as far as PostGraphile is concerned). And the smart tags that add fake foreign key constraints to views can also be used to add fake foreign key constraints to tables, because PostGraphile doesn't differentiate between views and tables.

Functions don't have foreign key relationships themselves, but they do return types, and if the type they return is a composite type (including table and view types) then it will use the same GraphQL type to represent the result, which means it will support all the same relationships because that's how GraphQL works.

If it does, instead of returning a foreign key relationship column directly (which Postgraphile might mistakenly consider to be a one-to-one relationship, eg. some_id), you can specify a one-to-many relationship by making the function/view return an aggregate array (ie. array_agg(some_id)).

PostGraphile only understands things a layer at a time, since this is how GraphQL operates, and PostGraphile thinks views are tables (it knows nothing of your underlying definitions).

(Quick pause to ask yourself: are you expecting PostGraphile to "pierce the veil" of a view and look at its underlying definition in terms of the joins/etc it performs? PostGraphile does not do this, but if you were expecting it to, this would be a great clarifying note to add to the documentation.)

PostGraphile sees all foreign key relationships as one to many UNLESS the columns in the FK constraint also have a unique constraint, in which case it's seen as a one-to-one. I don't think it's possible for PostGraphile to be "mistaken" on this since these are logical assertions unless you have additional logic enforcing them somewhere else (in which case how would PostGraphile be able to know that?).

(Quick pause to ask yourself: do you see the relationships on views as different to those on tables? How so?)

If you were to replace your view with an equivalent table (containing the same columns and data), maybe add some foreign keys, and then render it via an entity relationship diagram (ERD) perhaps it would be more obvious what PostGraphile is doing. You can think of PostGraphile as browsing an ERD one link at a time, each link becoming a field in the resulting GraphQL schema.


SELECT *
FROM recommendation_groups_with_recommendations(
    SELECT * FROM pg_wrapped_recommendation_groups_with_recommendations
);

this should instead be:

SELECT
  r.*,
  pg_wrapped_recommendation_groups_with_recommendations_recommendations(r)
FROM pg_wrapped_recommendation_groups_with_recommendations as r

see the documentation on computed column functions here for why the naming is such:

https://postgraphile.org/postgraphile/current/computed-columns

I've also created a PR to add details on how to call computed column functions yourself in PostgreSQL for those not already familiar with the pattern:

https://github.com/graphile/crystal/pull/2337/files


I think this very ticket shows that PostGraphile does a lot

PostGraphile does a lot; I've been rebuilding it for V5 since 2020 so I definitely know that!

it seems simple and clear to you because you wrote it, but it's opaque to an outside dev.

The basic relationship browsing has been in place since the early versions of PostGraphQL which is before my time as maintainer of the project (I joined towards the end of v3 IIRC). When I first started looking at the project I immediately grokked exactly what PostGraphQL did, so it's not just because I wrote it (since I hadn't at the time even contributed), but presumably because you and I met the project with different experiences and expectations. You'll need to help me slip into your shoes to help me to track down where our paths differed to help me to fix the docs for you and people that have followed a similar path to you.

Aside: Interestingly, though the SQL PostGraphile issues to the database to query these relationships has changed many times over the years (and will likely continue to evolve), the actual structure in the GraphQL schema and the resulting data from querying them is pretty much the same as it has always been - we've added more features like filtering, ordering, aggregates, computed column functions, refs, polymorphism, etc etc; but the core parts, which is I think the parts that you're actually interacting with in this question, have been in place since V1.


If any of this gives you an a-ha moment, please let me know what it was so I can better incorporate it into the docs. If none of it does, then I'll need further help to understand what you're getting stuck on so that we can improve the documentation together.

@machineghost
Copy link
Contributor Author

I realized (after I clicked on one of your links to the v5 docs), that much of my frustration was with Postgraphile's v4 docs. Upgrading to v5 is on my to do list ... after I finish getting this set of endpoints completed ... but that certainly isn't Postgraphile's fault! In fact, the new docs seem to address a lot of my concerns: the one page you linked to had its own full (ie. SQL + GraphQL) example, whereas I could only count four such complete examples in all of the v4 docs.

As for your (great) detailed explanation, I was following along until this:

Functions don't have foreign key relationships themselves, but they do return types, and if the type they return is a composite type (including table and view types) then it will use the same GraphQL type to represent the result, which means it will support all the same relationships because that's how GraphQL works.

How does Postgraphile know? Let's say I have two tables foo and bar in my database with id and created_date columns: if I return a TABLE(id INTEGER, created_date TIMESTAMP) from a function, would Postgraphile think it's a foo or a bar? How does it decide?

Does the naming of these columns matter at all (eg. could I return a foo_id and foo_created_date to ensure it treats it as a foo type?) Is there any difference between SELECT foo.id, SELECT id (FROM foo), SELECT foo.id AS id, in terms of how the type will be determined?

What if I have a foo type with three columns, and my function only returns two: will it recognize it as a partial foo type, or treat it as a completely new type (and not recognize it)? Does it matter if I have fifty columns and only return forty-nine?

PostGraphile does not do this, but if you were expecting it to, this would be a great clarifying note to add to the documentation.)

I'd argue that it shouldn't matter what I expect: explicitly telling users whether views follow foreign key relationships costs virtually nothing (the time to write a sentence) and beats not telling them ... and making them come here to ask you ;)

PostGraphile sees all foreign key relationships as one to many UNLESS the columns in the FK constraint also have a unique constraint, in which case it's seen as a one-to-one.

I feel like this belongs on (and almost contradicts) https://postgraphile.org/postgraphile/next/relations:

PostGraphile automatically discovers relations between database tables by inspecting their foreign keys, and uses these to build relations into the generated GraphQL schema. If you're using the PgIndexBehaviorsPlugin (enabled by default) then PostGraphile will also look at the database indexes and only include relations that don't require a table-scan to implement.
...
PostGraphile detects and exposes one-to-one, one-to-many and many-to-one relations automatically. Many-to-many relationships can be handled with the many-to-many relations plugin.

As for:

I've also created a PR to add details on how to call computed column functions yourself in PostgreSQL for those not already familiar with the pattern:

https://github.com/graphile/crystal/pull/2337/files

I'm a little confused by this:

person.person_full_name as full_name

How is the function getting called here? Is it something to do with this unfamiliar syntax:

```sql {3,5}

In any case, I definitely think an explanation of how to debug computed columns would be valuable on https://postgraphile.org/postgraphile/next/computed-columns and/or https://postgraphile.org/postgraphile/next/debugging!

@benjie
Copy link
Member

benjie commented Jan 27, 2025

I realized (after I clicked on one of your links to the v5 docs), that much of my frustration was with Postgraphile's v4 docs. Upgrading to v5 is on my to do list ... after I finish getting this set of endpoints completed ... but that certainly isn't Postgraphile's fault! In fact, the new docs seem to address a lot of my concerns: the one page you linked to had its own full (ie. SQL + GraphQL) example, whereas I could only count four such complete examples in all of the v4 docs.

This link: https://postgraphile.org/postgraphile/current/computed-columns ?

That's a copy of https://www.graphile.org/postgraphile/computed-columns/ and contains almost exactly the same content (including, apparently, the bugs in which line is syntactically highlit at the bottom... oops!)

(Edited to update: I've just published the changes in #2337 so they are no longer the same.)

It's the V4 link; V5 links have /next/ rather than /current/ in the URL. We're moving over to versioned documentation and @jemgillam's been doing an excellent job of porting the V4 docs across.

How does Postgraphile know? Let's say I have two tables foo and bar in my database with id and created_date columns: if I return a TABLE(id INTEGER, created_date TIMESTAMP) from a function, would Postgraphile think it's a foo or a bar? How does it decide?

Neither; PostGraphile sees that as an anonymous type and will assign it an autogenerated name, something like <function_name>_record (with GraphQL type e.g. FunctionNameRecord). It can't know which of those tables it's meant to be, and a change to either of those tables (e.g. adding a column) would break the association, so it would not be safe to guess or even to state it via smart tag.

Instead, you should always return named types (or sets thereof) from functions. This is mentioned in the customization overview that you helped write:

https://postgraphile.org/postgraphile/next/customization-overview/#adding-a-field-to-a-function-derived-type

but I couldn't find many other cases of returns table (and advising not to use it!) in the docs. We should improve that. Filed: #2338

I'd argue that it shouldn't matter what I expect: explicitly telling users whether views follow foreign key relationships costs virtually nothing (the time to write a sentence) and beats not telling them ... and making them come here to ask you ;)

Documenting every single thing that PostGraphile doesn't do might take a while, and would reduce the signal:noise ratio of the docs. If it's unlikely that people will have misaligned expectations then I'd argue that adding a sentence is a distraction and more likely to confuse the reader than help.

We do have a page dedicated to views which does state some of this:

however PostgreSQL lacks the powerful introspection capabilities on views that it has on tables, so we cannot easily automatically infer the relations
-- V5: https://postgraphile.org/postgraphile/next/views
-- V4: https://www.graphile.org/postgraphile/views/

If it did trip you up somewhere else, then a PR to that location would be appreciated.

I feel like this belongs on (and almost contradicts) postgraphile.org/postgraphile/next/relations:

Ah! So that's where it tripped you up! A PR to that page pointing to the views page and noting that we automatically support relationships via foreign key constraints on tables, and that views do not have foreign key constraints and thus do not get automatic relationships would be good!

I'm a little confused by this:

person.person_full_name as full_name

How is the function getting called here? Is it something to do with this unfamiliar syntax:

```sql {3,5}

No it's nothing to do with the markdown line highlighting syntax; you were reading the raw markdown - perhaps it renders better in the webpage:

https://postgraphile.org/postgraphile/next/computed-columns#:~:text=PostgreSQL%27s%20documentation%20on%20function%20calls%20notes%3A

Note in particular the bolded text in the quote block from the PostgreSQL documentation, namely that col(table) and table.col are equivalent for both column access and function calls. You may want to stick to fn(table) for a function call and table.col for column access, but that's personal preference - PostgreSQL sees them as equivalent.

In any case, I definitely think an explanation of how to debug computed columns would be valuable on postgraphile.org/postgraphile/next/computed-columns and/or postgraphile.org/postgraphile/next/debugging!

A PR doing this, or linking to the new section in the docs, would be welcome. Perhaps add a heading to https://postgraphile.org/postgraphile/next/computed-columns just above "PostgreSQL's documentation on function calls notes:" so that you can cross-link more easily?

@benjie
Copy link
Member

benjie commented Jan 27, 2025

I realised that what I said about named types might not be entirely hitting home, so consider this:

create table users (id serial primary key);
create table pets (id serial primary key, user_id int not null references users);

create function every_other_user() returns setof users as $$
  select users.*
  from users
  where id % 2 = 1;
$$ language sql stable;

You could query this with something like:

{
  everyOtherUser {
    nodes {
      id
      petsByUserId {
        nodes {
          id
        }
      }
    }
  }
}

Notice the returns setof users means that we're returning a literal users record, so we can use the FKeys from the table users because it's literally the same named type. (If we were to add a column to table users, the function would also be expected to return this new column.) This therefore allows you to navigate from the function directly to pets because the function is returning literal users.

When we execute this with PostGraphile we might do it in two separate queries, or we might do a join, or a subquery - how we handle it is up to us - but it'd be equivalent of:

select users.* from every_other_user() as users;

and then for each users:

select pets.* from pets where pets.user_id = ${users.id};

Note we don't (directly) query the users table at all, but the effect of querying the function is equivalent to querying the table.

This is one of many many reasons that I prefer to use functions over views. It's very rare that I'd use a view, and that may well contribute to why they have less attention paid to them in the documentation - in the same way that I don't use Docker much so I don't write about it much. Fortunately it's OSS, so I don't have to be the expert on everything, others can step in and fill in the gaps that they care about 🙌

@machineghost
Copy link
Contributor Author

Documenting every single thing that PostGraphile doesn't do might take a while, and would reduce the signal:noise ratio of the docs. If it's unlikely that people will have misaligned expectations then I'd argue that adding a sentence is a distraction and more likely to confuse the reader than help.

I have to respectfully disagree with this sentiment.

Yes, signal to noise ratio matters, and of course you don't want to add superfluous info ... but you don't get good docs (with a good ratio) by leaving out relevant information. You get good docs the same way as you get any good writing: by throwing everything but the kitchen sink in, and then revising it over time.

You can't just skip straight to "the signal" because (as the person with the most Postgraphile experience on the planet) you're by definition the worst equipped to imagine less experienced users' needs. I can't speak for anyone but myself, but I strongly suspect most would agree: the Postgraphile docs need more. More pages, more details, more examples, more revision.

Your focus feels like it's on trying to gatekeep out the "bad stuff" ... but I'd argue it should be on trying to add more good stuff (and then periodically revisit/revise to address signal concerns).

We do have a page dedicated to views which does state some of this:

PostgreSQL lacks the powerful introspection capabilities on views that it has on tables, so we cannot easily automatically infer the relations.

This seems to contradict my experiences. In general, Postgraphile does what you say, but before I asked this question I asked another one here: https://stackoverflow.com/questions/79363059/make-view-in-postgraphile-return-nested-objects/79376327#79376327 (I'll summarize so you don't have to go read it).

Basically, I asked how I could make a view that selected from "table1 and table2" (ie. recommendation_group and recommendation tables) and returned table2 as an object (ie. recommendation { ...}) in the API. I asked because I had accidentally gotten Postgraphile to do exactly that on a brand new view I'd made with no tags ... and then I did something to mess it up, and I couldn't get it back (thus the question).

Now I'm willing to grant this might have been some bug related to restarting my devsite/caching/whatever ... but for a brief moment Postgraphile did return non-anonymous value from a view, so unless it was a true anomaly it seems like PG does some kind of type lookup... no?

namely that col(table) and table.col are equivalent for both column access and function calls

Ahhhh ... I was not familiar with that syntax; that explains things!

As a side note, I think this is part of the challenge of documenting Postgraphile: on the one hand you have to assume your users have a basic level of SQL proficiency ... but at the same time the 33% (or whatever %) of PG users who are Javascript devs with only basic CRUD SQL proficiency are going to be completely mystified by stuff like volatile functions or row level security policies.

It'd be great if Postgraphile had little "aside" sections in the docs, detailing stuff that's really about PostgreSQL, not Postgraphile ... but is likely enough to be foreign to an inexperienced SQL dev that it gets clarified.

A PR to that page

Before I write any (more) documentation for v5, I think I need to update to using it myself ;)

But to summarize for the future (for me or anyone else):

@benjie
Copy link
Member

benjie commented Feb 3, 2025

You can't just skip straight to "the signal" because (as the person with the most Postgraphile experience on the planet) you're by definition the worst equipped to imagine less experienced users' needs. I can't speak for anyone but myself, but I strongly suspect most would agree: the Postgraphile docs need more. More pages, more details, more examples, more revision.

Your focus feels like it's on trying to gatekeep out the "bad stuff" ... but I'd argue it should be on trying to add more good stuff (and then periodically revisit/revise to address signal concerns).

$ cd postgraphile/website/postgraphile && wc -w *.md */*.md
[...]
 85770 total

The PostGraphile documentation currently has roughly 85,000 words - that's the same number of words as many adult fiction novels. (And that's just the PostGraphile V5 docs, it doesn't include the V4 docs, the Graphile Build docs, the pg-sql2 docs, the Grafast or Grafserv docs, or the docs for graphile-config, graphile-export, pg-introspection and all our various other projects that may or may not be relevant to usage of PostGraphile.) Revising a document like this and ensuring that it remains true and up to date gets harder and harder the larger it gets, especially as the software continues to evolve. I'm a slow reader, I read at roughly talking pace, so reviewing the docs is particularly time consuming for me - not to mention that I'm quite fastidious about ensuring that the text is both correct and, as much as possible, unambiguous. I could write every conversation I've ever had about PostGraphile (such as the above) into the documentation, but I don't think that would serve anyone, and it would certainly make it so that I relied on others to fix the docs when things change because I don't have time to read all that! I do make sure that conversations like this are searchable such that people can find them if they need to, and they can also act as a basis for refining the docs if such updates are deemed necessary/useful.

Once it's established that someone doesn't understand something about PostGraphile, I try and ensure that the documentation has answers to that lack of understanding. Writing more words for the sake of adding words to the documentation serves no-one - there must be a reason, they must serve a purpose that is not already served - hence my question to you: were you, specifically, expecting PostGraphile to parse, interpret, and understand the underlying SELECT statement used to define a VIEW such that it can figure out what all the different parts of the view mean and then somehow represent them as a complex mixture of inferred types and relationships or something like that? So far, you seem to have dodged the question by saying it doesn't matter whether or not you thought it did this, but this is such a drastically complex thing to expect PostGraphile to do (and I've no idea how it would do it, or even what your expected outcome of this would be) that it's really hard to write any documentation about it without confusing regular readers who do not have this expectation. It's still not clear to me whether anyone has ever had this expectation, or even what this expectation would concretely be.

Now I'm willing to grant this might have been some bug related to restarting my devsite/caching/whatever ... but for a brief moment Postgraphile did return non-anonymous value from a view, so unless it was a true anomaly it seems like PG does some kind of type lookup... no?

PostGraphile represents a view in the same way that it would a table - as the type that PostgreSQL defines for the view (or table), as detailed in the system catalogs - you can look at https://pgdoc.link/pg_class, https://pgdoc.link/pg_type and https://pgdoc.link/pg_attribute to get a sense for what we know about a view/table if you like. (Also: https://pgdoc.link/pg_constraint, though as mentioned views don't have constraints, but tables do.)


Interlude: Benjie spends some time re-reading the description of the issue to try and intuit what's going on


Basically, I asked how I could make a view that selected from "table1 and table2" (ie. recommendation_group and recommendation tables) and returned table2 as an object (ie. recommendation { ...}) in the API.

I think I might know what happened here... You have a view that returns a list of columns (it doesn't matter that it queries table1/table2 under the hood, PostGraphile cannot understand this1) and you marked it up with a foreign key to one of these tables. Thus it could traverse the relationship in one direction:

-- Create some "users"
create table users (id serial primary key);
insert into users (id) select i from generate_series(5,15) i;

-- create some "animals"
create table animals (id serial primary key);
insert into users (id) select i from generate_series(3,8) i;

-- Create a view that returns some numbers
create view my_view as
select i as user_id, j as animal_id
from generate_series(1, 10) i,
     generate_series(1, 10) j;

-- Tell PostGraphile that these numbers relate to records from users/animals tables
comment on view my_view is E'@foreignKey (user_id) references users\n@foreign key (animal_id) references animals';

You can then query it like:

{
  allMyViews {
    nodes {
      userId
      animalId
      userByUserId { id }
      animalByAnimalId { id }
    }
  }
}

(Note that the select statement in my_view just selects a series of numbers, and those records don't necessarily actually exist, so some of the results in the query will come up null.)

I think this is the "returned table2 as an object (ie. recommendation { ...})" that you mentioned - i.e. you're seeing a nested field with details of a record here via a (fake) foreign key constraint you've added.

and then I did something to mess it up, and I couldn't get it back (thus the question).

My guess is that you either:

  • accidentally overwrote the comment on the view (thus removing the fake foreign key constraint), or
  • replaced the view (removing the comment, and thus the foreign key), or
  • edited the comment on the view and broke the smart comment syntax which is not very forgiving ("Note that the parser is deliberately very strict currently [...]").

To summarise, my guess is you specifically told PostGraphile "pretend that this view (which PostGraphile sees as a table with no constraints) has a foreign key constraint on it," and so PostGraphile did. This is as described in the third sentence of the view documentation.

PostgreSQL lacks the powerful introspection capabilities on views that it has on tables, so we cannot easily automatically infer the relations.

This seems to contradict my experiences.

Does the above description of what may have happened seem likely? Perhaps you forgot that you added the smart tag, and then thought that PostGraphile inferred it automatically for you, and that was the source of your expectation that PostGraphile could somehow understand your view definition? If so, any suggestion as to what we could add to the documentation to solve this?

Or, perhaps you broke the smart comment syntax, and that caused it to no longer be recognized as smart tags. I'd love help building a "linter" for this, so that people can spot where smart tags may have gone wrong. Long term (V6 or thereafter) I plan to overhaul the smart tag system to have a "registry" such that smart tags get registered with a defined syntax, validations, and scopes (e.g. is this smart tag relevant to columns, tables, functions, constraints, types, or a mixture?) and then the system can more easily spot when they're invalid/corrupted/typo'd - but as you mention, PostGraphile is already very big so I've had to scope this out of V5 due to lack of engineering resources.

As a side note, I think this is part of the challenge of documenting Postgraphile: on the one hand you have to assume your users have a basic level of SQL proficiency ... but at the same time the 33% (or whatever %) of PG users who are Javascript devs with only basic CRUD SQL proficiency are going to be completely mystified by stuff like volatile functions or row level security policies.

Indeed! That's why we have the required knowledge page that outlines things like this and where to learn more about them, and also the PostgreSQL Schema Design guide which walks you through start to finish covering all these topics and more, and we also have "advice" sections at the bottom of various pages, and hyperlinks within the docs to other docs explaining foreign concepts in more detail, and callouts near code that may be unfamiliar to explain it (such as the quote block immediately above the syntax you were unfamiliar with). If you ever find somewhere that doesn't have such a callout or link for syntax or concepts that are unfamiliar, please raise an issue linking to the text in question and we'll tag it with "add-to-docs" to remind ourselves to revise that area - or, even better, send a PR!


We're often complimented on the depth and breadth of our documentation, it's something we put a lot of effort into and that I'm quite proud of - but documentation is never "done" and so we're always looking out for ways to improve it, hence moving to a new framework for versioned documentation in the new website, and doing a top to bottom overhaul of many of the documentation pages due to the restructuring of V5's internals, both of which has involved a significant investment of time. There will be a lot more to do on this front before the release of V5 too!

Thanks for your bullet points, I've filed issues about them:

Footnotes

  1. "PostgreSQL lacks the powerful introspection capabilities on views that it has on tables, so we cannot easily automatically infer the relations" -- https://www.graphile.org/postgraphile/views/

@machineghost
Copy link
Contributor Author

it's something we put a lot of effort into and that I'm quite proud of

As you should be; please do not take me saying that the docs could be improved as saying "the docs are crap" or anything of that nature!

but documentation is never "done"

In a sense, but in a sense I'd disagree.

Once it's established that someone doesn't understand something about PostGraphile, I try and ensure that the documentation has answers to that lack of understanding.

To me, this is the simplest metric of whether any library's docs are "done": when someone asks a basic-to-intermediate question (in Stack Overflow or GitHub Issues), can it be answered with (essentially):

Here, read this

Maybe a little more explanation is required, but the point is that if you can say that, your docs are done (...until someone asks a new basic-to-intermediate question that they don't answer).

I'm sympathetic to your desire to not add endless text to the docs. But, I'd argue that further additions are warranted, per my metric: my question wasn't advanced- or expert-level, and you were unable to answer it with something to the effect of "it's in the docs, go look at this link".

You have a view that returns a list of columns (it doesn't matter that it queries table1/table2 under the hood, PostGraphile cannot understand this) and you marked it up with a foreign key to one of these tables.

Almost. Initially, I had no foreign key smart comments/tags on the view, and I somehow got objects back in the API. But there was an issue (I can't even remember what), so I modified the view. When I did, I lost the objects, and messing with the view didn't bring them back, so I tried adding the @foreignKey.

In other words, your example code should be an accurate approximation, except that initially I didn't have the last line (I added it later to try and get objects). And since this sounds like it might not be how Postgraphile is supposed to work, I'm fully willing to admit I might have observed a caching issue or something similar (which would explain why I couldn't get it back when I messed with my view)!

I swear though, while it may have been a glitch, I did see a view return objects without a @foreignKey; that's what sent me down the road of asking the Stack Overflow question.

I think this is the "returned table2 as an object (ie. recommendation { ...})" that you mentioned - i.e. you're seeing a nested field with details of a record here via a (fake) foreign key constraint you've added.

That makes complete sense.

My guess is that you either:

Look, we're programmers, so we know all about "user error". I'm an error-prone user, so get as many grains of salt ready as you need :)

But, I swear: initially I didn't have a smart comment, and I somehow saw foo { bar not foo_bar_id. If that defies how Postgraphile works, it must have been a fluke, which isn't even worth thinking about further ... but I can only report what I saw.

Does the above description of what may have happened seem likely? Perhaps you forgot that you added the smart tag, and then thought that PostGraphile inferred it automatically for you, and that was the source of your expectation that PostGraphile could somehow understand your view definition? If so, any suggestion as to what we could add to the documentation to solve this?

Again, it all fits, except that initially I didn't have a tag/comment ... but that sounds anamolous enough to not be worth any further brain cycles on your part.

I'd love help building a "linter" for this

Cool idea, but personally I've never had a problem with the smart tags, so this feature wouldn't be valuable to me.

I run into lots of confusion with other parts of the library, but the comments/tags have always made perfect sense to me ... well, except I never can remember why there are two terms (comment/tag) and what the difference between them is; that confuses me ;)

Thanks for your bullet points, I've filed issues about them:

Cool; if I find some time (after I update to Postgraphile 5) I might help with them, but I can't commit to anything.

@benjie
Copy link
Member

benjie commented Feb 5, 2025

PostGraphile’s plugin system is really powerful so it could be that a plugin did what you describe; but short of that I’m at a loss…

(dramatic pause for thought, realisation slowly dawning across his face)

… unless you had one of the view’s “columns” actually BE a record? Like:

Create view as select a.id, b from a, b;

The thought of doing this has never even crossed my mind before, not in the 8 years of me being involved in the project, and I’m on mobile so can’t even check it, but now I strongly suspect that’s what you did! You were probably writing a list of columns and for one of them you put users,username instead of users.username or something like that and the entire record came through!

This is actually pretty cool. Please tell me that’s what you did! If so: you’ve discovered a feature I didn’t even know were had; fancy documenting it for the next person?

For smart tags vs comments: tags is the general concept and can be applied through comments, tags files, plugins, etc; comments is specific to the DB syntax used to apply them in the DB directly. See https://www.graphile.org/postgraphile/smart-tags/#the--character for more details

@machineghost
Copy link
Contributor Author

machineghost commented Feb 5, 2025

unless you had one of the view’s “columns” actually BE a record?

I don't think I did it intentionally, but ... at the time I was playing around with changing the query from a standard JOIN query to a:

SELECT FROM foo, bar WHERE foo.id = bar.id

... which was a new pattern I found on Stack Overflow. It's very possible that ...

You were probably writing a list of columns and for one of them you put users,username instead of users.username or something like that and the entire record came through!

This very well could have been what happened!

This is actually pretty cool. Please tell me that’s what you did! If so: you’ve discovered a feature I didn’t even know were had; fancy documenting it for the next person?

Well again, even I don't know what I did (which is why I asked Stack Overflow/you) ... but I can play with it and see. It'd be fun if I stumbled onto a new pattern.

For smart tags vs comments: tags is the general concept and can be applied through comments, tags files, plugins, etc; comments is specific to the DB syntax used to apply them in the DB directly.

It's so clear and easy when you say it like that: Smart tags are Postgraphile annotations (with several implementation options), while smart comments are just (one of several) ways to implement smart tags 😃 A sentence like that would be great in the docs, except ...

... I'd suggest that a better way to remove the confusion would be to just call smart comments "smart tag comments".

We can agree that "Smart Tag" is a unique and meaningful idea in Postgraphile, and that it makes sense to have its own terminology.

But a smart comment is just a specific way of implementing a "smart tag". You don't have terms for the other implementation options (Smart File Tags, Smart Plugin Tags, Custom Graphile Engine Tags), so unless your goal is to save four characters of typing at the expense of clarity ... why not just retire the term "Smart Comment"?

@machineghost
Copy link
Contributor Author

machineghost commented Feb 5, 2025

Reporting back: it worked! 😃

My (dummy) view:

CREATE VIEW test_view AS
    SELECT recommendations, group_children 
    FROM recommendations, group_children
    WHERE recommendations.id = group_children.recommendation_id

Resulting valid Postgraphile query:

  testViews {
    groupChildren {
      recommendationId
      recommendationGroupId
      index
    }
    recommendations {
      id
    }
  }

Time to update my self-answer to https://stackoverflow.com/questions/79363059/make-view-in-postgraphile-return-nested-objects/79415547 ...

In terms of expressivity I think this is really cool: it doesn't get much simpler/clearer than "select foo in the DB, get an (object) foo endpoint in Postgraphile".

However, I'm a little hesitant to start adopting this new pattern (as I'm now tempted to do), because it's new and unexpected. Can you forsee any obvious downsides to it?

@benjie
Copy link
Member

benjie commented Feb 5, 2025

... I'd suggest that a better way to remove the confusion would be to just call smart comments "smart tag comments".

I mooted this a while back for exactly these reasons, but ultimately we decided against it. Smart comments existed before smart tags existed - at first you could only set them using database comments, but then we added more ways so had to broaden it - and I think there's even a proposal to introduce smart comments to PostgREST (though I'm not sure how far that got, or even if it was actually filed on the PostgREST repo). Most of our legacy users are more familiar with the term "smart comments" than "smart tags"... We did consider calling them "smart comment tags" too, but that also was disliked. I also considered just calling them "tags" (smart comments in the DB can include tags and a description) but that was insufficiently descriptive.

Reporting back: it worked! 😃

Hah! Makes sense - we just use the types that Postgres reports the "columns" are using, and that column is a record type! I bet it works with arrays too, e.g.

create view foo as
  select users as user, array_agg(posts) as posts
  from users
  left join posts
  on posts.author_id = users.id;

probably yields something like:

type Foo {
  user: User
  posts: [Post]
}

However, I'm a little hesitant to start adopting this new pattern (as I'm now tempted to do), because it's new and unexpected. Can you forsee any obvious downsides to it?

The main downside would be that it forces the DB to select the entire record even if only a subset of the columns are needed; but other than that I don't see any major issue. But also I've not really thought about it, and I'm not aware of people using it, so I don't have much data to pull from.

@benjie
Copy link
Member

benjie commented Feb 5, 2025

(I submitted some minor corrections to your SO answer.)

@machineghost
Copy link
Contributor Author

The main downside would be that it forces the DB to select the entire record even if only a subset of the columns are needed; but other than that I don't see any major issue. But also I've not really thought about it, and I'm not aware of people using it, so I don't have much data to pull from.

Cool. I don't think I'm to the point with my data where a few extra columns in the server selection (that the client will never even see) could have any kind of impact, so I think it's safe for me to start playing with it.

(I submitted some minor corrections to your SO answer.)

Oh that was you 😄 I accepted the edit, but (and I fully admit this was nit-picky of me) I did change "record" back to "table" at one point (although I added a parenthetical right after about it actually selecting records).

My thinking was that from the query's standpoint, you are "selecting a table" (in SELECT foo FROM foo the second foo is clearly a table, so it seems to me the first is also). But of course, it's really both, as the results coming back will be records (thus my parenthetical).

@benjie
Copy link
Member

benjie commented Feb 6, 2025

I think “table record” may be best; previously you said something like “select the whole table” which sounded like every row, rather that everything from a single row. Hopefully your parenthetical covers any confusion 👍

@machineghost
Copy link
Contributor Author

That's definitely better/more accurate terminology; I updated the answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants