Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling on creating property graphs #26

Open
4 of 5 tasks
Dtenwolde opened this issue Feb 16, 2023 · 9 comments
Open
4 of 5 tasks

Error handling on creating property graphs #26

Dtenwolde opened this issue Feb 16, 2023 · 9 comments
Assignees
Labels
enhancement New feature or request low priority

Comments

@Dtenwolde
Copy link
Contributor

Dtenwolde commented Feb 16, 2023

Should throw a catalog/Binder exception

  • Internal exception when creating property graph on non-existing table #103
  • Column of a table does not exist
    - [ ] FK of a table does not exist
    - [ ] Primary keys are not unique
  • PK of referenced source table does not exist
  • PK of referenced destination table does not exist
    - [ ] FK of source of edge table does not exist
    - [ ] Multiple PKs are referenced from source/destination table (How would this work?)
  • Ensure everything is case insensitive
@Dtenwolde Dtenwolde transferred this issue from cwida/duckdb-pgq Jun 26, 2023
Dtenwolde pushed a commit that referenced this issue Dec 5, 2023
@Dtenwolde Dtenwolde changed the title Create a test case that creates a property graph on non-existing tables Error handling on creating property graphs Mar 22, 2024
@Dtenwolde Dtenwolde added the enhancement New feature or request label Mar 25, 2024
Copy link

github-actions bot commented Sep 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 7, 2024
@Dtenwolde Dtenwolde self-assigned this Sep 7, 2024
@dentiny
Copy link
Contributor

dentiny commented Dec 3, 2024

Hi @Dtenwolde , I would like to know more about duckdb and pgq; I'm wondering if you could provide me some code pointers, and if I could take some of the low-priority tickets mentioned in the issue? Thank you!

@Dtenwolde
Copy link
Contributor Author

Hi @dentiny, thanks for showing interest! This issue is partially outdated, I haven't found the time yet to verify all of the cases above actually have the correct behaviour of throwing an error. In case you want to work on this I would greatly appreciate it!
The code to verify the property graph is here: https://github.com/cwida/duckpgq-extension/blob/main/src/core/functions/table/create_property_graph.cpp.
Relevant test cases are here: https://github.com/cwida/duckpgq-extension/blob/main/test/sql/create_pg
Here are some instructions to run the tests (I use CLion so if you have issues I can help out): https://www.notion.so/duckpgq/Contribution-guide-d4f31c025cb44cb2bb477666cdd9e86f

Alternatively, if you want to work on implementing a new feature I think supporting more graph algorithms would be nice, I have a list here: #132. The list is mostly a guideline, so if you have another algorithm in mind that's not on the list then let me know :)
In case you want to implement a graph algorithm, I'd take a look at how PageRank is implemented: https://github.com/cwida/duckpgq-extension/blob/main/src/core/functions/table/pagerank.cpp and https://github.com/cwida/duckpgq-extension/blob/main/src/core/functions/scalar/pagerank.cpp.
The query builds an on the fly CSR data structure and then uses that as the graph representation for the algorithm.

Let me know what you'd like to work on and reach out if you have any questions! :)

@dentiny
Copy link
Contributor

dentiny commented Dec 6, 2024

Thank you so much for your kind word and guidance! I will try to get myself familiar with the proj and see if I could help with the error handling.

@Dtenwolde
Copy link
Contributor Author

Hi @dentiny I just found an issue that's quite related to this one and should be a good, concrete first issue to work on if you want: #176 :)

@dentiny
Copy link
Contributor

dentiny commented Dec 12, 2024

Hi @dentiny I just found an issue that's quite related to this one and should be a good, concrete first issue to work on if you want: #176 :)

Thank you so much for the concrete task and instruction! I will work on it this weekend!

@dentiny
Copy link
Contributor

dentiny commented Dec 22, 2024

I check a few items listed in the PR description.

Preparation:

D CREATE TABLE city (  id bigint PRIMARY KEY,  name varchar);CREATE TABLE person (  id bigint PRIMARY KEY,  name varchar);CREATE TABLE livesIn (  personid bigint,  cityid bigint);CREATE TABLE follows (  p1id bigint,  p2id bigint);
D -CREATE PROPERTY GRAPH SocialNetwork  VERTEX TABLES (    person,     city  )  EDGE TABLES (    follows    SOURCE KEY (p1id) REFERENCES person (id)                  DESTINATION KEY (p2id) REFERENCES person (id),    livesin    SOURCE KEY (personid) REFERENCES person (id)                DESTINATION KEY (cityid) REFERENCES city (id));
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ 0 rows  │
└─────────┘
  • Column of a table does not exist
D -SELECT count(id)FROM  GRAPH_TABLE (socialNetwork     MATCH p = ANY SHORTEST (p1:unknown_col WHERE p1.name='Bob') -[f:follows]-> *(p2:person)        -[l:livesIn]->(c:city WHERE c.name='Utrecht')    COLUMNS (p2.id));
Binder Error: The label unknown_col is not registered in property graph SocialNetwork

D -CREATE PROPERTY GRAPH SocialNetwork_second  VERTEX TABLES (    person,     city  )  EDGE TABLES (   follows    SOURCE KEY (p1id) REFERENCES person (nonexistent_col)                  DESTINATION KEY (p2id) REFERENCES person (id),    livesin    SOURCE KEY (personid) REFERENCES person (id)                DESTINATION KEY (cityid) REFERENCES city (id));
Invalid Error: Primary key nonexistent_col does not exist in table person
  • PK of referenced source table does not exist
D -CREATE PROPERTY GRAPH SocialNetwork  VERTEX TABLES (    person,     city  )  EDGE TABLES (    follows    SOURCE KEY (p1id) REFERENCES nonexistent_person (id)                  DESTINATION KEY (p2id) REFERENCES person (id),    livesin    SOURCE KEY (personid) REFERENCES person (id)                DESTINATION KEY (cityid) REFERENCES city (id));
Invalid Error: Table 'nonexistent_person' not found in the property graph SocialNetwork.
  • PK of referenced destination table does not exist
D -CREATE PROPERTY GRAPH SocialNetwork  VERTEX TABLES (    person,     city  )  EDGE TABLES (    follows    SOURCE KEY (p1id) REFERENCES person (id)                  DESTINATION KEY (p2id) REFERENCES nonexistent_person (id),    livesin    SOURCE KEY (personid) REFERENCES person (id)                DESTINATION KEY (cityid) REFERENCES city (id));
Invalid Error: Table 'nonexistent_person' not found in the property graph SocialNetwork.
  • Ensure everything is case insensitive
D -SELECT count(id)FROM  GRAPH_TABLE (socialNetwork     MATCH p = ANY SHORTEST (p1:PERSON WHERE p1.name='Bob') -[f:follows]-> *(p2:person)        -[l:livesIn]->(c:CITY WHERE c.name='Utrecht')    COLUMNS (p2.id));
┌───────────┐
│ count(id) │
│   int64   │
├───────────┤
│         0 │
└───────────┘

As for other cases:

  • FK of a table does not exist
  • Primary keys are not unique
  • FK of source of edge table does not exist

I feel like neither the property graph creation nor the the query has nothing to do with pgq extension, which should be handled by duckdb itself.

Let me know if there's anything else I could do. :)

@Dtenwolde
Copy link
Contributor Author

Thanks for checking!

FK of a table does not exist
Primary keys are not unique
FK of the source of the edge table does not exist

This is indeed something that should be handled by DuckDB, not by DuckPGQ.

All the other points seem to be handled properly.

One small note with regards to "Column of a table does not exist":

-SELECT count(id)FROM  GRAPH_TABLE (socialNetwork     MATCH p = ANY SHORTEST (p1:unknown_col WHERE p1.name='Bob') -[f:follows]-> *(p2:person)        -[l:livesIn]->(c:city WHERE c.name='Utrecht')    COLUMNS (p2.id));

Binder Error: The label unknown_col is not registered in property graph SocialNetwork

This now checks if the label is correctly registered unknown_col. However, this is not a column in the table.

But with your example, I just checked and the non-existing column seems to be handled properly:

⚫◗ SELECT count(coldoesnotexist) FROM  GRAPH_TABLE (SocialNetwork     MATCH p = ANY SHORTEST (p1:person WHERE p1.name='Bob') -[f:follows]-> *(p2:person)        -[l:livesIn]->(c:city WHERE c.name='Utrecht')    COLUMNS (p2.id));
Run Time (s): real 0.009 user 0.001437 sys 0.003755
Binder Error: Referenced column "coldoesnotexist" not found in FROM clause!
Candidate bindings: "unnamed_graphtable.id"
LINE 1: SELECT count(coldoesnotexist) FROM  GRAPH_TABLE (Soc...

There's a feature in SQL/PGQ where users can specify which properties can be used in a MATCH statement for a vertex or edge table, see here.
So for instance, given the following schema:

CREATE TABLE Person(
	id BIGINT PRIMARY KEY, 
    name VARCHAR
);

INSERT INTO Person VALUES (0, 'Foo'), (1, 'Bar');

CREATE PROPERTY GRAPH g VERTEX TABLES (Person PROPERTIES (id));  -- We only select the ID column here as property, not name

SELECT * FROM GRAPH_TABLE (g MATCH (p:Person));

┌───────┬─────────┐
│  id   │  name   │
│ int64 │ varchar │
├───────┼─────────┤
│     0 │ Foo     │
│     1 │ Bar     │
└───────┴─────────┘

According to the official specification, this result is incorrect and should only return the id column.

SELECT * FROM GRAPH_TABLE (g MATCH (p:Person) COLUMNS (p.id, p.name));

This should throw an error message stating, property "NAME" is never defined. (This is what Oracle does, at least.)

I don't expect this to be a high-priority issue, but it's worth noting that this should be fixed sometime. I think we'd have to add a check here to see if the column is actually registered for the given vertex or edge table, and if not throw an error. In match.cpp we transform the PGQ syntax into a relational query plan, which can be quite complex. But I think this extra check shouldn't be too involved. If you want to, you can work on this.

Another relatively simple issue: #187
Something else that I think could be very useful would be adding more graph algorithms, Neo4j has a nice list: https://neo4j.com/docs/graph-data-science/current/algorithms/. If you want to implement any of these, let me know and I can give you some pointers (I have been working on triangle counting recently)

Other issues often involve translating the query plan (#183, #112) or adding changes to the parser (#43, #14) which are more complex.

Finally, I am on holiday for the next two weeks, but I'll still check my mail but I might not do as much coding :)

@dentiny
Copy link
Contributor

dentiny commented Dec 23, 2024

This now checks if the label is correctly registered unknown_col. However, this is not a column in the table.
But with your example, I just checked and the non-existing column seems to be handled properly

Thank you for the correction and checking!

I don't expect this to be a high-priority issue, but it's worth noting that this should be fixed sometime. I think we'd have to add a check here to see if the column is actually registered for the given vertex or edge table, and if not throw an error. In match.cpp we transform the PGQ syntax into a relational query plan, which can be quite complex. But I think this extra check shouldn't be too involved. If you want to, you can work on this.

Another relatively simple issue: #187

Good to know! I would like to work on these.

Something else that I think could be very useful would be adding more graph algorithms

I haven't digged into the graph algorithm, I would like to take a look first.

Finally, I am on holiday for the next two weeks, but I'll still check my mail but I might not do as much coding :)

Happy holiday Daniël!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low priority
Projects
None yet
Development

No branches or pull requests

2 participants