documentation updates (#19)

* cleanup hash logic, make `rdf_graph` private * checkpoint * more cleanup * update action versions * checkpoint * update tests * checkpoint: green tests * checkpoint: cases `14_2` and `15` are failing * fix: make `__hash()` public * cleanup `conftest` * fix `v_count/e_count` assertions, more test cleanup * new: case 15_1, 15_2, 15_3 * cleanup case 13 * checkpoing: all tests green (except 1 flaky) 15_2 RPT is flaky, need to revisit * update case 14_1 * cleanup tests * checkpoint: `__process_subject_predicate_object` * fix lint * fix conftest * set `continue-on-error` * update case `15_2` (still flaky) * fix: `type` instead of `isinstance` * update case `14_1` * update tests * update case `container.ttl` * update `test_main` add `+ RDFGraph()` hack, update `test_pgt_container` * new: `pgt_remove_blacklisted_statements`, `pgt_parse_literal_statements`, remove `adb_col_blacklist` so many todos... * fix flake * new: case 13_1 and 13_2 * new: case 14_3 * new: case 15_4 * update tests * fix: rdf namespacing * fix case 7 and native graph * new: `adb_col_statements`, `write_adb_col_statements` (pgt) * update 13_2, 15_2, 14_3 * new test cases, use `pytest.xfail` on flaky assertions tests should be green now... * flake ignore can't reproduce * new: `explicit_metagraph`, optimize `fetch_adb_docs` * fix typo * cleanup: `**adb_kwargs` * doc cleanup * cleanup * cleanup: `flatten_reified_triples` * cleanup: progress/spinner bars via `rich` * more `rich` cleanup * update cases 10, 14.3, 15_4 * final main checkpoint: `arango_rdf` * minor cleanup * new: `__pgt_process_rdf_literal` * new: `serialize` as a conversion mode * new: `test_open_intelligence_graph` * fix lint * fix: `statements`, `rdf_graph` ref * cleanup: `write_adb_col_statements` * initial commit * fix lint * update notebook * checkpoint * fix lint * Create .readthedocs.yaml * Update README.md * Update requirements.txt * fix: code block warning * cleanup * nit * fix hyperlinks * fix docstring
ArangoDB-Community · Jan 23, 2024 · 6aa9ecf · 6aa9ecf
1 parent ff23744
commit 6aa9ecf
Show file tree

Hide file tree

Showing 28 changed files with 56,821 additions and 6,613 deletions.
diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml
@@ -0,0 +1,29 @@
+name: Docs
+
+on:
+  pull_request:
+  workflow_dispatch:
+
+jobs:
+  docs:
+    runs-on: ubuntu-latest
+
+    name: Docs
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Fetch all tags and branches
+        run: git fetch --prune --unshallow
+
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+
+      - name: Install dependencies
+        run: pip install .[dev] && pip install -r docs/requirements.txt
+
+      - name: Generate Sphinx HTML
+        run: cd docs && make html
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -19,10 +19,10 @@ jobs:
           python-version: "3.10"
 
       - name: Install release packages
-        run: pip install setuptools wheel twine setuptools-scm[toml]
+        run: pip install build twine
 
       - name: Build distribution
-        run: python setup.py sdist bdist_wheel
+        run: python -m build
 
       - name: Publish to Test PyPi
         env:

diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,29 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the OS, Python version and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.12"
+
+# Build documentation in the "docs/" directory with Sphinx
+sphinx:
+  configuration: docs/conf.py
+  fail_on_warning: true
+
+# Optionally build your docs in additional formats such as PDF and ePub
+# formats:
+#    - pdf
+#    - epub
+
+# Optional but recommended, declare the Python requirements required
+# to build your documentation
+# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+   install:
+   - requirements: docs/requirements.txt
diff --git a/README.md b/README.md
@@ -29,6 +29,7 @@ Resources to get started:
 * [RDF Primer](https://www.w3.org/TR/rdf11-concepts/)
 * [RDFLib (Python)](https://pypi.org/project/rdflib/)
 * [One Example for Modeling RDF as ArangoDB Graphs](https://www.arangodb.com/docs/stable/data-modeling-graphs-from-rdf.html)
+
 ## Installation
 
 #### Latest Release
@@ -41,69 +42,73 @@ pip install git+https://github.com/ArangoDB-Community/ArangoRDF
 ```
 
 ##  Quickstart
-Run the full version with Google Colab: <a href="https://colab.research.google.com/github/ArangoDB-Community/ArangoRDF/blob/main/examples/ArangoRDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
+<a href="https://colab.research.google.com/github/ArangoDB-Community/ArangoRDF/blob/main/examples/ArangoRDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
 
 ```py
 from rdflib import Graph
 from arango import ArangoClient
 from arango_rdf import ArangoRDF
 
-db = ArangoClient(hosts="http://localhost:8529").db("_system_", username="root", password="")
+db = ArangoClient().db()
 
 adbrdf = ArangoRDF(db)
 
-g = Graph()
-g.parse("https://raw.githubusercontent.com/stardog-union/stardog-tutorials/master/music/beatles.ttl")
-
-# RDF to ArangoDB
-###################################################################################
+def beatles():
+    g = Graph()
+    g.parse("https://raw.githubusercontent.com/ArangoDB-Community/ArangoRDF/main/tests/data/rdf/beatles.ttl", format="ttl")
+    return g
+```
 
-# 1.1: RDF-Topology Preserving Transformation (RPT)
-adbrdf.rdf_to_arangodb_by_rpt("Beatles", g, overwrite_graph=True)
+### RDF to ArangoDB
 
-# 1.2: Property Graph Transformation (PGT) 
-adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, overwrite_graph=True)
+**Note**: RDF-to-ArangoDB functionality has been implemented using concepts described in the paper
+*[Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches](https://arxiv.org/abs/2210.05781)*. So we offer two transformation approaches:
 
-g = adbrdf.load_meta_ontology(g)
+1. [RDF-Topology Preserving Transformation (RPT)](https://arangordf.readthedocs.io/en/docs/rdf_to_arangodb_rpt.html)
+2. [Property Graph Transformation (PGT)](https://arangordf.readthedocs.io/en/docs/rdf_to_arangodb_pgt.html)
 
-# 1.3: RPT w/ Graph Contextualization
-adbrdf.rdf_to_arangodb_by_rpt("Beatles", g, contextualize_graph=True, overwrite_graph=True)
+```py
+# 1. RDF-Topology Preserving Transformation (RPT)
+adbrdf.rdf_to_arangodb_by_rpt(name="BeatlesRPT", rdf_graph=beatles(), overwrite_graph=True)
 
-# 1.4: PGT w/ Graph Contextualization
-adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, contextualize_graph=True, overwrite_graph=True)
+# 2. Property Graph Transformation (PGT) 
+adbrdf.rdf_to_arangodb_by_pgt(name="BeatlesPGT", rdf_graph=beatles(), overwrite_graph=True)
+```
 
-# 1.5: PGT w/ ArangoDB Document-to-Collection Mapping Exposed
-adb_mapping = adbrdf.build_adb_mapping_for_pgt(g)
-print(adb_mapping.serialize())
-adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, adb_mapping, contextualize_graph=True, overwrite_graph=True)
+### ArangoDB to RDF
 
-# ArangoDB to RDF
-###################################################################################
+```py
+# pip install arango-datasets
+from arango_datasets import Datasets
 
-# Start from scratch!
-g = Graph()
-g.parse("https://raw.githubusercontent.com/stardog-union/stardog-tutorials/master/music/beatles.ttl")
-adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, overwrite_graph=True)
+name = "OPEN_INTELLIGENCE_ANGOLA"
+Datasets(db).load(name)
 
-# 2.1: Via Graph Name
-g2, adb_mapping_2 = adbrdf.arangodb_graph_to_rdf("Beatles", Graph())
+# 1. Graph to RDF
+rdf_graph = adbrdf.arangodb_graph_to_rdf(name, rdf_graph=Graph())
 
-# 2.2: Via Collection Names
-g3, adb_mapping_3 = adbrdf.arangodb_collections_to_rdf(
-    "Beatles",
-    Graph(),
-    v_cols={"Album", "Band", "Class", "Property", "SoloArtist", "Song"},
-    e_cols={"artist", "member", "track", "type", "writer"},
+# 2. Collections to RDF
+rdf_graph_2 = adbrdf.arangodb_collections_to_rdf(
+    name,
+    rdf_graph=Graph(),
+    v_cols={"Event", "Actor", "Source"},
+    e_cols={"eventActor", "hasSource"},
 )
 
-print(len(g2), len(adb_mapping_2))
-print(len(g3), len(adb_mapping_3))
-
-print('--------------------')
-print(g2.serialize())
-print('--------------------')
-print(adb_mapping_2.serialize())
-print('--------------------')
+# 3. Metagraph to RDF
+rdf_graph_3 = adbrdf.arangodb_to_rdf(
+    name=name,
+    rdf_graph=Graph(),
+    metagraph={
+        "vertexCollections": {
+            "Event": {"date", "description", "fatalities"},
+            "Actor": {"name"}
+        },
+        "edgeCollections": {
+            "eventActor": {}
+        },
+    },
+)
 ```
 
 ##  Development & Testing
@@ -123,76 +128,3 @@ def pytest_addoption(parser):
     parser.addoption("--username", action="store", default="root")
     parser.addoption("--password", action="store", default="")
 ```
-
-## Additional Info: RDF to ArangoDB
-
-RDF-to-ArangoDB functionality has been implemented using concepts described in the paper *[Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches](https://arxiv.org/abs/2210.05781)*.
-
-In other words, `ArangoRDF` offers 2 RDF-to-ArangoDB transformation methods:
-1. RDF-topology Preserving Transformation (RPT): `ArangoRDF.rdf_to_arangodb_by_rpt()`
-2. Property Graph Transformation (PGT): `ArangoRDF.rdf_to_arangodb_by_pgt()`
-
-RPT preserves the RDF Graph structure by transforming each RDF Statement into an ArangoDB Edge.
-
-PGT on the other hand ensures that Datatype Property Statements are mapped as ArangoDB Document Properties.
-
-```ttl
-@prefix ex: <http://example.org/> .
-@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
-ex:book ex:publish_date "1963-03-22"^^xsd:date .
-ex:book ex:pages "100"^^xsd:integer .
-ex:book ex:cover 20 .
-ex:book ex:index 55 .
-```
-
-| RPT | PGT |
-|:-------------------------:|:-------------------------:|
-| ![image](https://user-images.githubusercontent.com/43019056/232347662-ab48ebfb-e215-4aff-af28-a5915414a8fd.png) | ![image](https://user-images.githubusercontent.com/43019056/232347681-c899ef09-53c7-44de-861e-6a98d448b473.png) |
-
---------------------
-### RPT
-
-
-The `ArangoRDF.rdf_to_arangodb_by_rpt` method will store the RDF Resources of your RDF Graph under the following ArangoDB Collections:
-
-    - {graph_name}_URIRef: The Document collection for `rdflib.term.URIRef` resources.
-    - {graph_name}_BNode: The Document collection for`rdflib.term.BNode` resources.
-    - {graph_name}_Literal: The Document collection for `rdflib.term.Literal` resources.
-    - {graph_name}_Statement: The Edge collection for all triples/quads.
-
---------------------
-### PGT
-
-In contrast to RPT, the `ArangoRDF.rdf_to_arangodb_by_pgt` method will rely on the nature of the RDF Resource/Statement to determine which ArangoDB Collection it belongs to. This is referred as the **ArangoDB Collection Mapping Process**. This process relies on 2 fundamental URIs:
-
-1) `<http://www.arangodb.com/collection>` (adb:collection)
-    - Any RDF Statement of the form `<http://example.com/Bob> <adb:collection> "Person"` will map the Subject to the ArangoDB "Person" document collection.
-
-2) `<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>` (rdf:type)
-    - This strategy is divided into 3 cases:
-
-        1. If an RDF Resource only has one `rdf:type` statement,
-            then the local name of the RDF Object is used as the ArangoDB
-            Document Collection name. For example,
-            `<http://example.com/Bob> <rdf:type> <http://example.com/Person>`
-            would create an JSON Document for `<http://example.com/Bob>`,
-            and place it under the `Person` Document Collection.
-            NOTE: The RDF Object will also have its own JSON Document
-            created, and will be placed under the "Class"
-            Document Collection.
-
-        2. If an RDF Resource has multiple `rdf:type` statements,
-            with some (or all) of the RDF Objects of those statements
-            belonging in an `rdfs:subClassOf` Taxonomy, then the
-            local name of the "most specific" Class within the Taxonomy is
-            used (i.e the Class with the biggest depth). If there is a
-            tie between 2+ Classes, then the URIs are alphabetically
-            sorted & the first one is picked.
-
-        3. If an RDF Resource has multiple `rdf:type` statements, with none
-            of the RDF Objects of those statements belonging in an
-            `rdfs:subClassOf` Taxonomy, then the URIs are
-            alphabetically sorted & the first one is picked. The local
-            name of the selected URI will be designated as the Document
-            collection for that Resource.
---------------------
diff --git a/arango_rdf/__init__.py b/arango_rdf/__init__.py
@@ -1 +1,2 @@
+from arango_rdf.controller import ArangoRDFController  # noqa: F401
 from arango_rdf.main import ArangoRDF  # noqa: F401
diff --git a/arango_rdf/controller.py b/arango_rdf/controller.py
@@ -10,9 +10,19 @@
 
 
 class ArangoRDFController(AbstractArangoRDFController):
-    """ArangoDB-RDF controller.
+    """Controller used in RDF-to-ArangoDB (PGT).
 
-    You can derive your own custom ArangoRDFController.
+    Responsible for handling how the ArangoDB Collection Mapping Process
+    identifies the "ideal RDFS Class" among a selection of RDFS Classes
+    for a given RDF Resource.
+
+    The "ideal RDFS Class" is defined as an RDFS Class whose local name best
+    represents the RDF Resource in question. This local name will be
+    used as the ArangoDB Collection name that will store **rdf_resource**.
+
+    `Read more about how the PGT ArangoDB Collection Mapping
+    Process works here
+    <./rdf_to_arangodb_pgt.html#arangodb-collection-mapping-process>`_.
     """
 
     def __init__(self) -> None:
@@ -28,42 +38,19 @@ def identify_best_class(
         """Find the ideal RDFS Class among a selection of RDFS Classes. Essential
         for the ArangoDB Collection Mapping Process used in RDF-to-ArangoDB (PGT).
 
-        The "ideal RDFS Class" is defined as an RDFS Class whose local name can be
-        used as the ArangoDB Document Collection that will store **rdf_resource**.
+        `Read more about how the PGT ArangoDB Collection Mapping
+        Process works here
+        <./rdf_to_arangodb_pgt.html#arangodb-collection-mapping-process>`_.
+
+        The "ideal RDFS Class" is defined as an RDFS Class whose local name best
+        represents the RDF Resource in question. This local name will be
+        used as the ArangoDB Collection name that will store **rdf_resource**.
 
         This system is a work-in-progress. Users are welcome to overwrite this
         method via their own implementation of the `ArangoRDFController`
-        Python Class.
-
-        NOTE: Users are able to access the RDF Graph of the current
-        RDF-to-ArangoDB transformation via the `self.rdf_graph`
-        instance variable, and the database instance via the
-        `self.db` instance variable.
-
-        The current identification process goes as follows:
-        1) If an RDF Resource only has one `rdf:type` statement
-            (either by explicit definition or by domain/range inference),
-            then the local name of the single RDFS Class is used as the ArangoDB
-            Document Collection name. For example,
-            <http://example.com/Bob> <rdf:type> <http://example.com/Person>
-            would place the JSON Document for <http://example.com/Bob>
-            under the ArangoDB "Person" Document Collection.
-
-        2) If an RDF Resource has multiple `rdf:type` statements
-            (either by explicit definition or by domain/range inference),
-            with some (or all) of the RDFS Classes of those statements
-            belonging in an `rdfs:subClassOf` Taxonomy, then the
-            local name of the "most specific" Class within the Taxonomy is
-            used (i.e the Class with the biggest depth). If there is a
-            tie between 2+ Classes, then the URIs are alphabetically
-            sorted & the first one is picked. Relies on **subclass_tree**.
-
-        3) If an RDF Resource has multiple `rdf:type` statements, with
-            none of the RDFS Classes of those statements belonging in an
-            `rdfs:subClassOf` Taxonomy, then the URIs are
-            alphabetically sorted & the first one is picked. The local
-            name of the selected URI will be designated as the Document
-            Collection for **rdf_resource**.
+        Class. Users are able to access the RDF Graph of the current
+        RDF-to-ArangoDB transformation via `self.rdf_graph`, and the
+        database instance via the  `self.db`.
 
         :param rdf_resource: The RDF Resource in question.
         :type rdf_resource: URIRef | BNode
@@ -73,12 +60,12 @@ def identify_best_class(
             domain/range inference.
         :type class_set: Set[str]
         :param subclass_tree: The Tree data structure representing
-            the RDFS subClassOf Taxonomy. See `ArangoRDF.__build_subclass_tree()`
-            for more info.
+            the RDFS subClassOf Taxonomy.
+            See :func:`arango_rdf.main.ArangoRDF.__build_subclass_tree` for more info.
         :type subclass_tree: arango_rdf.utils.Tree
-        :return: The most suitable RDFS Class URI among the set of RDFS Classes
-            to use as the ArangoDB Document Collection name associated to
-            **rdf_resource**.
+        :return: The string representation of the URI of the most suitable
+            RDFS Class URI among the set of RDFS Classes to use as the ArangoDB
+            Document Collection name for **rdf_resource**.
         :rtype: str
         """
         # These are accessible!
Original file line number	Diff line number	Diff line change
		@@ -1 +1,2 @@
		from arango_rdf.controller import ArangoRDFController # noqa: F401
		from arango_rdf.main import ArangoRDF # noqa: F401