Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into expand-new-artifacts
Browse files Browse the repository at this point in the history
* origin/main:
  Add tracking of runs (#83)
  • Loading branch information
lazappi committed Nov 19, 2024
2 parents 3f36c6a + 73b4213 commit abba3ec
Show file tree
Hide file tree
Showing 7 changed files with 152 additions and 69 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- Add a `from_df()` method to the `Registry` class to create new artifacts from data frames (PR #78)
- Create `TemporaryRecord` classes for new artifacts before they have been saved to the database (PR #78)
- Add a `delete()` method to the `Record` class (PR #78)
- Add `track()` and `finish()` methods to the `Instance` class (PR #83)

## MAJOR CHANGES

Expand Down
65 changes: 64 additions & 1 deletion R/Instance.R
Original file line number Diff line number Diff line change
Expand Up @@ -191,10 +191,73 @@ Instance <- R6::R6Class( # nolint object_name_linter
},
#' @description Get the Python lamindb module
#'
#' @param check Logical, whether to perform checks
#' @param what What the python module is being requested for, used in check
#' messages
#'
#' @return Python lamindb module.
get_py_lamin = function() {
get_py_lamin = function(check = FALSE, what = "This functionality") {
if (check && isFALSE(self$is_default)) {
cli::cli_abort(c(
"{what} can only be performed by the default instance",
"i" = "Use {.code connect(slug = NULL)} to connect to the default instance"
))
}

if (check && is.null(self$get_py_lamin())) {
cli::cli_abort(c(
"{what} requires the Python lamindb package",
"i" = "Check the output of {.code connect()} for warnings"
))
}

private$.py_lamin
},
#' @description Start a run with tracked data lineage
#'
#' @details
#' Calling `track()` with `transform = NULL` with return a UID, providing
#' that UID with the same path with start a run
#'
#' @param path Path to the R script or document to track
#' @param transform UID specifying the data transformation
track = function(path, transform = NULL) {
py_lamin <- self$get_py_lamin(check = TRUE, what = "Tracking")

if (is.null(transform)) {
transform <- tryCatch(
py_lamin$track(path = path),
error = function(err) {
py_err <- reticulate::py_last_error()
if (py_err$type != "MissingContextUID") {
cli::cli_abort(c(
"Python error {.val {py_err$type}}",
"i" = "Run {.run reticulate::py_last_error()} for details"
))
}

uid <- gsub(".*\\(\"(.*?)\"\\).*", "\\1", py_err$value)
cli::cli_inform(paste(
"Got UID {.val {uid}} for path {.file {path}}.",
"Run this function with {.code transform = \"{uid}\"} to track this path."
))
}
)
} else {
if (is.character(transform) && nchar(transform) != 16) {
cli::cli_abort(
"The transform UID must be exactly 16 characters, got {nchar(transform)}"
)
}

py_lamin$track(transform = transform, path = path)
}
},
#' @description Finish a tracked run
finish = function() {
py_lamin <- self$get_py_lamin(check = TRUE, what = "Tracking")
py_lamin$finish()
},
#' @description
#' Print an `Instance`
#'
Expand Down
18 changes: 3 additions & 15 deletions R/Registry.R
Original file line number Diff line number Diff line change
Expand Up @@ -154,27 +154,15 @@ Registry <- R6::R6Class( # nolint object_name_linter
#' @return A `TemporaryRecord` object containing the new record. This is not
#' saved to the database until `temp_record$save()` is called.
from_df = function(dataframe, key = NULL, description = NULL, run = NULL) {
if (isFALSE(private$.instance$is_default)) {
cli::cli_abort(c(
"Only the default instance can create records",
"i" = "Use {.code connect(slug = NULL)} to connect to the default instance"
))
}

if (is.null(private$.instance$get_py_lamin())) {
cli::cli_abort(c(
"Creating records requires the Python lamindb package",
"i" = "Check the output of {.code connect()} for warnings"
))
}

if (private$.registry_name != "artifact") {
cli::cli_abort(
"Creating records is only supported for the Artifact registry"
)
}

py_lamin <- private$.instance$get_py_lamin()
py_lamin <- private$.instance$get_py_lamin(
check = TRUE, what = "Creating records"
)

py_record <- py_lamin$Artifact$from_df(
dataframe,
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ LaminDB is accompanied by LaminHub which is a data collaboration hub built on La
- Planned: `.fcs`, `.h5mu`, `.zarr`.
- Create records from data frames.
- Delete records.
- Track code in R scripts and notebooks.

See the development roadmap for more details (`vignette("development", package = "laminr")`).

Expand Down
48 changes: 47 additions & 1 deletion man/Instance.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

80 changes: 31 additions & 49 deletions vignettes/architecture.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,9 @@ classDiagram
Instance --> RelatedRecords
InstanceAPI --> RelatedRecords
%% Use #emsp; to create indents in the rendered diagram when necessary
%% Methods must be on one line to be shown in the right diagram section
%% Use \n for newlines and #emsp; to create indents in the rendered
%% diagram when necessary
class laminr{
+connect(String slug): RichInstance
Expand All @@ -150,13 +152,16 @@ classDiagram
+api_url: String
}
class Instance{
+initialize(
#emsp;InstanceSettings Instance_settings, API api,
#emsp;Map<String, any> schema
): Instance
+initialize(\n#emsp;InstanceSettings Instance_settings, API api, \n#emsp;Map<String, any> schema\n): Instance
+get_modules(): Module[]
+get_module(String module_name): Module
+get_module_names(): String[]
+get_api(): InstanceAPI
+get_settings(): InstanceSettings
+get_py_lamin(Boolean check, String what): PythonModule
+track(String path, String transform): NULL
+finish(): NULL
+is_default: Boolean
}
class InstanceAPI{
+initialize(InstanceSettings Instance_settings)
Expand All @@ -166,38 +171,28 @@ classDiagram
+delete_record(...): NULL
}
class Module{
+initialize(
#emsp;Instance Instance, API api, String module_name,
#emsp;Map<String, any> module_schema
): Module
+initialize(\n#emsp;Instance Instance, API api, String module_name,\n#emsp;Map<String, any> module_schema\n): Module
+name: String
+get_registries(): Registry[]
+get_registry(String registry_name): Registry
+get_registry_names(): String[]
}
class Registry{
+initialize(
#emsp;Instance Instance, Module module, API api,
#emsp;String registry_name, Map<String, Any> registry_schema
): Registry
+initialize(\n#emsp;Instance Instance, Module module, API api,\n#emsp;String registry_name, Map<String, Any> registry_schema\n): Registry
+name: String
+class_name: String
+is_link_table: Bool
+get_fields(): Field[]
+get_field(String field_name): Field
+get_field_names(): String[]
+get(String id_or_uid, Bool include_foreign_keys, List~String~ select, Bool verbose): RichRecord
+get(\n#emsp;String id_or_uid, Bool include_foreign_keys,\n#emsp;List~String~ select, Bool verbose\n): RichRecord
+get_record_class(): RichRecordClass
+get_temporary_record_class(): TemporaryRecordClass
+df(Integer limit, Bool verbose): DataFrame
+from_df(DataFrame dataframe, String key, String description, String run)): TemporaryRecord
+from_df(\n#emsp;DataFrame dataframe, String key,\n#emsp;String description, String run\n): TemporaryRecord
}
class Field{
+initialize(
#emsp;String type, String through, String field_name, String registry_name,
#emsp;String column_name, String module_name, Bool is_link_table, String relation_type,
#emsp;String related_field_name, String related_registry_name, String related_module_name
): Field
+initialize(\n#emsp;String type, String through, String field_name,\n#emsp;String registry_name, String column_name, String module_name,\n#emsp;Bool is_link_table, String relation_type, String related_field_name,\n#emsp;String related_registry_name, String related_module_name\n): Field
+type: String
+through: Map
+field_name: String
Expand All @@ -211,15 +206,12 @@ classDiagram
+related_module_name: String
}
class Record{
+initialize(Instance Instance, Registry registry, API api, Map<String, Any> data): Record
+initialize(\n#emsp;Instance Instance, Registry registry,\n#emsp;API api, Map<String, Any> data\n): Record
+get_value(String field_name): Any
+delete(): NULL
}
class RelatedRecords{
+initialize(
#emsp;Instance instance, Registry registry, Field field,
#emsp;String related_to, API api
): RelatedRecords
+initialize(\n#emsp;Instance instance, Registry registry, Field field,\n#emsp;String related_to, API api\n): RelatedRecords
+df(): DataFrame
+field: Field
}
Expand Down Expand Up @@ -317,13 +309,16 @@ classDiagram
+api_url: String
}
class Instance{
+initialize(
#emsp;InstanceSettings Instance_settings, API api,
#emsp;Map<String, any> schema
): Instance
+initialize(\n#emsp;InstanceSettings Instance_settings, API api, \n#emsp;Map<String, any> schema\n): Instance
+get_modules(): Module[]
+get_module(String module_name): Module
+get_module_names(): String[]
+get_api(): InstanceAPI
+get_settings(): InstanceSettings
+get_py_lamin(Boolean check, String what): PythonModule
+track(String path, String transform): NULL
+finish(): NULL
+is_default: Boolean
}
class InstanceAPI{
+initialize(InstanceSettings Instance_settings)
Expand All @@ -333,38 +328,28 @@ classDiagram
+delete_record(...): NULL
}
class Module{
+initialize(
#emsp;Instance Instance, API api, String module_name,
#emsp;Map<String, any> module_schema
): Module
+initialize(\n#emsp;Instance Instance, API api, String module_name,\n#emsp;Map<String, any> module_schema\n): Module
+name: String
+get_registries(): Registry[]
+get_registry(String registry_name): Registry
+get_registry_names(): String[]
}
class Registry{
+initialize(
#emsp;Instance Instance, Module module, API api,
#emsp;String registry_name, Map<String, Any> registry_schema
): Registry
+initialize(\n#emsp;Instance Instance, Module module, API api,\n#emsp;String registry_name, Map<String, Any> registry_schema\n): Registry
+name: String
+class_name: String
+is_link_table: Bool
+get_fields(): Field[]
+get_field(String field_name): Field
+get_field_names(): String[]
+get(String id_or_uid, Bool include_foreign_keys, List~String~ select, Bool verbose): RichRecord
+get(\n#emsp;String id_or_uid, Bool include_foreign_keys,\n#emsp;List~String~ select, Bool verbose\n): RichRecord
+get_record_class(): RichRecordClass
+get_temporary_record_class(): TemporaryRecordClass
+df(Integer limit, Bool verbose): DataFrame
+from_df(DataFrame dataframe, String key, String description, String run)): TemporaryRecord
+from_df(\n#emsp;DataFrame dataframe, String key,\n#emsp;String description, String run\n): TemporaryRecord
}
class Field{
+initialize(
#emsp;String type, String through, String field_name, String registry_name,
#emsp;String column_name, String module_name, Bool is_link_table, String relation_type,
#emsp;String related_field_name, String related_registry_name, String related_module_name
): Field
+initialize(\n#emsp;String type, String through, String field_name,\n#emsp;String registry_name, String column_name, String module_name,\n#emsp;Bool is_link_table, String relation_type, String related_field_name,\n#emsp;String related_registry_name, String related_module_name\n): Field
+type: String
+through: Map
+field_name: String
Expand All @@ -378,15 +363,12 @@ classDiagram
+related_module_name: String
}
class Record{
+initialize(Instance Instance, Registry registry, API api, Map<String, Any> data): Record
+initialize(\n#emsp;Instance Instance, Registry registry,\n#emsp;API api, Map<String, Any> data\n): Record
+get_value(String field_name): Any
+delete(): NULL
}
class RelatedRecords{
+initialize(
#emsp;Instance instance, Registry registry, Field field,
#emsp;String related_to, API api
): RelatedRecords
+initialize(\n#emsp;Instance instance, Registry registry, Field field,\n#emsp;String related_to, API api\n): RelatedRecords
+df(): DataFrame
+field: Field
}
Expand Down
8 changes: 5 additions & 3 deletions vignettes/development.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,11 @@ This document outlines the features of the **{laminr}** package and the roadmap

### Track notebooks & scripts

* [ ] **Track code execution**: Automatically track the execution of R scripts and notebooks.
* [x] **Track code execution**: Automatically track the execution of R scripts and notebooks.
* [ ] **Capture run context**: Record information about the execution environment (e.g., package versions, parameters).
* [ ] **Link code to artifacts**: Associate code execution with generated artifacts.
* [x] **Link code to artifacts**: Associate code execution with generated artifacts.
* [ ] **Visualize data lineage**: Create visualizations of data lineage and dependencies.
* [x] **Finalize tracking**: End and save a run.

### Curate datasets

Expand Down Expand Up @@ -126,10 +127,11 @@ A first version of the package that allows users to:
* Expand query functionality with comparators, relationships, and pagination.
* Implement basic data and metadata management features (create, save, load and delete artifacts).
* Expand support for different data formats and storage backends.
* Implement code tracking.

### Version 0.3.0

* Implement code tracking and data lineage visualization.
* Implement data lineage visualization.
* Introduce data curation features (validation, standardization, annotation).
* Enhance support for bionty registries and ontology interactions.

Expand Down

0 comments on commit abba3ec

Please sign in to comment.