Skip to content

Commit

Permalink
flesh out agent sdk
Browse files Browse the repository at this point in the history
  • Loading branch information
bensonlee5 committed Oct 7, 2024
1 parent c7a5b2e commit db51c12
Show file tree
Hide file tree
Showing 3 changed files with 115 additions and 31 deletions.
2 changes: 1 addition & 1 deletion docs/app/agents/QuickstartBuildAgent.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,6 @@ Now that you have successfully created and installed an Agent, you can explore m
- [How to modify the agent code](./Agent#watch-for-files-locally-then-run-flow) to add custom functionality, such as
- [Adding tags to files](../files/Tags.mdx) to make captured files easier to find in Ganymede
- Parsing metadata from file contents to determine how files are processed
- Delivering [multiple files into a single Node](../../sdk/markdowns/AgentSDK#classes-for-agent-triggered-flows)
- Delivering [multiple files into a single Node](../../sdk/markdowns/AgentSDK#agent-triggered-flows)
- Incorporating [Agent utility functions](../../sdk/markdowns/AgentSDK) from the Ganymede SDK and Agent SDK
- Interpreting [Agent log messages](./AgentLogs)
2 changes: 1 addition & 1 deletion docs/app/files/Tags.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The strict mode setting, if disabled, allows admins to delete or modify tags. T

### Tagging Files

Files can be tagged in user-defined code within flows and Agents, though the methods differ slightly. In flows, files are tagged by passing the file path to the `add_file_tag` function. Within Agents, files are tagged by passing the [FileParam](../../sdk/markdowns/AgentSDK#classes-for-agent-triggered-flows) object into the `add_file_tag_to_fileparam` function. The FileParam object contains the file that the Agent submits to Ganymede storage (for initiating a flow if the Agent is configured to do so).
Files can be tagged in user-defined code within flows and Agents, though the methods differ slightly. In flows, files are tagged by passing the file path to the `add_file_tag` function. Within Agents, files are tagged by passing the [FileParam](../../sdk/markdowns/AgentSDK#agent-triggered-flows) object into the `add_file_tag_to_fileparam` function. The FileParam object contains the file that the Agent submits to Ganymede storage (for initiating a flow if the Agent is configured to do so).

The full set of methods available for interacting with tags can be found on the [File Tag](../../sdk/FileTags.mdx) module in the SDK documentation.

Expand Down
142 changes: 113 additions & 29 deletions docs/sdk/markdowns/AgentSDK.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ toc_max_heading_level: 4

import NodeChip from '@site/src/components/NodeChip.js'

## Classes for Agent-triggered flows
## Agent-triggered flows

Objects for triggering a Flow from an [Agent](../../app/agents/Agent) can be found in `agent_sdk` for Agents v5.0+.

### FileWatcherResult Class
### `class` FileWatcherResult

FileWatcherResult is a dictionary of FileParam objects indexed by `node name`.`param name`.

- _param_ **files**: dict[str, fileParam] - Dictionary of FileParam objects indexed by `node name`.`param name`
- _param_ **tags**: list[FileTag] | None - List of tags to be applied to all files
- _param_ **files**: dict[str, fileParam | list[FileParam]] - Dictionary of FileParam objects indexed by `node name`.`param name`
- _param_ **tags**: list[UnappliedFileTag] | None - List of tags to be applied to all files

### TriggerFlowParams Class
### `class` TriggerFlowParams

TriggerFlowParams specifies the inputs for the Flow executed when all files are observed. It includes the following parameters:

Expand All @@ -28,7 +28,7 @@ TriggerFlowParams specifies the inputs for the Flow executed when all files are
- _param_ **benchling_tag**: Tag | None - Additional parameters to be passed to flow. This parameter is used for inputs to the Input_Benchling node.
- _param_ **additional_params**: dict[str, str] | None - Additional parameters to be passed to flow. This parameter is used for inputs to the [Input_Param node](../../nodes/Tag/Input_Param.md); the key is the name if the Node name for the input parameter, and the value is the string to pass into the Node.

### FileParam Class
### `class` FileParam

FileParam specifies files to be uploaded to Ganymede Cloud and their corresponding Flow parameters. These parameters are provided to the _execute_ function once all files are detected.

Expand All @@ -43,7 +43,7 @@ FileParam specifies files to be uploaded to Ganymede Cloud and their correspondi
- _param_ **bucket_name**: str - Bucket associated with file
- _param_ **files**: str - Alternative method for specifying file contents, where the key is the filename and the value is the file body.

### MultiFileParam Class
### `class` MultiFileParam

MultiFileParam is used for submitting multiple files to a single node. It includes the following parameters:

Expand All @@ -63,7 +63,50 @@ The MultiFileParam object contains a method for initiation from a list of FilePa
m = agent_sdk.file_params_list_to_multi([fp1, fp2])
```

## Utility functions
### `class` NoOpFileTagParams

NoOpFileTagParams is a used to specify that tags should be applied to a file, but that no Flow should be triggered upon file upload to Ganymede.

- _param_ **files**: list[FileParam | list[FileParam]] - List of FileParam objects to apply tags to

### `function` fp

`fp` returns a function that performs pattern matching against a file path. Specifically, the function returns callable[[str], bool] - a function that takes a file path and returns True if the file path matches the pattern, and False otherwise.

This function can be useful as a template for creating your own pattern matching functions.

- _param_ **watch_dir**: str - Directory to watch for files
- _param_ **pattern**: str - Glob pattern to match against the file path
- _param_ **seconds_since_modification**: int | None - if set, filters for files last modified within the number of seconds specified, by default None
- _param_ **seconds_since_access**: int | NOne - if set, filters for files last accessed within the number of seconds specified, by default None

### `function` file_params_list_to_multi

`file_params_list_to_multi` converts a list of FileParam objects to a MultiFileParam object.

- _param_ **file_params**: list[FileParam] - List of FileParam objects to convert to MultiFileParam

## Tag-related classes and functions

### `class` FlowTag

The FlowTag class is used to represent a tag that can be applied to a file. This class is not used for applying tags, but rather for interacting with tags already applied to files.

- _param_ **tag_id**: str - Name of the tag type applied to a file.
- _param_ **display_tag**: str - Value of the tag applied to a fil
- _param_ **upload_ts**: datetime - Timestamp of when tag was applied

### `function` add_file_tag_to_fileparam

`add_file_tag_to_fileparam` adds a Tag to a FileParam object, returning a FileParam | MultiFileParam object with the tag applied.

_param_ **file_param**: FileParam | MultiFileParam - FileParam object to add Tag to
_param_ **tag_type_id**: str - Tag type of Tag to add
_param_ **display_value**: str - Value of Tag to add
_param_ **tag_id**: str | None - Optional Tag ID which can be used to reference the Tag in code
_param_ **url**: str | None - Optional URL to associate with the Tag

## Checksum functions

Agent utility functions are provided in `agent_sdk` for validating data integrity and interacting with file systems.

Expand All @@ -75,7 +118,22 @@ The `agent_sdk` is only available for Agents v5.0+. Prior to v5.0, these functi

### Computing file checksums

Ganymede provides functions to validate file integrity; these values can be used to verify the integrity of a file uploaded to cloud storage:
Ganymede provides functions to validate file integrity; these values can be used to verify the integrity of a file uploaded to cloud storage

### `function` calculate_crc32c

The function returns the CRC32C checksum of a file as a string encoded in utf-8.

- _param_ **file_path**: str - Path to file to generate checksum for
- _param_ **blocksize**: int | None - Block size to use for the checksum calculation. If not specified, the default block size is 2**20.

### `function` calculate_md5

The function returns the MD5 hash of a file as a string encoded in utf-8.

- _param_ **file_path**: str - Path to file to generate MD5 hash for

### Examples

```python
# Before Agent v5.0
Expand Down Expand Up @@ -109,68 +167,94 @@ crc32c = calculate_crc32c(tmp_file_name)
os.remove(tmp_file_name)
```

### File system utilities
## File system utilities

`agent_sdk` provides a number of convenience functions, which can be helpful to use with cron Agents that involve more complex logic prior to invoking a flow. Some examples of this are when a file is written to multiple times before being processed, or if there is a variable number of files being processed, such that the trigger for invoking a flow requires more than just the presence of a file.

#### ScanResult Dataclass
### `class` ScanResult

ScanResult stores file paths for files of interest. It includes:
ScanResult is a frozen dataclass stores file paths for files of interest. Two files are considered to be the same if they have the same relative_path amd modified_time.

- _param_ **file_path**: str - Path to file
- _param_ **relative_path**: str - Path to file
- _param_ **modified_time**: datetime - Datetime of when file was last modified

#### Functions
### `function` list_files_recursive

`list_files_recursive` returns a list of all files in a directory and its subdirectories.
`list_files_recursive` returns a list of all filepaths in a directory and its subdirectories as a list[str].

- _param_ **file_path**: str - Path to directory to list files from

### `function` matches_pattern

`matches_pattern` returns True if a file path matches at least one of the specified regex patterns specified and False otherwise.

- _param_ **filename**: str - Name of file
- _param_ **pattern**: str | re.Pattern - Regex pattern or list of regex patterns to match against
- _param_ **pattern**: str | re.Pattern | list[re.Pattern] - Regex pattern or list of regex patterns to match against

### `function` is_file_ready

`is_file_ready` returns True if a file has the modified time is within the last **interval_in_seconds** seconds, or if the size of the file has changed in that same timespan.
`is_file_ready` returns True if a file has the modified time is within the last **interval_in_seconds** seconds, or if the size of the file has changed in that same timespan; otherwise, it returns False.

- _param_ **file_path**: str - Path to file to watch
- _param_ **threshold_seconds**: int - Number of seconds to wait between checks, by default 0.1
- _param_ **threshold_seconds**: float - Number of seconds to wait between checks, by default 0.1

### `function` get_most_recent_modified_result

`get_most_recent_modified_result` returns a ScanResult object referencing the most recently modified file in a directory, or None if no files are found.

`get_most_recent_access_result` returns a ScanResult object referencing the most recently accessed file in a directory. Access time is updated when a file is read from or written to.
- _param_ **directory**: Path - Path to directory to watch

- _param_ **directory**: str - Path to directory to watch
### `function` filter_by_age

`filter_by_age` returns a list of files that have not been modified within the last **age_in_minutes** minutes.
`filter_by_age` returns a list[str] of file paths that have not been modified within the last **age_in_minutes** minutes.

- _param_ **scan_results**: list[ScanResult] - List of ScanResult objects
- _param_ **scan_results**: Iterable[ScanResult] - List of ScanResult objects
- _param_ **age_in_minutes**: int - Minimum age in minutes

`zip_directory` creates a zip file of a directory and its contents.
### `function` zip_directory

`zip_directory` creates a zip file of a directory and its contents.

- _param_ **directory**: str - Path to directory to zip
- _param_ **zip_file**: str - Path to zip file to create

`scan_for_finished_files` scans a directory, returning paths to files with a modified date older than the specified number of minutes
### `function` scan_for_finished_files

`scan_for_finished_files` scans a directory, returning paths to files with a modified date older than the specified number of minutes as a list[str].

- _param_ **directory**: str - Path to directory to scan
- _param_ **age_in_minutes**: int - Minimum age in minutes for files to be included in the results
- _param_ **pattern**: re.Pattern | list[re.Pattern] - Regex pattern to match files against; only files that match against at least one of the specified patterns will be included in results

#### Example Use Case
#### Example

You can use `scan_for_finished_files` to continuously scan a directory for files, uploading them to Ganymede Cloud for processing when they are older than a specified number of minutes. The Flow could query previously uploaded files using the [list_files_recursive](#function-list_files_recursive) method to avoid uploading the same file multiple times.

## Accessing Ganymede Cloud

### `function` read_sql_query

You can use `scan_for_finished_files` to continuously scan a directory for files, uploading them to Ganymede Cloud for processing when they are older than a specified number of minutes. The Flow could query previously uploaded files using the [list_files](../GanymedeClass.mdx#method-list_files) method to avoid uploading the same file multiple times.
`read_sql_query` returns a pandas DataFrame object containing the results of a SQL query run against the Ganymede DB.

## Querying Ganymede from Agent Code
- _param_ **sql_query**: str - SQL query to run

#### Example

```python
from agent_sdk.query import read_sql_query

df = read_sql_query('SELECT * FROM instrument_methods')
```

### Logging Methods
### `function` get_secret

`get_secret` returns the value of a secret stored in Ganymede.

- _param_ **secret_name**: str - Name of the secret to retrieve

## Logging Methods

Ganymede Agents (v4.9+) support user-defined logging messages in the `agent_sdk`, aligning with [logging level for Agent messages](../../app/agents/AgentLogs#logging-level). Each level corresponds with a separate method in agent_sdk.
Ganymede Agents (v5.0+) support user-defined logging messages in the `agent_sdk`, aligning with [logging level for Agent messages](../../app/agents/AgentLogs#logging-level). Each level corresponds with a separate method in agent_sdk.

```python
from agent_sdk import internal, debug, info, activity, error
Expand Down

0 comments on commit db51c12

Please sign in to comment.