Skip to content

Commit

Permalink
Merge pull request #29 from boozallen/5-migrate-documentation-tranch-4
Browse files Browse the repository at this point in the history
#5 📝 Tranche 4 of documentation migration
  • Loading branch information
d-ryan-ashcraft authored May 1, 2024
2 parents 8828ab4 + 8bea984 commit ac2dc3c
Show file tree
Hide file tree
Showing 9 changed files with 954 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/antora.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: aissemble
title: aiSSEMBLE
version: 1.7.0
display_version: 1.7.0-SNAPSHOT
prerelease: true
title: aiSSEMBLE™
nav:
- modules/ROOT/nav.adoc
asciidoc:
Expand Down
136 changes: 136 additions & 0 deletions docs/modules/ROOT/pages/alerting-details.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
= Alerting

== Overview

The purpose of alerting is to bring attention to significant events and issues that arise during execution of a pipeline
by sending messages via email, Slack, etc. To simplify the incorporation of alerting, pre-constructed patterns have been
developed and can be included in a https://github.com/boozallen/aissemble[Solution Baseline,role=external,window=_blank]
project. This means there are only a few steps necessary to incorporate generated code for alerting purposes. This page
is intended to explain the generated components that are included when alerting is enabled, and determining where to
modify and customize elements to suit a specific implementation.

== What Gets Generated
Alerting is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default]
for projects that have a pre-fab data delivery pipeline.

[WARNING]
Alerting is currently only available for Spark Data Delivery Pipelines and will be available for PySpark Data Delivery
and Machine Learning Pipelines in a future version.

=== Default Method for Sending Alerts
When alerting is enabled, a few methods (outlined below) are generated in the base class of each step. These methods are
called automatically upon step completion (whether successfully or with an exception) to send an alert. All of these
methods have default logic, but can be customized by overriding the method in the step implementation class.

****
.sendAlerts
[source,java]
----
/**
* Send an alert with a given status and message.
* Override this method to customize how messages are sent to the alerting framework.
*/
protected void sendAlert(Alert.Status status, String message)
----
_Parameters:_
* `status` – the status of the alert
* `message` – the message
_Returns:_ None
****

****
.getSuccessMessage
[source,java]
----
/**
* Returns the message sent via alerting when the step completes successfully.
* Override this method to provide your own success message.
*/
protected String getSuccessMessage(Map<String, String> params)
----
_Parameters:_
* `params` – map of parameters for the success message including the execution duration under the key timeToComplete
_Returns:_ Success message with the action and the time to complete
****

****
.getErrorMessage
[source,java]
----
/**
* Returns the message sent via alerting when the step throws an exception. Override this method to provide your own
* error message.
*/
protected String getErrorMessage(Exception e)
----
_Parameters:_
* `e` – The exception that caused the step to fail
_Returns:_ The detailed error message
****

== Configuring Your Alerting Service
The Solution Baseline provides several integration options for alerting purposes.

=== Alerting with Slack
The default alerting implementation is Slack. To use Slack Alerting, follow the steps below:

. Add the aiSSEMBLE(TM) Slack alerting dependency `extensions-alerting-slack` to the pipeline POM:
[source,xml]
----
<dependencies>
...
<dependency>
<groupId>com.boozallen.aissemble</groupId>
<artifactId>extensions-alerting-slack</artifactId>
</dependency>
...
</dependencies>
----

[start=2]
. Add the SlackConsumer bean to the pipeline within the PipelinesCdiContext.java file

[source,java]
----
public List<Class<?>> getCdiClassses() {
// Add any customer CDI classes here
...
customBeans.add(SlackConsumer.class)
return customBeans;
}
----

[start=3]
. Create the slack-integration.properties in the following path:
`<project>-docker/<project>-spark-worker-docker/src/main/resources/krausening/base/slack-integration.properties`

=== Messaging Integration
The default alerting implementation can be extended to publish the alerts to a Messaging topic. Adding a
`microprofile-config.properties` file with the following configurations will enable the Messaging integration for the
default Alert Producer:

.<spark-data-delivery-pipeline>/src/main/resources/META-INF/microprofile-config.properties
[source]
----
kafka.bootstrap.servers=kafka-cluster:9093 <1>
mp.messaging.outgoing.alerts.connector=smallrye-kafka
mp.messaging.outgoing.alerts.topic=kafka-alert-topic-name <2>
mp.messaging.outgoing.alerts.key.serializer=org.apache.kafka.common.serialization.StringSerializer
mp.messaging.outgoing.alerts.value.serializer=org.apache.kafka.common.serialization.StringSerializer
----
<1> The hostname and port of the Messaging server to connect to.
<2> The name of the Messaging topic to publish the alerts to.

Please see the https://smallrye.io/smallrye-reactive-messaging/latest/kafka/kafka[SmallRye
documentation,role=external,window=_blank] on the Kafka connector for more configuration details.
9 changes: 9 additions & 0 deletions docs/modules/ROOT/pages/bias-detection.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[#_bias_detection]
= Bias Detection
Bias detection, also known as Ethical Artificial Intelligence (AI), is concerned with determining if an AI model
systematically produces inaccurate results due to flawed assumptions. One contributing factor to model bias is the data
it learns from. By driving bias detection from a semantic data model, consistent bias detection policies are applied
throughout the data on the related field(s).
Bias detection can be easily plugged into aiSSEMBLE. Please contact the team for more information.
60 changes: 60 additions & 0 deletions docs/modules/ROOT/pages/ci-cd.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
= Deploying the Project
:source-highlighter: rouge

AI/ML projects are generally built using scripts or notebooks, well suited for prototyping and simple implementations
but lacking Software Development Lifecycle (SDLC) best practices such as unit/integration testing, peer reviews, and a
consistent build process. aiSSEMBLE provides a structured approach for designing, developing, deploying, and monitoring
AI/ML solutions to standardize delivery and drive consistency and reliability. A key component of this approach is
automating the building, testing, and deployment of software through Continuous Integration and Continuous Delivery
(CI/CD). The following outlines the deployment and delivery approach in aiSSEMBLE.

== Deployment Artifacts
aiSSEMBLE makes your project portable, scalable, and platform-agnostic by using Docker to create “images” which are
blueprints for containers. https://docs.docker.com/build/[Docker,role=external,window=_blank] is a software platform
designed to help developers build, share, and run modern applications. Docker is used in aiSSEMBLE to create portable
software components packaged up for deployment in a containerized environment.

Container orchestration is important for automating deployments. https://kubernetes.io/docs/home/[Kubernetes,role=external,window=_blank],
also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized
applications. aiSSEMBLE generates Kubernetes artifacts to ease the management and scalability of your project.

Helm is used in aiSSEMBLE as the package management tool and template engine for Kubernetes. https://helm.sh/docs/[Helm,role=external,window=_blank]
is a tool that streamlines installing and managing Kubernetes applications. Think of it like apt/yum/homebrew for
Kubernetes. Helm packages and deploys aiSSEMBLE’s Kubernetes applications while also providing templating services that
allows for easy modifications.

== Deployment Infrastructure

=== Local Deployment
aiSSEMBLE’s framework enables rapid development and testing by ensuring local build and deployment processes are fast,
alleviating the need for ad-hoc scripts and notebooks. To achieve this, your project needs the ability to be deployed in
an environment where it can be easily stood up and torn down locally. In doing so, you ensure when you deploy your
project to a higher environment, all the pieces work together cohesively, similar to how they would in production. The two
necessary components you require to get to this state is a local Kubernetes environment and a local deployment tool for
Kubernetes.

The aiSSEMBLE team promotes the usage of https://docs.rancherdesktop.io/[Rancher Desktop,role=external,window=_blank]
for the local Kubernetes environment and management tool. Rancher Desktop is a light-weight, user-friendly tool which
comes packaged with critical tools such as Helm, Docker and Kubernetes. By deploying to a real Kubernetes environment,
Rancher Desktop allows you to test integration points between the key components of your project.

In order to ease testing in your local Kubernetes environment, there is a need for a simple Continuous Deployment (CD)
tool that can deploy your entire project quickly. The aiSSEMBLE team encourages the usage of https://docs.tilt.dev/[Tilt,role=external,window=_blank]
as your local CD for Kubernetes. By default, aiSSEMBLE will generate Tilt deployment files to get you started. Tilt can
deploy your project (in its entirety or partially) with a single command and provides a user-friendly interface to
monitor your container activity and logs. In addition, Tilt keeps the deployment up to date with the latest code changes
with very little downtime.

=== Remote Deployment
Including continuous integration (CI) is a best practice for unit/integration testing and consistent builds. By default,
aiSSEMBLE will include starter Jenkins CI pipelines for building, testing, packaging, and deploying your project.
Jenkins is an open-source, automation, DevOps tool commonly used for CI.

aiSSEMBLE enables standardized delivery and monitoring to drive consistency and reliability. ArgoCD is a CD tool which
deploys and continuously monitors running applications and compares the current, live state against the desired target
state. aiSSEMBLE promotes ArgoCD’s "app of apps" pattern in the Helm charts generated for your project.


== Related Pages

- xref:guides/guides-spark-job.adoc[]
151 changes: 151 additions & 0 deletions docs/modules/ROOT/pages/data-access-details.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
= Data Access

== Overview
Data access is the process of exposing data to external consumers. aiSSEMBLE(TM) supports this through generated
services and records.

== What Gets Generated
Data access is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default] for projects that include at least
one record. When enabled, aiSSEMBLE generates a https://graphql.org/learn/[GraphQL,role=external,window=_blank] query
service with endpoints for retrieving records from ingested datasets.

|===
|Generated file | Description

|`<project>/<project>-pipelines/<project>-data-access/pom.xml`
|Creates the Maven module that builds the generated query service.

|`<project>/<project>-pipelines/<project>-data-access/src/main/resources/application.properties`
|https://quarkus.io/guides/config[Quarkus,role=external,window=_blank] configuration of the query service.

|`<project>/<project>-pipelines/<project>-data-access/src/main/java/com/test/DataAccessResource.java`
|GraphQL resource that exposes the /graphql REST endpoint for data access requests.
|===

=== GraphQL API
GraphQL queries are generated based on the record metamodel(s) in `<project>/<project>-pipeline-models/src/main/resources/records/`.
By default, two queries are generated for each record metamodel: one for retrieving all the results from a table, and
one for retrieving a limited number of results from a table. The methods that implement these queries can be found in
`<project>/<project>-pipelines/<project>-data-access/src/generated/java/<user-defined-package>/DataAccessResourceBase.java`.
These methods can be overridden, or new queries can be added by modifying
`<project>/<project>-pipelines/<project>-data-access/src/main/java/<user-defined-package>/DataAccessResource.java`


.GraphQL query to pull records from a given table:
[source,json]
----
query auditList {
TaxPayer(table: delinquent_tax_payers)
{
id
}
}
----

|===
|Element | Element Type | Element Description

|`auditList`
|Operation name
|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is
simply based on your choosing.

|`TaxPayer`
|Query object
|The type of record that you are pulling from data store. This name is derived from your record metamodel.

|`delinquent_tax_payers`
|Argument
|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the
name you specified in your step implementation.

|`id` (String)
|Variable
|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.
|===

.GraphQL query to pull records from a given table with a limit:
[source,json]
----
query auditList {
TaxPayerLimited(table: delinquent_tax_payers, limit: 10)
{
id
}
}
----

|===
|Element | Element Type | Element Description

|`auditList`
|Operation name
|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is
simply based on your choosing.

|`TaxPayerLimited`
|Query object
|The type of record that you are pulling from data store. This name is derived from your record metamodel.

|`delinquent_tax_payers`
|Argument
|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the name
you specified in your step implementation.

|`limit` (int)
|Argument
|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.

|`id` (String)
|Variable
|Limit on how many records is to be returned from the query.
|===

To invoke the GraphQL query, you will need to do so via a REST API call.

=== POST/graphql
.Returns the records for the given GraphQL query.
[%collapsible]
====
// .POST/graphql
// Returns the records for the given GraphQL query.
*Parameters*
|===
|*Name* | *Description*
|`query`
|https://graphql.org/learn/queries/[GraphQL query,role=external,window=_blank] executed to retrieve the data.
|===
*Return*
[cols="1,1"]
|===
|`{record-name}` records.
|List of records. The record will be based on your record metamodel.
|===
.Sample data input:
[source,JSON]
----
{
"query": "{ ExampleDataLimited(table: \" example_table \", limit: 10) { id } }"
}
----
.Sample data output:
[source,JSON]
----
{
"data": {
"ExampleData": []
}
}
----
====

=== Deployment Artifacts
Once a data access record has been defined, aiSSEMBLE will also generate deployment artifacts like Docker images,
Kubernetes manifests, and Tilt configurations. For more information, see the
xref:containers.adoc#_containers[Containers] page.
Loading

0 comments on commit ac2dc3c

Please sign in to comment.