Skip to content

Commit

Permalink
#5 📝 Tranche 4 of documentation migration
Browse files Browse the repository at this point in the history
  • Loading branch information
d-ryan-ashcraft committed May 1, 2024
1 parent 8828ab4 commit 14854ae
Show file tree
Hide file tree
Showing 7 changed files with 923 additions and 0 deletions.
122 changes: 122 additions & 0 deletions docs/modules/ROOT/pages/alerting-details.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
= Alerting

== Overview

The purpose of alerting is to bring attention to significant events and issues that arise during execution of a pipeline
by sending messages via email, Slack, etc. To simplify the incorporation of alerting, pre-constructed patterns have been
developed and can be included in a https://github.com/boozallen/aissemble[Solution Baseline,role=external,window=_blank]
project. This means there are only a few steps necessary to incorporate generated code for alerting purposes. This page
is intended to explain the generated components that are included when alerting is enabled, and determining where to
modify and customize elements to suit a specific implementation.

== What Gets Generated
Alerting is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default]
for projects that have a pre-fab data delivery pipeline.

[WARNING]
Alerting is currently only available for Spark Data Delivery Pipelines and will be available for PySpark Data Delivery
and Machine Learning Pipelines in a future version.

=== Default Method for Sending Alerts
When alerting is enabled, a few methods (outlined below) are generated in the base class of each step. These methods are
called automatically upon step completion (whether successfully or with an exception) to send an alert. All of these
methods have default logic, but can be customized by overriding the method in the step implementation class.

.sendAlerts
[source]
----
protected void sendAlert(Alert.Status status, String message)
Send an alert with a given status and message.
Override this method to customize how messages are sent to the alerting framework.
Parameters:
status – the status of the alert
message – the message
----

.getSuccessMessage
[source]
----
protected String getSuccessMessage(Map<String, String> params)
Returns the message sent via alerting when the step completes successfully. Override this method to provide your own success message.
Parameters:
params – map of parameters for the success message including the execution duration under the key timeToComplete.
Returns:
Success message with the action and the time to complete.
----

.getErrorMessage
[source]
----
protected String getErrorMessage(Exception e)
Returns the message sent via alerting when the step throws an exception. Override this method to provide your own error message.
Parameters:
e – The exception that caused the step to fail.
Returns:
The detailed error message.
----

== Configuring Your Alerting Service
The Solution Baseline provides several integration options for alerting purposes.

=== Alerting with Slack
The default alerting implementation is Slack. To use Slack Alerting, follow the steps below:

. Add the aiSSEMBLE Slack alerting dependency `extensions-alerting-slack` to the pipeline POM:
[source,xml]
----
<dependencies>
...
<dependency>
<groupId>com.boozallen.aissemble</groupId>
<artifactId>extensions-alerting-slack</artifactId>
</dependency>
...
</dependencies>
----

[start=2]
. Add the SlackConsumer bean to the pipeline within the PipelinesCdiContext.java file

[source,java]
----
public List<Class<?>> getCdiClassses() {
// Add any customer CDI classes here
...
customBeans.add(SlackConsumer.class)
return customBeans;
}
----

[start=3]
. Create the slack-integration.properties in the following path:
`<project>-docker/<project>-spark-worker-docker/src/main/resources/krausening/base/slack-integration.properties`

=== Kafka Integration
The default alerting implementation can be extended to publish the alerts to an Apache Kafka topic. Adding a
`microprofile-config.properties` file with the following configurations will enable the Kafka integration for the
default Alert Producer:

.<spark-data-delivery-pipeline>/src/main/resources/META-INF/microprofile-config.properties
[source]
----
kafka.bootstrap.servers=kafka-cluster:9093 <1>
mp.messaging.outgoing.alerts.connector=smallrye-kafka
mp.messaging.outgoing.alerts.topic=kafka-alert-topic-name <2>
mp.messaging.outgoing.alerts.key.serializer=org.apache.kafka.common.serialization.StringSerializer
mp.messaging.outgoing.alerts.value.serializer=org.apache.kafka.common.serialization.StringSerializer
----
<1> The hostname and port of the Kafka server to connect to.
<2> The name of the Kafka topic to publish the alerts to.

Please see the https://smallrye.io/smallrye-reactive-messaging/latest/kafka/kafka[SmallRye documentation,role=external,window=_blank]
on the Kafka connector for more configuration details.
10 changes: 10 additions & 0 deletions docs/modules/ROOT/pages/bias-detection.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[#_bias_detection]
= Bias Detection
Bias detection, also known as Ethical Artificial Intelligence (AI), is concerned with determining if an AI model
systematically produces inaccurate results due to flawed assumptions. One contributing factor to model bias is the data
it learns from. By driving bias detection from a semantic data model, consistent bias detection policies are applied
throughout the data on the related field(s).
To implement bias detection within your project, please contact the https://stackoverflowteams.com/c/boozallensolutioncenter/questions[aiSSEMBLE team]
for integration and implementation guidance.
60 changes: 60 additions & 0 deletions docs/modules/ROOT/pages/ci-cd.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
= Deploying the Project
:source-highlighter: rouge

AI/ML projects are generally built using scripts or notebooks, well suited for prototyping and simple implementations
but lacking Software Development Lifecycle (SDLC) best practices such as unit/integration testing, peer reviews, and a
consistent build process. aiSSEMBLE provides a structured approach for designing, developing, deploying, and monitoring
AI/ML solutions to standardize delivery and drive consistency and reliability. A key component of this approach is
automating the building, testing, and deployment of software through Continuous Integration and Continuous Delivery
(CI/CD). The following outlines the deployment and delivery approach in aiSSEMBLE.

== Deployment Artifacts
aiSSEMBLE makes your project portable, scalable, and platform-agnostic by using Docker to create “images” which are
blueprints for containers. https://docs.docker.com/build/[Docker,role=external,window=_blank] is a software platform
designed to help developers build, share, and run modern applications. Docker is used in aiSSEMBLE to create portable
software components packaged up for deployment in a containerized environment.

Container orchestration is important for automating deployments. https://kubernetes.io/docs/home/[Kubernetes,role=external,window=_blank],
also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized
applications. aiSSEMBLE generates Kubernetes artifacts to ease the management and scalability of your project.

Helm is used in aiSSEMBLE as the package management tool and template engine for Kubernetes. https://helm.sh/docs/[Helm,role=external,window=_blank]
is a tool that streamlines installing and managing Kubernetes applications. Think of it like apt/yum/homebrew for
Kubernetes. Helm packages and deploys aiSSEMBLE’s Kubernetes applications while also providing templating services that
allows for easy modifications.

== Deployment Infrastructure

=== Local Deployment
aiSSEMBLE’s framework enables rapid development and testing by ensuring local build and deployment processes are fast,
alleviating the need for ad-hoc scripts and notebooks. To achieve this, your project needs the ability to be deployed in
an environment where it can be easily stood up and torn down locally. In doing so, you ensure when you deploy your
project to a higher environment, all the pieces work together cohesively, similar to how they would in production. The two
necessary components you require to get to this state is a local Kubernetes environment and a local deployment tool for
Kubernetes.

The aiSSEMBLE team promotes the usage of https://docs.rancherdesktop.io/[Rancher Desktop,role=external,window=_blank]
for the local Kubernetes environment and management tool. Rancher Desktop is a light-weight, user-friendly tool which
comes packaged with critical tools such as Helm, Docker and Kubernetes. By deploying to a real Kubernetes environment,
Rancher Desktop allows you to test integration points between the key components of your project.

In order to ease testing in your local Kubernetes environment, there is a need for a simple tool that can deploy your
entire project quickly. The aiSSEMBLE team encourages the usage of https://docs.tilt.dev/[Tilt,role=external,window=_blank]
as your local deployment tool for Kubernetes. By default, aiSSEMBLE will generate Tilt deployment files to get you
started. Tilt can deploy your project (in its entirety or partially) with a single command and provides a user-friendly
interface to monitor your container activity and logs. In addition, Tilt keeps the deployment up to date with the latest
code changes with very little downtime.

=== Remote Deployment
Including continuous integration (CI) is a best practice for unit/integration testing and consistent builds. By default,
aiSSEMBLE will include starter Jenkins CI pipelines for building, testing, packaging, and deploying your project.
Jenkins is an open-source, automation, DevOps tool commonly used for CI.

aiSSEMBLE enables standardized delivery and monitoring to drive consistency and reliability. ArgoCD is a tool which
deploys and continuously monitors running applications and compares the current, live state against the desired target
state. aiSSEMBLE promotes ArgoCD’s app of apps pattern in the Helm charts generated for your project.


== Related Pages

- xref:guides/guides-spark-job.adoc[]
149 changes: 149 additions & 0 deletions docs/modules/ROOT/pages/data-access-details.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
= Data Access

== Overview
Data access is the process of exposing data to external consumers. aiSSEMBLE supports this through generated services
and records.

== What Gets Generated
Data access is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default] for projects that include at least
one record. When enabled, aiSSEMBLE generates a https://graphql.org/learn/[GraphQL,role=external,window=_blank] query
service with endpoints for retrieving records from ingested datasets.

|===
|Generated file | Description

|`<project>/<project>-pipelines/<project>-data-access/pom.xml`
|Creates the Maven module that builds the generated query service.

|`<project>/<project>-pipelines/<project>-data-access/src/main/resources/application.properties`
|https://quarkus.io/guides/config[Quarkus,role=external,window=_blank] configuration of the query service.

|`<project>/<project>-pipelines/<project>-data-access/src/main/java/com/test/DataAccessResource.java`
|GraphQL resource that exposes the /graphql REST endpoint for data access requests.
|===

=== GraphQL API
GraphQL queries are generated based on the record metamodel(s) in `<project>/<project>-pipeline-models/src/main/resources/records/`.
By default, two queries are generated for each record metamodel: one for retrieving all the results from a table, and
one for retrieving a limited number of results from a table. The methods that implement these queries can be found in
`<project>/<project>-pipelines/<project>-data-access/src/generated/java/<user-defined-package>/DataAccessResourceBase.java`.
These methods can be overridden, or new queries can be added by modifying `<project>/<project>-pipelines/<project>-data-access/src/main/java/<user-defined-package>/DataAccessResource.java`


.GraphQL query to pull records from a given table:
[source,json]
----
query auditList {
TaxPayer(table: delinquent_tax_payers)
{
id
}
}
----

|===
|Element | Element Type | Element Description

|auditList
|Operation name
|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is simply based on your choosing.

|TaxPayer
|Query object
|The type of record that you are pulling from data store. This name is derived from your record metamodel.

|delinquent_tax_payers
|Argument
|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the
name you specified in your step implementation.

|id (String)
|Variable
|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.
|===

.GraphQL query to pull records from a given table with a limit:
[source,json]
----
query auditList {
TaxPayerLimited(table: delinquent_tax_payers, limit: 10)
{
id
}
}
----

|===
|Element | Element Type | Element Description

|auditList
|Operation name
|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is simply based on your choosing.

|TaxPayerLimited
|Query object
|The type of record that you are pulling from data store. This name is derived from your record metamodel.

|delinquent_tax_payers
|Argument
|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the name
you specified in your step implementation.

|limit (int)
|Argument
|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.

|id (String)
|Variable
|Limit on how many records is to be returned from the query.
|===

To invoke the GraphQL query, you will need to do so via a REST API call.

=== POST/graphql
.Returns the records for the given GraphQL query.
[%collapsible]
====
// .POST/graphql
****
// Returns the records for the given GraphQL query.
*Parameters*
|===
|*Name* | *Description*
|query
|https://graphql.org/learn/queries/[GraphQL query,role=external,window=_blank] executed to retrieve the data.
|===
*Return*
[cols="1,1"]
|===
|{record-name} records.
|List of records. The record will be based on your record metamodel.
|===
.Sample data input:
[source,JSON]
----
{
"query": "{ ExampleDataLimited(table: \" example_table \", limit: 10) { id } }"
}
----
.Sample data output:
[source,JSON]
----
{
"data": {
"ExampleData": []
}
}
----
****
====

=== Deployment Artifacts
Once a data access record has been defined, aiSSEMBLE will also generate deployment artifacts like Docker images,
Kubernetes manifests, and Tilt configurations. For more information, see the xref:containers.adoc#_containers[Containers] page.
Loading

0 comments on commit 14854ae

Please sign in to comment.