Merge pull request #29 from boozallen/5-migrate-documentation-tranch-4

#5 📝 Tranche 4 of documentation migration
boozallen · May 1, 2024 · ac2dc3c · ac2dc3c
2 parents 8828ab4 + 8bea984
commit ac2dc3c
Show file tree

Hide file tree

Showing 9 changed files with 954 additions and 3 deletions.
diff --git a/docs/antora.yml b/docs/antora.yml
@@ -1,8 +1,8 @@
 name: aissemble
+title: aiSSEMBLE
 version: 1.7.0
 display_version: 1.7.0-SNAPSHOT
 prerelease: true
-title: aiSSEMBLE&trade;
 nav:
   - modules/ROOT/nav.adoc
 asciidoc:

diff --git a/docs/modules/ROOT/pages/alerting-details.adoc b/docs/modules/ROOT/pages/alerting-details.adoc
@@ -0,0 +1,136 @@
+= Alerting
+
+== Overview
+
+The purpose of alerting is to bring attention to significant events and issues that arise during execution of a pipeline
+by sending messages via email, Slack, etc. To simplify the incorporation of alerting, pre-constructed patterns have been
+developed and can be included in a https://github.com/boozallen/aissemble[Solution Baseline,role=external,window=_blank]
+project. This means there are only a few steps necessary to incorporate generated code for alerting purposes. This page
+is intended to explain the generated components that are included when alerting is enabled, and determining where to
+modify and customize elements to suit a specific implementation.
+
+== What Gets Generated
+Alerting is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default]
+for projects that have a pre-fab data delivery pipeline.
+
+[WARNING]
+Alerting is currently only available for Spark Data Delivery Pipelines and will be available for PySpark Data Delivery
+and Machine Learning Pipelines in a future version.
+
+=== Default Method for Sending Alerts
+When alerting is enabled, a few methods (outlined below) are generated in the base class of each step. These methods are
+called automatically upon step completion (whether successfully or with an exception) to send an alert. All of these
+methods have default logic, but can be customized by overriding the method in the step implementation class.
+
+****
+.sendAlerts
+[source,java]
+----
+/**
+ * Send an alert with a given status and message.
+ * Override this method to customize how messages are sent to the alerting framework.
+ */
+protected void sendAlert(Alert.Status status, String message)
+----
+_Parameters:_
+
+* `status` – the status of the alert
+* `message` – the message
+
+_Returns:_ None
+****
+
+****
+.getSuccessMessage
+[source,java]
+----
+/**
+ * Returns the message sent via alerting when the step completes successfully.
+ * Override this method to provide your own success message.
+ */
+protected String getSuccessMessage(Map<String, String> params)
+----
+
+_Parameters:_
+
+* `params` – map of parameters for the success message including the execution duration under the key timeToComplete
+
+_Returns:_ Success message with the action and the time to complete
+****
+
+****
+.getErrorMessage
+[source,java]
+----
+/**
+ * Returns the message sent via alerting when the step throws an exception.  Override this method to provide your own
+ * error message.
+ */
+protected String getErrorMessage(Exception e)
+----
+
+_Parameters:_
+
+* `e` – The exception that caused the step to fail
+
+_Returns:_ The detailed error message
+****
+
+== Configuring Your Alerting Service
+The Solution Baseline provides several integration options for alerting purposes.
+
+=== Alerting with Slack
+The default alerting implementation is Slack. To use Slack Alerting, follow the steps below:
+
+. Add the aiSSEMBLE(TM) Slack alerting dependency `extensions-alerting-slack` to the pipeline POM:
+[source,xml]
+----
+<dependencies>
+	...
+	<dependency>
+		<groupId>com.boozallen.aissemble</groupId>
+		<artifactId>extensions-alerting-slack</artifactId>
+	</dependency>
+	...
+</dependencies>
+----
+
+[start=2]
+. Add the SlackConsumer bean to the pipeline within the PipelinesCdiContext.java file
+
+[source,java]
+----
+public List<Class<?>> getCdiClassses() {
+	// Add any customer CDI classes here
+    ...
+	customBeans.add(SlackConsumer.class)
+
+	return customBeans;
+}
+
+----
+
+[start=3]
+. Create the slack-integration.properties in the following path:
+`<project>-docker/<project>-spark-worker-docker/src/main/resources/krausening/base/slack-integration.properties`
+
+=== Messaging Integration
+The default alerting implementation can be extended to publish the alerts to a Messaging topic. Adding a
+`microprofile-config.properties` file with the following configurations will enable the Messaging integration for the
+default Alert Producer:
+
+.<spark-data-delivery-pipeline>/src/main/resources/META-INF/microprofile-config.properties
+[source]
+----
+kafka.bootstrap.servers=kafka-cluster:9093 <1>
+
+mp.messaging.outgoing.alerts.connector=smallrye-kafka
+mp.messaging.outgoing.alerts.topic=kafka-alert-topic-name <2>
+mp.messaging.outgoing.alerts.key.serializer=org.apache.kafka.common.serialization.StringSerializer
+mp.messaging.outgoing.alerts.value.serializer=org.apache.kafka.common.serialization.StringSerializer
+----
+<1> The hostname and port of the Messaging server to connect to.
+<2> The name of the Messaging topic to publish the alerts to.
+
+Please see the https://smallrye.io/smallrye-reactive-messaging/latest/kafka/kafka[SmallRye
+documentation,role=external,window=_blank] on the Kafka connector for more configuration details.
diff --git a/docs/modules/ROOT/pages/bias-detection.adoc b/docs/modules/ROOT/pages/bias-detection.adoc
@@ -0,0 +1,9 @@
+[#_bias_detection]
+= Bias Detection
+
+Bias detection, also known as Ethical Artificial Intelligence (AI), is concerned with determining if an AI model
+systematically produces inaccurate results due to flawed assumptions. One contributing factor to model bias is the data
+it learns from. By driving bias detection from a semantic data model, consistent bias detection policies are applied
+throughout the data on the related field(s).
+
+Bias detection can be easily plugged into aiSSEMBLE.  Please contact the team for more information.
diff --git a/docs/modules/ROOT/pages/ci-cd.adoc b/docs/modules/ROOT/pages/ci-cd.adoc
@@ -0,0 +1,60 @@
+= Deploying the Project
+:source-highlighter: rouge
+
+AI/ML projects are generally built using scripts or notebooks, well suited for prototyping and simple implementations
+but lacking Software Development Lifecycle (SDLC) best practices such as unit/integration testing, peer reviews, and a
+consistent build process. aiSSEMBLE provides a structured approach for designing, developing, deploying, and monitoring
+AI/ML solutions to standardize delivery and drive consistency and reliability. A key component of this approach is
+automating the building, testing, and deployment of software through Continuous Integration and Continuous Delivery
+(CI/CD). The following outlines the deployment and delivery approach in aiSSEMBLE.
+
+== Deployment Artifacts
+aiSSEMBLE makes your project portable, scalable, and platform-agnostic by using Docker to create “images” which are
+blueprints for containers. https://docs.docker.com/build/[Docker,role=external,window=_blank] is a software platform
+designed to help developers build, share, and run modern applications. Docker is used in aiSSEMBLE to create portable
+software components packaged up for deployment in a containerized environment.
+
+Container orchestration is important for automating deployments. https://kubernetes.io/docs/home/[Kubernetes,role=external,window=_blank],
+also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized
+applications. aiSSEMBLE generates Kubernetes artifacts to ease the management and scalability of your project.
+
+Helm is used in aiSSEMBLE as the package management tool and template engine for Kubernetes. https://helm.sh/docs/[Helm,role=external,window=_blank]
+is a tool that streamlines installing and managing Kubernetes applications. Think of it like apt/yum/homebrew for
+Kubernetes. Helm packages and deploys aiSSEMBLE’s Kubernetes applications while also providing templating services that
+allows for easy modifications.
+
+== Deployment Infrastructure
+
+=== Local Deployment
+aiSSEMBLE’s framework enables rapid development and testing by ensuring local build and deployment processes are fast,
+alleviating the need for ad-hoc scripts and notebooks. To achieve this, your project needs the ability to be deployed in
+an environment where it can be easily stood up and torn down locally. In doing so, you ensure when you deploy your
+project to a higher environment, all the pieces work together cohesively, similar to how they would in production. The two
+necessary components you require to get to this state is a local Kubernetes environment and a local deployment tool for
+Kubernetes.
+
+The aiSSEMBLE team promotes the usage of https://docs.rancherdesktop.io/[Rancher Desktop,role=external,window=_blank]
+for the local Kubernetes environment and management tool. Rancher Desktop is a light-weight, user-friendly tool which
+comes packaged with critical tools such as Helm, Docker and Kubernetes. By deploying to a real Kubernetes environment,
+Rancher Desktop allows you to test integration points between the key components of your project.
+
+In order to ease testing in your local Kubernetes environment, there is a need for a simple Continuous Deployment (CD)
+tool that can deploy your entire project quickly. The aiSSEMBLE team encourages the usage of https://docs.tilt.dev/[Tilt,role=external,window=_blank]
+as your local CD for Kubernetes. By default, aiSSEMBLE will generate Tilt deployment files to get you started. Tilt can
+deploy your project (in its entirety or partially) with a single command and provides a user-friendly interface to
+monitor your container activity and logs. In addition, Tilt keeps the deployment up to date with the latest code changes
+with very little downtime.
+
+=== Remote Deployment
+Including continuous integration (CI) is a best practice for unit/integration testing and consistent builds. By default,
+aiSSEMBLE will include starter Jenkins CI pipelines for building, testing, packaging, and deploying your project.
+Jenkins is an open-source, automation, DevOps tool commonly used for CI.
+
+aiSSEMBLE enables standardized delivery and monitoring to drive consistency and reliability. ArgoCD is a CD tool which
+deploys and continuously monitors running applications and compares the current, live state against the desired target
+state. aiSSEMBLE promotes ArgoCD’s "app of apps" pattern in the Helm charts generated for your project.
+
+
+== Related Pages
+
+- xref:guides/guides-spark-job.adoc[]
diff --git a/docs/modules/ROOT/pages/data-access-details.adoc b/docs/modules/ROOT/pages/data-access-details.adoc
@@ -0,0 +1,151 @@
+= Data Access
+
+== Overview
+Data access is the process of exposing data to external consumers. aiSSEMBLE(TM) supports this through generated
+services and records.
+
+== What Gets Generated
+Data access is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default] for projects that include at least
+one record. When enabled, aiSSEMBLE generates a https://graphql.org/learn/[GraphQL,role=external,window=_blank] query
+service with endpoints for retrieving records from ingested datasets.
+
+|===
+|Generated file | Description
+
+|`<project>/<project>-pipelines/<project>-data-access/pom.xml`
+|Creates the Maven module that builds the generated query service.
+
+|`<project>/<project>-pipelines/<project>-data-access/src/main/resources/application.properties`
+|https://quarkus.io/guides/config[Quarkus,role=external,window=_blank] configuration of the query service.
+
+|`<project>/<project>-pipelines/<project>-data-access/src/main/java/com/test/DataAccessResource.java`
+|GraphQL resource that exposes the /graphql REST endpoint for data access requests.
+|===
+
+=== GraphQL API
+GraphQL queries are generated based on the record metamodel(s) in `<project>/<project>-pipeline-models/src/main/resources/records/`.
+By default, two queries are generated for each record metamodel: one for retrieving all the results from a table, and
+one for retrieving a limited number of results from a table. The methods that implement these queries can be found in
+`<project>/<project>-pipelines/<project>-data-access/src/generated/java/<user-defined-package>/DataAccessResourceBase.java`.
+These methods can be overridden, or new queries can be added by modifying
+`<project>/<project>-pipelines/<project>-data-access/src/main/java/<user-defined-package>/DataAccessResource.java`
+
+
+.GraphQL query to pull records from a given table:
+[source,json]
+----
+query auditList {
+    TaxPayer(table: delinquent_tax_payers)
+    {
+        id
+    }
+}
+----
+
+|===
+|Element | Element Type | Element Description
+
+|`auditList`
+|Operation name
+|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is
+simply based on your choosing.
+
+|`TaxPayer`
+|Query object
+|The type of record that you are pulling from data store. This name is derived from your record metamodel.
+
+|`delinquent_tax_payers`
+|Argument
+|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the
+name you specified in your step implementation.
+
+|`id` (String)
+|Variable
+|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.
+|===
+
+.GraphQL query to pull records from a given table with a limit:
+[source,json]
+----
+query auditList {
+    TaxPayerLimited(table: delinquent_tax_payers, limit: 10)
+    {
+        id
+    }
+}
+----
+
+|===
+|Element | Element Type | Element Description
+
+|`auditList`
+|Operation name
+|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is
+simply based on your choosing.
+
+|`TaxPayerLimited`
+|Query object
+|The type of record that you are pulling from data store. This name is derived from your record metamodel.
+
+|`delinquent_tax_payers`
+|Argument
+|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the name
+you specified in your step implementation.
+
+|`limit` (int)
+|Argument
+|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.
+
+|`id` (String)
+|Variable
+|Limit on how many records is to be returned from the query.
+|===
+
+To invoke the GraphQL query, you will need to do so via a REST API call.
+
+=== POST/graphql
+.Returns the records for the given GraphQL query.
+[%collapsible]
+====
+// .POST/graphql
+// Returns the records for the given GraphQL query.
+
+*Parameters*
+
+|===
+|*Name* | *Description*
+|`query`
+|https://graphql.org/learn/queries/[GraphQL query,role=external,window=_blank] executed to retrieve the data.
+|===
+
+*Return*
+[cols="1,1"]
+|===
+|`{record-name}` records.
+|List of records. The record will be based on your record metamodel.
+|===
+
+
+.Sample data input:
+[source,JSON]
+----
+{
+    "query": "{ ExampleDataLimited(table: \" example_table \", limit: 10) { id } }"
+}
+----
+
+.Sample data output:
+[source,JSON]
+----
+{
+    "data": {
+        "ExampleData": []
+    }
+}
+----
+====
+
+=== Deployment Artifacts
+Once a data access record has been defined, aiSSEMBLE will also generate deployment artifacts like Docker images,
+Kubernetes manifests, and Tilt configurations. For more information, see the
+xref:containers.adoc#_containers[Containers] page.