#5 📝 Tranche 4 of documentation migration

boozallen · May 1, 2024 · 14854ae · 14854ae
1 parent 8828ab4
commit 14854ae
Show file tree

Hide file tree

Showing 7 changed files with 923 additions and 0 deletions.
diff --git a/docs/modules/ROOT/pages/alerting-details.adoc b/docs/modules/ROOT/pages/alerting-details.adoc
@@ -0,0 +1,122 @@
+= Alerting
+
+== Overview
+
+The purpose of alerting is to bring attention to significant events and issues that arise during execution of a pipeline
+by sending messages via email, Slack, etc. To simplify the incorporation of alerting, pre-constructed patterns have been
+developed and can be included in a https://github.com/boozallen/aissemble[Solution Baseline,role=external,window=_blank]
+project. This means there are only a few steps necessary to incorporate generated code for alerting purposes. This page
+is intended to explain the generated components that are included when alerting is enabled, and determining where to
+modify and customize elements to suit a specific implementation.
+
+== What Gets Generated
+Alerting is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default]
+for projects that have a pre-fab data delivery pipeline.
+
+[WARNING]
+Alerting is currently only available for Spark Data Delivery Pipelines and will be available for PySpark Data Delivery
+and Machine Learning Pipelines in a future version.
+
+=== Default Method for Sending Alerts
+When alerting is enabled, a few methods (outlined below) are generated in the base class of each step. These methods are
+called automatically upon step completion (whether successfully or with an exception) to send an alert. All of these
+methods have default logic, but can be customized by overriding the method in the step implementation class.
+
+.sendAlerts
+[source]
+----
+protected void sendAlert(Alert.Status status, String message)
+
+Send an alert with a given status and message.
+Override this method to customize how messages are sent to the alerting framework.
+
+Parameters:
+status – the status of the alert
+
+message – the message
+----
+
+.getSuccessMessage
+[source]
+----
+protected String getSuccessMessage(Map<String, String> params)
+
+Returns the message sent via alerting when the step completes successfully.  Override this method to provide your own success message.
+Parameters:
+params – map of parameters for the success message including the execution duration under the key timeToComplete.
+
+Returns:
+Success message with the action and the time to complete.
+----
+
+.getErrorMessage
+[source]
+----
+protected String getErrorMessage(Exception e)
+
+Returns the message sent via alerting when the step throws an exception.  Override this method to provide your own error message.
+Parameters:
+e – The exception that caused the step to fail.
+
+Returns:
+The detailed error message.
+----
+
+== Configuring Your Alerting Service
+The Solution Baseline provides several integration options for alerting purposes.
+
+=== Alerting with Slack
+The default alerting implementation is Slack. To use Slack Alerting, follow the steps below:
+
+. Add the aiSSEMBLE Slack alerting dependency `extensions-alerting-slack` to the pipeline POM:
+[source,xml]
+----
+<dependencies>
+	...
+	<dependency>
+		<groupId>com.boozallen.aissemble</groupId>
+		<artifactId>extensions-alerting-slack</artifactId>
+	</dependency>
+	...
+</dependencies>
+----
+
+[start=2]
+. Add the SlackConsumer bean to the pipeline within the PipelinesCdiContext.java file
+
+[source,java]
+----
+public List<Class<?>> getCdiClassses() {
+	// Add any customer CDI classes here
+    ...
+	customBeans.add(SlackConsumer.class)
+
+	return customBeans;
+}
+
+----
+
+[start=3]
+. Create the slack-integration.properties in the following path:
+`<project>-docker/<project>-spark-worker-docker/src/main/resources/krausening/base/slack-integration.properties`
+
+=== Kafka Integration
+The default alerting implementation can be extended to publish the alerts to an Apache Kafka topic. Adding a
+`microprofile-config.properties` file with the following configurations will enable the Kafka integration for the
+default Alert Producer:
+
+.<spark-data-delivery-pipeline>/src/main/resources/META-INF/microprofile-config.properties
+[source]
+----
+kafka.bootstrap.servers=kafka-cluster:9093 <1>
+
+mp.messaging.outgoing.alerts.connector=smallrye-kafka
+mp.messaging.outgoing.alerts.topic=kafka-alert-topic-name <2>
+mp.messaging.outgoing.alerts.key.serializer=org.apache.kafka.common.serialization.StringSerializer
+mp.messaging.outgoing.alerts.value.serializer=org.apache.kafka.common.serialization.StringSerializer
+----
+<1> The hostname and port of the Kafka server to connect to.
+<2> The name of the Kafka topic to publish the alerts to.
+
+Please see the https://smallrye.io/smallrye-reactive-messaging/latest/kafka/kafka[SmallRye documentation,role=external,window=_blank]
+on the Kafka connector for more configuration details.
diff --git a/docs/modules/ROOT/pages/bias-detection.adoc b/docs/modules/ROOT/pages/bias-detection.adoc
@@ -0,0 +1,10 @@
+[#_bias_detection]
+= Bias Detection
+
+Bias detection, also known as Ethical Artificial Intelligence (AI), is concerned with determining if an AI model
+systematically produces inaccurate results due to flawed assumptions. One contributing factor to model bias is the data
+it learns from. By driving bias detection from a semantic data model, consistent bias detection policies are applied
+throughout the data on the related field(s).
+
+To implement bias detection within your project, please contact the https://stackoverflowteams.com/c/boozallensolutioncenter/questions[aiSSEMBLE team]
+for integration and implementation guidance.
diff --git a/docs/modules/ROOT/pages/ci-cd.adoc b/docs/modules/ROOT/pages/ci-cd.adoc
@@ -0,0 +1,60 @@
+= Deploying the Project
+:source-highlighter: rouge
+
+AI/ML projects are generally built using scripts or notebooks, well suited for prototyping and simple implementations
+but lacking Software Development Lifecycle (SDLC) best practices such as unit/integration testing, peer reviews, and a
+consistent build process. aiSSEMBLE provides a structured approach for designing, developing, deploying, and monitoring
+AI/ML solutions to standardize delivery and drive consistency and reliability. A key component of this approach is
+automating the building, testing, and deployment of software through Continuous Integration and Continuous Delivery
+(CI/CD). The following outlines the deployment and delivery approach in aiSSEMBLE.
+
+== Deployment Artifacts
+aiSSEMBLE makes your project portable, scalable, and platform-agnostic by using Docker to create “images” which are
+blueprints for containers. https://docs.docker.com/build/[Docker,role=external,window=_blank] is a software platform
+designed to help developers build, share, and run modern applications. Docker is used in aiSSEMBLE to create portable
+software components packaged up for deployment in a containerized environment.
+
+Container orchestration is important for automating deployments. https://kubernetes.io/docs/home/[Kubernetes,role=external,window=_blank],
+also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized
+applications. aiSSEMBLE generates Kubernetes artifacts to ease the management and scalability of your project.
+
+Helm is used in aiSSEMBLE as the package management tool and template engine for Kubernetes. https://helm.sh/docs/[Helm,role=external,window=_blank]
+is a tool that streamlines installing and managing Kubernetes applications. Think of it like apt/yum/homebrew for
+Kubernetes. Helm packages and deploys aiSSEMBLE’s Kubernetes applications while also providing templating services that
+allows for easy modifications.
+
+== Deployment Infrastructure
+
+=== Local Deployment
+aiSSEMBLE’s framework enables rapid development and testing by ensuring local build and deployment processes are fast,
+alleviating the need for ad-hoc scripts and notebooks. To achieve this, your project needs the ability to be deployed in
+an environment where it can be easily stood up and torn down locally. In doing so, you ensure when you deploy your
+project to a higher environment, all the pieces work together cohesively, similar to how they would in production. The two
+necessary components you require to get to this state is a local Kubernetes environment and a local deployment tool for
+Kubernetes.
+
+The aiSSEMBLE team promotes the usage of https://docs.rancherdesktop.io/[Rancher Desktop,role=external,window=_blank]
+for the local Kubernetes environment and management tool. Rancher Desktop is a light-weight, user-friendly tool which
+comes packaged with critical tools such as Helm, Docker and Kubernetes. By deploying to a real Kubernetes environment,
+Rancher Desktop allows you to test integration points between the key components of your project.
+
+In order to ease testing in your local Kubernetes environment, there is a need for a simple tool that can deploy your
+entire project quickly. The aiSSEMBLE team encourages the usage of https://docs.tilt.dev/[Tilt,role=external,window=_blank]
+as your local deployment tool for Kubernetes. By default, aiSSEMBLE will generate Tilt deployment files to get you
+started. Tilt can deploy your project (in its entirety or partially) with a single command and provides a user-friendly
+interface to monitor your container activity and logs. In addition, Tilt keeps the deployment up to date with the latest
+code changes with very little downtime.
+
+=== Remote Deployment
+Including continuous integration (CI) is a best practice for unit/integration testing and consistent builds. By default,
+aiSSEMBLE will include starter Jenkins CI pipelines for building, testing, packaging, and deploying your project.
+Jenkins is an open-source, automation, DevOps tool commonly used for CI.
+
+aiSSEMBLE enables standardized delivery and monitoring to drive consistency and reliability. ArgoCD is a tool which
+deploys and continuously monitors running applications and compares the current, live state against the desired target
+state. aiSSEMBLE promotes ArgoCD’s app of apps pattern in the Helm charts generated for your project.
+
+
+== Related Pages
+
+- xref:guides/guides-spark-job.adoc[]
diff --git a/docs/modules/ROOT/pages/data-access-details.adoc b/docs/modules/ROOT/pages/data-access-details.adoc
@@ -0,0 +1,149 @@
+= Data Access
+
+== Overview
+Data access is the process of exposing data to external consumers. aiSSEMBLE supports this through generated services
+and records.
+
+== What Gets Generated
+Data access is xref:pipeline-metamodel.adoc#_pipeline_metamodel[enabled by default] for projects that include at least
+one record. When enabled, aiSSEMBLE generates a https://graphql.org/learn/[GraphQL,role=external,window=_blank] query
+service with endpoints for retrieving records from ingested datasets.
+
+|===
+|Generated file | Description
+
+|`<project>/<project>-pipelines/<project>-data-access/pom.xml`
+|Creates the Maven module that builds the generated query service.
+
+|`<project>/<project>-pipelines/<project>-data-access/src/main/resources/application.properties`
+|https://quarkus.io/guides/config[Quarkus,role=external,window=_blank] configuration of the query service.
+
+|`<project>/<project>-pipelines/<project>-data-access/src/main/java/com/test/DataAccessResource.java`
+|GraphQL resource that exposes the /graphql REST endpoint for data access requests.
+|===
+
+=== GraphQL API
+GraphQL queries are generated based on the record metamodel(s) in `<project>/<project>-pipeline-models/src/main/resources/records/`.
+By default, two queries are generated for each record metamodel: one for retrieving all the results from a table, and
+one for retrieving a limited number of results from a table. The methods that implement these queries can be found in
+`<project>/<project>-pipelines/<project>-data-access/src/generated/java/<user-defined-package>/DataAccessResourceBase.java`.
+These methods can be overridden, or new queries can be added by modifying `<project>/<project>-pipelines/<project>-data-access/src/main/java/<user-defined-package>/DataAccessResource.java`
+
+
+.GraphQL query to pull records from a given table:
+[source,json]
+----
+query auditList {
+    TaxPayer(table: delinquent_tax_payers)
+    {
+        id
+    }
+}
+----
+
+|===
+|Element | Element Type | Element Description
+
+|auditList
+|Operation name
+|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is simply based on your choosing.
+
+|TaxPayer
+|Query object
+|The type of record that you are pulling from data store. This name is derived from your record metamodel.
+
+|delinquent_tax_payers
+|Argument
+|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the
+name you specified in your step implementation.
+
+|id (String)
+|Variable
+|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.
+|===
+
+.GraphQL query to pull records from a given table with a limit:
+[source,json]
+----
+query auditList {
+    TaxPayerLimited(table: delinquent_tax_payers, limit: 10)
+    {
+        id
+    }
+}
+----
+
+|===
+|Element | Element Type | Element Description
+
+|auditList
+|Operation name
+|Name of the query. The name assigned to this operation has no correlation to the pipeline or metamodel, this is simply based on your choosing.
+
+|TaxPayerLimited
+|Query object
+|The type of record that you are pulling from data store. This name is derived from your record metamodel.
+
+|delinquent_tax_payers
+|Argument
+|Name of the table being queried. In the execution of the data pipeline, your records are stored in a table with the name
+you specified in your step implementation.
+
+|limit (int)
+|Argument
+|Field from the record type being returned. The available fields correspond with the fields within your record metamodel.
+
+|id (String)
+|Variable
+|Limit on how many records is to be returned from the query.
+|===
+
+To invoke the GraphQL query, you will need to do so via a REST API call.
+
+=== POST/graphql
+.Returns the records for the given GraphQL query.
+[%collapsible]
+====
+// .POST/graphql
+****
+// Returns the records for the given GraphQL query.
+
+*Parameters*
+
+|===
+|*Name* | *Description*
+|query
+|https://graphql.org/learn/queries/[GraphQL query,role=external,window=_blank] executed to retrieve the data.
+|===
+
+*Return*
+[cols="1,1"]
+|===
+|{record-name} records.
+|List of records. The record will be based on your record metamodel.
+|===
+
+
+.Sample data input:
+[source,JSON]
+----
+{
+    "query": "{ ExampleDataLimited(table: \" example_table \", limit: 10) { id } }"
+}
+----
+
+.Sample data output:
+[source,JSON]
+----
+{
+    "data": {
+        "ExampleData": []
+    }
+}
+----
+****
+====
+
+=== Deployment Artifacts
+Once a data access record has been defined, aiSSEMBLE will also generate deployment artifacts like Docker images,
+Kubernetes manifests, and Tilt configurations. For more information, see the xref:containers.adoc#_containers[Containers] page.