Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DMP 2024] Generate a project.yaml file from a list of steps #619

Open
christad92 opened this issue Mar 4, 2024 · 14 comments
Open

[DMP 2024] Generate a project.yaml file from a list of steps #619

christad92 opened this issue Mar 4, 2024 · 14 comments
Labels
DMP 2024 Submission for DMP

Comments

@christad92
Copy link

christad92 commented Mar 4, 2024

Overview

We are looking to integrate an AI agent that is able to generate a workflow for a user, based on an instructions specifying the required steps and adaptors.

This feature will enable new users of OpenFn (Lightning) to get started faster after registration by entering the steps they need.

The agent will automatically build a workflow can run without error and ready for editing.

Deliverables

  • A new service in the apollo rep called gen_project
  • The service should take a series of steps as an input, and return a workflow definition as an output
  • The workflow can optionally be defined as a project.yaml or workflow.json file
  • A demo script which allows the project to be imported to lightning.

To be clear, this work does NOT include any integration on the Lightning side.

Extensions to this work include:

  • Generating job expressions to implement the workflow (this likely leverages existing apollo functionality)

Sample Inputs

Below are examples of prompts and their corresponding workflow.yaml for reference and testing.
Prompt 1

  • Get Data from DHIS2
  • Filter out children under 2
  • Aggregate the data
  • Make a comment on Asana
  Workflow-1:
    name: Simple workflow
    jobs:
      Get-data-from-DHIS2:
        name: Get data from DHIS2
        adaptor: '@openfn/language-dhis2@latest'
        # credential:
        # globals:
        body: |

      Filter-out-children-under-2:
        name: Filter out children under 2
        adaptor: '@openfn/language-common@latest'
        # credential:
        # globals:
        body: |

      Aggregate-data-based-on-gender:
        name: Aggregate data based on gender
        adaptor: '@openfn/language-common@latest'
        # credential:
        # globals:
        body: |
    
      make-a-comment-on-Asana:
        name: make a comment on Asana
        adaptor: '@openfn/language-asana@latest'
        # credential:
        # globals:
        body: |
     
    triggers:
      webhook:
        type: webhook
        enabled: true
    edges:
      webhook->Get-data-from-DHIS2:
        source_trigger: webhook
        target_job: Get-data-from-DHIS2
        condition_type: always
        enabled: true
      Get-data-from-DHIS2->Filter-out-children-under-2:
        source_job: Get-data-from-DHIS2
        target_job: Filter-out-children-under-2
        condition_type: on_job_success
        enabled: true
      Filter-out-children-under-2->Aggregate-data-based-on-gender:
        source_job: Filter-out-children-under-2
        target_job: Aggregate-data-based-on-gender
        condition_type: on_job_success
        enabled: true
      Aggregate-data-based-on-gender->make-a-comment-on-Asana:
        source_job: Aggregate-data-based-on-gender
        target_job: make-a-comment-on-Asana
        condition_type: on_job_success
        enabled: true

Prompt 2:

  • Fetch submissions from KoboCollect with language-kobotoolbox@latest
  • Push the data to the a postgresSQL database with language-postgresql@latest
  • Send text message to an admin using [email protected] with status of sent message
workflow-1:
    name: another simple workflow
    jobs:
      Fetch-submissions-from-KoboCollect:
        name: Fetch submissions from KoboCollect
        adaptor: '@openfn/language-kobotoolbox@latest'
        # credential:
        # globals:
        body: |
          // Get started by adding operations from the API reference
          
      Push-Data-to-PostgreSQL:
        name: Push Data to PostgreSQL
        adaptor: '@openfn/language-postgresql@latest'
        # credential:
        # globals:
        body: |
          // Get started by adding operations from the API reference
          
      Send-a-text-message-to-admin:
        name: Send a text message to admin
        adaptor: '@openfn/[email protected]'
        # credential:
        # globals:
        body: |
          // Get started by adding operations from the API reference
          
    triggers:
      webhook:
        type: webhook
        enabled: true
    edges:
      webhook->Fetch-submissions-from-KoboCollect:
        source_trigger: webhook
        target_job: Fetch-submissions-from-KoboCollect
        condition_type: always
        enabled: true
      Fetch-submissions-from-KoboCollect->Push-Data-to-PostgreSQL:
        source_job: Fetch-submissions-from-KoboCollect
        target_job: Push-Data-to-PostgreSQL
        condition_type: on_job_success
        enabled: true
      Push-Data-to-PostgreSQL->Send-a-text-message-to-admin:
        source_job: Push-Data-to-PostgreSQL
        target_job: Send-a-text-message-to-admin
        condition_type: on_job_success
        enabled: true

Demo script

To validate the work, we would like a demo script which lets us easily test various inputs and see them visualised in Lightning.

The demo should be executable from a terminal. It could be a python or bash script. It should not be integrated with the the OpenFn CLI.

The demo should do the following:

  • Read in a list of steps from a file, where each line of the file is one step
  • Call out to the apollo service (locally, by default, but can take a URL) to generate a project.yaml
  • Generate a Lightning project structure on the file system
  • Deploy the project to demo.openfn.org

Suggested API

We recommend the following API for the service

JSON input:

{ 
  steps: ["instructions for step 1", "instructions for step"]
  format: "yaml" | "json"
}

JSON output:

{
	files: {
	   'project.yaml': '...'
	}
}

The resulting structure and the files key makes for easy integration with the apollo server.

Background

OpenFn is an open source platform for data integration and workflow automation which can be used via a CLI or a web UI.

Projects on OpenFn can be encoded in a yaml file containing workflows, steps, jobs, triggers and edges.

Key terms

Workflows contains one trigger and one or more steps connected by edges.

Workflows are at the base of users activities and steps related to a objective are built and saved in a workflow.

A step is a task or instruction that users write mostly in Javascript. The output of a step can be anything from sending a text message, transform data, make an API call, send data to an external systems or fetch data from an external system. These jobs are performed with the help of adaptors.

Triggers are useful for nudging workflows to run based on an event or at a scheduled time. Edges connects two steps and determines the order and conditions for the steps in a workflow.

See more details at https://docs.openfn.org/documentation/get-started/terminology

Sample Project.yaml

Below is an example of a yaml file for a project openhie-project which has one workflow OpenHIE-Workflow and 4 steps [ "FHIR-standard-Data-with-change", "Send-to-OpenHIM-to-route-to-SHR", "Send-to-OpenHIM-to-route-to-SHR", "Notify-CHW-upload-successful"]. The trigger is a Webhook event and there are also 4 edges with source_job, target_job and condition variables that guide how the workflow should be executed.

name: openhie-project
description: Some sample
# credentials:
# globals:
workflows:
  OpenHIE-Workflow:
    name: OpenHIE Workflow
    jobs:
      FHIR-standard-Data-with-change:
        name: FHIR-standard-Data-with-change
        adaptor: '@openfn/language-http@latest'
        enabled: true
        # credential:
        # globals:
        body: |
          fn(state => {
            console.log("hello world")
            return state
        });

      Send-to-OpenHIM-to-route-to-SHR:
        name: Send-to-OpenHIM-to-route-to-SHR
        adaptor: '@openfn/language-http@latest'
        enabled: true
        # credential:
        # globals:
        body: |
          fn(state => state);

      Notify-CHW-upload-successful:
        name: Notify-CHW-upload-successful
        adaptor: '@openfn/language-http@latest'
        enabled: true
        # credential:
        # globals:
        body: |
          fn(state => state);

      Notify-CHW-upload-failed:
        name: Notify-CHW-upload-failed
        adaptor: '@openfn/language-http@latest'
        enabled: true
        # credential:
        # globals:
        body: |
          fn(state => state);

    triggers:
      webhook:
        type: webhook
    edges:
      webhook->FHIR-standard-Data-with-change:
        source_trigger: webhook
        target_job: FHIR-standard-Data-with-change
        condition: always
      FHIR-standard-Data-with-change->Send-to-OpenHIM-to-route-to-SHR:
        source_job: FHIR-standard-Data-with-change
        target_job: Send-to-OpenHIM-to-route-to-SHR
        condition: on_job_success
      Send-to-OpenHIM-to-route-to-SHR->Notify-CHW-upload-successful:
        source_job: Send-to-OpenHIM-to-route-to-SHR
        target_job: Notify-CHW-upload-successful
        condition: on_job_success
      Send-to-OpenHIM-to-route-to-SHR->Notify-CHW-upload-failed:
        source_job: Send-to-OpenHIM-to-route-to-SHR
        target_job: Notify-CHW-upload-failed
        condition: on_job_failure

Here's what a list of steps might look like (also pictured in the image below):

  • Fetch Referrals From Primero using the primero adaptor
  • Send a text message to case officer with telerivet adaptor
  • Add patient to DHIS2 with the dhis2 adaptor

When this feature is integrated with Lighnting, here is how a user would use it to generate a workflow.

Building this UI/X is NOT part of this project but the image below is helpful for contributors to understand the mission.

Image

Documentation and relevant links

@christad92 christad92 added the DMP 2024 Submission for DMP label Mar 4, 2024
@christad92 christad92 added this to v2 Mar 4, 2024
@github-project-automation github-project-automation bot moved this to New Issues in v2 Mar 4, 2024
@christad92 christad92 moved this from New Issues to Icebox in v2 Mar 5, 2024
@josephjclark
Copy link
Collaborator

This probably needs to wait until OpenFn/apollo#42 is ready before we can take contributions.

Then this project needs:

  • An AI module plugging into the AI server that can generate workflows (in python or JavaScript)
  • A CLI integration which allows the module to be called from the CLI and outputs workflow.yaml to disk
  • A UI in Lightning, which is probably a separate ticket in the lightning repo

@hiteshbandhu
Copy link

Hey !

I'm interested in contributing to this project to solve the generation of workflows.yaml file.

@josephjclark I would love to talk and get more specific details on it and contribute

@josephjclark josephjclark changed the title [DMP 2024] Generate a project Workflow.yaml file from a list of steps [DMP 2024] Generate a project.yaml file from a list of steps Apr 10, 2024
@christad92
Copy link
Author

Hey !

I'm interested in contributing to this project to solve the generation of workflows.yaml file.

@josephjclark I would love to talk and get more specific details on it and contribute

Hi @hiteshbandhu thanks for the interest. Please can you send an email to ayodele[at]openfn[dot]org and we will be able to decide on possible next steps.

@Gmin2
Copy link

Gmin2 commented Apr 12, 2024

Hey @christad92
i am interested in working on this project should i mail w=you regarding the next steps?

@CodesSunny
Copy link

@christad92 hi,
I am learning coding... Pls let me know whether I can join this project.
Regards
Vikas

@christad92
Copy link
Author

Hi @utnim2 I have responded to your email with next steps. We'd love to get you started, please use the link in the email to book a 30 mins call to discuss this issue.

@AbhimanyuSamagra
Copy link

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project.

1 similar comment
@AbhimanyuSamagra
Copy link

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project.

@babitarit
Copy link

@christad92 i would like to work on this issue can u pls assign me this so that i can start working

@Saksham0303
Copy link

Greetings @christad92,
I've sent you my proposal via email. Could you please take a moment to review it and provide any feedback or suggestions for improvement. Once finalized, I'll be ready to submit it on the website.

@Pin4sf
Copy link

Pin4sf commented Jun 12, 2024

Hey @christad92 and @josephjclark
I'm Shivansh, and I've been selected to work on this issue for DMP 2024.
I've been developing a Python module to generate YAML files from workflow steps using NLP, rule-based parsing and NER.
You can check the initial AI pipeline I am following.

image

You can find my initial YAML generator implementation in this Google Colab : RB+NER_Yaml.ipynb


This probably needs to wait until OpenFn/apollo#42 is ready before we can take contributions.

Then this project needs:

  • An AI module plugging into the AI server that can generate workflows (in python or JavaScript)
  • A CLI integration which allows the module to be called from the CLI and outputs workflow.yaml to disk
  • A UI in Lightning, which is probably a separate ticket in the lightning repo

I've also reviewed @josephjclark's comment on the issue thread and wanted to confirm if my primary task is developing the Python module for this issue. Additionally, I noticed the initial framework for Apollo service calls in the CLI and wanted to know if I should also contribute to the UI ticket on the lightning repository.

  • A CLI integration which allows the module to be called from the CLI and outputs workflow.yaml to disk

On inspection of apollo repository I noticed that you have set up the initial framework for making calls to Apollo services through the CLI. So need a clarification regarding the integration/plugging of this issue in Apollo repository and CLI calls.

I hope it aligns with the project goals, I would greatly appreciate your feedback and guidance on any potential improvements or clarifications on my responsibilities.

Best Regards

P.S : I think this issue should be in OpenFn/apollo.

@josephjclark
Copy link
Collaborator

Hi @Pin4sf!

Sorry for the late reply - and congrats on getting the ticket!

You are not expected to contribute to lightning as part of this project - the focus is on the backend, likely python.

A lot has changed since we created this issue and we're doing a little bit of planning to accommodate that. Unfortunately I'm super busy this week so we need a few days to prepare.

We're going to set up a kick-off call late next week to go through this. We'll be in touch soon to get that arranged, then we can let you loose!

@Pin4sf
Copy link

Pin4sf commented Jun 27, 2024

Weekly Learnings & Updates

You can access the contribution dashboard here C4GT DMP 24

Week 1

  • Explored OpenFn documentation, focusing on workflows, adaptors, and CLI workflows.
  • Gained insights into YAML structure and the interplay between adaptors, jobs, triggers, and edges.
  • Set up basic CLI configurations.

Week 2

  • Started collecting essential data and adopters for training the Adopter Named Entity Recognition (NER) model.
  • Delved into various natural language processing (NLP) techniques, including Part-of-Speech (PoS) tagging, rule-based parsing, and NER.
  • Began training an initial model for generating YAML definitions. Check out my progress in RB+NER_Yaml.ipynb

Week 3

  • Received updated project requirements:
  • Create a new service in the Apollo repository called gen_project.
  • This service should take a series of steps as input and return a workflow definition (either as a project.yaml or workflow.json file).
  • Develop a demo script that allows easy project import to Lightning.
  • Explored the OpenFn/apollo. repository and aligned project goals accordingly.
  • Engaged in discussions with maintainers to refine project specifications.
  • Adapted the AI model pipeline to meet Apollo’s requirements. You can find the updated notebook here: AI Agent

Week 4

  • Trained the NER model using OpenFn’s list of input steps for adopter identification.
  • Established the service structure within the Apollo repository (branch: Dmp/619).
  • Added gen_project.py to the repository.

Week 5

  • Focused on creating a demo script that allows developers to easily test various inputs and visualize them in Lightning.
  • Addressed the challenge of lengthy deployment times for the current local NER model by exploring strategies to reduce data in both the repository and the deployed Docker image.

Week 6

  • Work on structuring and planning for lightning import.
  • Understanding the project structure of OpenFn Lightning demo.
  • Going through the resources for project lightning import.

Week 7

  • Updating the gen_project services for optimized output.json for lightning import
  • Understanding openfn lightning at demo.openfn.org.
  • Testing openfn demo and openfn CLI.

Week 8

  • Created python script for project.yaml creation and linting the output.json.
  • debugging the gen_project service.

@hitenvidhani
Copy link

hitenvidhani commented Jul 12, 2024

Weekly Goals

Week 1

  • Explored OpenFn documentation, focusing on workflows, adaptors, and CLI workflows.
  • Gained insights into YAML structure and the interplay between adaptors, jobs, triggers, and edges.
  • Set up basic CLI configurations.

Week 2

  • Started collecting essential data and adopters for training the Adopter Named Entity Recognition (NER) model.
  • Delved into various natural language processing (NLP) techniques, including Part-of-Speech (PoS) tagging, rule-based parsing, and NER.
  • Began training an initial model for generating YAML definitions. Check out my progress in RB+NER_Yaml.ipynb

Week 3

  • Received updated project requirements:
  • Create a new service in the Apollo repository called gen_project.
    This service should take a series of steps as input and return a workflow definition (either as a project.yaml or workflow.json file).
  • Develop a demo script that allows easy project import to Lightning.
  • Explored the OpenFn/apollo. repository and aligned project goals accordingly.
  • Engaged in discussions with maintainers to refine project specifications.
  • Adapted the AI model pipeline to meet Apollo’s requirements. You can find the updated notebook here: AI Agent

Week 4

  • Trained the NER model using OpenFn’s list of input steps for adopter identification.
  • Established the service structure within the Apollo repository (branch: Dmp/619).
  • Added gen_project.py to the repository.

Week 5

  • Focused on creating a demo script that allows developers to easily test various inputs and visualize them in Lightning.
  • Addressed the challenge of lengthy deployment times for the current local NER model by exploring strategies to reduce data in both the repository and the deployed Docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DMP 2024 Submission for DMP
Projects
Status: Icebox
Development

No branches or pull requests

10 participants