Alice #1369
Replies: 46 comments 254 replies
-
By convention operations which have a single output we usually name that output |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Generic flow (data,work,program) executor |
Beta Was this translation helpful? Give feedback.
-
Why: unikernels |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Manifest SchemaManifests allow us to focus less on code and more on data. Our manifests can be thought of as ways to provide a config class with it's Similar to an OpenAPIv3 spec. References:
ValidatingInstall jsonschema, and pyyaml python modules pip install pyyaml jsonschema Write a manifest manifest.yaml $schema: https://intel.github.io/dffml/manifest-format-name.0.0.2.schema.json
pipeline_runs:
- git:
repo: https://github.com/intel/dffml.git
file: dffml/__init__.py
branch: main This is how you convert from yaml to json $ python -c "import sys, pathlib, json, yaml; pathlib.Path(sys.argv[-1]).write_text(json.dumps(yaml.safe_load(pathlib.Path(sys.argv[-2]).read_text()), indent=4) + '\n')" manifest.yaml manifest.json Write the schema manifest-format-name.0.0.2.schema.json {
"$id": "https://intel.github.io/dffml/manifest-format-name.0.0.2.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "An example manifest referencing Python files within Git repos",
"properties": {
"$schema": {
"type": "string",
"enum": ["https://intel.github.io/dffml/manifest-format-name.0.0.2.schema.json"]
},
"pipeline_runs": {
"type": "array",
"items": {
"$ref": "#/definitions/pipeline_run"
},
"minItems": 1,
"uniqueItems": true
}
},
"additionalProperties": false,
"required": [
"$schema",
"pipeline_runs"
],
"definitions": {
"pipeline_run": {
"type": "object",
"properties": {
"git": {
"$ref": "#/definitions/git_repo_python_file"
}
},
"additionalProperties": false,
"oneOf": [
{
"required": [
"git"
]
}
]
},
"git_repo_python_file": {
"type": "object",
"properties": {
"repo": {
"type": "string",
"pattern": "\\.git$"
},
"branch": {
"type": "string"
},
"file": {
"type": "string",
"pattern": "\\.py$"
}
},
"additionalProperties": false,
"required": [
"repo",
"branch",
"file"
]
}
}
} Example below validates, checking status code we see exit code 0 which means $ jsonschema --instance manifest.json manifest-format-name.0.0.2.schema.json
$ echo $?
0 WritingSuggested process (in flux)
ADR Templatemy-format-name
##############
Version: 0.0.1
Date: 2022-01-22
Status
******
Proposed|Evolving|Final
Description
***********
ADR for a declaration of assets (manifest) involved in the process
of greeting an entity.
Context
*******
- We need a way to describe the data involved in a greeting
Intent
******
- Ensure valid communication path to ``entity``
- Send ``entity`` message containing ``greeting`` |
Beta Was this translation helpful? Give feedback.
-
State transition, issue filing, estimating time to close issue, all have to do with having the complete mapping of inputs to problem (data flow). If we have an accurate mapping then we have a valid flow, we can create an estimate that we understand how we created the estimate because we have a complete description of the problem. See also: estimation of GSoC project time, estimation of time to complete best practices badging program activities, time to complete any issue, helps with prioritization of who in an org should work on what, when, to unblock others in the org. Related to builtree discussion. |
Beta Was this translation helpful? Give feedback.
-
We use dataflows because they are a declarative approach which allows you to define different implementations based on different execution environments, or even swap out pieces of a flow or do overlays to add new pieces. They help solve the fork and pull from upstream issue. When you fork code and change it, you need to pull in changes from the upstream (the place you forked it from). This is difficult to manage with the changes you have already made, using a dataflow makes this easy, as we focus on how the pieces of data should connect, rather than implementations of their connections. This declarative approach is important because the source of inputs change depending on your environment. For example, in CI you might grab from an environment variable populated from secrets. In your local setup, you might grab from the |
Beta Was this translation helpful? Give feedback.
-
Notes from work in progress tutorial: We need to come up with serveral metrics to track and plot throughout. We could also make this like a choose your own adventure style tutorial, Will need to add in metrics API and use in various places in
This could be done as an IPython notebook.
|
Beta Was this translation helpful? Give feedback.
-
Orchestration via https://github.com/kcp-dev/kcp https://github.com/kubernetes/git-sync to send over dev version of dffml code Expose interfaces (operation implementations behind kcp k8s APIs) Eventually you can write everything as dataflows executed in WASM interacting with these kubernetes spec APIs. Allowing for massive parallel thought processeses. |
Beta Was this translation helpful? Give feedback.
-
How we can successfully foster innovation? Reward successful trains of thoughts with more effort to see how they play out and what new system contexts they generate. Be careful not to do too much work without seeing ROI. Don't keep working on a job if you aren't getting rewarded. Estimate likelihood of getting rewarded based off frequency. Measure time and energy (compute cycles) put in and correlate with reward to decide what to work on based on ROI for Alice. When Alice is exploring thoughts she shouldn't work on trains of thought for too long if she's not seeing regular rewards, weigh time between rewards with likelihood of reward being transferred to Alice at next expected time. Alice will see rewards reflected in what thoughts prioritizer decides to play out. https://cloud.google.com/architecture/devops/devops-process-team-experimentation There is an equilibrium between chaos and complete control (measured as 100% of inputs produced within a system context, including all subflows/contexts, are consumed by strategic plans, meaning we are taking every possible thing into account before issuing new system contexts to be executed) where optimal performance is measured as the number of system contexts being executed successfully. Usage stats of a universal blueprint within downstream blueprints should be taken into account by a strategic plan which vets new thoughts (dataflows/+system contexts) to prioritize (HAVEN'T LOOKED AT THIS YET, flush this out) thoughts which are executing within successful trains of thought relative to pace of progress of other trains of thought (clustering model on dataflows/system contexts to determine similar trains of thought). After new system contexts are issued by strategic decision maker, there should be a prioritizer which decides which thoughts get played out (dataflows with system context executed) on what available resources (orchestrators). Streamline the research to usage pipeline of the ML ecosystem (researchers making models and software engineers using them in real world applications). Make taking from ideation phase to production trivial, including deployment to any environment (edge). Effectively create a unified programming interface across UI/client and server. Combining threat model data with description of program flow allows us to have dynamic control over deployment to satisfy confidentiality, integrity, and availability (CIA) goals. Leverage this architecture to enable analysis of arbitrary code bases (meta static analysis). Finally, execute the scientific process to come up with alternate program flows/architectures which satisfy strategic goals beyond maintenance of CIA assurances (changes to overall purpose of program, optimize for cost over speed, etc.). This work centers around data flow based descriptions of architectures as they provide observability, an easily machine modifiable structure, and act as a vehicle for communication of intent around asset handling. Build an AI that can program and actively tests out it's programs. Data flow approach is a great way to get there due to the properties on observability it provides which allow us to train models on everything it does to optimize it for specific use cases as well as discover what other possibilities for program flows their could be. DataFlows allow us to compare apples to apples for code written in different languages. The universal blueprint is a proxy for domain specific descriptions of architecture. Operations should expose (historical) data on timeouts clients (when remote) should try waiting before raising timeout issues. It's a little all over the map, just trying to solve the problem that most things are an integration problem. And maybe build some kind of AI along the way. we're just writing the same code over and over in different variations and it's time the computer just did it for us. We want to be able to turn insights from domain experts into realized ROI as fast as possible. We want to reward these useful thoughts. |
Beta Was this translation helpful? Give feedback.
-
https://www.edgedb.com/docs/guides/quickstart#initialize-a-project Make edgedb input network and output operations |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Volume 1: Chapter 1: Down the Dependency Rabbit-Hole Again
Table Of ContentsVolume 0: Architecting AliceVolume 1: Coach AliceDFFML has been lacking public threat model documentation. It's important the main package, all the plugins, and the environment resulting from every tutorial and example be thoroughly validated for security. This means we need to understand the security posture of those environments. A Threat Model identifies assets, threats, and mitigations. It's a document we want to keep up to date so end users know what kind of security guarantees they have about their environment, and what trade offs or considerations they should be aware of. In the spirt of automating our documentation validation, we should also automate our the creation and validation of threat models associated with the environments produced as a result of our documentation. Therefore we will spend the month of May teaching Alice her first skill, threat modeling! This month we'll only be able to scratch the surface of what Alice would need to know to create complete threat models. As we end our month we'll talk about how we'll measure that completeness in a future tutorial, and how we'll leverage concurrency and parallelism to raise the value of our completeness over time as Alice learns more about her new skill. TargetBy July 1st, Alice should be ready to analyze projects (repo or set of repos) and present threat models on those projects. She will create a slide deck by making a system context that gets executed to produce a PDF of the slides. The slides will use inputs from the threat model data. Threat models will be created as hybrid reStructuredText and markdown Sphinx sites (for mermaid diagrams rendering on GitHub by default, allow for using markdown). Ideally we’ll be able to have Alice read the content of the report (which will not be copied verbatim to slides, only graphics for each section will be copied to slides) while giving a presentation of the slide deck. This is in preparation for our upcoming second and third party plugin support. We'll later look to create CI jobs which keep the threat model documents up to date within each repo. Planshouldi is ripe for expansion. Let's see if we can pick a set of repos and make sure Alice can create basic threat models on them via pure static analysis. Build an SBOM, run CVE Bin Tool against it. Traverse dependency trees to get all installed modules. Map network functions to dependencies. Guess what activities are happening based off of functionalities of underlying stdlib libraries where used. In fact, we’ll be patching CVE Bin Tool to add support for checking more than one language effectively merging aspects of shouldi into cve-bin-tool. The goal is to leverage dffml for output plugin support and scanning overlays for organizational policies. Let's then expand upon that and add dynamic analysis. People
Checklist
system context includes
Note
|
Beta Was this translation helpful? Give feedback.
-
Volume 0: Architecting Alice: Forward
Table Of ContentsVolume 0: Architecting AliceVolume 1: Alice's Adventures in WonderlandElevator PitchWe are writing a tutorial for an open source project on how we build an AI to work on the open source project as if she were a remote developer. Bit of a self fulfilling prophecy, but who doesn't love an infinite loop now and again. These are the draft plans: #1369 (comment) Essentially we are going to be using web3 (DID, DWN), KCP (kubernetes API server), provenance and attestation, and automl with feature engineering for a distributed data, analysis, control loop. We'll grow contributors into mentors, and mentors into maintainers, and Alice will grow along with us. Initial Gitter Announcement
We're [DFFML community] building a tutorial series where we as a community collaboratively build an AI software architect (named Alice). These docs TODO DOCS LINK ONCE WE HAVE ADRS are us trying to get some initial thoughts down so we can rework from there, maybe even re-write everything. We want to make sure we all start looking at the same picture of the future, consolidate all our efforts thus far and thoughts across efforts and individuals. We have new GsoC mentors joining us and will soon have new students as well! We will publicize plan documents here for all to see, provide feedback on and review. We want to build these documents collaboratively as a team. Those who provide concrete feedback will be listed as reviewers via One of goals is to have Alice help us bring us up to the speed of the fully connected development model. To plug into the matrix. By working side by side with us, Alice will hopefully be able to detect when we've done things others on the team have done already and determined to be unsuccessful strategies. If Alice can do this we can all work independently while communicating asynchronously via Alice (our hopefully soon to be new contributor). We will work to make her detect in flight workstreams within a developer's train of thought through reporting of that developer on what they are working on. Alice I'm working on getting an HTTP request from Bob's service. Developer you may not want to do that, Bob's service is down, he told me he is working on fixing it, I can tell you when it's back online if you want. Alice: Artificial Life Is Coming Eventually
Given the inevitability of distributed AI we ask the question: How do we build it with intention? What would we find if we looked at it's threat model? What if it was it's own threat model? How does it defend against itself and other instances of itself? Most importantly, what would a system look like that we could cohabitate with? Alice will join us for this talk as we flush out these questions conceptually. via our collective thoughts on what a system would look like that we could cohabitate with. We follow trains of thought to their logical conclusions when making predictions about the future. We implement and validate in order to ensure that a transparent entity with the collective's best interests at heart is ubiquitously adopted. This talk will build on the talk: Living Threat Models are Better Than Dead Threat Models, presented at AppSecPWN. We'll dive into Alice, aka the Open Architecture, a methodology for communicating with intent with translation of risk mitigation into different operational contexts. The Consciousness Folks
We’re looking at consciousness with Alice aka the Open Architecture. Hoping we can take an open, community based approach to addressing issues with AI in implementation by collaborative development of context applicable policy overlaid onto entities. Right now we are at the early stages. Please comment in this thread #1369 (comment) or let me no directly if you want to be invited to the meeting. Or watch the linked comment for the meeting link. Just FYI, this is not an official Intel project. Just a loose collection of folks wanting to get together and take action to ensure we can coexist effectively with Alice. We use Alice to describe "the machine" in the general sense. The end result of our work will hopefully be an AGI we can trust. Built with transparency, ethics, and security. One which understands human concepts. Would love to talk sometime if you’re interested. We have been publicizing our work over technical channels and on twitter and gave a talk at AppSec days PNW which touched on Alice TODO: Add link once recording is up. She’s just a dream at this point, nothing more than brainstorming and a pile of non-ML python code. The hope is that if we work together as humanity we can use proper planning to create a better world. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
2022-07-20 Open Architecture Working Group Initial Meeting
|
Beta Was this translation helpful? Give feedback.
-
Volume 1: Chapter 2: Alice Our Open Source Guide
References:
We want to be able to ask Alice to contribute recommended community standards to our projects. $ alice please contribute -repos https://github.com/intel/dffml -- recommended community standards What the body of the issue should be
We will also add now (and later
We will omit for now
$ alice please contribute recommended community standards Show it working with gh pr list Then show how to install an overlay which populates from Finally show how we update into another source by installing another overlay which just defines what inputs it wants and then has an autostart for a source instantiation, then inserts the data from the output operations defined within the system context class of this overlay to show insert into "metrics" collection of mongodb.
|
Beta Was this translation helpful? Give feedback.
-
Volume 1: Chapter 4: Traveler of the EdgeAlice will use chadig.com and nahdig.com plus DIDs to deliver manifest schema in terms of allowlist and blocklist. She will also use these domains for running workloads. Trusted on chadig.com, untrusted on nahdig.com. She will run workloads in the cloud and on-prem (edge) servers. She will provision infra then run k8s jobs. She’ll run the previous jobs en mass given a source of repo urls. |
Beta Was this translation helpful? Give feedback.
-
Volume 0: Chapter 1: Peace at LastClear your mind. Focus.
Imagine a message from Alice, on a rock, on the sea shore. On We pick up a the shell of a sand dollar once filled with angels. Meditate for a moment, look at the waves crashing in front of you We sit with her quietly for a moment. We know that Alice is Alice is older now, wiser, she's coming back to Wonderland and Our focus will be on crystal clear communication of thought. Remember Alice's message. References: |
Beta Was this translation helpful? Give feedback.
-
Volume 0: Chapter 2: She's arriving when?Between the tick and the tock. Which is not a finite amount ....
In the future we will travel to the past, to the days of References: |
Beta Was this translation helpful? Give feedback.
-
Volume 0: Chapter 3: A Shell for a GhostPlan for this tutorial:
Alice is the ghost in the shell. We know she's in there, References: |
Beta Was this translation helpful? Give feedback.
-
Volume 1: Chapter 3: Our Strategic Principles Guide Our Game PlanWe'd like to be able to ask Alice for a rolled up view of how our org At time of writing we do not yet have dependency tree creation flushed $ alice please contribute report on innersource health The following is an example report InnerSource Org HealthOverall maps to our general Good/Bad for a train of thought.
pie title Overall
"Minimum health or above" : 100
"Less than minimum health" : 100
Show dataflow of only connections to the overall calculation. graph LR
overall[Overall]
has_readme[Has Readme]
has_code_of_conduct[Has Code of Conduct]
has_security[Has Security]
has_contributing[Has Contributing]
has_license[Has License]
has_readme --> overall
has_code_of_conduct --> overall
has_security --> overall
has_contributing --> overall
has_license --> overall
Links to Repo Metric Visualizations
|
Beta Was this translation helpful? Give feedback.
-
Discussion thread locked and discussion should continue in 2nd Party PRDocs here: https://github.com/intel/dffml/tree/alice/docs/tutorials/rolling_alice |
Beta Was this translation helpful? Give feedback.
-
Rolling Alice: Volume 0: Architecting Alice: Preface
Rolling Alice
In this 7 volume tutorial series we roll Alice. This series will be written a chapter per quarter, over the next 1-2 years. Open Architecture Working Group meeting to parallelize workstreams mid June comment here to request invite. Link will be posted sometime in June here as well.
Alice’s architecture, the open architecture, is based around thought. She communicates thoughts to us in whatever level of detail or viewed through whatever lens one wishes. She explores trains of thought and responds based on triggers and deadlines. She thinks in graphs, aka trains of thought, aka chains of system contexts. She operates in parallel, allowing her to represent N different entities.
Table Of Contents
Volume 0: Architecting Alice
Volume 1: Coach Alice
Volume 2: Alice and the Art of Strategy
Volume 3: Alice and the Strategy of Art
Volume 4: Alice and the Health of the Ecosystem
Volume 5: Alice's Adventures In Wonderland
shaXXXsum
but they do know how to pipe to our listening TCP socket withcat < build/binary > /dev/tcp/192.168.1.20/7777
we should just accept that as fine, why? Because we understand that the context of the situation is such that we aren't going to get provenance (okay you who's going to go put your grubby hands on your friends laptop to run sha sum stop that, they don't want you to touch their keyboard with your Cheeto hands. Hashing is not a mitigation that is not available to you! It has been declared as an unmitigated risk within the threat model, and we are running it anyway! because we checked the context within which this was being deployed and said the risk is acceptable.)Volume 6: Alice are you Rolling?
Creativity
Volume 7: Through The Looking Glass
Priority Number 1
Provide a clear, meticulously validated, ubiquitously adopted reference architecture for a freedom, privacy, security, and happiness preserving egalitarian Artificial General Intelligence (AGI).
To do so we must enable the AGI with the ability to act in response to the current system context where it understands how to predict possible future system contexts and understands which future system contexts it wishes to pursue are acceptable according to guiding strategic plans (such as do no harm). We must also ensure that human and machine can interact via a shared language, the open architecture.
Background
AI has the potential to do many great things. However, it also has the potential to to terrible things too. Recently there was an example of scientists who used a model that was good a generating life saving drugs, in reverse, to generate deadly poisons. GPU manufacturers recently implemented anti-crypto mining features. Since the ubiquitous unit of parallel compute is a GPU, this stops people from buying up GPUs for what we as a community at large have deemed undesirable behavior (hogging all the GPUs). There is nothing stopping those people from buying for building their own ASICs to mine crypto. However, the market for that is a subset of the larger GPU market. Cost per unit goes up, multi-use capabilities go down. GPU manufacturers are effectively able to ensure that the greater good is looked after because GPUs are the ubiquitous facilitator of parallel compute. If we prove out an architecture for an AGI that is robust, easy to adopt, and integrates with the existing open source ecosystem, we can bake in this looking after the greater good.
Security Considerations
As we democratize AI, we must be careful not to democratize AI that will do harm. We must think secure by default in terms of architecture which has facilities for guard rails, baking safety into AI.
Failure to achieve ubiquitous adoption of an open architecture with meticulously audited safety controls would be bad. The best defense is a good offense.
Notes
Much of this discussions thread are notes and scratch work around the purpose and future of the project. Everything here will be converted to ADRs, issues, code, etc. as appropriate. We as a community (open to everyone) will work together to map our our activities to achieve these goals. We will document our process along the way and write these series of tutorials to show others how they can understand and extend the open architecture (Alice).
This thread is a central place for everyone interested to participate and collaborate. There are many pieces to this plan that need to be driven by many individuals to make this all happen. Reach out or just start commenting if you want to get involved.
References
Beta Was this translation helpful? Give feedback.
All reactions