Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin' into artsiom/tagtooltip
Browse files Browse the repository at this point in the history
  • Loading branch information
ArtsiomWB committed Dec 23, 2024
2 parents fd3f70e + 1b77d8f commit 42c1863
Show file tree
Hide file tree
Showing 193 changed files with 11,082 additions and 2,358 deletions.
16 changes: 14 additions & 2 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,11 +81,16 @@ jobs:
env:
CI: 1
WANDB_ENABLE_TEST_CONTAINER: true
LOGGING_ENABLED: true
ports:
- '8080:8080'
- '8083:8083'
- '9015:9015'
options: --health-cmd "curl --fail http://localhost:8080/healthz || exit 1" --health-interval=5s --health-timeout=3s
options: >-
--health-cmd "wget -q -O /dev/null http://localhost:8080/healthz || exit 1"
--health-interval=5s
--health-timeout=3s
--health-start-period=10s
outputs:
tests_should_run: ${{ steps.test_check.outputs.tests_should_run }}
steps:
Expand Down Expand Up @@ -254,11 +259,16 @@ jobs:
env:
CI: 1
WANDB_ENABLE_TEST_CONTAINER: true
LOGGING_ENABLED: true
ports:
- '8080:8080'
- '8083:8083'
- '9015:9015'
options: --health-cmd "curl --fail http://localhost:8080/healthz || exit 1" --health-interval=5s --health-timeout=3s
options: >-
--health-cmd "wget -q -O /dev/null http://localhost:8080/healthz || exit 1"
--health-interval=5s
--health-timeout=3s
--health-start-period=10s
weave_clickhouse:
image: clickhouse/clickhouse-server
ports:
Expand All @@ -267,6 +277,8 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Enable debug logging
run: echo "ACTIONS_STEP_DEBUG=true" >> $GITHUB_ENV
- name: Set up Python ${{ matrix.python-version-major }}.${{ matrix.python-version-minor }}
uses: actions/setup-python@v5
with:
Expand Down
30 changes: 15 additions & 15 deletions dev_docs/BaseObjectClasses.md → dev_docs/BuiltinObjectClasses.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# BaseObjectClasses
# BuiltinObjectClasses

## Refresher on Objects and object storage

Expand Down Expand Up @@ -79,11 +79,11 @@ While many Weave Objects are free-form and user-defined, there is often a need f

Here's how to define and use a validated base object:

1. **Define your schema** (in `weave/trace_server/interface/base_object_classes/your_schema.py`):
1. **Define your schema** (in `weave/trace_server/interface/builtin_object_classes/your_schema.py`):

```python
from pydantic import BaseModel
from weave.trace_server.interface.base_object_classes import base_object_def
from weave.trace_server.interface.builtin_object_classes import base_object_def

class NestedConfig(BaseModel):
setting_a: int
Expand Down Expand Up @@ -116,7 +116,7 @@ curl -X POST 'https://trace.wandb.ai/obj/create' \
"project_id": "user/project",
"object_id": "my_config",
"val": {...},
"set_base_object_class": "MyConfig"
"object_class": "MyConfig"
}
}'

Expand Down Expand Up @@ -154,38 +154,38 @@ Run `make synchronize-base-object-schemas` to ensure the frontend TypeScript typ

### Architecture Flow

1. Define your schema in a python file in the `weave/trace_server/interface/base_object_classes/test_only_example.py` directory. See `weave/trace_server/interface/base_object_classes/test_only_example.py` as an example.
2. Make sure to register your schemas in `weave/trace_server/interface/base_object_classes/base_object_registry.py` by calling `register_base_object`.
1. Define your schema in a python file in the `weave/trace_server/interface/builtin_object_classes/test_only_example.py` directory. See `weave/trace_server/interface/builtin_object_classes/test_only_example.py` as an example.
2. Make sure to register your schemas in `weave/trace_server/interface/builtin_object_classes/builtin_object_registry.py` by calling `register_base_object`.
3. Run `make synchronize-base-object-schemas` to generate the frontend types.
* The first step (`make generate_base_object_schemas`) will run `weave/scripts/generate_base_object_schemas.py` to generate a JSON schema in `weave/trace_server/interface/base_object_classes/generated/generated_base_object_class_schemas.json`.
* The second step (yarn `generate-schemas`) will read this file and use it to generate the frontend types located in `weave-js/src/components/PagePanelComponents/Home/Browse3/pages/wfReactInterface/generatedBaseObjectClasses.zod.ts`.
* The first step (`make generate_base_object_schemas`) will run `weave/scripts/generate_base_object_schemas.py` to generate a JSON schema in `weave/trace_server/interface/builtin_object_classes/generated/generated_builtin_object_class_schemas.json`.
* The second step (yarn `generate-schemas`) will read this file and use it to generate the frontend types located in `weave-js/src/components/PagePanelComponents/Home/Browse3/pages/wfReactInterface/generatedBuiltinObjectClasses.zod.ts`.
4. Now, each use case uses different parts:
1. `Python Writing`. Users can directly import these classes and use them as normal Pydantic models, which get published with `weave.publish`. The python client correct builds the requisite payload.
2. `Python Reading`. Users can `weave.ref().get()` and the weave python SDK will return the instance with the correct type. Note: we do some special handling such that the returned object is not a WeaveObject, but literally the exact pydantic class.
3. `HTTP Writing`. In cases where the client/user does not want to add the special type information, users can publish base objects by setting the `set_base_object_class` setting on `POST obj/create` to the name of the class. The weave server will validate the object against the schema, update the metadata fields, and store the object.
3. `HTTP Writing`. In cases where the client/user does not want to add the special type information, users can publish builtin objects (set of weave.Objects provided by Weave) by setting the `builtin_object_class` setting on `POST obj/create` to the name of the class. The weave server will validate the object against the schema, update the metadata fields, and store the object.
4. `HTTP Reading`. When querying for objects, the server will return the object with the correct type if the `base_object_class` metadata field is set.
5. `Frontend`. The frontend will read the zod schema from `weave-js/src/components/PagePanelComponents/Home/Browse3/pages/wfReactInterface/generatedBaseObjectClasses.zod.ts` and use that to provide compile time type safety when using `useBaseObjectInstances` and runtime type safety when using `useCreateBaseObjectInstance`.
5. `Frontend`. The frontend will read the zod schema from `weave-js/src/components/PagePanelComponents/Home/Browse3/pages/wfReactInterface/generatedBuiltinObjectClasses.zod.ts` and use that to provide compile time type safety when using `useBaseObjectInstances` and runtime type safety when using `useCreateBaseObjectInstance`.
* Note: it is critical that all techniques produce the same digest for the same data - which is tested in the tests. This way versions are not thrashed by different clients/users.

```mermaid
graph TD
subgraph Schema Definition
F["weave/trace_server/interface/<br>base_object_classes/your_schema.py"] --> |defines| P[Pydantic BaseObject]
P --> |register_base_object| R["base_object_registry.py"]
P --> |register_base_object| R["builtin_object_registry.py"]
end
subgraph Schema Generation
M["make synchronize-base-object-schemas"] --> G["make generate_base_object_schemas"]
G --> |runs| S["weave/scripts/<br>generate_base_object_schemas.py"]
R --> |import registered classes| S
S --> |generates| J["generated_base_object_class_schemas.json"]
M --> |yarn generate-schemas| Z["generatedBaseObjectClasses.zod.ts"]
S --> |generates| J["generated_builtin_object_class_schemas.json"]
M --> |yarn generate-schemas| Z["generatedBuiltinObjectClasses.zod.ts"]
J --> Z
end
subgraph "Trace Server"
subgraph "HTTP API"
R --> |validates using| HW["POST obj/create<br>set_base_object_class"]
R --> |validates using| HW["POST obj/create<br>object_class"]
HW --> DB[(Weave Object Store)]
HR["POST objs/query<br>base_object_classes"] --> |Filters base_object_class| DB
end
Expand All @@ -203,7 +203,7 @@ graph TD
Z --> |import| UBI["useBaseObjectInstances"]
Z --> |import| UCI["useCreateBaseObjectInstance"]
UBI --> |Filters base_object_class| HR
UCI --> |set_base_object_class| HW
UCI --> |object_class| HW
UI[React UI] --> UBI
UI --> UCI
end
Expand Down
5 changes: 3 additions & 2 deletions dev_docs/RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ This document outlines how to publish a new Weave release to our public [PyPI pa

2. You should also run through this [sample notebook](https://colab.research.google.com/drive/1DmkLzhFCFC0OoN-ggBDoG1nejGw2jQZy#scrollTo=29hJrcJQA7jZ) remember to install from master. You can also just run the [quickstart](http://wandb.me/weave_colab).

3. To prepare a PATCH release, go to GitHub Actions and run the `bump-python-sdk-version` workflow on master. This will:
3. To prepare a PATCH release, go to GitHub Actions and run the [bump-python-sdk-version](https://github.com/wandb/weave/actions/workflows/bump_version.yaml) workflow on master. This will:

- Create a new patch version by dropping the pre-release (e.g., `x.y.z-dev0` -> `x.y.z`) and tag this commit with `x.y.z`
- Create a new dev version by incrementing the dev version (e.g., `x.y.z` -> `x.y.(z+1)-dev0`) and commit this to master
- Both of these commits will be pushed to master
Expand All @@ -16,6 +17,6 @@ This document outlines how to publish a new Weave release to our public [PyPI pa

5. Verify the new version of Weave exists in [PyPI](https://pypi.org/project/weave/) once it is complete.

6. Go to GitHub, click the release tag, and click `Draft a New Release`. Select the new tag, and click generate release notes. Publish the release.
6. Go to the [GitHub new release page](https://github.com/wandb/weave/releases/new). Select the new tag, and click "Generate release notes". Publish the release.

7. Finally, announce that the merge freeze is over.
6 changes: 3 additions & 3 deletions docs/docs/guides/evaluation/scorers.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,9 +224,9 @@ In Weave, Scorers are used to evaluate AI outputs and return evaluation metrics.
```

### Mapping Column Names with `columnMapping`
:::warning
:::important

In TypeScript, this feature is currently on the `Evaluation` object, not individual scorers!
In TypeScript, this feature is currently on the `Evaluation` object, not individual scorers.

:::

Expand Down Expand Up @@ -455,7 +455,7 @@ In Weave, Scorers are used to evaluate AI outputs and return evaluation metrics.
from weave.scorers import OpenAIModerationScorer
from openai import OpenAI

oai_client = OpenAI(api_key=...) # initialize your LLM client here
oai_client = OpenAI() # initialize your LLM client here

scorer = OpenAIModerationScorer(
client=oai_client,
Expand Down
1 change: 0 additions & 1 deletion docs/docs/guides/integrations/local_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ First and most important, is the `base_url` change during the `openai.OpenAI()`

```python
client = openai.OpenAI(
api_key='fake',
base_url="http://localhost:1234",
)
```
Expand Down
1 change: 0 additions & 1 deletion docs/docs/guides/integrations/notdiamond.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ preference_id = train_router(
response_column="actual",
language="en",
maximize=True,
api_key=api_key,
)
```

Expand Down
43 changes: 28 additions & 15 deletions docs/docs/guides/platform/index.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,44 @@
# Platform & Security

Weave is available on [W&B SaaS Cloud](https://docs.wandb.ai/guides/hosting/hosting-options/saas_cloud) which is a multi-tenant, fully-managed platform deployed in W&B's Google Cloud Platform (GCP) account in a North America region.
Weave is available on the following deployment options:

:::info
It's coming soon on [W&B Dedicated Cloud](https://docs.wandb.ai/guides/hosting/hosting-options/dedicated_cloud). Reach out to your W&B team if that would be of interest in your organization.
:::
- **[W&B SaaS Cloud](https://docs.wandb.ai/guides/hosting/hosting-options/saas_cloud):** A multi-tenant, fully-managed platform deployed in W&B's Google Cloud Platform (GCP) account in a North America region.
- **[W&B Dedicated Cloud](https://docs.wandb.ai/guides/hosting/hosting-options/dedicated_cloud):** Generally available on AWS and in preview on GCP and Azure.
- **Self-managed instances:** For teams that prefer to host Weave independently, guidance is available from your W&B team to evaluate deployment options.

## Identity & Access Management
## Identity and Access Management

Use the identity and access management capabilities for secure authentication and effective authorization in your [W&B Organization](https://docs.wandb.ai/guides/hosting/iam/org_team_struct#organization). The following capabilities are available for Weave users in W&B SaaS Cloud:
Use the identity and access management capabilities for secure authentication and effective authorization in your [W&B Organization](https://docs.wandb.ai/guides/hosting/iam/org_team_struct#organization). The following capabilities are available for Weave users depending on your deployment option and [pricing plan](https://wandb.ai/site/pricing/):

* Authenticate using Single-Sign On (SSO), with available options being Google, Github, Microsoft, and [OIDC providers](https://docs.wandb.ai/guides/technical-faq/general#does-wb-support-sso-for-saas)
* [Team-based access control](https://docs.wandb.ai/guides/hosting/iam/manage-users#manage-a-team), where each team may correspond to a business unit / function, department, or a project team in your company
* Use W&B projects to organize different initiatives within a team, and configure the required [visibility scope](https://docs.wandb.ai/guides/hosting/restricted-projects) for each project
- **Authenticate using Single-Sign On (SSO):** Options include public identity providers like Google and Github, as well as enterprise providers such as Okta, Azure Active Directory, and others, [using OIDC](https://docs.wandb.ai/guides/technical-faq/general#does-wb-support-sso-for-saas).
- **[Team-based logical separation](https://docs.wandb.ai/guides/hosting/iam/manage-organization/#add-and-manage-teams):** Each team may correspond to a business unit, department, or project team within your organization.
- **Use W&B projects to organize initiatives:** Organize initiatives within teams and configure the required [visibility scope](https://docs.wandb.ai/guides/hosting/restricted-projects), including the `restricted` scope for sensitive collaborations.
- **Role-based access control:** Configure access at the [team](https://docs.wandb.ai/guides/hosting/iam/manage-organization#assign-or-update-a-team-members-role) or [project](https://docs.wandb.ai/guides/hosting/iam/restricted-projects#project-level-roles) level to ensure users access data on a need-to-know basis.
- **Scoped service accounts:** Automate Gen AI workflows using service accounts scoped to your organization or team.
- **[SCIM API and Python SDK](https://docs.wandb.ai/guides/hosting/iam/automate_iam):** Manage users and teams efficiently with SCIM API and Python SDK.

## Data Security

In the W&B SaaS Cloud, data of all Weave users is stored in a shared cloud storage and is processed using shared compute services. The shared cloud storage is encrypted using the cloud-native encryption mechanism. When reading or writing data on behalf of a user, a security context comprising of the user's W&B organization, team and project is utilized to ensure data path isolation.
- **SaaS Cloud:** Data for all Weave users is stored in a shared Clickhouse Cloud cluster, encrypted using cloud-native encryption. Shared compute services process the data, ensuring isolation through a security context comprising your W&B organization, team, and project.

- **Dedicated Cloud:** Data is stored in a unique Clickhouse Cloud cluster in the cloud and region of your choice. A unique compute environment processes the data, with the following additional protections:
- **[IP allowlisting](https://docs.wandb.ai/guides/hosting/data-security/ip-allowlisting):** Authorize access to your instance from specific IP addresses. This is an optional capability.
- **[Private connectivity](https://docs.wandb.ai/guides/hosting/data-security/private-connectivity):** Route data securely through the cloud provider's private network. This is an optional capability.
- **[Data encryption](https://docs.wandb.ai/guides/hosting/data-security/data-encryption):** W&B encrypts data at rest using a unique W&B-managed encryption key.
- **Clickhouse cluster security:** W&B connects to the unique Clickhouse Cloud cluster for your Dedicated Cloud instance over the cloud provider's private network. W&B also encrypts the cluster using a unique W&B-managed encryption key, while leveraging Clickhouse's file level encryption.

:::note
[Secure storage connector](https://docs.wandb.ai/guides/hosting/secure-storage-connector) is not applicable to Weave.
:::important
[The W&B Platform secure storage connector or BYOB](https://docs.wandb.ai/guides/hosting/data-security/secure-storage-connector) is not available for Weave.
:::

## Maintenance
## Maintenance

If you're using Weave on W&B SaaS Cloud, you do not incur the overhead and costs of provisioning and maintaining the W&B platform. It's all fully managed for you.
If you're using Weave on SaaS Cloud or Dedicated Cloud, you avoid the overhead and costs of provisioning, operating, and maintaining the W&B platform, as it is fully managed for you.

## Compliance

Security controls for W&B SaaS Cloud are periodically audited internally and externally. Refer to the [W&B Security Portal](https://security.wandb.ai/) to request the SOC2 report and other security and compliance documents.
:::tip
To request SOC 2 reports and other security and compliance documents, refer to the [W&B Security Portal](https://security.wandb.ai/) or contact your W&B team for more information.
:::

Security controls for both SaaS Cloud and Dedicated Cloud are periodically audited internally and externally. Both platforms are SOC 2 Type II compliant. Additionally, Dedicated Cloud is HIPAA-compliant for organizations managing PHI data while building Generative AI applications.
Loading

0 comments on commit 42c1863

Please sign in to comment.