Skip to content

[DO NOT MERGE] feat: Check Nix package is working as part of the CI workflow #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

sestrella
Copy link
Member

@sestrella sestrella commented May 27, 2025

This PR was created primarily to test the new Nix job added to the CI workflow.

Copy link

github-actions bot commented May 27, 2025


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 4 committers have signed the CLA.
@bolinfest
@fouad-openai
@rgwood-dd
@sestrella
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

…openai#1137)

This PR introduces support for `-c`/`--config` so users can override
individual config values on the command line using `--config
name=value`. Example:

```
codex --config model=o4-mini
```

Making it possible to set arbitrary config values on the command line
results in a more flexible configuration scheme and makes it easier to
provide single-line examples that can be copy-pasted from documentation.

Effectively, it means there are four levels of configuration for some
values:

- Default value (e.g., `model` currently defaults to `o4-mini`)
- Value in `config.toml` (e.g., user could override the default to be
`model = "o3"` in their `config.toml`)
- Specifying `-c` or `--config` to override `model` (e.g., user can
include `-c model=o3` in their list of args to Codex)
- If available, a config-specific flag can be used, which takes
precedence over `-c` (e.g., user can specify `--model o3` in their list
of args to Codex)

Now that it is possible to specify anything that could be configured in
`config.toml` on the command line using `-c`, we do not need to have a
custom flag for every possible config option (which can clutter the
output of `--help`). To that end, as part of this PR, we drop support
for the `--disable-response-storage` flag, as users can now specify `-c
disable_response_storage=true` to get the equivalent functionality.

Under the hood, this works by loading the `config.toml` into a
`toml::Value`. Then for each `key=value`, we create a small synthetic
TOML file with `value` so that we can run the TOML parser to get the
equivalent `toml::Value`. We then parse `key` to determine the point in
the original `toml::Value` to do the insert/replace. Once all of the
overrides from `-c` args have been applied, the `toml::Value` is
deserialized into a `ConfigToml` and then the `ConfigOverrides` are
applied, as before.
@sestrella sestrella force-pushed the add_nix_job_to_ci branch from 808b0b1 to c40aa34 Compare May 28, 2025 20:31
bolinfest added 4 commits May 28, 2025 14:03
The motivation behind this PR is to make it so a `HistoryCell` is more
like a `WidgetRef` that knows how to render itself into a `Rect` so that
it can be backed by something other than a `Vec<Line>`. Because a
`HistoryCell` is intended to appear in a scrollable list, we want to
ensure the stack of cells can be scrolled one `Line` at a time even if
the `HistoryCell` is not backed by a `Vec<Line>` itself.

To this end, we introduce the `CellWidget` trait whose key method is:

```
fn render_window(&self, first_visible_line: usize, area: Rect, buf: &mut Buffer);
```

The `first_visible_line` param is what differs from
`WidgetRef::render_ref()`, as a `CellWidget` needs to know the offset
into its "full view" at which it should start rendering.

The bookkeeping in `ConversationHistoryWidget` has been updated
accordingly to ensure each `CellWidget` in the history is rendered
appropriately.
We had `debug!()` logging statements already, but they weren't being
printed because `tracing_subscriber` was not set up.
…en talking to OpenAI (openai#1150)

As noted in the comment introduced in this PR, this is analogous to the
issue reported in
openai/openai-agents-python#449. This seems to
work now.
)

The output of an MCP server tool call can be one of several types, but
to date, we treated all outputs as text by showing the serialized JSON
as the "tool output" in Codex:


https://github.com/openai/codex/blob/25a9949c49194d5a64de54a11bcc5b4724ac9bd5/codex-rs/mcp-types/src/lib.rs#L96-L101

This PR adds support for the `ImageContent` variant so we can now
display an image output from an MCP tool call.

In making this change, we introduce a new
`ResponseInputItem::McpToolCallOutput` variant so that we can work with
the `mcp_types::CallToolResult` directly when the function call is made
to an MCP server.

Though arguably the more significant change is the introduction of
`HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that
uses `ratatui_image` to render an image into the terminal. To support
this, we introduce `ImageRenderCache`, cache a
`ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the
appropriate scaled image data and dimensions based on the current
terminal size.

To test, I created a minimal `package.json`:

```json
{
  "name": "kitty-mcp",
  "version": "1.0.0",
  "type": "module",
  "description": "MCP that returns image of kitty",
  "main": "index.js",
  "dependencies": {
    "@modelcontextprotocol/sdk": "^1.12.0"
  }
}
```

with the following `index.js` to define the MCP server:

```js
#!/usr/bin/env node

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { readFile } from "node:fs/promises";
import { join } from "node:path";

const IMAGE_URI = "image://Ada.png";

const server = new McpServer({
  name: "Demo",
  version: "1.0.0",
});

server.tool(
  "get-cat-image",
  "If you need a cat image, this tool will provide one.",
  async () => ({
    content: [
      { type: "image", data: await getAdaPngBase64(), mimeType: "image/png" },
    ],
  })
);

server.resource("Ada the Cat", IMAGE_URI, async (uri) => {
  const base64Image = await getAdaPngBase64();
  return {
    contents: [
      {
        uri: uri.href,
        mimeType: "image/png",
        blob: base64Image,
      },
    ],
  };
});

async function getAdaPngBase64() {
  const __dirname = new URL(".", import.meta.url).pathname;
  // From https://github.com/benjajaja/ratatui-image/blob/9705ce2c59ec669abbce2924cbfd1f5ae22c9860/assets/Ada.png
  const filePath = join(__dirname, "Ada.png");
  const imageData = await readFile(filePath);
  const base64Image = imageData.toString("base64");
  return base64Image;
}

const transport = new StdioServerTransport();
await server.connect(transport);
```

With the local changes from this PR, I added the following to my
`config.toml`:

```toml
[mcp_servers.kitty]
command = "node"
args = ["/Users/mbolin/code/kitty-mcp/index.js"]
```

Running the TUI from source:

```
cargo run --bin codex -- --model o3 'I need a picture of a cat'
```

I get:

<img width="732" alt="image"
src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869"
/>

Now, that said, I have only tested in iTerm and there is definitely some
funny business with getting an accurate character-to-pixel ratio
(sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10
rows to render instead of 4), so there is still work to be done here.
@sestrella sestrella force-pushed the add_nix_job_to_ci branch from c40aa34 to add7c12 Compare May 29, 2025 14:05
@sestrella sestrella force-pushed the add_nix_job_to_ci branch from add7c12 to 1275055 Compare May 30, 2025 02:16
bolinfest added 14 commits May 30, 2025 09:11
The way these definitions worked before, they did not handle quoted args
with spaces properly.

For example, if you had `/tmp/test-just/printlen.py` as:

```python
#!/usr/bin/env python3

import sys

print(len(sys.argv))
```

and your `justfile` was:

```
printlen *args:
    /tmp/test-just/printlen.py {{args}}
```

Then:

```shell
$ just printlen foo bar
3
$ just printlen 'foo bar'
3
```

which is not what we want: `'foo bar'` should be treated as one
argument.

The fix is to use
[positional-arguments](https://github.com/casey/just/blob/515e806b5121a4696113ef15b5f0b12e69854570/README.md#L1131):

```
set positional-arguments

printlen *args:
    /tmp/test-just/printlen.py "$@"
```
This is a first cut at a GitHub Action that lets you define prompt
templates in `.md` files under `.github/codex/labels` that will run
Codex with the associated prompt when the label is added to a GitHub
pull request.

For example, this PR includes these files:

```
.github/codex/labels/codex-attempt.md
.github/codex/labels/codex-code-review.md
.github/codex/labels/codex-investigate-issue.md
```

And the new `.github/workflows/codex.yml` workflow declares the
following triggers:

```yaml
on:
  issues:
    types: [opened, labeled]
  pull_request:
    branches: [main]
    types: [labeled]
```

as well as the following expression to gate the action:

```
jobs:
  codex:
    if: |
      (github.event_name == 'issues' && (
        (github.event.action == 'labeled' && (github.event.label.name == 'codex-attempt' || github.event.label.name == 'codex-investigate-issue'))
      )) ||
      (github.event_name == 'pull_request' && github.event.action == 'labeled' && github.event.label.name == 'codex-code-review')
```

Note the "actor" who added the label must have write access to the repo
for the action to take effect.

After adding a label, the action will "ack" the request by replacing the
original label (e.g., `codex-review`) with an `-in-progress` suffix
(e.g., `codex-review-in-progress`). When it is finished, it will swap
the `-in-progress` label with a `-completed` one (e.g.,
`codex-review-completed`).

Users of the action are responsible for providing an `OPENAI_API_KEY`
and making it available as a secret to the action.
We should do some work to share the setup logic across `codex.yml`,
`ci.yml`, and `rust-ci.yml`.
https://github.com/openai/codex/actions/runs/15352839832/job/43205041563
appeared to fail around `postComment()`, but I don't see the output from
`fail()` in the logs. Adding a bit more info.
…ns.rs (openai#1177)

The main motivator behind this PR is that `stream_chat_completions()`
was not adding the `"tools"` entry to the payload posted to the
`/chat/completions` endpoint. This (1) refactors the existing logic to
build up the `"tools"` JSON from `client.rs` into `openai_tools.rs`, and
(2) updates the use of responses API (`client.rs`) and chat completions
API (`chat_completions.rs`) to both use it.

Note this PR alone is not sufficient to get tool calling from chat
completions working: that is done in
openai#1167.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1177).
* openai#1167
* __->__ openai#1177
…f not a TTY (openai#1178)

This attempts to make `codex exec` more flexible in how the prompt can
be passed:

* as before, it can be passed as a single string argument
* if `-` is passed as the value, the prompt is read from stdin
* if no argument is passed _and stdin is a tty_, prints a warning to
stderr that no prompt was specified an exits non-zero.
* if no argument is passed _and stdin is NOT a tty_, prints `Reading
prompt from stdin...` to stderr to let the user know that Codex will
wait until it reads EOF from stdin to proceed. (You can repro this case
by doing `yes | just exec` since stdin is not a TTY in that case but it
also never reaches EOF).
Fixes:

* Instantiate `EventProcessor` earlier in `lib.rs` so
`print_config_summary()` can be an instance method of it and leverage
its various `Style` fields to ensure it honors `with_ansi` properly.
* After printing the config summary, print out user's prompt with the
heading `User instructions:`. As noted in the comment, now that we can
read the instructions via stdin as of openai#1178, it is helpful to the user
to ensure they know what instructions were given to Codex.
* Use same colors/bold/italic settings for headers as the TUI, making
the output a bit easier to read.
This required changing `ts_println!()` to take `$self:ident`, which is a
bit more verbose, but the usability improvement seems worth it.

Also eliminated an unnecessary `.to_string()` while here.
This PR introduces a `hide_agent_reasoning` config option (that defaults
to `false`) that users can enable to make the output less verbose by
suppressing reasoning output.

To test, verified that this includes agent reasoning in the output:

```
echo hello | just exec
```

whereas this does not:

```
echo hello | just exec --config hide_agent_reasoning=false
```
The TypeScript version of the CLI shows the version when it starts up,
which is helpful when users share screenshots (and nice to know, as a
user).
@sestrella sestrella force-pushed the add_nix_job_to_ci branch from 1275055 to 698d891 Compare June 2, 2025 15:29
Update what we log to make `RUST_LOG=debug` a bit easier to work with.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1196).
* openai#1167
* __->__ openai#1196
bolinfest and others added 16 commits June 2, 2025 13:47
Prior to this PR, there were two big misses in `chat_completions.rs`:

1. The loop in `stream_chat_completions()` was only including items of
type `ResponseItem::Message` when building up the `"messages"` JSON for
the `POST` request to the `chat/completions` endpoint. This fixes things
by ensuring other variants (`FunctionCall`, `LocalShellCall`, and
`FunctionCallOutput`) are included, as well.
2. In `process_chat_sse()`, we were not recording tool calls and were
only emitting items of type
`ResponseEvent::OutputItemDone(ResponseItem::Message)` to the stream.
Now we introduce `FunctionCallState`, which is used to accumulate the
`delta`s of type `tool_calls`, so we can ultimately emit a
`ResponseItem::FunctionCall`, when appropriate.

While function calling now appears to work for chat completions with my
local testing, I believe that there are still edge cases that are not
covered and that this codepath would benefit from a battery of
integration tests. (As part of that further cleanup, we should also work
to support streaming responses in the UI.)

The other important part of this PR is some cleanup in
`core/src/codex.rs`. In particular, it was hard to reason about how
`run_task()` was building up the list of messages to include in a
request across the various cases:

- Responses API
- Chat Completions API
- Responses API used in concert with ZDR

I like to think things are a bit cleaner now where:

- `zdr_transcript` (if present) contains all messages in the history of
the conversation, which includes function call outputs that have not
been sent back to the model yet
- `pending_input` includes any messages the user has submitted while the
turn is in flight that need to be injected as part of the next `POST` to
the model
- `input_for_next_turn` includes the tool call outputs that have not
been sent back to the model yet
Previous to this PR, we always set `reasoning` when making a request
using the Responses API:


https://github.com/openai/codex/blob/d7245cbbc9d8ff5446da45e5951761103492476d/codex-rs/core/src/client.rs#L108-L111

Though if you tried to use the Rust CLI with `--model gpt-4.1`, this
would fail with:

```shell
"Unsupported parameter: 'reasoning.effort' is not supported with this model."
```

We take a cue from the TypeScript CLI, which does a check on the model
name:


https://github.com/openai/codex/blob/d7245cbbc9d8ff5446da45e5951761103492476d/codex-cli/src/utils/agent/agent-loop.ts#L786-L789

This PR does a similar check, though also adds support for the following
config options:

```
model_reasoning_effort = "low" | "medium" | "high" | "none"
model_reasoning_summary = "auto" | "concise" | "detailed" | "none"
```

This way, if you have a model whose name happens to start with `"o"` (or
`"codex"`?), you can set these to `"none"` to explicitly disable
reasoning, if necessary. (That said, it seems unlikely anyone would use
the Responses API with non-OpenAI models, but we provide an escape
hatch, anyway.)

This PR also updates both the TUI and `codex exec` to show `reasoning
effort` and `reasoning summaries` in the header.
As explained on https://crates.io/crates/regex-lite, `regex-lite` is a
lighter alternative to `regex` and seems to be sufficient for our
purposes.
As explained in detail in the doc comment for `ParseMode::Lenient`, we
have observed that GPT-4.1 does not always generate a valid invocation
of `apply_patch`. Fortunately, the error is predictable, so we introduce
some new logic to the `codex-apply-patch` crate to recover from this
error.

Because we would like to avoid this becoming a de facto standard (as it
would be incompatible if `apply_patch` were provided as an actual
executable, unless we also introduced the lenient behavior in the
executable, as well), we require passing `ParseMode::Lenient` to
`parse_patch_text()` to make it clear that the caller is opting into
supporting this special case.

Note the analogous change to the TypeScript CLI was
openai#930. In addition to changing the
accepted input to `apply_patch`, it also introduced additional
instructions for the model, which we include in this PR.

Note that `apply-patch` does not depend on either `regex` or
`regex-lite`, so some of the checks are slightly more verbose to avoid
introducing this dependency.

That said, this PR does not leverage the existing
`extract_heredoc_body_from_apply_patch_command()`, which depends on
`tree-sitter` and `tree-sitter-bash`:


https://github.com/openai/codex/blob/5a5aa899143f9b9ef606692c401b010368b15bdb/codex-rs/apply-patch/src/lib.rs#L191-L246

though perhaps it should.
…ai#1207)

This fixes a longstanding error in the Rust CLI where `codex.rs`
contained an errant `is_first_turn` check that would exclude the user
instructions for subsequent "turns" of a conversation when using the
responses API (i.e., when `previous_response_id` existed).

While here, renames `Prompt.instructions` to `Prompt.user_instructions`
since we now have quite a few levels of instructions floating around.
Also removed an unnecessary use of `clone()` in
`Prompt.get_full_instructions()`.
This PR overhauls how active tool calls and completed tool calls are
displayed:

1. More use of colour to indicate success/failure and distinguish
between components like tool name+arguments
2. Previously, the entire `CallToolResult` was serialized to JSON and
pretty-printed. Now, we extract each individual `CallToolResultContent`
and print those
1. The previous solution was wasting space by unnecessarily showing
details of the `CallToolResult` struct to users, without formatting the
actual tool call results nicely
2. We're now able to show users more information from tool results in
less space, with nicer formatting when tools return JSON results

### Before:

<img width="1251" alt="Screenshot 2025-06-03 at 11 24 26"
src="https://github.com/user-attachments/assets/5a58f222-219c-4c53-ace7-d887194e30cf"
/>

### After:

<img width="1265" alt="image"
src="https://github.com/user-attachments/assets/99fe54d0-9ebe-406a-855b-7aa529b91274"
/>

## Future Work

1. Integrate image tool result handling better. We should be able to
display images even if they're not the first `CallToolResultContent`
2. Users should have some way to view the full version of truncated tool
results
3. It would be nice to add some left padding for tool results, make it
more clear that they are results. This is doable, just a little fiddly
due to the way `first_visible_line` scrolling works
4. There's almost certainly a better way to format JSON than "all on 1
line with spaces to make Ratatui wrapping work". But I think that works
OK for now.
This does not implement the full Login with ChatGPT experience, but it
should unblock people.

**What works**

* The `codex` multitool now has a `login` subcommand, so you can run
`codex login`, which should write `CODEX_HOME/auth.json` if you complete
the flow successfully. The TUI will now read the `OPENAI_API_KEY` from
`auth.json`.
* The TUI should refresh the token if it has expired and the necessary
information is in `auth.json`.
* There is a `LoginScreen` in the TUI that tells you to run `codex
login` if both (1) your model provider expects to use `OPENAI_API_KEY`
as its env var, and (2) `OPENAI_API_KEY` is not set.

**What does not work**

* The `LoginScreen` does not support the login flow from within the TUI.
Instead, it tells you to quit, run `codex login`, and then run `codex`
again.
* `codex exec` does read from `auth.json` yet, nor does it direct the
user to go through the login flow if `OPENAI_API_KEY` is not be found.
* The `maybeRedeemCredits()` function from `get-api-key.tsx` has not
been ported from TypeScript to `login_with_chatgpt.py` yet:


https://github.com/openai/codex/blob/a67a67f3258fc21e147b6786a143fe3e15e6d5ba/codex-cli/src/utils/get-api-key.tsx#L84-L89

**Implementation**

Currently, the OAuth flow requires running a local webserver on
`127.0.0.1:1455`. It seemed wasteful to incur the additional binary cost
of a webserver dependency in the Rust CLI just to support login, so
instead we implement this logic in Python, as Python has a `http.server`
module as part of its standard library. Specifically, we bundle the
contents of a single Python file as a string in the Rust CLI and then
use it to spawn a subprocess as `python3 -c
{{SOURCE_FOR_PYTHON_SERVER}}`.

As such, the most significant files in this PR are:

```
codex-rs/login/src/login_with_chatgpt.py
codex-rs/login/src/lib.rs
```

Now that the CLI may load `OPENAI_API_KEY` from the environment _or_
`CODEX_HOME/auth.json`, we need a new abstraction for reading/writing
this variable, so we introduce:

```
codex-rs/core/src/openai_api_key.rs
```

Note that `std::env::set_var()` is [rightfully] `unsafe` in Rust 2024,
so we use a LazyLock<RwLock<Option<String>>> to store `OPENAI_API_KEY`
so it is read in a thread-safe manner.

Ultimately, it should be possible to go through the entire login flow
from the TUI. This PR introduces a placeholder `LoginScreen` UI for that
right now, though the new `codex login` subcommand introduced in this PR
should be a viable workaround until the UI is ready.

**Testing**

Because the login flow is currently implemented in a standalone Python
file, you can test it without building any Rust code as follows:

```
rm -rf /tmp/codex_home && mkdir /tmp/codex_home
CODEX_HOME=/tmp/codex_home python3 codex-rs/login/src/login_with_chatgpt.py
```

For reference:

* the original TypeScript implementation was introduced in
openai#963
* support for redeeming credits was later added in
openai#974
Users were running into issues with glibc mismatches on arm64 linux. In
the past, we did not provide a musl build for arm64 Linux because we had
trouble getting the openssl dependency to build correctly. Though today
I just tried the same trick in `Cargo.toml` that we were doing for
`x86_64-unknown-linux-musl` (using `openssl-sys` with `features =
["vendored"]`), so I'm not sure what problem we had in the past the
builds "just worked" today!

Though one tweak that did have to be made is that the integration tests
for Seccomp/Landlock empirically require longer timeouts on arm64 linux,
or at least on the `ubuntu-24.04-arm` GitHub Runner. As such, we change
the timeouts for arm64 in `codex-rs/linux-sandbox/tests/landlock.rs`.

Though in solving this problem, I decided I needed a turnkey solution
for testing the Linux build(s) from my Mac laptop, so this PR introduces
`.devcontainer/Dockerfile` and `.devcontainer/devcontainer.json` to
facilitate this. Detailed instructions are in `.devcontainer/README.md`.

We will update `dotslash-config.json` and other release-related scripts
in a follow-up PR.
…t of release artifacts (openai#1230)

This was missed in openai#1225. Once we
create a new GitHub Release with this change, we can use the URL from
the workflow that triggered the release in
openai#1228.
…gnu (openai#1228)

Now that we have published a GitHub Release that contains arm64 musl
artifacts for Linux, update the following scripts to take advantage of
them:

- `dotslash-config.json` now uses musl artifacts for the `linux-aarch64`
target
- `install_native_deps.sh` for the TypeScript CLI now includes
`codex-linux-sandbox-aarch64-unknown-linux-musl` instead of
`codex-linux-sandbox-aarch64-unknown-linux-gnu` for sandboxing
- `codex-cli/bin/codex.js` now checks for `aarch64-unknown-linux-musl`
artifacts instead of `aarch64-unknown-linux-gnu` ones
…inux (openai#1232)

Target a workflow with more recent binary artifacts.
…at (openai#1264)

I noticed that `/clear` wasn't fully clearing chat history; it would
clear the chat history widgets _in the UI_, but the LLM still had access
to information from previous messages.

This PR renames `/clear` to `/new` for clarity as per Michael's
suggestion, resetting `app_state` to a fresh `ChatWidget`.
…penai#1267)

Let users know about what the Rust CLI supports that the TypeScript CLI
doesn't!
@sestrella sestrella force-pushed the add_nix_job_to_ci branch from 698d891 to 6a16dff Compare June 13, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants