[Bug]: The 'view' tool command doesn't work on /workspace #5497

enyst · 2024-12-10T03:12:42Z

Is there an existing issue for the same bug?

I have checked the existing issues.

Describe the bug and reproduction steps

I tried to nudge the LLM to use its 'view' command from the file_editor tool, to view directory content with depth=2, when it starts exploring the filesystem. (the first action of the agent). Claude Sonnet has been trained on those as far as we know, and Anthropic's tool use examples include this.

When I tried to eval it on swe-bench, the LLM ran into trouble, it couldn't read the output when it ran it on /workspace. So it did things like:
"I see the issue - it's a symlink to /testbed. Let's explore the actual directory"
...
"{\"path\": \"/testbed/django/db/backends/base/schema.py\"

Cc: @ryanhoangt @xingyaoww

Logs, Errors, Screenshots, and Additional Context

No response

The text was updated successfully, but these errors were encountered:

ryanhoangt · 2024-12-10T05:09:38Z

Seems like the find {path} -maxdepth 2 -not -path '*/\.*' doesn't play well with the current symlink setup for swe-bench evaluation. I tried executing this command on a symlink directory and it didn't list any files. I suspect this may cause the agent to be under-performing?

Not quite sure if we can simply fix it with -L option or need a more complex modification.

ryanhoangt · 2024-12-10T05:36:23Z

For visibility, below is what the issue looks like from the swe-bench trajectory:

...
    {
            "id": 3,
            "timestamp": "2024-11-21T12:13:39.311499",
            "source": "agent",
            "message": "Running Python code interactively: print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.2'}))",
            "action": "run_ipython",
            "tool_call_metadata": {
                "function_name": "str_replace_editor",
                "tool_call_id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                "model_response": {
                    "id": "chatcmpl-AW0Ly6chyLtCqh0P6VtQbOHik9E7m",
                    "created": 1732191218,
                    "model": "gpt-4o-2024-05-13",
                    "object": "chat.completion",
                    "system_fingerprint": "fp_04751d0b65",
                    "choices": [
                        {
                            "finish_reason": "tool_calls",
                            "index": 0,
                            "message": {
                                "content": null,
                                "role": "assistant",
                                "tool_calls": [
                                    {
                                        "function": {
                                            "arguments": "{\"command\":\"view\",\"path\":\"/workspace/django__django__3.2\"}",
                                            "name": "str_replace_editor"
                                        },
                                        "id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                                        "type": "function"
                                    }
                                ],
                                "function_call": null
                            }
                        }
                    ],
                    "usage": {
                        "completion_tokens": 28,
                        "prompt_tokens": 1563,
                        "total_tokens": 1591,
                        "completion_tokens_details": null,
                        "prompt_tokens_details": null
                    },
                    "service_tier": null,
                    "prompt_filter_results": [
                        {
                            "prompt_index": 0,
                            "content_filter_results": {}
                        }
                    ]
                },
                "total_calls_in_response": 1
            },
            "args": {
                "code": "print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.2'}))",
                "thought": "",
                "include_extra": false,
                "confirmation_state": "confirmed",
                "kernel_init_code": ""
            },
            "timeout": 300
        },
        {
            "id": 4,
            "timestamp": "2024-11-21T12:13:39.339866",
            "source": "agent",
            "message": "Code executed in IPython cell.",
            "cause": 3,
            "observation": "run_ipython",
            "tool_call_metadata": {
                "function_name": "str_replace_editor",
                "tool_call_id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                "model_response": {
                    "id": "chatcmpl-AW0Ly6chyLtCqh0P6VtQbOHik9E7m",
                    "created": 1732191218,
                    "model": "gpt-4o-2024-05-13",
                    "object": "chat.completion",
                    "system_fingerprint": "fp_04751d0b65",
                    "choices": [
                        {
                            "finish_reason": "tool_calls",
                            "index": 0,
                            "message": {
                                "content": null,
                                "role": "assistant",
                                "tool_calls": [
                                    {
                                        "function": {
                                            "arguments": "{\"command\":\"view\",\"path\":\"/workspace/django__django__3.2\"}",
                                            "name": "str_replace_editor"
                                        },
                                        "id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                                        "type": "function"
                                    }
                                ],
                                "function_call": null
                            }
                        }
                    ],
                    "usage": {
                        "completion_tokens": 28,
                        "prompt_tokens": 1563,
                        "total_tokens": 1591,
                        "completion_tokens_details": null,
                        "prompt_tokens_details": null
                    },
                    "service_tier": null,
                    "prompt_filter_results": [
                        {
                            "prompt_index": 0,
                            "content_filter_results": {}
                        }
                    ]
                },
                "total_calls_in_response": 1
            },
            "content": "Here's the files and directories up to 2 levels deep in /workspace/django__django__3.2, excluding hidden items:\n/workspace/django__django__3.2",
            "extras": {
                "code": "print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.2'}))"
            }
    }
...

@enyst Can you help double check again to see if path is /workspace or /workspace/$WORKSPACE_NAME?

enyst · 2024-12-10T06:01:34Z

I saw the same you see here, it had 'django' in the path, but I'm double checking.

enyst · 2024-12-10T07:20:20Z

This was the test I made: #5506

I tested what if it uses this more, to see how it does. With the resolver I have noticed lately that it works very well when it uses this 'view' tool a lot for the initial exploration.

From the events:

"content": [
        {
          "type": "text",
          "text": "I'll help you implement the necessary changes to fix the issue with `makemigrations` incorrectly handling inner class paths. Let's follow the steps:\n\n1. First, let's explore the repository structure:"
        }
      ],
      "role": "assistant",
      "tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"command\": \"view\", \"path\": \"/workspace/django__django__3.1\"}",
            "name": "str_replace_editor"
          },
          "id": "tooluse_GAwv0bSSRwG27F9lQldiXQ",
          "type": "function"
        }
      ]
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Here's the files and directories up to 2 levels deep in /workspace/django__django__3.1, excluding hidden items:\n/workspace/django__django__3.1\n\n"
        }
      ],
      "role": "tool",
      "tool_call_id": "tooluse_GAwv0bSSRwG27F9lQldiXQ",
      "name": "str_replace_editor"
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Hmm, seems like the directory is empty. Let me check the parent directory:"
        }
      ],
      "role": "assistant",
      "tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"path\": \"/workspace\", \"command\": \"view\"}",
            "name": "str_replace_editor"
          },
          "id": "tooluse_tMvn_dM6TVOHmKUvitPWFQ",
          "type": "function"
        }
      ]
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Here's the files and directories up to 2 levels deep in /workspace, excluding hidden items:\n/workspace\n/workspace/django__django__3.1\n\n"
        }
      ],
      "role": "tool",
      "tool_call_id": "tooluse_tMvn_dM6TVOHmKUvitPWFQ",
      "name": "str_replace_editor"
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Let me check if there's an issue with the repository:"
        }
      ],
      "role": "assistant",
      "tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"command\": \"cd /workspace/django__django__3.1 && ls -la\"}",
            "name": "execute_bash"
          },
          "id": "tooluse_o6ka-SP5QPmif7iwsDFDIw",
          "type": "function"
        }
      ]
    },
    {
      "content": [
        {
          "type": "text",
          "text": "cd /workspace/django__django__3.1 && ls -la\r\ntotal 72\r\ndrwxrwxrwx 2 root root   580 Dec  9 15:03 .\r\ndrwxr-xr-x 2 root root   180 Dec  9 15:03 ..\r\n-rw-rw-r-- 1 root root   782 Dec  9 15:03 .editorconfig\r\n-rw-rw-r-- 1 root root    86 Dec  9 15:03 .eslintignore\r\n-rw-rw-r-- 1 root root  1165 Dec  9 15:03 .eslintrc\r\ndrwxrw............... 2 root root  4340 Dec  9 15:03 tests\r\n-rw-rw-r-- 1 root root  1535 Dec  9 15:03 tox.ini[Python Interpreter: /opt/miniconda3/envs/testbed/bin/python]\nroot@openhands-workspace:/workspace/django__django__3.1 # \n[Command finished with exit code 0]"
        }
      ],
      "role": "tool",
      "tool_call_id": "tooluse_o6ka-SP5QPmif7iwsDFDIw",
      "name": "execute_bash"
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Great, now I can see the repository structure. The issue is related to model field serialization in migrations, so we need to look at the Django code that handles this. Let's first check the django/db/migrations directory:"
        }

enyst · 2024-12-10T07:34:21Z

Also the LLM was doing stuff like this 😅

"tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"command\": \"cd /testbed && PYTHONPATH=/testbed python3 /workspace/reproduce.py\"}",
            "name": "execute_bash"
          },
          "id": "toolu_016Dayy583oQpUMo86B8D2Kx",
          "type": "function"
        }
      ]

ryanhoangt · 2024-12-10T07:40:59Z

Thanks, it seems to fail for both /workspace and /workspace/$WORKSPACE_NAME.

With the resolver I have noticed lately that it works very well when it uses this 'view' tool a lot for the initial exploration.

Yeah, I guess it's probably better than using ls because ls -la is a bit too little, and ls -R is a bit too much. I'm also curious if always using view as the initial step can help improve the evaluation score.

enyst · 2024-12-10T12:45:40Z

Exactly my thoughts! And once your fix is merged, we can eval the prompt change.

ls -R is a bit too much.

Definitely, and there's more:

on the django repo, it's well beyond the limit for an observation (120k tokens! 😮), so we truncate it, all the middle is out
the final obs sent to the LLM has approx 8k tokens, which is very large - the max we allow
and it also means we can't condense history at 8k tokens without affecting it at the beginning, so ... how much could it possibly help the LLM?

xingyaoww · 2024-12-10T13:13:27Z

ahhh good observation @ryanhoangt -- i ran into the issue as well but wasn't sure the cause until this point.

Maybe we can modify this line so instead of ln -s, we can directly move /testbed /workspace/XXX?

OpenHands/evaluation/benchmarks/swe_bench/scripts/setup/instance_swe_entry.sh

Line 36 in 39e5311

ln -s /testbed /workspace/$WORKSPACE_NAME

I think this only happnes during eval, right?

enyst · 2024-12-10T13:17:24Z

Yes, during eval.

Edited to add: good question. I saw it with eval. But it works fine with the resolver.

ryanhoangt · 2024-12-10T13:35:44Z

Maybe we can modify this line so instead of ln -s, we can directly move /testbed /workspace/XXX?

Yeah I think it'd be better to replace symlink in eval (maybe in a separate PR with some testing), in my PR I also have some issues with it: All-Hands-AI/openhands-aci#5 (comment)

Btw @enyst I think we can also update the aci version in your PR with the fix-symlink-directory-view branch and try running some eval as well, while waiting for the fix to be merged!

enyst added the bug Something isn't working label Dec 10, 2024

mamoodi added the severity:medium Affecting multiple users label Dec 10, 2024

ryanhoangt mentioned this issue Dec 10, 2024

Fix directory view to follow symlinks All-Hands-AI/openhands-aci#20

Merged

This was referenced Dec 11, 2024

Remove the use of symlink in swe-bench eval #5526

Closed

Use more specific action types for openhands-aci commands #5508

Merged

xingyaoww closed this as completed in #5508 Dec 20, 2024

enyst mentioned this issue Jan 17, 2025

feat: Add LocalRuntime and rename EventStreamRuntime to DockerRuntime #5284

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: The 'view' tool command doesn't work on /workspace #5497

[Bug]: The 'view' tool command doesn't work on /workspace #5497

enyst commented Dec 10, 2024 •

edited

Loading

ryanhoangt commented Dec 10, 2024 •

edited

Loading

ryanhoangt commented Dec 10, 2024

enyst commented Dec 10, 2024 •

edited

Loading

enyst commented Dec 10, 2024

enyst commented Dec 10, 2024

ryanhoangt commented Dec 10, 2024 •

edited

Loading

enyst commented Dec 10, 2024 •

edited

Loading

xingyaoww commented Dec 10, 2024 •

edited

Loading

enyst commented Dec 10, 2024 •

edited

Loading

ryanhoangt commented Dec 10, 2024 •

edited

Loading

[Bug]: The 'view' tool command doesn't work on /workspace #5497

[Bug]: The 'view' tool command doesn't work on /workspace #5497

Comments

enyst commented Dec 10, 2024 • edited Loading

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

Logs, Errors, Screenshots, and Additional Context

ryanhoangt commented Dec 10, 2024 • edited Loading

ryanhoangt commented Dec 10, 2024

enyst commented Dec 10, 2024 • edited Loading

enyst commented Dec 10, 2024

enyst commented Dec 10, 2024

ryanhoangt commented Dec 10, 2024 • edited Loading

enyst commented Dec 10, 2024 • edited Loading

xingyaoww commented Dec 10, 2024 • edited Loading

enyst commented Dec 10, 2024 • edited Loading

ryanhoangt commented Dec 10, 2024 • edited Loading

enyst commented Dec 10, 2024 •

edited

Loading

ryanhoangt commented Dec 10, 2024 •

edited

Loading

enyst commented Dec 10, 2024 •

edited

Loading

ryanhoangt commented Dec 10, 2024 •

edited

Loading

enyst commented Dec 10, 2024 •

edited

Loading

xingyaoww commented Dec 10, 2024 •

edited

Loading

enyst commented Dec 10, 2024 •

edited

Loading

ryanhoangt commented Dec 10, 2024 •

edited

Loading