Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The 'view' tool command doesn't work on /workspace #5497

Closed
1 task done
enyst opened this issue Dec 10, 2024 · 10 comments · Fixed by #5508
Closed
1 task done

[Bug]: The 'view' tool command doesn't work on /workspace #5497

enyst opened this issue Dec 10, 2024 · 10 comments · Fixed by #5508
Labels
bug Something isn't working severity:medium Affecting multiple users

Comments

@enyst
Copy link
Collaborator

enyst commented Dec 10, 2024

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Describe the bug and reproduction steps

I tried to nudge the LLM to use its 'view' command from the file_editor tool, to view directory content with depth=2, when it starts exploring the filesystem. (the first action of the agent). Claude Sonnet has been trained on those as far as we know, and Anthropic's tool use examples include this.

When I tried to eval it on swe-bench, the LLM ran into trouble, it couldn't read the output when it ran it on /workspace. So it did things like:
"I see the issue - it's a symlink to /testbed. Let's explore the actual directory"
...
"{\"path\": \"/testbed/django/db/backends/base/schema.py\"

Cc: @ryanhoangt @xingyaoww

Logs, Errors, Screenshots, and Additional Context

No response

@enyst enyst added the bug Something isn't working label Dec 10, 2024
@mamoodi mamoodi added the severity:medium Affecting multiple users label Dec 10, 2024
@ryanhoangt
Copy link
Contributor

ryanhoangt commented Dec 10, 2024

Seems like the find {path} -maxdepth 2 -not -path '*/\.*' doesn't play well with the current symlink setup for swe-bench evaluation. I tried executing this command on a symlink directory and it didn't list any files. I suspect this may cause the agent to be under-performing?

Not quite sure if we can simply fix it with -L option or need a more complex modification.

@ryanhoangt
Copy link
Contributor

For visibility, below is what the issue looks like from the swe-bench trajectory:

...
    {
            "id": 3,
            "timestamp": "2024-11-21T12:13:39.311499",
            "source": "agent",
            "message": "Running Python code interactively: print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.2'}))",
            "action": "run_ipython",
            "tool_call_metadata": {
                "function_name": "str_replace_editor",
                "tool_call_id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                "model_response": {
                    "id": "chatcmpl-AW0Ly6chyLtCqh0P6VtQbOHik9E7m",
                    "created": 1732191218,
                    "model": "gpt-4o-2024-05-13",
                    "object": "chat.completion",
                    "system_fingerprint": "fp_04751d0b65",
                    "choices": [
                        {
                            "finish_reason": "tool_calls",
                            "index": 0,
                            "message": {
                                "content": null,
                                "role": "assistant",
                                "tool_calls": [
                                    {
                                        "function": {
                                            "arguments": "{\"command\":\"view\",\"path\":\"/workspace/django__django__3.2\"}",
                                            "name": "str_replace_editor"
                                        },
                                        "id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                                        "type": "function"
                                    }
                                ],
                                "function_call": null
                            }
                        }
                    ],
                    "usage": {
                        "completion_tokens": 28,
                        "prompt_tokens": 1563,
                        "total_tokens": 1591,
                        "completion_tokens_details": null,
                        "prompt_tokens_details": null
                    },
                    "service_tier": null,
                    "prompt_filter_results": [
                        {
                            "prompt_index": 0,
                            "content_filter_results": {}
                        }
                    ]
                },
                "total_calls_in_response": 1
            },
            "args": {
                "code": "print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.2'}))",
                "thought": "",
                "include_extra": false,
                "confirmation_state": "confirmed",
                "kernel_init_code": ""
            },
            "timeout": 300
        },
        {
            "id": 4,
            "timestamp": "2024-11-21T12:13:39.339866",
            "source": "agent",
            "message": "Code executed in IPython cell.",
            "cause": 3,
            "observation": "run_ipython",
            "tool_call_metadata": {
                "function_name": "str_replace_editor",
                "tool_call_id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                "model_response": {
                    "id": "chatcmpl-AW0Ly6chyLtCqh0P6VtQbOHik9E7m",
                    "created": 1732191218,
                    "model": "gpt-4o-2024-05-13",
                    "object": "chat.completion",
                    "system_fingerprint": "fp_04751d0b65",
                    "choices": [
                        {
                            "finish_reason": "tool_calls",
                            "index": 0,
                            "message": {
                                "content": null,
                                "role": "assistant",
                                "tool_calls": [
                                    {
                                        "function": {
                                            "arguments": "{\"command\":\"view\",\"path\":\"/workspace/django__django__3.2\"}",
                                            "name": "str_replace_editor"
                                        },
                                        "id": "call_29i0HPkZ2cmM3qIbT7MBkR0P",
                                        "type": "function"
                                    }
                                ],
                                "function_call": null
                            }
                        }
                    ],
                    "usage": {
                        "completion_tokens": 28,
                        "prompt_tokens": 1563,
                        "total_tokens": 1591,
                        "completion_tokens_details": null,
                        "prompt_tokens_details": null
                    },
                    "service_tier": null,
                    "prompt_filter_results": [
                        {
                            "prompt_index": 0,
                            "content_filter_results": {}
                        }
                    ]
                },
                "total_calls_in_response": 1
            },
            "content": "Here's the files and directories up to 2 levels deep in /workspace/django__django__3.2, excluding hidden items:\n/workspace/django__django__3.2",
            "extras": {
                "code": "print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.2'}))"
            }
    }
...

@enyst Can you help double check again to see if path is /workspace or /workspace/$WORKSPACE_NAME?

@enyst
Copy link
Collaborator Author

enyst commented Dec 10, 2024

I saw the same you see here, it had 'django' in the path, but I'm double checking.

@enyst
Copy link
Collaborator Author

enyst commented Dec 10, 2024

This was the test I made: #5506

I tested what if it uses this more, to see how it does. With the resolver I have noticed lately that it works very well when it uses this 'view' tool a lot for the initial exploration.

From the events:

"content": [
        {
          "type": "text",
          "text": "I'll help you implement the necessary changes to fix the issue with `makemigrations` incorrectly handling inner class paths. Let's follow the steps:\n\n1. First, let's explore the repository structure:"
        }
      ],
      "role": "assistant",
      "tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"command\": \"view\", \"path\": \"/workspace/django__django__3.1\"}",
            "name": "str_replace_editor"
          },
          "id": "tooluse_GAwv0bSSRwG27F9lQldiXQ",
          "type": "function"
        }
      ]
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Here's the files and directories up to 2 levels deep in /workspace/django__django__3.1, excluding hidden items:\n/workspace/django__django__3.1\n\n"
        }
      ],
      "role": "tool",
      "tool_call_id": "tooluse_GAwv0bSSRwG27F9lQldiXQ",
      "name": "str_replace_editor"
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Hmm, seems like the directory is empty. Let me check the parent directory:"
        }
      ],
      "role": "assistant",
      "tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"path\": \"/workspace\", \"command\": \"view\"}",
            "name": "str_replace_editor"
          },
          "id": "tooluse_tMvn_dM6TVOHmKUvitPWFQ",
          "type": "function"
        }
      ]
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Here's the files and directories up to 2 levels deep in /workspace, excluding hidden items:\n/workspace\n/workspace/django__django__3.1\n\n"
        }
      ],
      "role": "tool",
      "tool_call_id": "tooluse_tMvn_dM6TVOHmKUvitPWFQ",
      "name": "str_replace_editor"
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Let me check if there's an issue with the repository:"
        }
      ],
      "role": "assistant",
      "tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"command\": \"cd /workspace/django__django__3.1 && ls -la\"}",
            "name": "execute_bash"
          },
          "id": "tooluse_o6ka-SP5QPmif7iwsDFDIw",
          "type": "function"
        }
      ]
    },
    {
      "content": [
        {
          "type": "text",
          "text": "cd /workspace/django__django__3.1 && ls -la\r\ntotal 72\r\ndrwxrwxrwx 2 root root   580 Dec  9 15:03 .\r\ndrwxr-xr-x 2 root root   180 Dec  9 15:03 ..\r\n-rw-rw-r-- 1 root root   782 Dec  9 15:03 .editorconfig\r\n-rw-rw-r-- 1 root root    86 Dec  9 15:03 .eslintignore\r\n-rw-rw-r-- 1 root root  1165 Dec  9 15:03 .eslintrc\r\ndrwxrw............... 2 root root  4340 Dec  9 15:03 tests\r\n-rw-rw-r-- 1 root root  1535 Dec  9 15:03 tox.ini[Python Interpreter: /opt/miniconda3/envs/testbed/bin/python]\nroot@openhands-workspace:/workspace/django__django__3.1 # \n[Command finished with exit code 0]"
        }
      ],
      "role": "tool",
      "tool_call_id": "tooluse_o6ka-SP5QPmif7iwsDFDIw",
      "name": "execute_bash"
    },
    {
      "content": [
        {
          "type": "text",
          "text": "Great, now I can see the repository structure. The issue is related to model field serialization in migrations, so we need to look at the Django code that handles this. Let's first check the django/db/migrations directory:"
        }

@enyst
Copy link
Collaborator Author

enyst commented Dec 10, 2024

Also the LLM was doing stuff like this 😅

"tool_calls": [
        {
          "index": 1,
          "function": {
            "arguments": "{\"command\": \"cd /testbed && PYTHONPATH=/testbed python3 /workspace/reproduce.py\"}",
            "name": "execute_bash"
          },
          "id": "toolu_016Dayy583oQpUMo86B8D2Kx",
          "type": "function"
        }
      ]

@ryanhoangt
Copy link
Contributor

ryanhoangt commented Dec 10, 2024

Thanks, it seems to fail for both /workspace and /workspace/$WORKSPACE_NAME.

With the resolver I have noticed lately that it works very well when it uses this 'view' tool a lot for the initial exploration.

Yeah, I guess it's probably better than using ls because ls -la is a bit too little, and ls -R is a bit too much. I'm also curious if always using view as the initial step can help improve the evaluation score.

@enyst
Copy link
Collaborator Author

enyst commented Dec 10, 2024

Exactly my thoughts! And once your fix is merged, we can eval the prompt change.

ls -R is a bit too much.

Definitely, and there's more:

  • on the django repo, it's well beyond the limit for an observation (120k tokens! 😮), so we truncate it, all the middle is out
  • the final obs sent to the LLM has approx 8k tokens, which is very large - the max we allow
  • and it also means we can't condense history at 8k tokens without affecting it at the beginning, so ... how much could it possibly help the LLM?

@xingyaoww
Copy link
Collaborator

xingyaoww commented Dec 10, 2024

ahhh good observation @ryanhoangt -- i ran into the issue as well but wasn't sure the cause until this point.

Maybe we can modify this line so instead of ln -s, we can directly move /testbed /workspace/XXX?

ln -s /testbed /workspace/$WORKSPACE_NAME

I think this only happnes during eval, right?

@enyst
Copy link
Collaborator Author

enyst commented Dec 10, 2024

Yes, during eval.

Edited to add: good question. I saw it with eval. But it works fine with the resolver.

@ryanhoangt
Copy link
Contributor

ryanhoangt commented Dec 10, 2024

Maybe we can modify this line so instead of ln -s, we can directly move /testbed /workspace/XXX?

Yeah I think it'd be better to replace symlink in eval (maybe in a separate PR with some testing), in my PR I also have some issues with it: All-Hands-AI/openhands-aci#5 (comment)


Btw @enyst I think we can also update the aci version in your PR with the fix-symlink-directory-view branch and try running some eval as well, while waiting for the fix to be merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working severity:medium Affecting multiple users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants