Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to run eval in agent_rag branch #128

Open
2 tasks
heyjustinai opened this issue Nov 26, 2024 · 1 comment
Open
2 tasks

Trying to run eval in agent_rag branch #128

heyjustinai opened this issue Nov 26, 2024 · 1 comment

Comments

@heyjustinai
Copy link
Member

heyjustinai commented Nov 26, 2024

System Info

llama_models                             0.0.55
llama_stack                              0.0.55      /Users/justinlee/local/llama-stack
llama_stack_client                       0.0.55

yaml

version: "2"
built_at: "2024-11-11T21:59:52.074753"
image_name: fireworks
docker_image: null
conda_env: fireworks
apis:
  - inference
  - telemetry
  - datasetio
  - eval
  - scoring
  - inference
  - memory
  - safety
  - agents
  - telemetry
providers:
  safety: []
  scoring:
    - provider_id: basic-0
      provider_type: inline::basic
      config: {}
    - provider_id: llm-as-judge-0
      provider_type: inline::llm-as-judge
      config: {}
    - provider_id: braintrust-0
      provider_type: inline::braintrust
      config: {}
  datasetio:
    - provider_id: huggingface-0
      provider_type: remote::huggingface
      config: {}
    - provider_id: localfs-0
      provider_type: inline::localfs
      config: {}
  eval:
    - provider_id: meta-reference-0
      provider_type: inline::meta-reference
      config: {}
  inference:
    # - provider_id: fireworks-0
    #   provider_type: remote::together
    #   config:
    #     url: https://api.together.xyz/v1
    # api_key: <ENTER_YOUR_API_KEY>
    - provider_id: fireworks-0
      provider_type: remote::fireworks
      config:
        url: https://api.fireworks.ai/inference
        api_key:
  telemetry:
    - provider_id: meta-reference-0
      provider_type: inline::meta-reference
      config: {}
  memory:
    - provider_id: chromadb-0
      provider_type: remote::chromadb
      config:
        host: localhost
        port: 8000
        protocol: http
  agents:
    - provider_id: meta-reference-0
      provider_type: inline::meta-reference
      config:
        persistence_store:
          namespace: null
          type: sqlite
          db_path: /Users/xiyan/.llama/runtime/kvstore.db
metadata_store: null
models:
  - metadata: {}
    model_id: meta-llama/Llama-3.1-8B-Instruct
    provider_id: null
    provider_model_id: fireworks/llama-v3p1-8b-instruct
  - metadata: {}
    model_id: meta-llama/Llama-3.1-70B-Instruct
    provider_id: null
    provider_model_id: fireworks/llama-v3p1-70b-instruct
  - metadata: {}
    model_id: meta-llama/Llama-3.1-405B-Instruct-FP8
    provider_id: null
    provider_model_id: fireworks/llama-v3p1-405b-instruct
  - metadata: {}
    model_id: meta-llama/Llama-3.2-1B-Instruct
    provider_id: null
    provider_model_id: fireworks/llama-v3p2-1b-instruct
  - metadata: {}
    model_id: meta-llama/Llama-3.2-3B-Instruct
    provider_id: null
    provider_model_id: fireworks/llama-v3p2-3b-instruct
  - metadata: {}
    model_id: meta-llama/Llama-3.2-11B-Vision-Instruct
    provider_id: null
    provider_model_id: fireworks/llama-v3p2-11b-vision-instruct
  - metadata: {}
    model_id: meta-llama/Llama-3.2-90B-Vision-Instruct
    provider_id: null
    provider_model_id: fireworks/llama-v3p2-90b-vision-instruct
  - metadata: {}
    model_id: meta-llama/Llama-Guard-3-8B
    provider_id: null
    provider_model_id: fireworks/llama-guard-3-8b
  - metadata: {}
    model_id: meta-llama/Llama-Guard-3-11B-Vision
    provider_id: null
    provider_model_id: fireworks/llama-guard-3-11b-vision
datasets:
  - dataset_id: mmlu
    provider_id: huggingface-0
    url:
      uri: https://huggingface.co/datasets/llamastack/evals
    metadata:
      path: llamastack/evals
      name: evals__mmlu__details
      split: train
    dataset_schema:
      input_query:
        type: string
      expected_answer:
        type: string
      chat_completion_input:
        type: string
  - dataset_id: simpleqa
    provider_id: huggingface-0
    url:
      uri: https://huggingface.co/datasets/llamastack/evals
    metadata:
      path: llamastack/evals
      name: evals__simpleqa
      split: train
    dataset_schema:
      input_query:
        type: string
      expected_answer:
        type: string
      chat_completion_input:
        type: string
eval_tasks:
  - eval_task_id: meta-reference-mmlu
    provider_id: meta-reference-0
    dataset_id: mmlu
    scoring_functions:
      - basic::regex_parser_multiple_choice_answer
  - eval_task_id: meta-reference-simpleqa
    provider_id: meta-reference-0
    dataset_id: simpleqa
    scoring_functions:
      - llm-as-judge::405b-simpleqa

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Steps to reproduce:

  1. install llama stack from source
  2. build together llama stack distro
  3. changed yaml file based on discussion with @yanxi0830, included above
  4. changed config:
AGENT_CONFIG = AgentConfig(
    model="meta-llama/Llama-3.2-3B-Instruct",
    instructions="You are a helpful assistant",
  1. run generate.py in evals/rag [success]
  2. But i get 404 when running llama-stack-client scoring_functions list and also
--dataset-path <path-to-local-dataset> \
--output-dir ./

Error logs

it seems like llama-stack-client is not hitting the server at all scoring_functions list
client side error:

llama-stack-client eval run_scoring braintrust::answer-correctness \
--dataset-path ./rag/data/input_llamastack_generated.csv \
--output-dir ./rag/data/results
   0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/2  [ 0:00:16 < -:--:-- , ? it/s ]
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/bin/llama-stack-client", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/llama_stack_client/lib/cli/llama_stack_client.py", line 80, in main
    cli()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/llama_stack_client/lib/cli/eval/run_scoring.py", line 100, in run_scoring
    score_res = client.scoring.score(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/llama_stack_client/resources/scoring.py", line 78, in score
    return self._post(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/llama_stack_client/_base_client.py", line 1261, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/llama_stack_client/_base_client.py", line 953, in request
    return self._request(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/llamastack-fireworks/lib/python3.10/site-packages/llama_stack_client/_base_client.py", line 1056, in _request
    raise self._make_status_error_from_response(err.response) from None
llama_stack_client.NotFoundError: Error code: 404 - {'detail': 'Not Found'}

Expected behavior

should be able to do eval on dataset generated by generate.py

@yanxi0830
Copy link
Contributor

yanxi0830 commented Nov 26, 2024

Wondering if you have the server side logs? The llama_stack_client.NotFoundError: Error code: 404 - {'detail': 'Not Found'} suggests that the endpoint is not found. What is printed out during server startup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants