Fine grained metrics #308

IanMagnusson · 2023-10-03T02:18:26Z

This PR creates a new create_fine_grained_pipeline which will:

do an extra process-outputs step on the result of outputs steps that computes additional metrics based on per instance information
uses a new write-outputs-as-rows-multiple-metrics step to write out separate sheets for each metric_type in the metrics dict of output step, rather than just using the primary metric.

Right now I've just included a per subdomain perplexity as a minimal example of the kinds of metrics that could go into process-outputs. Additional metrics will eventually include things like metrics just over non-contaminated documents.

One thing that would be good to add as there are more of these post-processing metrics would be a way to specify which of these metrics to include from the evaluation configuration file.

IanMagnusson · 2023-10-03T02:21:37Z

I tested this locally with some ppl suite v3 data that has subdomains just on gpt-tiny. Everything works okay, but the perplexities are hard to assess because gpt-tiny just gets bad ppl on everything. I could compare this to some past numbers I've gotten on trained OLMo checkpoints, but its unclear to me how I'm supposed to used tango-in-beaker until the commit has been merged into main (since tango-in-beaker retrieves code from GitHub).

IanMagnusson · 2023-10-03T02:44:54Z

I don't know how to satisfy mypy on this one:

evaluation/steps/run_catwalk.py:320: error: Return type "Dict[str, List[Dict[Any, Any]]]" of "run" incompatible with return type "List[Any]" in supertype "WriteOutputsAsRows"  [override]

Yes ideally this new row writer would have the same return type, but because we're now writing out named tables as opposed to just one table by itself it has to have a different data type. I tried to just tell mypy to ignore it, but that doesn't seem to work.

AkshitaB

_write_to_gsheet does not really need to be a method of the step class -- let's break it out into its own function, and use it in both classes. This way, the new step class does not need to be a subclass of the old one, and we also make mypy happy.

IanMagnusson · 2023-10-06T19:14:26Z

_write_to_gsheet does not really need to be a method of the step class -- let's break it out into its own function, and use it in both classes. This way, the new step class does not need to be a subclass of the old one, and we also make mypy happy.

Great idea! I've changed it as you proposed.

IanMagnusson · 2023-10-13T01:56:46Z

@epwalsh Something strange seems to have been added in the last few commits on main. When I merged in main and went from commit 56d708abbffa91867a2a02e87d0ce25c413c656a to 22068df968f2ede925303bcc8bac0317c69cb5b4 I started getting an inexplicable botocore error that I think is arising inside tango. I'm not using s3 for anything I'm doing with this branch so I'm not sure why it's having an issue.

Thanks for anything you can do to help resolve this!

Here is the tail of the log with errors:

[10/12/23 18:52:51] ERROR    Uncaught exception
Traceback (most recent call last):
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/bin/tango", line 8, in <module>
    sys.exit(main())
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/click/decorators.py", line 38, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/tango/__main__.py", line 315, in run
    _run(
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/tango/__main__.py", line 675, in _run
    import_extra_module(package_name)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/tango/common/util.py", line 46, in import_extra_module
    import_module_and_submodules(package_name)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/tango/common/util.py", line 113, in import_module_and_submodules
    module = importlib.import_module(package_name)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/ianm/projects/ppl-suite-paper-runs/LLM/evaluation/steps/__init__.py", line 1, in <module>
    from evaluation.steps.get_model import *  # noqa: F401
  File "/Users/ianm/projects/ppl-suite-paper-runs/LLM/evaluation/steps/get_model.py", line 7, in <module>
    from hf_olmo.add_hf_config_to_olmo_checkpoint import (
  File "/Users/ianm/projects/ppl-suite-paper-runs/LLM/hf_olmo/__init__.py", line 1, in <module>
    from .configuration_olmo import OLMoConfig
  File "/Users/ianm/projects/ppl-suite-paper-runs/LLM/hf_olmo/configuration_olmo.py", line 8, in <module>
    from olmo.config import ModelConfig
  File "/Users/ianm/projects/ppl-suite-paper-runs/LLM/olmo/__init__.py", line 2, in <module>
    from .model import *
  File "/Users/ianm/projects/ppl-suite-paper-runs/LLM/olmo/model.py", line 32, in <module>
    from .util import ensure_finite_
  File "/Users/ianm/projects/ppl-suite-paper-runs/LLM/olmo/util.py", line 506, in <module>
    s3_client = boto3.client(
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/boto3/__init__.py", line 92, in client
    return _get_default_session().client(*args, **kwargs)
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/boto3/session.py", line 299, in client
    return self._session.create_client(
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/botocore/session.py", line 926, in create_client
    credentials = self.get_credentials()
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/botocore/session.py", line 497, in get_credentials
    ).load_credentials()
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/botocore/credentials.py", line 2095, in load_credentials
    creds = provider.load()
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/botocore/credentials.py", line 2246, in load
    sso_config = self._load_sso_config()
  File "/opt/miniconda3/envs/ppl-suite-paper-runs-working-env/lib/python3.10/site-packages/botocore/credentials.py", line 2236, in _load_sso_config
    raise InvalidConfigError(
botocore.exceptions.InvalidConfigError: The profile "default" is configured to use SSO but is missing required configuration: sso_start_url, sso_region

IanMagnusson · 2023-10-16T21:20:58Z

Double checked an this branch works again with #333

IanMagnusson added 2 commits October 2, 2023 17:39

allow rows outputs per metric type

92b98a6

minimal subdomains create_fine_grained_pipeline

4036fca

IanMagnusson added 3 commits October 2, 2023 19:29

type hints grumble grumble

83cf0c3

ugh mypy!

859108b

more mypy

c1c137b

IanMagnusson requested a review from AkshitaB October 3, 2023 02:45

AkshitaB requested changes Oct 4, 2023

View reviewed changes

IanMagnusson and others added 3 commits October 6, 2023 12:00

Merge branch 'main' into fine-grained-metrics

a16f1c9

turn write_to_gsheet into its own function

b1e8ffe

style fixes

6bdd5c1

IanMagnusson requested a review from AkshitaB October 6, 2023 19:13

IanMagnusson and others added 4 commits October 6, 2023 13:09

Merge branch 'main' into fine-grained-metrics

cd1859a

Merge remote-tracking branch 'origin/main' into fine-grained-metrics

2ac66e9

add support for model_kwargs to fine-grained

56d708a

Merge branch 'main' into fine-grained-metrics

22068df

epwalsh mentioned this pull request Oct 16, 2023

Lazily init s3 client #333

Merged

Merge branch 'main' into fine-grained-metrics

8a29d40

AkshitaB approved these changes Oct 16, 2023

View reviewed changes

IanMagnusson merged commit 07404f8 into main Oct 16, 2023
10 checks passed

IanMagnusson deleted the fine-grained-metrics branch October 16, 2023 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine grained metrics #308

Fine grained metrics #308

IanMagnusson commented Oct 3, 2023

IanMagnusson commented Oct 3, 2023

IanMagnusson commented Oct 3, 2023

AkshitaB left a comment

IanMagnusson commented Oct 6, 2023

IanMagnusson commented Oct 13, 2023

IanMagnusson commented Oct 16, 2023

Fine grained metrics #308

Fine grained metrics #308

Conversation

IanMagnusson commented Oct 3, 2023

IanMagnusson commented Oct 3, 2023

IanMagnusson commented Oct 3, 2023

AkshitaB left a comment

Choose a reason for hiding this comment

IanMagnusson commented Oct 6, 2023

IanMagnusson commented Oct 13, 2023

IanMagnusson commented Oct 16, 2023