-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List index out of range error #46
Comments
One thing this could be is that if everything fails validation (possibly due to the model breaking with the output format every time, if the model is not very strong, or sometimes due to API errors and other issues) then the chunks list will be empty and nothing will generate past that. Strangely it looks like you're using the default inputs, so we can probably rule that out. And llama 3 is also definitely capable of running Augmentoolkit, so that rules that out as well unless you're running a very low quant or something's up with your server. Could you share some of the intermediate outputs? You should be able to find them in outputs/judge_paragraph_generations. In that folder there will be a bunch of yaml files containing the full prompts + the AI output at the very end. It might contain a clue about what's going on. |
Hi @e-p-armstrong
|
I have also observed an "index out of bounds" error: Output written to ../outFiles/judge_paragraph_generations/99e9cbce-823a-401e-b09b-ef922e806e98.yaml with ollama |
Same issue here index out of range with Ollama and llama3.1. Running into lots of FAILED TO GENERATE QUESTIONS! as well. Maybe it's formatting the data weird? |
Trying to run the augment toolkit on MacOs M3, with ollama (ollama run llama3) on the following config.yaml
PATH:
INPUT: "./raw_text_input"
OUTPUT: "./output"
DEFAULT_PROMPTS: "./prompts" # the baseline prompt folder that Augmentoolkit falls back to if it can't find a step in the PROMPTS path
PROMPTS: "./prompts" # Where Augmentoolkit first looks for prompts
API:
API_KEY: "53212512"
BASE_URL: http://127.0.0.1:11434/
LARGE_LOGICAL_MODEL: llama3
LOGICAL_MODEL: llama3 # model used for question generation and conversation generation at the very end. A pretty tough task, if ASSISTANT_MODE isn't on.
QUANTIZATION_SMALL: "gptq" # Only use if Aphrodite mode is on.
QUANTIZATION_LARGE: "gptq" # Only use if Aphrodite mode is on.
SKIP:
QUESTION_CHECK: False
ANSWER_RELEVANCY_CHECK: False # turn on if using the negative question prompt override
FILTER_CHUNKS: False
SYSTEM:
CHUNK_SIZE: 1900
USE_FILENAMES: False # give the AI context from the filenames provided to it. Useful if the filenames are meaningful, otherwise turn them off.
DOUBLE_CHECK_COUNTER: 1 # How many times to check a question and answer pair during each validation step. Majority vote decides if it passes that step. There are three steps. So most questions are by default checked around 9 times (fewer if the first two checks for a step pass, obviously).
SUBSET_SIZE: 10
USE_SUBSET: False # Whether to take only the first 13 chunks from a text during the run. Useful for experimenting and iterating and seeing all the steps without costing too much money or time.
CONCURRENCY_LIMIT: 50 # Hard limit of how many calls can be run at the same time, useful for API mode (aphrodite automatically manages this and queues things, as far as I know)
COMPLETION_MODE: False # Change to false if you want to use chat (instruct) mode; this requires .json files in your chosen prompts directory, in the OpenAI API format. Not all APIs support completion mode.
MODE: "api" # can be one of "api"|"aphrodite"
STOP: True # True = Use stop tokens, False = do not use stop tokens. OpenAI's API restricts you to four stop tokens and all steps have way more than four stop tokens, so you'll need to turn this to False if you're using OAI's API. Also NOTE that if you turn this OFF while using COMPLETION MODE, EVERYTHING WILL BREAK and it will cost you money in the process. Don't do that.
CONVERSATION_INSTRUCTIONS: For this conversation, you are generating a chat between a generalist, generic AI assistant, and a human.
FINAL_ASSISTANT_PROMPT_NO_RAG: |
You are a helpful AI assistant.
FINAL_ASSISTANT_PROMPT_RAG: |
You are a helpful AI assistant.
Context information is below:
{data}
PHASE:
WORK_IN_PHASES: False
PHASE_INDEX: 3 # index of the phase we are currently on (index 0 = filtering out chunks with no relevant context; index 1 = question generation; index 2 = question validation; index 3 = context revision and conversation generation, the final phase)
HUGGINGFACE:
HUB_PATH: "Heralax/test-atk-dataset-do-not-use-3"
PRIVATE: false
PUSH_TO_HUB: false
im getting the error as follows :
LOADING: failed|./raw_text_input/medicine_wikipedia
100%|█████████████████████████████████████████| 85/85 [00:00<00:00, 5419.66it/s]
Converting generations to training data
entering saving mode
...Converted successfully (we think)
Traceback (most recent call last):
File "/Users/kargupta8/Desktop/augmentoolkit/processing.py", line 505, in
asyncio.run(main())
File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/kargupta8/Desktop/augmentoolkit/processing.py", line 226, in main
print(filtered_worthy_for_questions[0])
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
The text was updated successfully, but these errors were encountered: