-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Data in Outputs #73
Comments
Following up... I have continued to play around with this to try and fix it. Working within my hardware limits, I have tested
both of which are larger than the previous 70b I was working with. While the previous model did output some (mangled) stories, both of these attempts seem unable to find emotions. Here is an example of what IO am seeing in the terminal:
I'm not sure how exactly to fix this, as the output does not look obviously broken or miswritten. There is some overlap with this issue, but thus far changing models has not fixed it, and I am unsure how implement the context fix mentioned there. |
@Coastline-3102 Thanks for creating this issue! Sorry for not seeing it until now. I believe your analysis is on point, both problems look like a model issue. However it's very strange that a 70b is failing the RP datagen so severely -- most of the demo dataset was generated with llama 3 70b, and even smaller models should at least get the format right (and not talk about salesforce!) The second issue is also a model/output format problem, emotions should begin with the emotion name in ALLCAPS: with a colon. What's odd is that I have personally used Mistral large to make data with RPTK successfully -- so maybe this is an issue with your inference engine, or sampling parameters? Maybe the inference engine has somehow overridden those of the pipeline? And you mentioned a custom input text, that could also be it if it is a difficult one in some way, if you are able to share that or the config I might be able to help diagnose the problem? |
Thanks for getting back to me!
Good to know. I was also surprised that the 70b I used (midnight-miqu70b) failed so miserably. One of the reasons I tested that model is because I have used it in the past for similar workflows, where it has performed well and not rambled about stuff like salesforce. If I could fix the issue it such that I can get away with running something around a 70b instead of a bigger model, that would be nice. I can run higher models, but that really pushes the limit of my system and slows generation time to a crawl.
Strange. My initial thought is that maybe something is wrong with the context size of my model? Since the RP pipeline prompt is so large, I wonder if it is somehow "choaking" on it, and thus returning bad results? I will admit I am somewhat new to this, so I am unsure the best way to troubleshoot.
Sure. I will say I have made fairly minimal changes to the base config:
I have been using Ollama as my inference engine, and have not made any major changes to Ollama itself. In most cases I just directly run the model from the command line. For example, with the above config I would just run
I doubt that is causing the issue? While it is true that I'm not using one of the provided sample texts, it is just a normal novel that I have converted into a .txt file (and verified that it is not corrupted or anything) which should not be any more difficult than the |
Hello,
I have been testing out both the role play and original pipelines, and I am seeing random undesired data in the outputs.
To use a recent rp pipeline run as an example, I deleted everything in
raw_txt_input
expect for a single custom .txt file of a book, and ran the pipeline (with a 10 subset size) using a local 70b model loaded via ollama.Looking at the
full_stories_list_complete_format.json
that results, I see passages likedespite there being no references to Salesforce, Ada Clarke or Victorian London in the source I provided.
I poked around in the other files, and found that Ada and Victorian London are mentioned in files in the rp pipeline
prompts
folder (no idea where Salesforce is coming from...).Considering these results, I am wondering if there is an issue with augmentoolkit or the way I am running it, or if I simply need a better or bigger model that is smart enough to know not to include this content. If the latter is the case, can anyone recommend an uncensored model that can run locally? With my hardware, a quantized 70b is likely the largest I can go.
Thanks!
The text was updated successfully, but these errors were encountered: