You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi; this is really cool work!!! Thanks for putting this together!
I have an LLM based method that just prompts an LLM zero-shot, and we show that it achieves strong forecasts (See Direct Prompt in https://arxiv.org/abs/2410.18959, Method in Page 41). Specifically we benchmarked on data that is beyond the cutoff date of the LLM and so is definitely not memorized.
Do you think adding this method to GIFT-Eval would be worthwhile? I'm on the fence because the test data in this benchmark is from public sources so there is no guarantee that LLMs haven't memorized it.
The text was updated successfully, but these errors were encountered:
Thanks for this issue. In my view, it is worthwhile to put on new a LLM-based method. Meanwhile, I understand your point on the issue of whether the data is memorized by LLMs.
I am thinking about maybe later we can put a column to indicate if the model is trained with "data leakage" (like [yes, no, 'potentially']). What do you think about this?
Yes I totally agree. So, for your model, could you please provide the all_results.csv file and the model config file, and make a PR here? I would love to help you put on the leaderboard :)
Hi; this is really cool work!!! Thanks for putting this together!
I have an LLM based method that just prompts an LLM zero-shot, and we show that it achieves strong forecasts (See Direct Prompt in https://arxiv.org/abs/2410.18959, Method in Page 41). Specifically we benchmarked on data that is beyond the cutoff date of the LLM and so is definitely not memorized.
Do you think adding this method to GIFT-Eval would be worthwhile? I'm on the fence because the test data in this benchmark is from public sources so there is no guarantee that LLMs haven't memorized it.
The text was updated successfully, but these errors were encountered: