You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For text task, when we would have multiple datasets, concatenation strategy could be moved to a more sophisticated logic by using huggingface concatenation.
Further, we may wish to change the evaluation loop to also give metrics individual to each dataset besides the average.
The text task looks good so far, I am curious about the choice / what you think is the best way to handle having multiple datasets. Are there speed benefits following the process here, of concatenating the datasets? If we had separate tasks, then we would also want to calculate the total tokens for each task / proportionally calculate how much of each batch comes from each task depending on # of tokens, but we don't have to worry about this if following your procedure. It seems like there is an edge case where the concatenation will not work if the columns are not named the same: https://huggingface.co/docs/datasets/process#concatenate
One thing that may be useful, is that if we have multiple datasets which are concatenated, is during evaluation, is to determine specific metrics associated w/ each separate dataset. E.g., want a separate perplexity score for wikitext vs the pile, not just the average between both. Potentially, after contenating, can maybe maintain start and end indices for each dataset, e.g. pile is 0 to 200mil, other dataset is (200mil + 1) to 400mil, so we can attribute which samples correspond to each task, separately aggregate their metrics.
Another strategy is that during training, we just track the average, but after training finishes, you essentially load model, e.g. eval.py, just running over each of your tasks separately, where text_datasets={the specific dataset you want your eval metrics over}, but may be inconvenient
@Pritam-hakingmaster this is a good issue to get started with - the basic concatenation is implemented using hugginface datasets (see line 29 in gato/tasks/text_task.py)
The individual metrics per dataset is an open challenge that no one has taken on yet, worth doing
For text task, when we would have multiple datasets, concatenation strategy could be moved to a more sophisticated logic by using huggingface concatenation.
Further, we may wish to change the evaluation loop to also give metrics individual to each dataset besides the average.
Originally posted by @daniellawson9999 in #1 (comment)
The text was updated successfully, but these errors were encountered: