question related to train_test_split #5

unknown007007007 · 2024-06-17T10:03:50Z

Thanks for your work and vivid description.
But I have some questions in relaiton to train_test_split.
First, I think in proc_data/train_test_split.json, you split your data according to user_id (90%,10%).
And as of knowledge/user.klg, you mention that you only use the limited data in knowledge/begin_data.json to prevent info leakage.
Then I have some questions.

Did you use the data of the training set's users interaction(begin_data.json) in both user_profile and training?
Did you use the data of the test set's users interaction(begin_data.json) in both user_profile and training?
Or you just use it to complete user_profile and simply delete the others in both traing and testing?
Why do you conduct the user_profile in training set only use limited data,even although you can use more data to make it more accurate.
And I sincerely hope to know how you get the hint to construct training and testing data in this way. Because for me, intuitively, I will choose to split all users' data ,which means evey user has some data in both training and testing, and I may use all of the data in the training set (or a fixed interaction number ,like40) to make user_profile, and finally get the result.

Thanks very much if you can provide some suggestion with me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question related to train_test_split #5

question related to train_test_split #5

unknown007007007 commented Jun 17, 2024

question related to train_test_split #5

question related to train_test_split #5

Comments

unknown007007007 commented Jun 17, 2024