Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question related to train_test_split #5

Open
unknown007007007 opened this issue Jun 17, 2024 · 0 comments
Open

question related to train_test_split #5

unknown007007007 opened this issue Jun 17, 2024 · 0 comments

Comments

@unknown007007007
Copy link

Thanks for your work and vivid description.
But I have some questions in relaiton to train_test_split.
First, I think in proc_data/train_test_split.json, you split your data according to user_id (90%,10%).
And as of knowledge/user.klg, you mention that you only use the limited data in knowledge/begin_data.json to prevent info leakage.
Then I have some questions.

  1. Did you use the data of the training set's users interaction(begin_data.json) in both user_profile and training?
  2. Did you use the data of the test set's users interaction(begin_data.json) in both user_profile and training?
    Or you just use it to complete user_profile and simply delete the others in both traing and testing?
  3. Why do you conduct the user_profile in training set only use limited data,even although you can use more data to make it more accurate.
  4. And I sincerely hope to know how you get the hint to construct training and testing data in this way. Because for me, intuitively, I will choose to split all users' data ,which means evey user has some data in both training and testing, and I may use all of the data in the training set (or a fixed interaction number ,like40) to make user_profile, and finally get the result.

Thanks very much if you can provide some suggestion with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant