-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bad performance when faced to large data set #2
Comments
@geekan I apologize for replying late. |
多谢,用中文吧:那么如何解决这个问题呢?我感觉ZEN的FM收敛情况会更好一点。 |
好的。是的ZEN的收敛性更好,但是似乎要慢很多,如果我没记错,它的实现是基于Graphx的。 |
意思就是小数据集训练出初始model,再用较小的step迭代大模型吗? |
是这样的。等我做了足够的实验,才能确定是否有效。 |
Hi RuiFeng, I notice that you said you have tried FM with Adagrad/AdaDelta, did you find them useful/? how much gain do they have comparing with SGD? |
我使用几条样本训练出来的结果也不好,都无法区分0/1: import scala.collection.mutable.ArrayBuffer class FactorizationMachineCtrModelTest extends FunSuite with SparkFunSuite { private val getProb1 = functions.udf((v: Vector) => v.toArray(1)) test("testTrain") {
// val predict = fmModel.predict(formatSamples.map(.features)).zip(formatSamples.map(.label)) } |
any update here? |
I have a large data set(2000w rows), which features are (user, topic)(5000 one-hot columns)
And label is (0, 1)
Logistic regression's AUC could easily reach 0.84, but FM's AUC is just around 0.5, or 0.46 perhaps.
The parameter I used (both SGD and LBFGS I've tried):
And
Could you point out how can I get a better performance with FM?
The text was updated successfully, but these errors were encountered: