-
Notifications
You must be signed in to change notification settings - Fork 78
Demos' common issues
In label_semantic_roles, the original test reader is directly pointing to train reader, and the dataset conll05 has only one batch of data. When one pass is done, there will be no more data feed from the iterator, which leads to "division by zero" error. the fix is to use dataset's reader and wrap it with paddle.batch.
One of the most difficult questions is to tell a user what our framework is doing: It's waiting and doesn't print anything. When a trainer's pod is deleted, the whole job is waiting for his task timeout even the other jobs complete.When the other jobs get task of next pass from master:
383 if passID < s.state.CurPass {
384 return ErrPassBefore
385 }
386 if passID > s.state.CurPass {
387 // Client may get run to pass after master when one client faster than the
388 // other
389 return ErrPassAfter
390 }
So since an RPC function is an action, it should print info about input
context
result
when it complete or return or in a loop or maybe blocked for long time.And then, we can find what is doing otherwise one may think: "Oh, no, It hangs somewhere?".
I'll add more logs later.
Maybe set error condition and check error or warning log should be included in the unit test.
this issue is caused by paddle.infer not provided with feed setting. this issue is fixed in book, but not synched to cloud repo. see this fix for more detail