-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distributed candidate evaluation support #52
Comments
@picarus: This is a known issue when using the One way you can make the evaluation much faster is to pass the You're right that distributed evaluation should be a supported feature, unfortunately it is non-trivial to implement. Do you have any suggestions how you can shard evaluation across all the workers given an arbitrary input_fn? |
@cweill , I lack the deep knowledge you sure have about Adanet or even TF but unless you suggest the problem to implement this is on TF I don't see additional complexity other than the fact that you are evaluating multiple networks. Is it a TF issue? |
@picarus: Unfortunately nothing is very straightforward in TF. :) The challenges I see are:
If you have any suggestions or pull request, I'm happy to chat more. |
Hello,
I am running Adanet 0.5.0 in GCP with Runtime version 1.10.
I am using a CPU configuration with multiple nodes.
The training phase is very fast but it gets totally slowed down by the evaluations. The evaluation don't seem to take advantage of the multiple nodes and the logs are flooded with "Waiting for chief to finish" messages coming from the workers and generated by Adanet Estimator.
I think support for evaluation phase to use the multiple nodes should be added and that be a priority change as not only the nodes are not used, you also keep paying for them.
Is that feasible?
Thanks in advance
Jose
The text was updated successfully, but these errors were encountered: