Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPU 分布式计算 #8

Open
huan opened this issue Mar 17, 2019 · 6 comments
Open

TPU 分布式计算 #8

huan opened this issue Mar 17, 2019 · 6 comments
Assignees
Labels

Comments

@huan
Copy link
Collaborator

huan commented Mar 17, 2019

TPU 章节计划包括以下几部分内容:

  • Cloud TPU
    • v2
    • v3
    • Pod
  • Edge TPU (Coral)

目前看来,第一版可能来不及涵盖,所以计划在第一版中不包括 TPU 部分内容。(如果之后书出版之前还有时间补充,可以补充最基本的 Google Cloud TPU 配置方法)

大家看这样是否可以? @snowkylin @dpinthinker


  1. UPDATE(29 Aug 2019): TensorFlow 2.0/2.1 TPU Support Track Issue: TPU support is incomplete tensorflow/tensorflow#24412 (comment)
  2. UPDATE(17 Mar 2019): 经过和锡涵讨论,TF2.0正式发布之前还能有一些时间,所以决定继续补充一个最基本的版本,5-10页
@huan huan self-assigned this Mar 17, 2019
@huan
Copy link
Collaborator Author

huan commented Aug 25, 2019

Will start writting this chapter this week.

@huan huan added the chapter label Aug 28, 2019
@huan huan mentioned this issue Sep 4, 2019
@huan
Copy link
Collaborator Author

huan commented Sep 9, 2019

Reviews from @snowkylin

TPU

  • Move minor contents into tips box
    • confirm env environment text move to the tip box
  • Use TF existing model in example code
  • Add benchmark comparison to other strategies: GPU, multiple GPU, and multiple Servers
  • Study xihan's distribute chapter, align to it.
  • Add source link to each image

@JimXiongGM
Copy link

您好, 章节《使用 TPU 训练 TensorFlow 模型(Huan)》的示例colab文件(https://colab.research.google.com/github/huan/tensorflow-handbook-tpu/blob/master/tensorflow-handbook-tpu-example.ipynb)无法跑通,显示

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:CPU:0 in order to run AutoShardDataset: Unable to parse tensor proto
Additional GRPC error information:
{"created":"@1571137943.518656507","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Unable to parse tensor proto","grpc_status":3} [Op:AutoShardDataset]

请求解答,谢谢

@huan
Copy link
Collaborator Author

huan commented Oct 15, 2019

@JimXiongGM Hi, thanks for trying the TF2.0 with Colab & TPU!

The TensorFlow 2.0 has not finished TPU support in Colab. I get some updates from Googler and they said that it will be fully supported in TensorFlow 2.1.

This is a known issue and you can learn more from tensorflow/tensorflow#33045 (comment) and huan/tensorflow-handbook-tpu#1

The Workaround

Before the TF2.1 was released, you can use the latest TF1.x code and use eager execution, which all the API is quite like the TF2.0.

And you can switch to TF2.1 after the 2.1 is released, with very few code modifications.

P.S. I will update the chapter to describe this problem in detail today.

@JimXiongGM
Copy link

thanks a lot ;-)

@huan
Copy link
Collaborator Author

huan commented Oct 15, 2019

You are welcome. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants