We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.没有GPU的情况能不能进行这个项目?Azure for Student的100$ / 年,如果在Azure开带GPU的VM好像不到24h就花光了,希望能用纯CPU的vm完成。 2.项目二里的, 这个input.data<scalar_t>()是怎样一种数据结构?为什么要把blockIdx平方?举例一个例子,这个Linear层输入是32维输出也是32维,那就是32个Cell,假设Batch是10,那么M = size(0) = 10, N = size(1) = 32,weight.size(1) = ?,blockIdx和threadIdx的遍历范围是多大 ?
The text was updated successfully, but these errors were encountered:
Sorry, something went wrong.
这些问题,在我的repo中进行了部分实验解释:
https://github.com/UEFI-code/MSRA_thePracticeSpaceProject_PyTorchCUDA/wiki/About-the-No-GPU-Build-Hack-in-PyTorch
https://github.com/UEFI-code/MSRA_thePracticeSpaceProject_PyTorchCUDA/wiki/Explan-the-input-and-weight-Vector's-Shape-in-Cpp
https://github.com/UEFI-code/MSRA_thePracticeSpaceProject_PyTorchCUDA/wiki/About-the-input-and-weight-Vector's-Memory-Structure
https://github.com/UEFI-code/MSRA_thePracticeSpaceProject_PyTorchCUDA/tree/main/myLinear_CUDA_backend
我已经成功在Linear层使用1个Thread模拟一个神经元,然后通过block等效循环去遍历每个batch,虽然还是没有理解demo代码😅
如果对我的repo有任何问题,欢迎在这里回复或者在我的repo里开个issue
感谢老师们与同学们的耐心检查与指导
No branches or pull requests
1.没有GPU的情况能不能进行这个项目?Azure for Student的100$ / 年,如果在Azure开带GPU的VM好像不到24h就花光了,希望能用纯CPU的vm完成。
2.项目二里的, 这个input.data<scalar_t>()是怎样一种数据结构?为什么要把blockIdx平方?举例一个例子,这个Linear层输入是32维输出也是32维,那就是32个Cell,假设Batch是10,那么M = size(0) = 10, N = size(1) = 32,weight.size(1) = ?,blockIdx和threadIdx的遍历范围是多大
?
The text was updated successfully, but these errors were encountered: