Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do with arbitrary input size for GPU efficiency #18

Open
barisgecer opened this issue Mar 9, 2016 · 1 comment
Open

What to do with arbitrary input size for GPU efficiency #18

barisgecer opened this issue Mar 9, 2016 · 1 comment

Comments

@barisgecer
Copy link

Hi,
Short Question : I have a data set with arbitrary input sizes. How should I edit my input samples to keep all of them in one matrix for GPU efficiency?

Long Questions :
Currently I am processing each sample individually, which of course, really inefficient. I should store them in 4D matrix (where 4th dimension is for images), but due to their varying size, I can't.

Do you think I should sample same sized inputs from the data? How should I do it?

What if I set the size of my 4D matrix to maximum size of among all input samples and fill the remaining part of other samples with zeros? What happens when we give an image of zeros to the network? Does it have any influence in learning?

Thank you.

@brisker
Copy link

brisker commented Apr 13, 2016

Hi, @barisgecer When I run the fcnTrain.m(gpu mode) it is very slow around 1Hz and don't converge, what problems do you think I have? Please help me,thank you very much!
CUDADevice with properties:

              Name: 'GeForce GTX TITAN X'
             Index: 1
 ComputeCapability: '5.2'
    SupportsDouble: 1
     DriverVersion: 7.5000
    ToolkitVersion: 6.5000
MaxThreadsPerBlock: 1024
  MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
       MaxGridSize: [2.1475e+09 65535 65535]
         SIMDWidth: 32
       TotalMemory: 1.2885e+10
   AvailableMemory: 1.2609e+10

MultiprocessorCount: 24
ClockRateKHz: 1076000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
train: epoch 01: 1/565: 0.9 Hz accuracy: 0.677 0.048 0.032 objective: 3.044
train: epoch 01: 2/565: 1.0 Hz accuracy: 0.691 0.048 0.033 objective: 3.035
train: epoch 01: 3/565: 1.0 Hz accuracy: 0.698 0.048 0.033 objective: 2.994

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants