Doesnt appear to be any way to transfer a result tensor in to an existing cpu float array. Below requires new memory allocation. var cpuResult = gpuResult.cpu(); float[] result = cpuResult.data<float>().ToArray(); If this is part of a loop, this is a lot of wasted memory allocation and time! Below is how libraries normally do things. eg. CUDA. float[] cpuResult ..... (pre allocated further up) gpuResult.CopyToHost(cpuResult); Maybe I missed this CopyTo because its kinda essential for any gpu type library (?!) Is this project maintained?