Forum

Author Topic: Finding GPU memory is very slow  (Read 807 times)

kwea123

  • Newbie
  • *
  • Posts: 3
    • View Profile
Finding GPU memory is very slow
« on: April 27, 2023, 04:15:20 PM »
Hi, I'm using metashape on an AWS instance. The instance has nvidia driver 515.65.01 and cuda11.7 installed.
After spawning the instance, the initial run always takes 30 seconds to find the GPU:

Code: [Select]
Found 1 GPUs in 0.040546 sec (CUDA: 0.040196 sec, OpenCL: 0.000329 sec)
Using device: NVIDIA A10G, 80 compute units, free memory: 22332/22592 MB, compute capability 8.6
  driver/runtime CUDA: 11070/10010
  max work group size 1024
  max work item sizes [1024, 1024, 64]
  got device properties in 0.001279 sec, free memory in 33.9145 sec

From the second run on, the "got free memory" part is instant, meaning this is something related to cuda initialization.
However for our use we need to spawn a new instance every time, meaning we always waste 30 seconds. Is there any way to reduce the loading time?
« Last Edit: April 27, 2023, 04:38:19 PM by kwea123 »

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14485
    • View Profile
Re: Finding GPU memory is very slow
« Reply #1 on: April 27, 2023, 05:42:17 PM »
Hello kwea123,

Can you please try to switch to OpenCL implementation from CUDA via main/gpu_enable_cuda=False tweak and check, if it fixes the delay issue?
Best regards,
Alexey Pasumansky,
Agisoft LLC

kwea123

  • Newbie
  • *
  • Posts: 3
    • View Profile
Re: Finding GPU memory is very slow
« Reply #2 on: April 28, 2023, 01:32:41 PM »
Our instance cannot find openCL, if I use `gpu_enable_cuda=False` it switches everything to CPU which is not suitable for us. Trying to install openCL alongside with CUDA is something we could investigate, but this takes a long time since we need to rebuild all our images.

Is there any other potential solution that we can quickly try?

kwea123

  • Newbie
  • *
  • Posts: 3
    • View Profile
Re: Finding GPU memory is very slow
« Reply #3 on: April 28, 2023, 02:53:38 PM »
Ok, it works with saving the compute cache (copying ~/.nv/ComputeCache from local to the instance). Thanks for your support, it is now solved.