Forum

Author Topic: cudaMemGetInfo time out error  (Read 22404 times)

B_Free42

  • Newbie
  • *
  • Posts: 10
    • View Profile
cudaMemGetInfo time out error
« on: January 25, 2020, 05:29:51 AM »
When I first run Build Dense Point Cloud on Medium or Low setting I get this error after several seconds:

cudaMemGetInfo(&free_mem_size, &total_mem_size): the launch timed out and was terminated (6) at line 211

When I try running the same thing again, it errors out almost immediately and gives the same error message except that it terminated at line 33.

The Build Dense Point Cloud process will finish if I run it on "Lowest" quality setting.

Is this something I can fix? Do I need to change the amount of time the computer waits for the time out? Thanks!

I'm running Windows 10 Home on Dell G7 7790 laptop. GPU is NVidia GeForce RTX 2080 with Max-Q Design.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #1 on: January 25, 2020, 12:12:13 PM »
Hello B_Free42,

Looks like the driver failure. After the first time out it has not recovered, so all the next tries fails immediately.

I suggest to make a clean driver install and check if it helps:
https://www.nvidia.com/Download/driverResults.aspx/156282/en-us

If the problem persists, please provide the complete processing log from the Console pane.
Best regards,
Alexey Pasumansky,
Agisoft LLC

tpeachey

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #2 on: February 07, 2020, 03:26:11 AM »
When I have my GeForce RTX 2070 selected under the GPU tab of Metashape preferences I have been having almost the identical issue when I run both 'Align photos' and 'Build mesh'. When I use only my Intel UHD graphics 630 selected, these processes will run successfully, albeit very slowly. I have reinstalled both the GeForce Game Ready Driver and the Studio Driver and restarted my computer before and after doing both.

This started out happening rarely and now happens every time and no process will run successfully. Here are 3 examples of error messages I've been getting;
"cudaMemGetInfo(&free_mem_size,&total_mem_size): unspecified launch failure (4) at line 211"
"cudaMemGetInfo(&free_mem_size,&total_mem_size): an illegal memory access was encountered (77) at line 40"
"Kernel failed: an illegal memory access was encountered (77) at line 235"

I have attached an example console file.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #3 on: February 07, 2020, 02:26:39 PM »
Hello tpeachey,

Are you observing similar issues for other GPU-supported stages, like depth maps reconstruction, for example?

Does it help, if you re-install NVIDIA driver using Clean Driver Install option?
Best regards,
Alexey Pasumansky,
Agisoft LLC

tpeachey

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #4 on: February 09, 2020, 03:31:01 AM »
Thanks for your reply Alexey.

I have tried the 'clean' option when reinstalling with no success.

Are you referring to the depth reconstruction stage of the 'Build Mesh' process? Yes that seems to be when it is occurring in that process.

As an update, I did find that a 'Build Mesh' would complete on a previously aligned 133,000 point chunk at medium quality, though the resulting model was distorted and not representative of the object or the point cloud of the previously successful alignment.

I am including another console file which shows the above mentioned 'successful' Build Mesh and then following that a failed attempt to run a high quality Build Mesh on a 200,000+ point previously aligned chunk.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #5 on: February 11, 2020, 09:06:47 PM »
Hello tpeachey,

It looks like a hardware problem, as the issues are observed during different processing stages. I can send you instructions, how to switch the GPU calculations from CUDA to OpenCL, but if the problem is related to the system, then most likely you'll get a kind of similar errors.

Meanwhile, you can try to run Memtest86 checks, just for case (both in multi- and single-threaded modes), just for case the problem is caused by RAM failures. If CPU, GPU or RAM are overclocked, it may be also a reason of the problem, however, it should be observed during other intensive calculation stages.
Best regards,
Alexey Pasumansky,
Agisoft LLC

OlaH

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #6 on: October 03, 2020, 12:38:45 PM »
I had the same problem and it seemed really random, but I finally found out that this appeared only when my laptop was unplugged.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #7 on: October 03, 2020, 10:15:37 PM »
Hello OlaH,

Probably system is dropping the power supply when working from battery, trying to save some power, and it affects the performance and stability of the application.
Best regards,
Alexey Pasumansky,
Agisoft LLC

RHenriques

  • Full Member
  • ***
  • Posts: 225
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #8 on: October 12, 2020, 07:34:13 PM »
Cuda Errors are becoming more frequent and persistant. They are happening in the Align Photos and Build dense cloud stages. If I switch to OpenGL, the app will crash also (I've send a couple of errors to Agisoft via th build in crash error send). Are there any tweaks or parameters that we can change to ease these errors? Once Alexey send me "main/depth_max_gpu_multiplier" change to 1 that did improve things. However, something has changed that now crashes cuda a lot more often.
I'm sending cnsoloe logs of recently crashed projects.
Best Regards.


PS: Included another crash during the dense cloud building.
« Last Edit: October 12, 2020, 07:43:16 PM by RHenriques »

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #9 on: October 13, 2020, 12:51:24 AM »
Hello Renato,

I would say that these crashes seems to be related to the GPU driver issues.

Which version of NVIDIA driver you have installed? And do you have a saved log related to OpenCL tweak?
Best regards,
Alexey Pasumansky,
Agisoft LLC

RHenriques

  • Full Member
  • ***
  • Posts: 225
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #10 on: October 13, 2020, 09:26:26 PM »
Hi Alexey

I have the latest drives available in MacOS. I cannot collect the OpenGL log because the app crashes. I've used the feature that you have built-in to send the crash report to Agisoft.
Best Regards


 

c-r-o-n-o-s

  • Jr. Member
  • **
  • Posts: 91
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #11 on: November 19, 2020, 10:51:42 PM »
Same here: cudaMemGetInfo...

ThinkPad with Quadro 3000 GPU and NVIDIA 452.39 driver.
(Win10 1909)
« Last Edit: November 19, 2020, 10:55:24 PM by c-r-o-n-o-s@web.de »

c-r-o-n-o-s

  • Jr. Member
  • **
  • Posts: 91
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #12 on: November 20, 2020, 12:44:52 PM »
I have updated the drivers to 452.57.
Unfortunately the error still occurred.


Now I have set "depth_max_gpu_multiplier" to 1 and it seems to work!

I think there are speed losses now, though?

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #13 on: November 20, 2020, 02:28:54 PM »
Hello c-r-o-n-o-s,

You can try to switch back the multiplier to 2 (default value), but set main/gpu_enable_cuda tweak to False value, re-start Metashape and check, if the GPU is able to work properly using OpenCL instead of CUDA.
Best regards,
Alexey Pasumansky,
Agisoft LLC

c-r-o-n-o-s

  • Jr. Member
  • **
  • Posts: 91
    • View Profile
Re: cudaMemGetInfo time out error
« Reply #14 on: November 20, 2020, 03:35:12 PM »
This works of course, BUT now a LOT of this appeard in the LOG:


- GPU rectifying failed: clEnqueueWriteBuffer(queue(), buffer, blocking_write, offset, cb, ptr, 0, NULL, NULL): CL_OUT_OF_RESOURCES (-5) at line 345
- using CPU implementation...