Agisoft Metashape

Agisoft Metashape => Bug Reports => Topic started by: B_Free42 on January 25, 2020, 05:29:51 AM

Title: cudaMemGetInfo time out error
Post by: B_Free42 on January 25, 2020, 05:29:51 AM
When I first run Build Dense Point Cloud on Medium or Low setting I get this error after several seconds:

cudaMemGetInfo(&free_mem_size, &total_mem_size): the launch timed out and was terminated (6) at line 211

When I try running the same thing again, it errors out almost immediately and gives the same error message except that it terminated at line 33.

The Build Dense Point Cloud process will finish if I run it on "Lowest" quality setting.

Is this something I can fix? Do I need to change the amount of time the computer waits for the time out? Thanks!

I'm running Windows 10 Home on Dell G7 7790 laptop. GPU is NVidia GeForce RTX 2080 with Max-Q Design.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on January 25, 2020, 12:12:13 PM
Hello B_Free42,

Looks like the driver failure. After the first time out it has not recovered, so all the next tries fails immediately.

I suggest to make a clean driver install and check if it helps:
https://www.nvidia.com/Download/driverResults.aspx/156282/en-us

If the problem persists, please provide the complete processing log from the Console pane.
Title: Re: cudaMemGetInfo time out error
Post by: tpeachey on February 07, 2020, 03:26:11 AM
When I have my GeForce RTX 2070 selected under the GPU tab of Metashape preferences I have been having almost the identical issue when I run both 'Align photos' and 'Build mesh'. When I use only my Intel UHD graphics 630 selected, these processes will run successfully, albeit very slowly. I have reinstalled both the GeForce Game Ready Driver and the Studio Driver and restarted my computer before and after doing both.

This started out happening rarely and now happens every time and no process will run successfully. Here are 3 examples of error messages I've been getting;
"cudaMemGetInfo(&free_mem_size,&total_mem_size): unspecified launch failure (4) at line 211"
"cudaMemGetInfo(&free_mem_size,&total_mem_size): an illegal memory access was encountered (77) at line 40"
"Kernel failed: an illegal memory access was encountered (77) at line 235"

I have attached an example console file.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on February 07, 2020, 02:26:39 PM
Hello tpeachey,

Are you observing similar issues for other GPU-supported stages, like depth maps reconstruction, for example?

Does it help, if you re-install NVIDIA driver using Clean Driver Install option?
Title: Re: cudaMemGetInfo time out error
Post by: tpeachey on February 09, 2020, 03:31:01 AM
Thanks for your reply Alexey.

I have tried the 'clean' option when reinstalling with no success.

Are you referring to the depth reconstruction stage of the 'Build Mesh' process? Yes that seems to be when it is occurring in that process.

As an update, I did find that a 'Build Mesh' would complete on a previously aligned 133,000 point chunk at medium quality, though the resulting model was distorted and not representative of the object or the point cloud of the previously successful alignment.

I am including another console file which shows the above mentioned 'successful' Build Mesh and then following that a failed attempt to run a high quality Build Mesh on a 200,000+ point previously aligned chunk.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on February 11, 2020, 09:06:47 PM
Hello tpeachey,

It looks like a hardware problem, as the issues are observed during different processing stages. I can send you instructions, how to switch the GPU calculations from CUDA to OpenCL, but if the problem is related to the system, then most likely you'll get a kind of similar errors.

Meanwhile, you can try to run Memtest86 checks, just for case (both in multi- and single-threaded modes), just for case the problem is caused by RAM failures. If CPU, GPU or RAM are overclocked, it may be also a reason of the problem, however, it should be observed during other intensive calculation stages.
Title: Re: cudaMemGetInfo time out error
Post by: OlaH on October 03, 2020, 12:38:45 PM
I had the same problem and it seemed really random, but I finally found out that this appeared only when my laptop was unplugged.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on October 03, 2020, 10:15:37 PM
Hello OlaH,

Probably system is dropping the power supply when working from battery, trying to save some power, and it affects the performance and stability of the application.
Title: Re: cudaMemGetInfo time out error
Post by: RHenriques on October 12, 2020, 07:34:13 PM
Cuda Errors are becoming more frequent and persistant. They are happening in the Align Photos and Build dense cloud stages. If I switch to OpenGL, the app will crash also (I've send a couple of errors to Agisoft via th build in crash error send). Are there any tweaks or parameters that we can change to ease these errors? Once Alexey send me "main/depth_max_gpu_multiplier" change to 1 that did improve things. However, something has changed that now crashes cuda a lot more often.
I'm sending cnsoloe logs of recently crashed projects.
Best Regards.


PS: Included another crash during the dense cloud building.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on October 13, 2020, 12:51:24 AM
Hello Renato,

I would say that these crashes seems to be related to the GPU driver issues.

Which version of NVIDIA driver you have installed? And do you have a saved log related to OpenCL tweak?
Title: Re: cudaMemGetInfo time out error
Post by: RHenriques on October 13, 2020, 09:26:26 PM
Hi Alexey

I have the latest drives available in MacOS. I cannot collect the OpenGL log because the app crashes. I've used the feature that you have built-in to send the crash report to Agisoft.
Best Regards


 
Title: Re: cudaMemGetInfo time out error
Post by: c-r-o-n-o-s on November 19, 2020, 10:51:42 PM
Same here: cudaMemGetInfo...

ThinkPad with Quadro 3000 GPU and NVIDIA 452.39 driver.
(Win10 1909)
Title: Re: cudaMemGetInfo time out error
Post by: c-r-o-n-o-s on November 20, 2020, 12:44:52 PM
I have updated the drivers to 452.57.
Unfortunately the error still occurred.


Now I have set "depth_max_gpu_multiplier" to 1 and it seems to work!

I think there are speed losses now, though?
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on November 20, 2020, 02:28:54 PM
Hello c-r-o-n-o-s,

You can try to switch back the multiplier to 2 (default value), but set main/gpu_enable_cuda tweak to False value, re-start Metashape and check, if the GPU is able to work properly using OpenCL instead of CUDA.
Title: Re: cudaMemGetInfo time out error
Post by: c-r-o-n-o-s on November 20, 2020, 03:35:12 PM
This works of course, BUT now a LOT of this appeard in the LOG:


- GPU rectifying failed: clEnqueueWriteBuffer(queue(), buffer, blocking_write, offset, cb, ptr, 0, NULL, NULL): CL_OUT_OF_RESOURCES (-5) at line 345
- using CPU implementation...
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on November 20, 2020, 04:24:57 PM
Hello c-r-o-n-o-s,

Can you please attach the processing logs related to the GPU processing (using CUDA and OpenCL with multiplier = 2)? I suggest to reboot before every start, just for case the driver cannot be recovered.

For me it seems like a driver issue, so I can suggest to make a clean driver install. Also please specify, what is the Quadro model that you are using? Do you mean RTX 3000 or P3000?
Title: Re: cudaMemGetInfo time out error
Post by: RHenriques on November 30, 2020, 10:12:08 PM
These errors are coming more frequent, even in smaller projects. I've been noticing that if the "Generic Preselection" is active in the Align Photos, there is more chance of success in this stage. If switched off, failure is certain. This problem seems to be linked to excess of peak use by external GPU's. Is there a way to lower or fine-tune a bit each GPU use?
Best Regards


Title: Re: cudaMemGetInfo time out error
Post by: c-r-o-n-o-s on December 02, 2020, 09:20:14 PM
The main/depth_max_gpu_multiplier 1 setting work fine, but the speedimpact is roud about 50%!
Title: Re: cudaMemGetInfo time out error
Post by: RHenriques on December 02, 2020, 09:55:15 PM
The main/depth_max_gpu_multiplier 1 setting work fine, but the speedimpact is roud about 50%!

Same here. Probably not has drastic but noticeable. However is the only way to minimize CUDA errors.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on December 03, 2020, 01:44:25 PM
These errors are coming more frequent, even in smaller projects. I've been noticing that if the "Generic Preselection" is active in the Align Photos, there is more chance of success in this stage. If switched off, failure is certain. This problem seems to be linked to excess of peak use by external GPU's. Is there a way to lower or fine-tune a bit each GPU use?

I don't think professional graphic cards like Quadro RTX are so delicate and cannot handle high load. It seems to be one of their main purposes - to be installed in the server racks and workstations that are performing regular and almost constant calculations. So for me this particular case (reported by c-r-o-n-o-s) seems to be related to hardware or drivers. If NVIDIA drivers are up to date and clean install doesn't change anything, no RAM issues are detected and there are no problems with power supply management, I would suggest to contact NVIDIA support regarding the observed problem - CUDA errors when two contexts are used on the same GPU.
According to my experience, NVIDIA should provide good support to professional graphic card owners. Maybe they could suggest to alter some settings or will run additional diagnostics for the GPU in order to check, if the issues are related to some factory flaw.
Title: Re: cudaMemGetInfo time out error
Post by: c-r-o-n-o-s on December 04, 2020, 01:48:16 PM
Now I have changed a lot in the graphics card drivers.
Energy mode: Adaptive, max - optimal performance and so on.
What to say, now it runs!

My current settings are the same as I started with (with errors) and still it runs, even under OpenCL.

There really is a "node" in the driver settings.
Title: Re: cudaMemGetInfo time out error
Post by: Alberto C on February 01, 2021, 06:30:46 PM
When I first run Build Dense Point Cloud on Medium or Low setting I get this error after several seconds:

cudaMemGetInfo(&free_mem_size, &total_mem_size): the launch timed out and was terminated (6) at line 211

When I try running the same thing again, it errors out almost immediately and gives the same error message except that it terminated at line 33.

The Build Dense Point Cloud process will finish if I run it on "Lowest" quality setting.

Is this something I can fix? Do I need to change the amount of time the computer waits for the time out? Thanks!

I'm running Windows 10 Home on Dell G7 7790 laptop. GPU is NVidia GeForce RTX 2080 with Max-Q Design.




Hola, tuve el mismo problema y lo solucione regresando a una versiĆ³n anterior del controlador de la grafica integrada!! Funciono perfectamente y mi PC cuenta con una NVIDIA RTX2060
Title: Re: cudaMemGetInfo time out error
Post by: flogs on January 18, 2022, 02:38:23 PM
Hi,

RTX 2070 super, Windows 11, Metashape 1.8.0. Occational errors during processing. Drivers reinstalled - I tried both game and studio drivers. Any idea if it is hardware or software problem and what to do?

Kernel failed: an illegal memory access was encountered (700) at line 269
cudaMemGetInfo(&free_mem_size, &total_mem_size): an illegal memory access was encountered (700) at line 40

Thanks,
Filip
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on January 18, 2022, 03:30:33 PM
Hello Filip,

Looks like the driver failure.

Do you observe GPU-based processing problems on different stages, like image matching, depth maps generation, depth maps based mesh generation? Also please specify the driver versions that you have tried.

Title: Re: cudaMemGetInfo time out error
Post by: flogs on January 18, 2022, 05:12:49 PM
Hello Filip,

Looks like the driver failure.

Do you observe GPU-based processing problems on different stages, like image matching, depth maps generation, depth maps based mesh generation? Also please specify the driver versions that you have tried.

No errors before, problems started about at the same time when migrating to Windows 11. But I cannot be sure.

I use up-to-date drivers. When I do a clean driver reinstal, I do not observe errors immediately. But also when I close and open again Metashape, sometimes I can succesfully complete the desired calculation (which stopped with an error before).
NVidia Studio, 511.09, 01/04/2022
NVidia Game, 511.23, 01/14/2022

Most often I see the error (Kernel failed) during image alignment stage. But sometimes even later during next steps. But sometimes even a couple of hours computation work (build dense cloud, build mesh) is ok.

I wonder if I should run some software test of my GPU/RAM to see everything is ok. (OCCT, memtest)
GPU cards are pretty expensive and unavailable these days so I hope my card will last a little bit longer. :)
Title: Re: cudaMemGetInfo time out error
Post by: flogs on January 25, 2022, 05:02:32 PM
I can confirme that when I do a clean reinstall of a GPU driver, I do not see errors for some time afterwards. But it starts later on. So far it is always like this. Are going to investigate this issue further? Filip
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on January 25, 2022, 05:33:12 PM
Hello Filip,

We'll try to reproduce the problem on similar system configuration (Windows 11 + RTX 2070).

Meanwhile you can set up the following tweak via Advanced preference tab: main/gpu_enable_cuda, set its value to False, re-start Metashape and check if the problem no longer persists. The tweak switches from CUDA implementation to OpenCL and may help if only CUDA part of the driver is somehow affected.
Title: Re: cudaMemGetInfo time out error
Post by: flogs on January 28, 2022, 07:06:54 PM
Hello Filip,

We'll try to reproduce the problem on similar system configuration (Windows 11 + RTX 2070).

Meanwhile you can set up the following tweak via Advanced preference tab: main/gpu_enable_cuda, set its value to False, re-start Metashape and check if the problem no longer persists. The tweak switches from CUDA implementation to OpenCL and may help if only CUDA part of the driver is somehow affected.

Ok. thanks. Just to be accurate, I have RTX 2070 Super, not older RTX 2070. From MSI.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on January 28, 2022, 09:04:13 PM
Hello Filip,

We haven't yet got a chance to test RTX 20 series on Windows 11, but if you were able to run the processing using OpenCL, please let me know, if it went fine or produced any similar error (likely would start with CL_ prefix).
Title: Re: cudaMemGetInfo time out error
Post by: flogs on February 16, 2022, 02:02:04 PM
Hello Filip,

We haven't yet got a chance to test RTX 20 series on Windows 11, but if you were able to run the processing using OpenCL, please let me know, if it went fine or produced any similar error (likely would start with CL_ prefix).

Hi, thanks for info. I did not switch to OpenCL either. The problem is still there but when I reinstall the driver, I am somehow able to work for some time. Regarding swithing to OpenCL... how much time increases while processing projects?
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on February 18, 2022, 11:12:41 AM
Hello Filip,

There shouldn't be any noticeable increase of the processing time, when switching to OpenCL implementation.
Title: Re: cudaMemGetInfo time out error
Post by: ttsesm on February 25, 2022, 05:38:54 PM
Hi Alexey, I am getting the same error with RTX2080 super on my linux machine. I would like to test the OpenCL option, is there a way to set this in my script through python?

Moreover, adding the suggested tweak on the gui seems to work for the Dense Point cloud creation, but then on the Mesh creation I am getting the following error:

Code: [Select]
ciErrNum: CL_UNKNOWN_ERROR_CODE_-9999 (-9999) at line 209
Which I guess is related to opencl.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on February 25, 2022, 07:03:13 PM
Hello ttsesm,

To switch to OpenCL implementation via Python, you need to add the following line to the beginning of your script:
Code: [Select]
Metashape.app.settings.setValue("main/gpu_enable_cuda", "0")
As for the error that you are observing, please provide the related log  corresponding to the failed operation. Also specify Linux distribution used and NVIDIA driver version.
Title: Re: cudaMemGetInfo time out error
Post by: ttsesm on February 25, 2022, 07:55:07 PM
Thanks Alexey, setting the parameter worked.

However after switching to OpenCL the error that I get is the following:

Code: [Select]
...
...
Using device: NVIDIA GeForce RTX 2080 SUPER, 48 compute units, 7980 MB global memory, OpenCL 3.0
  driver version: 510.54, platform version: OpenCL 3.0 CUDA 11.6.110
  max work group size 1024
  max work item sizes [1024, 1024, 64]
  max mem alloc size 1995 MB
  warp size 32
Building OpenCL kernels for NVIDIA GeForce RTX 2080 SUPER...
Kernels compilation done in 2.83562 seconds
Building OpenCL kernels for NVIDIA GeForce RTX 2080 SUPER...
Kernels compilation done in 0.646988 seconds
Traceback (most recent call last):
  File "/home/ttsesm/Development/metashape_project/bundler_extractor.py", line 68, in <module>
    main()
  File "/home/ttsesm/Development/metashape_project/bundler_extractor.py", line 47, in main
    chunk.matchPhotos()
Exception: Kernel locatePoints: clWaitForEvents(1, &ev): CL_UNKNOWN_ERROR_CODE_-9999 (-9999) at line 638

Process finished with exit code 1

My linux distribution is Arch linux, fully updated to the latest packages. The nvidia driver version is again the latest and specifically v.510.54 as you can see above in the output.

Is there any other log that I can provide you? I am running the script through pycharm with the metashape interpreter as described here https://agisoft.freshdesk.com/support/solutions/articles/31000154762-how-to-make-python-interpreter-to-use-metashape-module

------------------------------------------------------------------------------------

Also without the OpenCL workaround, initially the error usually I get is the following:

Code: [Select]
...
...
Found 1 GPUs in 0.000133 sec (CUDA: 7.3e-05 sec, OpenCL: 5.3e-05 sec)
Using device: NVIDIA GeForce RTX 2080 SUPER, 48 compute units, free memory: 6807/7980 MB, compute capability 7.5
  driver/runtime CUDA: 11060/10010
  max work group size 1024
  max work item sizes [1024, 1024, 64]
[GPU] photo 19: 8310 points
[GPU] photo 48: 8221 points
[GPU] photo 77: 8417 points
[GPU] photo 106: 7715 points
[GPU] photo 135: 6117 points
[GPU] photo 164: 8341 points
[GPU] photo 193: 9600 points
[GPU] photo 222: 8013 points
[GPU] photo 251: 9177 points
[GPU] photo 280: 9049 points
[GPU] photo 309: 7521 points
[GPU] photo 338: 9145 points
[GPU] photo 367: 8751 points
[GPU] photo 396: 8642 points
[GPU] photo 425: 9032 points
[GPU] photo 454: 8926 points
[GPU] photo 483: 9299 points
[GPU] photo 512: 8379 points
Warning: cudaStreamDestroy failed: an illegal memory access was encountered (700)
Traceback (most recent call last):
  File "/home/ttsesm/Development/metashape_project/bundler_extractor.py", line 68, in <module>
    main()
  File "/home/ttsesm/Development/metashape_project/bundler_extractor.py", line 47, in main
    chunk.matchPhotos()
Exception: Kernel failed: an illegal memory access was encountered (700) at line 143

Process finished with exit code 1

In case it helps somehow.
Title: Re: cudaMemGetInfo time out error
Post by: ttsesm on February 28, 2022, 03:20:34 PM
Hi Alexey,

any update how I could resolve the issue or apply any workaround?

Thanks.
Title: Re: cudaMemGetInfo time out error
Post by: Alexey Pasumansky on February 28, 2022, 05:38:14 PM
Hello ttsesm,

To investigate the problem further, can you please check, if the same operation works or returns the error in the version 1.6.5:
https://s3-eu-west-1.amazonaws.com/download.agisoft.com/metashape-pro_1_6_5_amd64.tar.gz

If the issue is still there, please save the related log.

In case it is possible for you, please also check, if the issue persists in 1.8.1 and 1.6.5 with the older driver version (latest available in 4xx.xx series).
Title: Re: cudaMemGetInfo time out error
Post by: ttsesm on March 02, 2022, 01:21:40 PM
Hello ttsesm,

To investigate the problem further, can you please check, if the same operation works or returns the error in the version 1.6.5:
https://s3-eu-west-1.amazonaws.com/download.agisoft.com/metashape-pro_1_6_5_amd64.tar.gz

If the issue is still there, please save the related log.

In case it is possible for you, please also check, if the issue persists in 1.8.1 and 1.6.5 with the older driver version (latest available in 4xx.xx series).

Hi Alexey,

Some update. Downgrading to an older version didn't really help, I was getting the same errors as well. Then reading around that this might be a hardware related issue, I plugged in an older nvidia card that I had available and more specifically an Nvidia GTX 1080 Ti with 12Gb memory and all of a sudden everything works smoothly without errors without anything. My current card is an RTX Nvidia 2080 super with 7Gb of memory. Thus, apparently it is related to the hardware somehow. Now I am not sure whether it is due to the memory difference or just because my 2080 super is broken (though for everything else works fine) or because in the new rtx cards there is something different in the processing of memory or something. Unfortunately, I do not have any other RTX 20xx card to test but it would be interesting to see if the other guys who have similar issues are also using an RTX card.

In any case, thanks for the support and your time.