Agisoft Metashape

Agisoft Metashape => Bug Reports => Topic started by: edtriplett on February 19, 2018, 09:25:11 PM

Title: CUDA Crash during dense point cloud processing
Post by: edtriplett on February 19, 2018, 09:25:11 PM
I have a  very new multi-GPU machine that has been crashing during the dense point cloud processing due to an error you can see below from the console. It does not happen with a smaller 200 photo process, but this set with 421 photos is causing this error. Does anyone know a good fix?

Quote
2018-02-19 13:15:46 [GPU] estimating 2339x1537x832 disparity using 1170x769x8u tiles
2018-02-19 13:15:46 timings: rectify: 0.03 disparity: 0.402 borders: 0.031 filter: 0.088 fill: 0
2018-02-19 13:15:46 [GPU] estimating 1889x1945x192 disparity using 945x973x8u tiles
2018-02-19 13:15:50 timings: rectify: 0.058 disparity: 0.377 borders: 0.018 filter: 3.42 fill: 0
2018-02-19 13:15:50 [GPU] estimating 1224x924x896 disparity using 1224x924x8u tiles
2018-02-19 13:15:50 GPU rectifying failed: the launch timed out and was terminated (6) at line 216
2018-02-19 13:15:50 using CPU implementation...
2018-02-19 13:15:50 Error: the launch timed out and was terminated (6) at line 216
2018-02-19 13:15:50 GPU processing failed, switching to CPU mode
2018-02-19 13:15:50 [CPU] estimating 1224x924x896 disparity using 1224x924x8u tiles
2018-02-19 13:15:50 Error: Kernel failed: the launch timed out and was terminated (6) at line 199
2018-02-19 13:15:50 GPU processing failed, switching to CPU mode
2018-02-19 13:15:50 [CPU] estimating 2339x1537x832 disparity using 1170x769x8u tiles
2018-02-19 13:15:50 Error: Kernel failed: the launch timed out and was terminated (6) at line 821
2018-02-19 13:15:50 GPU processing failed, switching to CPU mode
2018-02-19 13:15:50 [CPU] estimating 1889x1945x192 disparity using 945x973x8u tiles
2018-02-19 13:15:50 Error: the launch timed out and was terminated (6) at line 198
2018-02-19 13:15:50 GPU processing failed, switching to CPU mode
2018-02-19 13:15:50 [CPU] estimating 1472x1562x128 disparity using 1472x781x8u tiles
2018-02-19 13:15:50
2018-02-19 13:15:50 Depth reconstruction devices performance:
2018-02-19 13:15:50  - 35%    done by GeForce GTX 1080 Ti
2018-02-19 13:15:50  - 32%    done by GeForce GTX 1080 Ti
2018-02-19 13:15:50  - 33%    done by GeForce GTX 1080 Ti
2018-02-19 13:15:50 Total time: 202.378 seconds
2018-02-19 13:15:50
2018-02-19 13:15:50 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-19 13:15:50 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-19 13:15:50 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-19 13:15:50 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-19 13:15:50 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-19 13:15:53 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-19 13:15:53 Finished processing in 231.347 sec (exit code 0)
2018-02-19 13:15:53 Error: the launch timed out and was terminated (6) at line 198
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on February 19, 2018, 09:37:29 PM
Hello edtriplett,

Can you please specify the NVIDIA driver version installed, OS version and PhotoScan version used?
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on February 21, 2018, 08:18:59 PM
Thanks for the fast reply. Here are the specs:

NVIDIA DRIVERS: 23.21.13.8813 on 3x GeForce 1080Ti
Windows 10 Enterprise 64bit
Photoscan 1.4.0 Build 5650
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on February 21, 2018, 08:26:43 PM
Hello edtriplett,

Can you please try to make a clean driver install of the package from NVIDIA web-site:
http://www.nvidia.com/download/driverResults.aspx/130633/en-us
And then check, if the processing works fine after the system reboot.
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on February 22, 2018, 11:36:02 PM
Hi Alexey,
I was able to do a fresh driver install and test it successfully when I ran the dense point cloud stage on "Mediu." but when I switched it toi high, I get the same CUDA error you see below:
Quote
2018-02-22 15:32:22 [GPU] estimating 1126x1722x288 disparity using 1126x861x8u tiles
2018-02-22 15:32:22 timings: rectify: 0.015 disparity: 0.145 borders: 0.012 filter: 0.038 fill: 0
2018-02-22 15:32:22 [GPU] estimating 1572x1238x224 disparity using 786x1238x8u tiles
2018-02-22 15:32:22 [GPU] estimating 1064x1889x384 disparity using 1064x945x8u tiles
2018-02-22 15:32:24 [GPU] estimating 1157x1962x384 disparity using 1157x981x8u tiles
2018-02-22 15:32:24 GPU borders filtering failed: the launch timed out and was terminated (6) at line 216
2018-02-22 15:32:24 using CPU implementation...
2018-02-22 15:32:24 Error: Kernel failed: the launch timed out and was terminated (6) at line 769
2018-02-22 15:32:24 GPU processing failed, switching to CPU mode
2018-02-22 15:32:24 [CPU] estimating 1064x1889x384 disparity using 1064x945x8u tiles
2018-02-22 15:32:24 Error: Kernel failed: the launch timed out and was terminated (6) at line 821
2018-02-22 15:32:24 GPU processing failed, switching to CPU mode
2018-02-22 15:32:24 [CPU] estimating 1157x1962x384 disparity using 1157x981x8u tiles
2018-02-22 15:32:24 timings: rectify: 0.021 disparity: 0.292 borders: 0.028 filter: 0.108 fill: 0
2018-02-22 15:32:24 Error: Kernel failed: the launch timed out and was terminated (6) at line 821
2018-02-22 15:32:24 GPU processing failed, switching to CPU mode
2018-02-22 15:32:24 [CPU] estimating 1381x7424x160 disparity using 1381x1061x8u tiles
2018-02-22 15:32:24 Error: Kernel failed: the launch timed out and was terminated (6) at line 170
2018-02-22 15:32:24 GPU processing failed, switching to CPU mode
2018-02-22 15:32:24 [CPU] estimating 1126x1722x288 disparity using 1126x861x8u tiles
2018-02-22 15:32:25
2018-02-22 15:32:25 Depth reconstruction devices performance:
2018-02-22 15:32:25  - 36%    done by GeForce GTX 1080 Ti
2018-02-22 15:32:25  - 32%    done by GeForce GTX 1080 Ti
2018-02-22 15:32:25  - 32%    done by GeForce GTX 1080 Ti
2018-02-22 15:32:25 Total time: 664.673 seconds
2018-02-22 15:32:25
2018-02-22 15:32:25 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-22 15:32:25 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-22 15:32:25 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-22 15:32:25 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-22 15:32:25 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-22 15:32:26 Warning: cudaStreamDestroy failed: all CUDA-capable devices are busy or unavailable (46)
2018-02-22 15:32:26 Finished processing in 691.457 sec (exit code 0)
2018-02-22 15:32:26 Error: the launch timed out and was terminated (6) at line 198
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on February 23, 2018, 12:07:21 AM
Hello edtriplett,

Please disable "use CPU" option in the GPU preferences tab and try again using the same processing settings?
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on February 23, 2018, 06:40:32 PM
I had the same instinct. I tried turning off the CPU before I sent my previous reply.

Do you know of some other options?
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on February 24, 2018, 11:48:12 PM
Hello edtriplett,

We are trying to reproduce the issue on a similar configuration.

You have mentioned that the processing works on a smaller dataset - is there any difference in the image resolution or all the photos are of the same dimensions?
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on March 15, 2018, 08:02:12 PM
I am sorry for the delay getting back to this problem. I have been working with our IT department to see if there is an issue with the configuration of the graphics cards in the PC.

So far I have had the problem only with 500+ photo projects, and I have noticed that immediately after a reboot of the PC, I am able to get the projects with larger numbers of images to process dense point clouds on medium settings, but after I get an initial failure, the CUDA errors will continue with all settings until I reboot again.

As soon as I know more I will offer an update.
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on March 22, 2018, 10:08:48 PM
I am still getting these CUDA crashes on the dense point cloud step. I have tried this with a different photo sets, but it is still happening, even when I reboot. I also asked my IT department, and outside of the driver, (which is updated), they have not been able to help.

GPUS:
3 x GeForce 1080 TI
Driver version:
23.21.13.9101

I am attching two full error logs from recent errors.
1. The error log named "Error1.txt" occurred when I checked all three GPUS and unchecked the CPU processing while running GPU processes.
2. The error log named "Error2.txt" occurred when I unchecked my top GPU and checked the CPU processing while running GPU processes.

I am unsure what to do next.
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on March 23, 2018, 01:53:59 PM
Hello edtriplett,

Can you please (for testing purposes) install Professional edition of PhotoScan 1.4.1 and execute the following single-line code directly in the Console pane and then re-start the dense cloud generation using all the graphic cards (with CPU disabled in the Preferences):
Code: [Select]
PhotoScan.app.settings.setValue("main/depth_max_gpu_multiplier", 1)
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on March 26, 2018, 07:05:23 PM
Thank you Alexey,
I did as you said and was able to get the model to process the dense cloud without any errors if I added the line of code you sent in a demo version of Agisoft Photoscan Pro. I don't have a license for Pro, so I can't save. Is there a way to duplicate this in Standard?
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on March 26, 2018, 07:33:32 PM
Hello edtriplett,

You can create a new DWORD item in registry (regedit.exe) in the following directory:
Code: [Select]
HKEY_CURRENT_USER\Software\Agisoft\PhotoScan\main\name it "depth_max_gpu_multiplier" and set value to 1.
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on March 27, 2018, 06:30:11 PM
Would you mind sending a screenshot to show what this looks like as it is written in the registry? I am new to editing these by hand.
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on March 27, 2018, 06:41:48 PM
Hello edtriplett,

See attached screenshot (although, you can actually see an example in PhotoScan Pro registry section).

It's currently set to "2"- default value, you need to re-define it to 1.
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on March 27, 2018, 07:14:37 PM
Thanks for showing me this. I am still getting the same error in Standard. I attached the console text here.
I assume the DWORD item was 64bit, correct?
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on March 30, 2018, 11:26:55 PM
Sorry to bother again. I am posting this here because I can not add an attachment in a personal message.
We just purchased a Agisoft Pro license to work around this issue because this code worked the first time with the demo of Pro:

Quote
PhotoScan.app.settings.setValue("main/depth_max_gpu_multiplier", 1)

Unfortunately, I am getting the same crash again in Pro with this code pasted into the console before running the dense cloud process. I attached the console text here.

Again the error code is this:
Quote
Error: Kernel failed: the launch timed out and was terminated (6) at line 821
Title: Re: CUDA Crash during dense point cloud processing
Post by: Alexey Pasumansky on March 30, 2018, 11:44:39 PM
Hello edtriplett,

To understand, if it is the hardware-related issue, I can suggest several tests:
- try to physically remove all GPUs but one and check, if the processing works or a similar issue occurs, you can also try putting GPU to another slot,
- if you have such an option - after the first test, try to put the GPUs (one by one) to another computer and also check, if the similar issue is observed on the second machine.

It may be also worth trying to run some CUDA-based stress tests on your configuration.
Title: Re: CUDA Crash during dense point cloud processing
Post by: edtriplett on May 24, 2018, 11:03:47 PM
Hi Alexey,
I have gone back to the manufacturer of the PC and they suggested turning off a few CPUs in windows with the "Processor Affinity" option box. This did not have any effect. I am still getting the CUDA crashes.

I took your advice and detached all but one of the GPUS and everything worked fine. Attaching two or all three causes the crash. I am attaching the reports from Cuda-Z software showing the performance of each GPU. I hope that information is useful to you, because I can't interpret it.

Title: Re: CUDA Crash during dense point cloud processing
Post by: soulbank on October 06, 2023, 01:41:44 PM
this happens to me all the time, i am on on a 4090t1 and i910k with 128gb ram on Windows 10 - when i finish processing one asset (in full so align / make mesh / texture / smooth / export model)  and then start a new project, after the initial alignment i get the same cuda error when i try to build the mesh

Note that i am processing a lot of assets back to back so it a pain to reboot every time (that fixed it). After a lot of headache i found a simple solution - in the Windows Device Manager, simply go to Display Adapters and Disable (wait a few seconds) then re-Enable your Graphics card.

The screen gets messed up (windows move around) but i am able to keep working without a reboot and its fast....

Seems that metashape is not cleaning up the CUDA que properly after finishing its jobs ? There is a way to reset the CUDA systems at the command line but i cant get it to work that way.