Forum

Author Topic: more CUDA ERROR - Error: CUDA_ERROR_UNKNOWN (999) at line 128  (Read 183 times)

andyroo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
more CUDA ERROR - Error: CUDA_ERROR_UNKNOWN (999) at line 128
« on: February 09, 2019, 01:40:50 AM »
I have been having CUDA errors on a couple of our workstations since v. 1.4-ish. Recently upgraded my most problematic one from 1.4.3 OpenCL to metashape 15.1 CUDA and sure enough got the same error immediately on starting alignment(5069 16 MP images in two camera groups):

Error: CUDA_ERROR_UNKNOWN (999) at line 128

This is a dual Xeon E5-2643 v3 w/ 512GB RAM, and three GPUs: dual EVGA 0980Ti hybrid GPUs, and a EVGA Titan X hybrid running the display.

I then updated the GPU drivers and did a repair reinstall (upgrade-in-place) of windows 10 then disabled my primary video GPU (EVGA Titan X) in Tools/Preferences/GPU, and unchecked "Use CPU when performing GPU-accelerated rendering".

Interestingly, things worked ok when I started alignment after doing this.

I was optimistic that it would work if I left "use CPU..." unchecked so I enabled the Titan X then attempted the alignment again and got the CUDA error. Interestingly, it threw the error on the Titan X immediately (I infer this because it threw the error on the third image), which makes me think that maybe it's because of another process on that GPU. These are all the processes that NVidia-SMI reports on the GPUs:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      5108      C   ...les\Agisoft\Metashape Pro\metashape.exe N/A      |
|    1       744    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    1      1320    C+G   Insufficient Permissions                   N/A      |
|    1      5108    C+G   ...les\Agisoft\Metashape Pro\metashape.exe N/A      |
|    1      5704    C+G   ...hell.Experiences.TextInput.InputApp.exe N/A      |
|    1      6352    C+G   ...6)\Google\Chrome\Application\chrome.exe N/A      |
|    1      7212    C+G   C:\Windows\explorer.exe                    N/A      |
|    1      7540    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    2      5108      C   ...les\Agisoft\Metashape Pro\metashape.exe N/A      |
+-----------------------------------------------------------------------------+

(PID 1320 is Desktop Window Manager)

Here' a partial log showing the successful partial run, and the failed next run after I enabled the Titan X:

Code: [Select]
2019-02-08 13:52:46 Agisoft Metashape Professional Version: 1.5.1 build 7618 (64 bit)
2019-02-08 13:52:46 Platform: Windows
2019-02-08 13:52:46 CPU: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz (server)
2019-02-08 13:52:46 CPU family: 6 model: 63 signature: 306F2h
2019-02-08 13:52:46 RAM: 511.9 GB
2019-02-08 13:52:49 OpenGL Vendor: NVIDIA Corporation
2019-02-08 13:52:49 OpenGL Renderer: GeForce GTX TITAN X/PCIe/SSE2
2019-02-08 13:52:49 OpenGL Version: 4.6.0 NVIDIA 418.81
2019-02-08 13:52:49 Maximum Texture Size: 16384
2019-02-08 13:52:49 Quad Buffered Stereo: not enabled
2019-02-08 13:52:49 ARB_vertex_buffer_object: supported
2019-02-08 13:52:49 ARB_texture_non_power_of_two: supported

2019-02-08 13:52:54 LoadProject: path = E:/Elwha/20190206PlaneCam/SfM/20190206_Quinault.psx
2019-02-08 13:52:54 Loading project...
2019-02-08 13:52:54 loaded project in 0.548 sec
2019-02-08 13:52:54 Finished processing in 0.548 sec (exit code 1)
2019-02-08 13:54:11 AlignPhotos: accuracy = High, preselection = generic, keypoint limit = 0, tiepoint limit = 0, apply masks = 0, filter tie points = 0, adaptive fitting = 0
2019-02-08 13:54:11 Matching photos...
2019-02-08 13:54:14 saved matching data in 0.016 sec
2019-02-08 13:54:19 scheduled 100 keypoint detection groups
2019-02-08 13:54:19 saved keypoint partition in 0.016 sec
2019-02-08 13:54:19 groups: 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128395
2019-02-08 13:54:22 340435 of 5069 used (6716.02%)
2019-02-08 13:54:22 scheduled 100 keypoint matching groups
2019-02-08 13:54:22 saved matching partition in 0.062 sec
2019-02-08 13:54:23 loaded keypoint partition in 0 sec
2019-02-08 13:54:23 loaded matching data in 0 sec
2019-02-08 13:54:23 Detecting points...
2019-02-08 13:54:40 Using device: GeForce GTX 980 Ti, 22 compute units, free memory: 5088/6144 MB, compute capability 5.2
2019-02-08 13:54:40   driver version: 418.81, driver/runtime CUDA: 10010/5050
2019-02-08 13:54:40   max work group size 1024
2019-02-08 13:54:40   max work item sizes [1024, 1024, 64]
2019-02-08 13:54:40 Using device: GeForce GTX 980 Ti, 22 compute units, free memory: 5088/6144 MB, compute capability 5.2
2019-02-08 13:54:40   driver version: 418.81, driver/runtime CUDA: 10010/5050
2019-02-08 13:54:40   max work group size 1024
2019-02-08 13:54:40   max work item sizes [1024, 1024, 64]
2019-02-08 13:54:41 [GPU] photo 0: 99254 points
2019-02-08 13:54:41 [GPU] photo 100: 109523 points
2019-02-08 13:54:41 [GPU] photo 200: 102206 points
2019-02-08 13:54:41 [GPU] photo 300: 124713 points
2019-02-08 13:54:42 [GPU] photo 400: 117956 points
2019-02-08 13:54:42 [GPU] photo 500: 126365 points
2019-02-08 13:54:42 [GPU] photo 600: 69406 points
2019-02-08 13:54:42 [GPU] photo 700: 117144 points
2019-02-08 13:54:42 [GPU] photo 800: 114690 points
2019-02-08 13:54:42 [GPU] photo 900: 105174 points
2019-02-08 13:54:43 [GPU] photo 1000: 94910 points
2019-02-08 13:54:43 [GPU] photo 1100: 93596 points
2019-02-08 13:54:43 [GPU] photo 1200: 94911 points
2019-02-08 13:54:43 [GPU] photo 1300: 99096 points
2019-02-08 13:54:43 [GPU] photo 1400: 83928 points
2019-02-08 13:54:43 [GPU] photo 1500: 82285 points
2019-02-08 13:54:44 [GPU] photo 1600: 93443 points
2019-02-08 13:54:44 [GPU] photo 1700: 99221 points
2019-02-08 13:54:44 [GPU] photo 1800: 76264 points
2019-02-08 13:54:44 [GPU] photo 2000: 57386 points
2019-02-08 13:54:44 [GPU] photo 1900: 99425 points
2019-02-08 13:54:44 [GPU] photo 2100: 96156 points
2019-02-08 13:54:45 [GPU] photo 2200: 98751 points
2019-02-08 13:54:45 [GPU] photo 2300: 121368 points
2019-02-08 13:54:45 [GPU] photo 2400: 113657 points
2019-02-08 13:54:45 [GPU] photo 2500: 100553 points
2019-02-08 13:54:45 [GPU] photo 2600: 139357 points
2019-02-08 13:54:45 [GPU] photo 2700: 145186 points
2019-02-08 13:54:46 [GPU] photo 2800: 74595 points
2019-02-08 13:54:46 [GPU] photo 2900: 114896 points
2019-02-08 13:54:46 [GPU] photo 3000: 83435 points
2019-02-08 13:54:46 [GPU] photo 3100: 97808 points
2019-02-08 13:54:46 [GPU] photo 3200: 114143 points
2019-02-08 13:54:46 [GPU] photo 3300: 48723 points
2019-02-08 13:54:47 [GPU] photo 3400: 122766 points
2019-02-08 13:54:47 [GPU] photo 3500: 123042 points
2019-02-08 13:54:47 [GPU] photo 3600: 89272 points
2019-02-08 13:54:47 [GPU] photo 3700: 79337 points
2019-02-08 13:54:47 [GPU] photo 3800: 87967 points
2019-02-08 13:54:47 [GPU] photo 3900: 74224 points
2019-02-08 13:54:48 [GPU] photo 4000: 119007 points
2019-02-08 13:54:48 [GPU] photo 4100: 97702 points
2019-02-08 13:54:48 [GPU] photo 4200: 115408 points
2019-02-08 13:54:48 [GPU] photo 4300: 98065 points
2019-02-08 13:54:48 [GPU] photo 4400: 79883 points
2019-02-08 13:54:48 [GPU] photo 4500: 117093 points
2019-02-08 13:54:48 [GPU] photo 4600: 123136 points
2019-02-08 13:54:49 [GPU] photo 4700: 103121 points
2019-02-08 13:54:49 [GPU] photo 4800: 84256 points
2019-02-08 13:54:49 [GPU] photo 4900: 118860 points
2019-02-08 13:54:49 [GPU] photo 5000: 98751 points
2019-02-08 13:54:49 points detected in 26.06 sec
2019-02-08 13:54:50 loaded keypoint partition in 0 sec
2019-02-08 13:54:50 loaded matching data in 0 sec
2019-02-08 13:54:50 Detecting points...
2019-02-08 13:54:50 Using device: GeForce GTX 980 Ti, 22 compute units, free memory: 5088/6144 MB, compute capability 5.2
2019-02-08 13:54:50   driver version: 418.81, driver/runtime CUDA: 10010/5050
2019-02-08 13:54:50   max work group size 1024
2019-02-08 13:54:50   max work item sizes [1024, 1024, 64]
2019-02-08 13:54:50 Using device: GeForce GTX 980 Ti, 22 compute units, free memory: 5088/6144 MB, compute capability 5.2
2019-02-08 13:54:50   driver version: 418.81, driver/runtime CUDA: 10010/5050
2019-02-08 13:54:50   max work group size 1024
2019-02-08 13:54:50   max work item sizes [1024, 1024, 64]
2019-02-08 13:54:51 [GPU] photo 1: 131726 points
...<clipped for brevity>
2019-02-08 13:55:26 [GPU] photo 404: 102017 points
2019-02-08 13:55:26 [GPU] photo 504: 126837 points
2019-02-08 13:55:27 Finished batch processing in 75.744 sec (exit code 0)
2019-02-08 13:55:32 Error: Aborted by user
2019-02-08 13:56:00 Saving project...
2019-02-08 13:56:00 saved project in 0.079 sec
2019-02-08 13:56:00 AlignPhotos: accuracy = High, preselection = generic, keypoint limit = 0, tiepoint limit = 0, apply masks = 0, filter tie points = 0, adaptive fitting = 0
2019-02-08 13:56:00 Matching photos...
2019-02-08 13:56:03 saved matching data in 0 sec
2019-02-08 13:56:08 scheduled 100 keypoint detection groups
2019-02-08 13:56:08 saved keypoint partition in 0 sec
2019-02-08 13:56:08 groups: 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128449 128395
2019-02-08 13:56:11 340435 of 5069 used (6716.02%)
2019-02-08 13:56:11 scheduled 100 keypoint matching groups
2019-02-08 13:56:11 saved matching partition in 0.046 sec
2019-02-08 13:56:12 loaded keypoint partition in 0 sec
2019-02-08 13:56:12 loaded matching data in 0 sec
2019-02-08 13:56:12 Detecting points...
2019-02-08 13:56:12 Using device: GeForce GTX 980 Ti, 22 compute units, free memory: 5088/6144 MB, compute capability 5.2
2019-02-08 13:56:12   driver version: 418.81, driver/runtime CUDA: 10010/5050
2019-02-08 13:56:12   max work group size 1024
2019-02-08 13:56:12   max work item sizes [1024, 1024, 64]
2019-02-08 13:56:12 Using device: GeForce GTX 980 Ti, 22 compute units, free memory: 5088/6144 MB, compute capability 5.2
2019-02-08 13:56:12   driver version: 418.81, driver/runtime CUDA: 10010/5050
2019-02-08 13:56:12   max work group size 1024
2019-02-08 13:56:12   max work item sizes [1024, 1024, 64]
2019-02-08 13:56:12 Using device: GeForce GTX TITAN X, 24 compute units, free memory: 10093/12288 MB, compute capability 5.2
2019-02-08 13:56:12   driver version: 418.81, driver/runtime CUDA: 10010/5050
2019-02-08 13:56:12   max work group size 1024
2019-02-08 13:56:12   max work item sizes [1024, 1024, 64]
2019-02-08 13:56:13 [GPU] photo 100: 109523 points
2019-02-08 13:56:13 [GPU] photo 0: 99254 points
2019-02-08 13:56:13 Error: CUDA_ERROR_UNKNOWN (999) at line 128
2019-02-08 13:56:13 Saving project...
2019-02-08 13:56:13 saved project in 0.063 sec
2019-02-08 13:56:13 OptimizeCameras: f, cx, cy, k1-k3, p1, p2, adaptive fitting = 0
2019-02-08 13:56:13 Optimizing camera locations...
2019-02-08 13:56:13 coordinates applied in 0 sec
2019-02-08 13:56:13 Finished processing in 0.015 sec
2019-02-08 13:56:13 Saving project...
2019-02-08 13:56:13 saved project in 0.063 sec
2019-02-08 13:56:13 Finished batch processing in 13.109 sec (exit code 1)

I don't have anything more to add at this point. I have to reboot every time I get a CUDA error to make the GPU happy again, so I'll try to figure out how to kill these processes on the Titan and see if I can get it to run OK.

andyroo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
Re: more CUDA ERROR - Error: CUDA_ERROR_UNKNOWN (999) at line 128
« Reply #1 on: February 20, 2019, 10:08:25 PM »
Update - finally got around to attempting to generate a dense cloud with only my 2 980Ti GPUs and got the same damn CUDA error:

Code: [Select]
2019-02-19 18:30:35 Error: CUDA_ERROR_UNKNOWN (999) at line 128
I would love to know how to enable openCL in metashape (tweaks?)

Andy

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 9871
    • View Profile
Re: more CUDA ERROR - Error: CUDA_ERROR_UNKNOWN (999) at line 128
« Reply #2 on: February 21, 2019, 08:24:25 PM »
Hello Andy,

It's quite difficult to investigate such issues, as they seem to be related to hardware configurations (from mechanical defect to power supply or over/under-clocking) or drivers. So far we were not able to reproduce anything similar on our test configurations, that include NVIDIA cards of different series.
Best regards,
Alexey Pasumansky,
AgiSoft LLC