Forum

Author Topic: NVIDIA DGX-1  (Read 8371 times)

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
NVIDIA DGX-1
« on: February 10, 2017, 03:31:48 PM »
Hi All,
we are considering to acquire an NVIDIA DGX-1 computer for our work with Photoscan, among other things.
http://images.nvidia.com/content/technologies/deep-learning/pdf/Datasheet-DGX1.pdf

Some Specs:
8x Tesla P100
512Gb RAM
Dual 20core Xeon E5-2698

Am I failing to identify some compelling reason why this is a bad idea? We do not really want to deal with a multi machine cluster instead. Yes, I am aware it costs 130k€.

Opinions welcome!
Tom

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
Re: NVIDIA DGX-1
« Reply #1 on: March 02, 2017, 01:32:59 PM »
So here comes the test, and something is weird:
While the machine is generally (very) fast, the mesh generation step is around 100 times slower than on my HPZ840 running Windows. How come? Alexey, can you comment?
This is what the machine does with ten 20MP images:
Code: [Select]
LoadProject
loaded project in 0.009438 sec
clGetPlatformIDs failed: CL_UNKNOWN_ERROR_CODE_-1001
HERE BE GPU #:
[{'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 6, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}, {'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 7, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}, {'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 10, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}, {'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 11, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}, {'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 133, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}, {'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 134, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}, {'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 137, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}, {'compute_units': 56, 'clock': 1480, 'version': '', 'mem_size': 17100439552, 'pci_bus_id': 138, 'name': 'Tesla P100-SXM2-16GB', 'pci_device_id': 0, 'vendor': ''}]
MatchPhotos: accuracy = High, preselection = generic, reference, keypoint limit = 40000, tiepoint limit = 4000, constrain features by mask = 0
clGetPlatformIDs failed: CL_UNKNOWN_ERROR_CODE_-1001
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
photo 4078: 39794 points
photo 4088: 39843 points
photo 4072: 39886 points
photo 4070: 39983 points
photo 4074: 39791 points
photo 4082: 39868 points
photo 4084: 39831 points
photo 4086: 39998 points
photo 4076: 39935 points
photo 4080: 39860 points
points detected in 6.30511 sec
clGetPlatformIDs failed: CL_UNKNOWN_ERROR_CODE_-1001
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
4112 matches found in 0.230473 sec
matches combined in 0.000547 sec
matches filtered in 0.048923 sec
12 of 14 pairs selected in 1e-05 sec
clGetPlatformIDs failed: CL_UNKNOWN_ERROR_CODE_-1001
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
59225 matches found in 0.299517 sec
matches combined in 0.006723 sec
matches filtered in 0.356821 sec
finished matching in 7.35111 sec
setting point indices... 21295 done in 0.003269 sec
generated 21295 tie points, 2.44593 average projections
removed 106 multiple indices
removed 18 tracks
selected 13228 tracks out of 21277 in 0.001883 sec
AlignCameras: adaptive fitting = 1
processing block: 10 photos
pair 4076 and 4078: 2772 robust from 2820
pair 4078 and 4080: 2728 robust from 2783
adding photos 4076 and 4078 (2772 robust)
adding 2815 points, 0 far (6.08 threshold), 0 inaccurate, 5 invisible, 0 weak
adjusting: xxxxxxx 0.457606 -> 0.14951
adding 2 points, 1 far (6.08 threshold), 0 inaccurate, 1 invisible, 0 weak
optimized in 0.321948 seconds
adding camera 4080 (3 of 10), 1648 of 1654 used
adding camera 4074 (4 of 10), 1497 of 1502 used
adding 2306 points, 7 far (6.08 threshold), 0 inaccurate, 4 invisible, 0 weak
adjusting: xxxxxxx 0.211001 -> 0.124918
adding 5 points, 1 far (6.08 threshold), 0 inaccurate, 4 invisible, 0 weak
optimized in 0.33958 seconds
adding camera 4082 (5 of 10), 1257 of 1261 used
adding camera 4072 (6 of 10), 707 of 709 used
adding 2587 points, 5 far (6.08 threshold), 0 inaccurate, 15 invisible, 0 weak
adjusting: xxxxxxxxx 0.163178 -> 0.133594
adding 17 points, 1 far (6.08 threshold), 0 inaccurate, 16 invisible, 0 weak
optimized in 0.13536 seconds
adding camera 4084 (7 of 10), 777 of 778 used
adding camera 4070 (8 of 10), 399 of 399 used
adding 3023 points, 3 far (6.08 threshold), 0 inaccurate, 16 invisible, 0 weak
adjusting: xxxxxxxxx 0.157561 -> 0.138203
adding 18 points, 2 far (6.08 threshold), 0 inaccurate, 17 invisible, 0 weak
optimized in 0.06208 seconds
adding camera 4086 (9 of 10), 543 of 544 used
adding 1479 points, 2 far (6.08 threshold), 0 inaccurate, 20 invisible, 0 weak
adjusting: xxxxxxxx 0.148055 -> 0.13876
adding 23 points, 2 far (6.08 threshold), 1 inaccurate, 21 invisible, 0 weak
optimized in 0.215891 seconds
adding camera 4088 (10 of 10), 284 of 284 used
adding 1047 points, 2 far (6.08 threshold), 1 inaccurate, 20 invisible, 0 weak
adjusting: xxxxxxxxxx 0.177195 -> 0.148982
adding 23 points, 3 far (6.08 threshold), 14 inaccurate, 20 invisible, 0 weak
optimized in 0.067274 seconds
3 sigma filtering...
adjusting: xxxxx 0.148239 -> 0.148087
point variance: 0.134174 threshold: 0.402522
adding 6 points, 187 far (0.402522 threshold), 0 inaccurate, 6 invisible, 0 weak
adjusting: xxxx 0.115785 -> 0.115254
point variance: 0.104041 threshold: 0.312123
adding 43 points, 298 far (0.312123 threshold), 0 inaccurate, 6 invisible, 0 weak
adjusting: xxxxx 0.106958 -> 0.106917
point variance: 0.0960216 threshold: 0.288065
adding 182 points, 286 far (0.288065 threshold), 0 inaccurate, 6 invisible, 0 weak
adjusting: xxxxx 0.104337 -> 0.104323
point variance: 0.0935057 threshold: 0.280517
adding 253 points, 298 far (0.280517 threshold), 0 inaccurate, 6 invisible, 0 weak
adjusting: xxxxx 0.103163 -> 0.103144
point variance: 0.0923657 threshold: 0.277097
adding 284 points, 302 far (0.277097 threshold), 0 inaccurate, 6 invisible, 0 weak
optimized in 0.782872 seconds
f 1548.12, cx -8.74659, cy -18.8127, k1 -0.0209463, k2 0.000455108, k3 -0.000270836
finished sfm in 2.44364 seconds
coordinates applied in 0 sec

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
Re: NVIDIA DGX-1
« Reply #2 on: March 02, 2017, 01:34:16 PM »
and here comes the second part of the console output:
Code: [Select]
BuildDenseCloud: quality = Medium, depth filtering = Aggressive
clGetPlatformIDs failed: CL_UNKNOWN_ERROR_CODE_-1001
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using device: Tesla P100-SXM2-16GB, 56 compute units, 16308 MB global memory, CUDA 6.0
  max work group size 1024
  max work item sizes [1024, 1024, 64]
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
Using CUDA device 'Tesla P100-SXM2-16GB' in concurrent. (4 times)
sorting point cloud... done in 0.000163 sec
processing matches... done in 0.001732 sec
initializing...
selected 10 cameras from 10 in 0.053334 sec
loaded photos in 0.291136 seconds
[GPU] estimating 483x731x128 disparity using 483x731x8u tiles
[GPU] estimating 554x884x160 disparity using 554x884x8u tiles
[GPU] estimating 634x1037x96 disparity using 634x1037x8u tiles
[GPU] estimating 418x679x160 disparity using 418x679x8u tiles
[GPU] estimating 635x1037x96 disparity using 635x1037x8u tiles
[GPU] estimating 630x948x128 disparity using 630x948x8u tiles
[GPU] estimating 577x915x128 disparity using 577x915x8u tiles
[GPU] estimating 538x910x128 disparity using 538x910x8u tiles
[GPU] estimating 500x858x128 disparity using 500x858x8u tiles
[CPU] estimating 493x746x128 disparity using 493x746x8u tiles
timings: rectify: 0.167045 disparity: 0.17471 borders: 0.007665 filter: 0.013486 fill: 0
[GPU] estimating 760x1002x96 disparity using 760x1002x8u tiles
timings: rectify: 0.152975 disparity: 0.220637 borders: 0.008709 filter: 0.02217 fill: 0
[GPU] estimating 726x1024x128 disparity using 726x1024x8u tiles
timings: rectify: 0.135794 disparity: 0.250467 borders: 0.00694 filter: 0.007232 fill: 0
[GPU] estimating 529x759x160 disparity using 529x759x8u tiles
timings: rectify: 0.13577 disparity: 0.304682 borders: 0.010936 filter: 0.017883 fill: 0
timings: rectify: 0.111978 disparity: 0.337518 borders: 0.011714 filter: 0.019683 fill: 1e-06
[GPU] estimating 500x858x128 disparity using 500x858x8u tiles
[GPU] estimating 613x887x160 disparity using 613x887x8u tiles
timings: rectify: 0.148843 disparity: 0.384866 borders: 0.014197 filter: 0.034039 fill: 0
timings: rectify: 0.154598 disparity: 0.3751 borders: 0.028259 filter: 0.016216 fill: 0
timings: rectify: 0.162516 disparity: 0.397937 borders: 0.025022 filter: 0.020918 fill: 0
timings: rectify: 0.046137 disparity: 0.596174 borders: 0.025515 filter: 0.021957 fill: 0
timings: rectify: 0.108306 disparity: 0.471166 borders: 0.019896 filter: 0.02613 fill: 0
timings: rectify: 0.01666 disparity: 0.206933 borders: 0.021671 filter: 0.041772 fill: 0
[GPU] estimating 696x1020x96 disparity using 696x1020x8u tiles
[GPU] estimating 697x1020x96 disparity using 697x1020x8u tiles
[GPU] estimating 774x1017x128 disparity using 774x1017x8u tiles
[GPU] estimating 812x1169x96 disparity using 812x1169x8u tiles
[GPU] estimating 585x867x128 disparity using 585x867x8u tiles
[CPU] estimating 603x867x128 disparity using 603x867x8u tiles
timings: rectify: 0.012305 disparity: 0.291412 borders: 0.013577 filter: 0.020515 fill: 0
timings: rectify: 0.014202 disparity: 0.390282 borders: 0.023196 filter: 0.024572 fill: 0
timings: rectify: 0.009832 disparity: 0.355969 borders: 0.01082 filter: 0.019826 fill: 0
[GPU] estimating 712x989x96 disparity using 712x989x8u tiles
timings: rectify: 0.022147 disparity: 0.285326 borders: 0.026725 filter: 0.037049 fill: 0
[GPU] estimating 621x861x128 disparity using 621x861x8u tiles
[GPU] estimating 623x893x128 disparity using 623x893x8u tiles
timings: rectify: 0.066493 disparity: 0.251076 borders: 0.016061 filter: 0.022722 fill: 0
[GPU] estimating 576x915x128 disparity using 576x915x8u tiles
timings: rectify: 0.026124 disparity: 0.285987 borders: 0.021563 filter: 0.033089 fill: 0
[GPU] estimating 725x1024x128 disparity using 725x1024x8u tiles
timings: rectify: 0.066011 disparity: 0.348459 borders: 0.015128 filter: 0.024264 fill: 0
timings: rectify: 0.021747 disparity: 0.336556 borders: 0.014336 filter: 0.027326 fill: 0
timings: rectify: 0.054899 disparity: 0.393497 borders: 0.024363 filter: 0.029795 fill: 0
[GPU] estimating 760x1002x96 disparity using 760x1002x8u tiles
[GPU] estimating 813x1169x96 disparity using 813x1169x8u tiles
timings: rectify: 0.022996 disparity: 0.209382 borders: 0.013782 filter: 0.016938 fill: 0
[GPU] estimating 623x893x128 disparity using 623x893x8u tiles
timings: rectify: 0.028533 disparity: 0.285065 borders: 0.014685 filter: 0.012513 fill: 1e-06
[GPU] estimating 603x867x128 disparity using 603x867x8u tiles
timings: rectify: 0.016282 disparity: 0.246852 borders: 0.014629 filter: 0.023118 fill: 0
timings: rectify: 0.017914 disparity: 0.324341 borders: 0.026295 filter: 0.024268 fill: 0
timings: rectify: 0.015385 disparity: 0.204311 borders: 0.028789 filter: 0.030402 fill: 0
[GPU] estimating 538x910x128 disparity using 538x910x8u tiles
[GPU] estimating 554x884x160 disparity using 554x884x8u tiles
[GPU] estimating 621x861x128 disparity using 621x861x8u tiles
timings: rectify: 0.02934 disparity: 0.191801 borders: 0.012896 filter: 0.02396 fill: 0
timings: rectify: 0.031876 disparity: 0.771112 borders: 0.021433 filter: 0.02182 fill: 0
[GPU] estimating 773x1017x128 disparity using 773x1017x8u tiles
timings: rectify: 0.014296 disparity: 0.174094 borders: 0.010324 filter: 0.013271 fill: 0
[GPU] estimating 482x731x128 disparity using 482x731x8u tiles
[CPU] estimating 586x867x128 disparity using 586x867x8u tiles
timings: rectify: 0.018216 disparity: 0.364905 borders: 0.021828 filter: 0.022477 fill: 0
timings: rectify: 0.019611 disparity: 0.187715 borders: 0.021241 filter: 0.022206 fill: 0
timings: rectify: 0.022992 disparity: 0.193045 borders: 0.019086 filter: 0.022727 fill: 0
timings: rectify: 0.014267 disparity: 0.335437 borders: 0.040208 filter: 0.02907 fill: 0
[GPU] estimating 712x989x96 disparity using 712x989x8u tiles
[GPU] estimating 528x759x160 disparity using 528x759x8u tiles
timings: rectify: 0.006838 disparity: 0.16229 borders: 0.014458 filter: 0.010548 fill: 0
timings: rectify: 0.029918 disparity: 0.333465 borders: 0.011678 filter: 0.024755 fill: 0
timings: rectify: 0.01464 disparity: 0.256611 borders: 0.03076 filter: 0.027046 fill: 0
[GPU] estimating 493x746x128 disparity using 493x746x8u tiles
[GPU] estimating 629x948x128 disparity using 629x948x8u tiles
timings: rectify: 0.010937 disparity: 0.1583 borders: 0.012454 filter: 0.020326 fill: 0
timings: rectify: 0.033134 disparity: 0.152099 borders: 0.023526 filter: 0.022636 fill: 0
[GPU] estimating 614x887x160 disparity using 614x887x8u tiles
timings: rectify: 0.023905 disparity: 0.075524 borders: 0.006219 filter: 0.010075 fill: 0
timings: rectify: 0.016776 disparity: 0.112741 borders: 0.017154 filter: 0.014009 fill: 0
timings: rectify: 0.01024 disparity: 0.092939 borders: 0.008295 filter: 0.013219 fill: 0
[GPU] estimating 419x679x160 disparity using 419x679x8u tiles
timings: rectify: 0.005304 disparity: 0.047405 borders: 0.004569 filter: 0.005526 fill: 0
timings: rectify: 0.008431 disparity: 0.762599 borders: 0.006958 filter: 0.013809 fill: 0
finished depth reconstruction in 2.4669 seconds
Device 1: 3% work done with performance: 32.0177 million samples/sec (CPU), device used for 2.12988 seconds
Device 2: 33% work done with performance: 295.151 million samples/sec (Tesla P100-SXM2-16GB), device used for 3.70754 seconds
Device 6: 46% work done with performance: 410.559 million samples/sec (Tesla P100-SXM2-16GB), device used for 5.40371 seconds
Device 10: 15% work done with performance: 138.118 million samples/sec (Tesla P100-SXM2-16GB), device used for 1.05966 seconds
Device 14: 0% work done with performance: 0 million samples/sec (Tesla P100-SXM2-16GB), device used for 0 seconds
Device 18: 0% work done with performance: 0 million samples/sec (Tesla P100-SXM2-16GB), device used for 0 seconds
Device 22: 0% work done with performance: 0 million samples/sec (Tesla P100-SXM2-16GB), device used for 0 seconds
Device 26: 0% work done with performance: 0 million samples/sec (Tesla P100-SXM2-16GB), device used for 0 seconds
Device 30: 0% work done with performance: 0 million samples/sec (Tesla P100-SXM2-16GB), device used for 0 seconds
Total performance: 875.846 million samples/sec
selected 10 cameras in 0.114584 sec
working volume: 1846x1210x607
tiles: 1x1x1
selected 10 cameras
preloading data... done in 0.043698 sec
filtering depth maps... done in 0.865229 sec
preloading data... done in 0.229593 sec
accumulating data... done in 0.350426 sec
building point cloud... done in 0.202695 sec
2575138 points extracted
BuildModel: surface type = Arbitrary, source data = Dense cloud, face count = Medium, interpolation = Enabled
Grid size: 1846 x 1210 x 607
Tree depth: 11
Tree set in 10.6122s (1751289 points)
Leaves/Nodes: 12480126/14263001
Laplacian constraints set in 1.38601s
Depth[0/11]: 1
        Evaluated / Got / Solved in: 0 / 0.000301123 / 0.00117588
Depth[1/11]: 8
        Evaluated / Got / Solved in: 0 / 0.000265121 / 0.0132508
Depth[2/11]: 64
        Evaluated / Got / Solved in: 0 / 0.0225217 / 3.93458
Depth[3/11]: 512
        Evaluated / Got / Solved in: 0 / 0.0582504 / 3.14985
Depth[4/11]: 4096
        Evaluated / Got / Solved in: 0 / 0.0094316 / 0.0776699
Depth[5/11]: 32768
        Evaluated / Got / Solved in: 0 / 0.148186 / 12.7921
Depth[6/11]: 36696
        Evaluated / Got / Solved in: 0 / 0.296992 / 28.7145
Depth[7/11]: 122136
        Evaluated / Got / Solved in: 0 / 0.127057 / 1.54786
Depth[8/11]: 431088
        Evaluated / Got / Solved in: 0 / 0.52249 / 37.7161
Depth[9/11]: 1473336
        Evaluated / Got / Solved in: 0 / 1.69444 / 100.925
Depth[10/11]: 4416776
        Evaluated / Got / Solved in: 0 / 2.93468 / 129.633
Depth[11/11]: 7745520
        Evaluated / Got / Solved in: 0 / 4.71441 / 254.811
Linear system solved in 584.689s
Got Iso-value in 0.59202s
Iso-Value -0.432739
4632747 faces extracted in 107.812s
decimating mesh (4610222 -> 116752)
processing nodes...  done in 0.038203 sec
calculating colors...  done in 0.311711 sec
BuildUV: blending mode = Generic, texture count = 1
calculating mesh connectivity... done in 0.038562 sec
estimating quality... ********** done in 2.22983 sec
blending textures... ********** done in 2.66529 sec
postprocessing atlas... done in 0.13599 sec
SaveProject
saved project in 1.26884 sec
[/code]

maddin

  • Full Member
  • ***
  • Posts: 161
    • View Profile
Re: NVIDIA DGX-1
« Reply #3 on: March 02, 2017, 01:39:49 PM »
NIce machine :-)

James

  • Hero Member
  • *****
  • Posts: 748
    • View Profile
Re: NVIDIA DGX-1
« Reply #4 on: March 02, 2017, 01:49:07 PM »
Without knowing too much about the specific hardware, it seems like a test with just 10 photos isn't going to be very enlightening.

It's a bit like comparing the performance of a Ferrari LaFerrari against a Toyota Prius by seeing how long it takes to get each of them out of the garage.

Although I do agree, it is a bit weird.
« Last Edit: March 02, 2017, 01:51:25 PM by James »

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
Re: NVIDIA DGX-1
« Reply #5 on: March 02, 2017, 01:59:50 PM »
I am sparing you the tests with many images.
We have casually done a complete 7000 image project with a medium cloud in one go overnight, but there is no point running it any further if the mesh performance stalls with just 10 images.

From what I can see, the mesh interpolation is the issue, but if I switch it off it hangs during texture allocation because there are so many small fragments to texture.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14843
    • View Profile
Re: NVIDIA DGX-1
« Reply #6 on: March 02, 2017, 02:16:06 PM »
Hello Tom,

Can you share the project in PSZ format with the dense cloud, so that we can run the mesh generation on our test configurations?
Best regards,
Alexey Pasumansky,
Agisoft LLC

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
Re: NVIDIA DGX-1
« Reply #7 on: March 02, 2017, 02:36:59 PM »
Hello Alexey,
please find the project and the images here:
https://we.tl/eFuYrtkQhc

I have the machine until Sunday night...
Many thanks and best greetings!
Tom

maddin

  • Full Member
  • ***
  • Posts: 161
    • View Profile
Re: NVIDIA DGX-1
« Reply #8 on: March 02, 2017, 02:58:33 PM »
From what I can see, the mesh interpolation is the issue, but if I switch it off it hangs during texture allocation because there are so many small fragments to texture.

Interesting. FWIW, I just ran into my own issue with mesh interpolation, related to memory problems.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14843
    • View Profile
Re: NVIDIA DGX-1
« Reply #9 on: March 02, 2017, 03:08:21 PM »
please find the project and the images here:
https://we.tl/eFuYrtkQhc

Hello Tom,

Thank you for sharing, we'll test in on our dual-Xeon configurations. Your system is running Linux, right?
Best regards,
Alexey Pasumansky,
Agisoft LLC

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
Re: NVIDIA DGX-1
« Reply #10 on: March 02, 2017, 03:11:07 PM »
Correct, the DGX-1 is running Ubuntu 14.04.4 LTS (GNU/Linux 4.4.0-45-generic x86_64)

This is how I run the machine in headless mode. Playing with the GPU and CPU Settings throughout the workflow does not really change anything.

Code: [Select]
#Set path
os.chdir("/mydata/scripts/")

print(os.getcwd())

# Open Project
doc = PhotoScan.app.document
doc.open("densecloudtest.psz")
#densecloudtest.psz")

# check GPU useage
foo = PhotoScan.app.enumGPUDevices()
print("List of available GPUs:")
print(foo)


# Set Number of GPUs to use, in numbers expressing binary switching
# http://www.exploringbinary.com/binary-converter/
PhotoScan.app.gpu_mask = 255
PhotoScan.app.cpu_enable = True # Use CPU when GPU is active

# Change path or parts of image path
chunk = doc.chunk
# chunk = PhotoScan.app.document.chunk
#for i in range(len(chunk.cameras)):
##      print(chunk.cameras[i].photo.path)
#       chunk.cameras[i].photo.path = chunk.cameras[i].photo.path.replace ("/../data/VirtualVents/AUVcam/","/tomsdata/")
##      print(chunk.cameras[i].photo.path)

# run processing
#chunk = doc.chunk
chunk.matchPhotos(accuracy=PhotoScan.HighAccuracy, reference_preselection=True)
chunk.alignCameras()
chunk.buildDenseCloud(quality=PhotoScan.MediumQuality)
PhotoScan.app.cpu_enable = False
PhotoScan.app.gpu_mask = 0
chunk.buildModel(surface=PhotoScan.Arbitrary, interpolation=PhotoScan.EnabledInterpolation)
PhotoScan.app.gpu_mask = 255
PhotoScan.app.cpu_enable = True
chunk.buildUV(mapping=PhotoScan.GenericMapping)
chunk.buildTexture(blending=PhotoScan.MosaicBlending, size=4096)
doc.save(path="mesh_noGPU.psz")
« Last Edit: March 02, 2017, 03:12:40 PM by tkwasnitschka »

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14843
    • View Profile
Re: NVIDIA DGX-1
« Reply #11 on: March 02, 2017, 03:13:59 PM »
Hello Tom,

Mesh is generated on CPU only, so GPU settings have no effect on the procedure.
Best regards,
Alexey Pasumansky,
Agisoft LLC

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
Re: NVIDIA DGX-1
« Reply #12 on: March 02, 2017, 03:20:36 PM »
Absolutely - I was unsure wether the allocation of cores for GPU control is effective only through GPU steps or permanently, so I thought I might give it at least all the CPU cores in that step. Maybe you can clarify.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14843
    • View Profile
Re: NVIDIA DGX-1
« Reply #13 on: March 02, 2017, 05:07:41 PM »
Hello Tom,

Can you please also specify, if you have Hyper-Threading turned on and whether there are any other processes consuming CPU performance?
Best regards,
Alexey Pasumansky,
Agisoft LLC

tkwasnitschka

  • Jr. Member
  • **
  • Posts: 66
    • View Profile
Re: NVIDIA DGX-1
« Reply #14 on: March 02, 2017, 06:33:22 PM »
Alexey,
there are no other costly processes running and hyperthreading is activated...
Thanks,
Tom