Forum

Author Topic: Benchmarking a GPUs  (Read 96373 times)

Matt

  • Full Member
  • ***
  • Posts: 104
    • View Profile
Re: Benchmarking a GPUs
« Reply #15 on: October 11, 2012, 05:40:05 AM »
Wow you should be able to crack over 2 billion samples per second with that rig but it will need a personal nuclear reactor to power it ;D
« Last Edit: October 12, 2012, 12:38:01 AM by Matt »

ReginaK

  • Newbie
  • *
  • Posts: 17
    • View Profile
Re: Benchmarking a GPUs
« Reply #16 on: October 11, 2012, 11:49:12 PM »
My RIG is a 2 x GTX 580 overclocked + i7 @ 3.8ghz + 24Gb RAM

Device 1 performance: 137.908 million samples/sec (CPU)
Device 2 performance: 250.863 million samples/sec (GeForce GTX 580)
Device 3 performance: 247.25 million samples/sec (GeForce GTX 580)
Total performance: 636.021 million samples/sec

Image:

juneau3000

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Benchmarking a GPUs
« Reply #17 on: November 11, 2012, 08:19:07 AM »
4 x 7970 3GB @ 1100mhz/1700mhz
Intel 3770k @ 4.7ghz 4c/8t
2x4gb ddr3-2600 9-12-12-19


Device 1 performance: 505.96 million samples/sec (Tahiti)
Device 2 performance: 546.441 million samples/sec (Tahiti)
Device 3 performance: 506.016 million samples/sec (Tahiti)
Device 4 performance: 504.028 million samples/sec (Tahiti)
Total performance: 2062.45 million samples/sec


Generating mesh...
13304348 points extracted
Grid size: 1638 x 1636 x 2563
Tree depth: 12
Tree set in 23.743 s
Tree size 1517.71 MB (24866241 leaves, 28418561 nodes)
Tree refined in 4.43s
Tree size 1517.71 MB (24866241 leaves, 28418561 nodes)
Normal Size: 249.706 MB
Laplacian weights set in 68.007s
Tree refined in 5.633s
Tree size 1517.71 MB (24866241 leaves, 28418561 nodes)
Depth 1/12, 56.25% entries (36 / 8^2)
Depth 2/12, 34.2773% entries (1404 / 64^2)
Depth 3/12, 9.89337% entries (13398 / 368^2)
Depth 4/12, 7.38831% entries (19368 / 512^2)
Depth 5/12, 2.16436% entries (89367 / 2032^2)
Depth 6/12, 0.627468% entries (351822 / 7488^2)
Depth 7/12, 0.178709% entries (1290465 / 26872^2)
Depth 8/12, 0.0432821% entries (4846308 / 105816^2)
Depth 9/12, 0.0108605% entries (19537053 / 424136^2)
Depth 10/12
   Nodes 1/8: 397328, 0.0115373% entries (18213859 / 397328^2)
   Nodes 2/8: 252936, 0.0176532% entries (11293901 / 252936^2)
   Nodes 3/8: 426136, 0.0108643% entries (19728673 / 426136^2)
   Nodes 4/8: 298456, 0.015252% entries (13585900 / 298456^2)
   Nodes 5/8: 194520, 0.0239189% entries (9050433 / 194520^2)
   Nodes 6/8: 120032, 0.0379344% entries (5465471 / 120032^2)
   Nodes 7/8: 84072, 0.0547675% entries (3871021 / 84072^2)
   Nodes 8/8: 45520, 0.0992257% entries (2056027 / 45520^2)
Depth 11/12
   Nodes 8/64: 1453512, 0.00318671% entries (67325534 / 1453512^2)
   Nodes 15/64: 904432, 0.00502197% entries (41079543 / 904432^2)
   Nodes 22/64: 1551176, 0.00299513% entries (72067181 / 1551176^2)
   Nodes 29/64: 962776, 0.0047575% entries (44099037 / 962776^2)
   Nodes 36/64: 695872, 0.00678441% entries (32852692 / 695872^2)
   Nodes 40/64: 47968, 0.0948568% entries (2182588 / 47968^2)
   Nodes 43/64: 443472, 0.0104579% entries (20567360 / 443472^2)
   Nodes 47/64: 12872, 0.351308% entries (582077 / 12872^2)
   Nodes 50/64: 301384, 0.0156763% entries (14239180 / 301384^2)
   Nodes 54/64: 12416, 0.362556% entries (558905 / 12416^2)
   Nodes 57/64: 164488, 0.0279929% entries (7573835 / 164488^2)
Depth 12/12
   Nodes 57/368: 77320, 0.0471448% entries (2818496 / 77320^2)
   Nodes 58/368: 742456, 0.00503086% entries (27732141 / 742456^2)
   Nodes 59/368: 975016, 0.00386316% entries (36725337 / 975016^2)
   Nodes 60/368: 829792, 0.00454031% entries (31262489 / 829792^2)
   Nodes 62/368: 61864, 0.0591943% entries (2265457 / 61864^2)
   Nodes 63/368: 173312, 0.0215044% entries (6459287 / 173312^2)
   Nodes 64/368: 1571056, 0.00240964% entries (59475151 / 1571056^2)
   Nodes 113/368: 753648, 0.00494485% entries (28086029 / 753648^2)
   Nodes 114/368: 7496, 0.458528% entries (257647 / 7496^2)
   Nodes 115/368: 1062808, 0.0035025% entries (39562833 / 1062808^2)
   Nodes 116/368: 69448, 0.0512892% entries (2473689 / 69448^2)
   Nodes 117/368: 63176, 0.0582321% entries (2324164 / 63176^2)
   Nodes 119/368: 1098592, 0.00340633% entries (41111112 / 1098592^2)
   Nodes 169/368: 1189888, 0.00320646% entries (45398201 / 1189888^2)
   Nodes 170/368: 590944, 0.00645144% entries (22529395 / 590944^2)
   Nodes 171/368: 59664, 0.0613152% entries (2182694 / 59664^2)
   Nodes 172/368: 1832832, 0.0021012% entries (70584962 / 1832832^2)
   Nodes 173/368: 152136, 0.0250708% entries (5802739 / 152136^2)
   Nodes 174/368: 762328, 0.00498398% entries (28964071 / 762328^2)
   Nodes 176/368: 5408, 0.605454% entries (177074 / 5408^2)
   Nodes 225/368: 1425056, 0.00267097% entries (54241692 / 1425056^2)
   Nodes 226/368: 99776, 0.0367272% entries (3656286 / 99776^2)
   Nodes 227/368: 513632, 0.00744717% entries (19646963 / 513632^2)
   Nodes 228/368: 2008, 1.65208% entries (66613 / 2008^2)
   Nodes 229/368: 717952, 0.00532773% entries (27462042 / 717952^2)
   Nodes 276/368: 1030184, 0.00374059% entries (39698068 / 1030184^2)
   Nodes 280/368: 1248952, 0.00305794% entries (47700262 / 1248952^2)
   Nodes 284/368: 133368, 0.0280868% entries (4995800 / 133368^2)
   Nodes 299/368: 907488, 0.00416611% entries (34309341 / 907488^2)
   Nodes 303/368: 573824, 0.00654664% entries (21556382 / 573824^2)
   Nodes 307/368: 34536, 0.107592% entries (1283291 / 34536^2)
   Nodes 322/368: 182816, 0.0207891% entries (6948081 / 182816^2)
   Nodes 326/368: 838416, 0.00461138% entries (32415280 / 838416^2)
   Nodes 338/368: 43624, 0.0877051% entries (1669075 / 43624^2)
   Nodes 345/368: 363432, 0.0104874% entries (13851995 / 363432^2)
   Nodes 349/368: 217280, 0.0171502% entries (8096721 / 217280^2)
Linear system solved in 71.773s (setup: 14.786s, solve: 25.177s, update: 26.002s)
Got Iso-value in 9.711s
Iso-value -60290.4
Normal Size: 71.6336 MB
4707876 vertices extracted in 39.375 sec
9415720 faces extracted in 6.429 sec
filtering mesh (9415720 -> 9413930)
Finished processing in 498.187 sec (exit code 1)

sorry if i did it wrong, not sure why my tree depth and grid size in my run is higher then the others that have run...im new to this :P
« Last Edit: November 11, 2012, 08:25:52 AM by juneau3000 »

Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Benchmarking a GPUs
« Reply #18 on: November 12, 2012, 12:44:07 AM »
Depht ad tree depend on the GPU internal organization - OpenCL device hierarchy, is difeerent from NVIDIA..... as far im know... and THANX for providing results, this is verification from other user that reported similat and hilarious performance from AMD.... What precise GPU card is used ? and ammount and etc....

Thanx.....
----------------
www.mhb.sk

juneau3000

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Benchmarking a GPUs
« Reply #19 on: November 12, 2012, 03:09:53 AM »
This is 4 HD7970 3GB gpu's overclocked to 1150mhz core 1700mhz mem lead by a Intel 3770k 8 threads at 5ghz.

Performance seems low? what should I be looking for in terms of samples per second or should you go by processing time?

Thanks in advanced, wonder if 2x 680's might be better then 2x 7970's. I will try both.

Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Benchmarking a GPUs
« Reply #20 on: November 12, 2012, 09:54:02 AM »
No, this sort of performance is  possible only with AMD/ATI cards. The nvidia is much slower than this.....

and nice to see overclock on CPU/GPU sides and its impact on performance...... 
----------------
www.mhb.sk

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: Benchmarking a GPUs
« Reply #21 on: November 12, 2012, 01:57:20 PM »
Hello,

Quote
not sure why my tree depth and grid size in my run is higher then the others that have run...im new to this
Tree depth depend on the selected quality and bounding box size. However, Mesh generation step doesn't involve GPU processing. OpenCL devices are only used for Depth maps generation.
Best regards,
Alexey Pasumansky,
Agisoft LLC

juneau3000

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Benchmarking a GPUs
« Reply #22 on: November 12, 2012, 07:42:31 PM »
thanks for the information guys, will update with 2x 7970's vs 2x GTX-680 this afternoon.

Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Benchmarking a GPUs
« Reply #23 on: November 12, 2012, 07:51:10 PM »
Juneau, it can casue problems when mixing diferent GPUs, for best performance you need to acomplish use the 7970 cards, not the NVIDIAs 680 ( the 680 is fast just like 580 !!! )

----------------
www.mhb.sk

juneau3000

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Benchmarking a GPUs
« Reply #24 on: November 12, 2012, 09:42:12 PM »
Im sorry I meant at seperate times.  :D


Here are results with the following hardware.
Intel 3970X 6cores 12threads at 4.8ghz (running 8 threads saving 2 each for the cards because of ht)
2x HD 7970 3GB cards at 1100/1700
4x4GB @ CAS10 2133mhz

finished depth reconstruction in 368.598 seconds
Device 1 performance: 306.784 million samples/sec (CPU)
Device 2 performance: 517.826 million samples/sec (Tahiti)
Device 3 performance: 506.601 million samples/sec (Tahiti)
Total performance: 1331.21 million samples/sec
Generating mesh...
13299863 points extracted
Grid size: 1638 x 1637 x 2563
Tree depth: 12
Tree set in 23.821 s
Tree size 1516.9 MB (24852948 leaves, 28403369 nodes)
Tree refined in 4.368s
Tree size 1516.9 MB (24852948 leaves, 28403369 nodes)
Normal Size: 249.543 MB
Laplacian weights set in 47.768s
Tree refined in 5.491s
Tree size 1516.9 MB (24852948 leaves, 28403369 nodes)
Depth 1/12, 56.25% entries (36 / 8^2)
Depth 2/12, 34.2773% entries (1404 / 64^2)
Depth 3/12, 9.89337% entries (13398 / 368^2)
Depth 4/12, 7.38831% entries (19368 / 512^2)
Depth 5/12, 2.16436% entries (89367 / 2032^2)
Depth 6/12, 0.62886% entries (350346 / 7464^2)
Depth 7/12, 0.178867% entries (1288533 / 26840^2)
Depth 8/12, 0.043282% entries (4847026 / 105824^2)
Depth 9/12, 0.0108639% entries (19529262 / 423984^2)
Depth 10/12
   Nodes 1/8: 397016, 0.0115446% entries (18196790 / 397016^2)
   Nodes 2/8: 252632, 0.0176684% entries (11276485 / 252632^2)
   Nodes 3/8: 426064, 0.0108665% entries (19725931 / 426064^2)
   Nodes 4/8: 298560, 0.0152487% entries (13592359 / 298560^2)
   Nodes 5/8: 194280, 0.0239416% entries (9036675 / 194280^2)
   Nodes 6/8: 120120, 0.0379237% entries (5471940 / 120120^2)
   Nodes 7/8: 84024, 0.0547993% entries (3868847 / 84024^2)
   Nodes 8/8: 45280, 0.0996754% entries (2043623 / 45280^2)
Depth 11/12
   Nodes 8/64: 1453992, 0.00318538% entries (67341824 / 1453992^2)
   Nodes 15/64: 903336, 0.00502759% entries (41025937 / 903336^2)
   Nodes 22/64: 1550512, 0.00299644% entries (72037114 / 1550512^2)
   Nodes 29/64: 962392, 0.00475896% entries (44077453 / 962392^2)
   Nodes 36/64: 695272, 0.00678932% entries (32819795 / 695272^2)
   Nodes 40/64: 47864, 0.0950325% entries (2177159 / 47864^2)
   Nodes 43/64: 443136, 0.010464% entries (20548111 / 443136^2)
   Nodes 47/64: 12832, 0.35214% entries (579834 / 12832^2)
   Nodes 50/64: 301168, 0.015685% entries (14226671 / 301168^2)
   Nodes 54/64: 12416, 0.362618% entries (559001 / 12416^2)
   Nodes 57/64: 164136, 0.0280488% entries (7556515 / 164136^2)
Depth 12/12
   Nodes 57/368: 77448, 0.0470791% entries (2823893 / 77448^2)
   Nodes 58/368: 742144, 0.00503303% entries (27720833 / 742144^2)
   Nodes 59/368: 975296, 0.00386201% entries (36735488 / 975296^2)
   Nodes 60/368: 830392, 0.00453717% entries (31286101 / 830392^2)
   Nodes 62/368: 61608, 0.0593687% entries (2253368 / 61608^2)
   Nodes 63/368: 173064, 0.0215245% entries (6446823 / 173064^2)
   Nodes 64/368: 1569024, 0.00241267% entries (59396079 / 1569024^2)
   Nodes 113/368: 753528, 0.00494683% entries (28088316 / 753528^2)
   Nodes 114/368: 7472, 0.458983% entries (256254 / 7472^2)
   Nodes 115/368: 1063304, 0.00350076% entries (39580087 / 1063304^2)
   Nodes 116/368: 69040, 0.0515391% entries (2456621 / 69040^2)
   Nodes 117/368: 63080, 0.0582343% entries (2317195 / 63080^2)
   Nodes 119/368: 1097288, 0.00340952% entries (41051995 / 1097288^2)
   Nodes 169/368: 1189944, 0.00320623% entries (45399109 / 1189944^2)
   Nodes 170/368: 590800, 0.00645318% entries (22524490 / 590800^2)
   Nodes 171/368: 59768, 0.0611872% entries (2185736 / 59768^2)
   Nodes 172/368: 1833488, 0.00210064% entries (70616686 / 1833488^2)
   Nodes 173/368: 151944, 0.0250969% entries (5794127 / 151944^2)
   Nodes 174/368: 762312, 0.00498329% entries (28958848 / 762312^2)
   Nodes 176/368: 5408, 0.605598% entries (177116 / 5408^2)
   Nodes 225/368: 1424848, 0.00267148% entries (54236125 / 1424848^2)
   Nodes 226/368: 99152, 0.0369453% entries (3632141 / 99152^2)
   Nodes 227/368: 513552, 0.00744825% entries (19643679 / 513552^2)
   Nodes 228/368: 2080, 1.60984% entries (69648 / 2080^2)
   Nodes 229/368: 715952, 0.00533993% entries (27371803 / 715952^2)
   Nodes 276/368: 1028656, 0.00374592% entries (39636862 / 1028656^2)
   Nodes 280/368: 1246264, 0.00306318% entries (47576502 / 1246264^2)
   Nodes 284/368: 133136, 0.0281344% entries (4986871 / 133136^2)
   Nodes 299/368: 907384, 0.00416652% entries (34304863 / 907384^2)
   Nodes 303/368: 573976, 0.00654438% entries (21560372 / 573976^2)
   Nodes 307/368: 34368, 0.10801% entries (1275773 / 34368^2)
   Nodes 322/368: 182536, 0.0208148% entries (6935355 / 182536^2)
   Nodes 326/368: 838672, 0.00461006% entries (32425830 / 838672^2)
   Nodes 338/368: 43552, 0.0877833% entries (1665053 / 43552^2)
   Nodes 345/368: 363200, 0.0104926% entries (13841250 / 363200^2)
   Nodes 349/368: 217288, 0.0171512% entries (8097771 / 217288^2)
Linear system solved in 60.7s (setup: 10.167s, solve: 27.677s, update: 16.787s)
Got Iso-value in 6.864s
Iso-value -60218.6
Normal Size: 71.6137 MB
4706303 vertices extracted in 38.969 sec
9412562 faces extracted in 4.431 sec
filtering mesh (9412562 -> 9410710)
Finished processing in 597.231 sec (exit code 1)


I will exchange the GTX-680's in next.
Should I be judging by total time it takes to complete? or go by the samples per second rating.
Thanks for all the help.

Matt

  • Full Member
  • ***
  • Posts: 104
    • View Profile
Re: Benchmarking a GPUs
« Reply #25 on: November 13, 2012, 03:59:03 AM »
I havnt had any problems mixing nvidia/ATI video cards at all.  If anything junneau's issues may stem from not enough RAM. Drop in an extra 8 gigs and see how that goes. That overclocked CPU is flying pity i cant overclock my Xeon E5's   :( . My guess is around 600 million samples per second with the 2 * GTX 680's.

juneau3000

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Benchmarking a GPUs
« Reply #26 on: November 16, 2012, 09:16:17 PM »
with 2x GTX-680 GPU's my total time is around 850-889s  ???
they show 255m samples per/s each

This seems incredibly slow vs the 7970's.

So can I say that 7970's vs gtx-680, 7970's win by a large margin?

I do know if cinebench 11.5 open cl for the 7970 is 103FPS vs 62FPS of the 680. Seems like a real workhorse of a card ATI has put out.

Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Benchmarking a GPUs
« Reply #27 on: November 16, 2012, 09:36:20 PM »
Juneau, a BIG thanx for the testing you have done for the community, for us all here.  8)

So now a lot of us can decide what to buy for their workstations. And it seems that AMD/ATI will put the 8000 series in jan/feb 2013 so even more performance await us.

Matt, thanx too about hte mixing cards, have asked few monts in the Khronos group http://www.khronos.org/opencl/ and they say to me that is problematic use different architectures, but now is clear that for our use is not problematic.

If were possible to port more functions of Photoscan to the OpenCL eingine then we will se a significant boost to efficiency, so hoping that will be possible in future. If we get speedup of whole process then Photogrametry could be a very competetive to the laser.....
----------------
www.mhb.sk

Matt

  • Full Member
  • ***
  • Posts: 104
    • View Profile
Re: Benchmarking a GPUs
« Reply #28 on: November 17, 2012, 01:29:20 PM »
The 7970 beats the 500 and 600 series in all programming based clbenchmarks apart from Image Filter Global Atomic Add and Bitonic Merge sort. Still like my GTX 590 though 400+ million points from one card and no fiddly driver issues. To get the Nvidia Crads and ATI cards working together i plugged them both into the same monitor.
« Last Edit: November 17, 2012, 02:09:10 PM by Matt »

ajg-cal

  • Newbie
  • *
  • Posts: 42
    • View Profile
Re: Benchmarking a GPUs
« Reply #29 on: November 22, 2012, 01:15:51 PM »
Hi all - thanks sharing these results :). I only have gtx 580 and 680 here so I'm not sure how much original data I would be able to supply.

I am interested by the increase in samples per second shown in the 7970.

I've had an awful experience with AMD drivers in laptops (7970M), which of course is not wholly applicable here I suppose.

Are many of you going to be buying 7970s based on Juneau's data do you think?