Author Topic: New Low Hash Rate (LHR) GeForce Cards (Read 3478 times)

KBL · « **on:** June 30, 2021, 06:19:45 AM »

NVidia has recently announced low hash rate (LHR) versions of their GeForce cards. Per NVidia, these are intended to be crippled for crypto mining, but operate at 100% for gaming. However, it appears a number of variables effect hash rate. Does anyone know if a LHR card would be effective for Metashape?

Alexey Pasumansky · « **Reply #1 on:** July 05, 2021, 01:15:26 PM »

Hello KBL,

According to the information on NVIDIA forum (https://forums.developer.nvidia.com/t/will-nvidias-gimping-of-crypto-hash-rate-impact-deep-learning/168780/4), hash rate limitation wouldn't affect CUDA performance.

Corensia · « **Reply #2 on:** September 03, 2021, 08:20:17 AM »

I was going to post a new thread but this seems relevant enough so I'll add it here. I was going to ask a similar question to the OP and ask if anyone has tested LHR graphics cards. We recently purchased a 3060ti LHR and didn't think much about it until we ran our benchmark and noticed something interesting.

Old PC : CPU - i9 9900K, GPU - RTX 2080 Super
New PC : CPU - Ryzen 5600X, GPU - 3060ti LHR

From standard benchmarks the new CPU should be way faster and the new GPU the same or a fraction faster.

Benchmarked Sept 2, 2021 on Metashape version 1.7.4

Old PC Results
Align - 6 mins 4 sec
Optimize - 11 sec
Dense Cloud - 46 min 57 sec
Mesh - 13 min 49 sec
DEM - 1 min 56 sec
Ortho - 20 min 38 sec

New PC Results
Align - 4 min 12 sec
Optimize - 4 sec
Dense Cloud - 1 hr 20 min 46 sec
Mesh - 9 min 49 sec
DEM - 1 min 7 sec
Ortho - 9 min 48 sec

Clearly the CPU is showing a huge improvement but the dense cloud generation is incredibly slow. In fact, it's slower than dense cloud generations of both the GTX 1660ti and RTX 2060 (test results from Metashape 1.6). Any idea what is causing the slow performance of the 3060ti?

P.S. NVIDIA driver 471.68 was used for the test, dense cloud generation was on High with aggressive filtering, the dataset is only 320 images

Bzuco · « **Reply #3 on:** September 03, 2021, 09:29:07 AM »

@Corensia
In multicore performance 5600X is sadly slower than 9900K, both on stock frequencies(good comaprisson is cinebench Rxx). Only in single core performance is 5600x better.
In terms of GPUs, there were changes in CUDA cores and especially how they are grouped in GPU core. In rtx 30xx CUDA count was doubled, but also "regrouped in two blocks" in one SM block. There is also question about dispatcher which is assigning tasks to CUDA cores, how good it can handle that assigning process. You can compare numbers(core configuration, and other fields in table) on this page https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_20_series
Differences: 3060ti vs 2080s: SM blocks(38vs48), memory bandwith(448/496), default and boost frequencies(1410/1650 and 1665/1815), memory frequency(14000/15500)
Can you compare just depth maps generation times alone? and check what was the ~frequencies during that process? Or better lock frequencies using C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe utility to have more accurate result.
There is also chance, that LHR limit can affect some kind of processing tasks, we will see in the future. Double count of CUDA cores in RTX30xx gen. not always means automatically doubled performance in programs, probably some kind of optimization is needed in kernels which are executed on GPU. I'm not an expert in this field, but I'm a little interested in it.

Corensia · « **Reply #4 on:** September 03, 2021, 10:29:11 AM »

@Bzuco
Well as you can see in the results, Metashape highly prefers fast single core speeds as can be seen in a 20% faster alignment which is a CPU intensive phase. I meant that 5600X is faster than the 9900K in regards to Metashape performance as tested and documented by Pugetsystems not in full multi-core benchmarks. If those multi-core benchmarks translated fully to Metashape performance, the Threadrippers would be king but they're pretty mediocre for Metashape especially for smaller projects.

https://www.pugetsystems.com/labs/articles/Agisoft-Metashape-1-6-5---GeForce-RTX-30-Series-vs-Radeon-RX-6000-Series-2033/
This is the testing data that led to our purchase of the 3060ti which followed closely behind the other 30xx cards and was considerably faster than the 20xx cards and 1660 series (even the 2080ti although it is difficult to prove exactly since Metashape versions keep changing and they don't include old cards in new tests). But to have the 3060ti perform worse than the 1660ti and 2060 is definitely unexpected so I was wondering if the LHR had anything to do with it since that's the only thing I could think of that would be different from the tests Pugetsystem did earlier this year.

Anyone else tried running Metashape on a LHR card?

Bzuco · « **Reply #5 on:** September 03, 2021, 11:20:24 AM »

@Corensia
You simply can not make conclusion from total alignment time, because that process consist from 4 subtask and each subtask utilizes differently cpu and gpu:
-detecting. points: one CPU core at 100%, just few %gpu
-generic preselection: 20-50% cpu, 50% gpu usage
-matching points: cpu 40%, ~50%-100% gpu
-estimating camera location - 100% CPU multicore utilisation, GPU utilization 0%

Main difference between those two CPU is, that ryzen 5600x has higher IPC than 9900K, but sadly has 2 cores less.
Dense cloud generation also consist of two subtask, first make sense only compute on GPU(depthmap generation) with CPU set to not use in preferences, and the second is CPU pure only, because it can not utilize GPU at all.(threadrippers will love this second phase)
Just measure each subtask individually and then you will know exactly, if the time differences are cause by LHR or not.

Alexey Pasumansky · « **Reply #6 on:** September 03, 2021, 02:42:59 PM »

Hello Corensia,

Can you please extend the timing in the provided table and split Align Photos to Match Photos + Align Cameras steps, and Build Dense Cloud to Build Depth Maps + Build Dense Cloud?
If you still have the projects saved after each run, you can access this information via Chunk Info dialog or exported PDF report.

Corensia · « **Reply #7 on:** September 06, 2021, 03:29:51 AM »

Hello Alexey,

Sure thing, here is the timing split into the various stages.
(BTW I noticed the old PC was still running 1.7.3(!) so I updated and redid the benchmark so the timing is a little different but still similar results)

Old PC (9900K, 2080 super)
Matching time - 4 min 21 sec
Alignment time - 1 min 50 sec
Optimize - 4 sec
Depth map (Medium, aggressive) - 7 min 15 sec
Depth map (High, aggressive) - 23 min 16 sec
Dense cloud - 28 min 46 sec
Model reconstruction - 6 min 21 sec
DEM - 1 min 51 sec
Orthomosaic - 15 min 20 sec

New PC (5600X, 3060ti LHR)
Matching time - 2 min 36 sec
Alignment time - 1 min 36 sec
Optimize - 4 sec
Depth map (Medium, aggressive) - 4 min 23 sec
Depth map (High, aggressive) - 17 min 53 sec
Dense cloud - 1hr 2 min
Model reconstruction - 4 min 18 sec
DEM - 1 min 6 sec
Orthomosaic - 9 min 48 sec

@Bzuco I guess if your workload distribution is correct, then indeed the 5600X is the bottleneck during Dense cloud generation not the 3060ti.

I would love to hear your thoughts as well Alexey!

Bzuco · « **Reply #8 on:** September 06, 2021, 10:07:06 AM »

Despite my concerns about dispatcher in GPU, 3060ti is doing great job after see your more detailed results.
I am INTEL user, but from some posts on internet 5600x users without PBO enabled have low cinebench multicore scores, just ~4.07GHz instead ~4.6GHz. https://linustechtips.com/topic/1269138-5600x-low-cinebench-r20-scores-multi-stock/ Can you check your frequencies during dense cloud phase?
I would not expect suc big difference in dense cloud times, it is more than twice. From my perspective the only reason would be that 9900K was pretty high clocked, and 5600x is keeping lower multicore freqv. for some reason?...PBO, power limit, temp limit?

According review from techpowerup, 5600x is handling max freqv. great in terms of different instructions utilization https://tpucdn.com/review/amd-ryzen-5-5600x/images/boost-clock-analysis.png
9900K https://tpucdn.com/review/intel-core-i9-9900k/images/clock-analysis.jpg

Forum

Author Topic: New Low Hash Rate (LHR) GeForce Cards (Read 3478 times)

KBL

New Low Hash Rate (LHR) GeForce Cards

Alexey Pasumansky

Re: New Low Hash Rate (LHR) GeForce Cards

Corensia

Re: New Low Hash Rate (LHR) GeForce Cards

Bzuco

Re: New Low Hash Rate (LHR) GeForce Cards

Corensia

Re: New Low Hash Rate (LHR) GeForce Cards

Bzuco

Re: New Low Hash Rate (LHR) GeForce Cards

Alexey Pasumansky

Re: New Low Hash Rate (LHR) GeForce Cards

Corensia

Re: New Low Hash Rate (LHR) GeForce Cards

Bzuco

Re: New Low Hash Rate (LHR) GeForce Cards