Forum

Author Topic: Strange workstations testing results.  (Read 52057 times)

Triplegangers

  • Jr. Member
  • **
  • Posts: 55
    • View Profile
    • Triplegangers
Strange workstations testing results.
« on: June 14, 2013, 12:56:12 AM »
 Hello Agisoft team. We at Infinite Realities started one sweet experiment and baffled with results. Would be cool if you could shed some light on it.
 So we have two powerful Workstations running Agisoft PhotoScan Professional edition 0.9.1



 As you can see, Photo Align stage was a total disaster for Xeons Workstation. Which doesn't make any sense as there are two E5-2670 running on Max Turbo Frequency at 3.3 GHz against one i7-3930K
 On Geometry build stage we didn't see much difference in speed. Although Xeon WS showed dramatic speed boost on Depth Reconstruction and was already at 10% building geometry while i7 WS was still on Depth stage. This might be only because GPU's kicked in, but were busy only during Depth Reconstruction stage which is sad because its just 3-4% of the overall time.

The questions are:
How is it possible that 6 core with 12 threads at 3.8Ghz overtakes 16 cores 32 threads at 3.3 GHz?
Why GPU's potential is not harnessed during all processing stages?

Matt

  • Full Member
  • ***
  • Posts: 104
    • View Profile
Re: Strange workstations testing results.
« Reply #1 on: June 14, 2013, 03:24:23 AM »
I think you will find the faster processor speed of the i7 (3.8 ghz) will give that machine the edge. Regardless of the amount of cores the i7 is simply faster than the Xeons. For this reason many people overclock the i7 processors to maximise the efficiency of alignment etc. The two machines should perform pretty much the same during depth processing as the GPU's are the same.

meshmaster

  • Jr. Member
  • **
  • Posts: 78
    • View Profile
Re: Strange workstations testing results.
« Reply #2 on: June 14, 2013, 03:42:45 AM »
I've got a few multi-processor xenon boxes as well as a couple single processor i7 extreme boxes.  Honestly, I've always found the i7 machines leave my xenons in the dust.

:-/


Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Strange workstations testing results.
« Reply #3 on: June 14, 2013, 10:11:39 AM »
try read litle about eficiency of multiprocessors, adding every new core lower the efficiency so a 4 cores have eff around 96 %, but when have 16+ cores its come down to 70 % in some examples.....

----------------
www.mhb.sk

RalfH

  • Sr. Member
  • ****
  • Posts: 344
    • View Profile
Re: Strange workstations testing results.
« Reply #4 on: June 14, 2013, 10:50:04 AM »
Thanks, Wishgranter. This is something I have been wondering about a little while ago. I had a large project that was running for several days (CPU only, 8 core Xeon 3.2 GHz), and during the days I gave Photoscan only 2 out of 8 processor cores (so I could still do other work on the machine), but over night Photoscan was allowed to use 7 out of 8 cores. Out of curiosity I took notes on how many ultra quality depth maps were created per hour. What I found was that using 7 instead of 2 cores only resulted in 2 times as many depth maps per hour (instead of 3.5 times as many). Multi-core efficiency appears to be a much bigger issue than I had expected. Is this something that could be improved by improving the software, or is it a hardware issue?

Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Strange workstations testing results.
« Reply #5 on: June 14, 2013, 12:59:53 PM »
its mostly HW issue, what OS you use for it ? Win8 is little better in thread efficiency(5-15%) ....

OK im wil do few more tests on the dataset whats used for GPU benchmarks, so we have comparable results and can see how it performs....
----------------
www.mhb.sk

RalfH

  • Sr. Member
  • ****
  • Posts: 344
    • View Profile
Re: Strange workstations testing results.
« Reply #6 on: June 14, 2013, 01:01:20 PM »
I am using Windows Vista 64 bit. What about using Linux?
« Last Edit: June 14, 2013, 01:39:56 PM by RalfH »

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14813
    • View Profile
Re: Strange workstations testing results.
« Reply #7 on: June 14, 2013, 01:18:05 PM »
Hello Alexander,

Please note that real frequency for hard processing steps is equal to nominal (3.20 GHz for i7 and 2.60 GHz for Xeons), while Turbo could be applied only for short periods of time when the core is quite cool.


As for the GPUs, they are only utilized during depth maps reconstruction.
Best regards,
Alexey Pasumansky,
Agisoft LLC

Matt

  • Full Member
  • ***
  • Posts: 104
    • View Profile
Re: Strange workstations testing results.
« Reply #8 on: June 14, 2013, 04:58:11 PM »
I find the xeons great you have just got get the data processing centre specific chips. Less cores with more ghz. Multi CPU boards are currently the only way to get enough ram to process really large projects in a single chunk.

Triplegangers

  • Jr. Member
  • **
  • Posts: 55
    • View Profile
    • Triplegangers
Re: Strange workstations testing results.
« Reply #9 on: June 15, 2013, 04:06:57 PM »
 Thank you all for replying, there were some good ideas!

 So the last two days I spent testing this two systems, to understand how they deal with the load. Turned out classical testing, of starting two machines simultaneously on the same task and see which one finishes first, don't really do justice this days. Especially when you get into equation dual CPU systems.

 Digging for hidden potential in both systems I decided to run 2 and 3 parallel Photoscan windows, working at the same time on 6 photos set. Which started to show some very interesting results. You can see from bottom table, time performance on each test.



 This graph shows how each system deals with the load during parallel Photoscan tasking. Obviously exponential peaking is a bad thing



 While working on the same one task, during conventional testing, systems didn't show much difference in speed and only in cost ;D. However when you multitasking, you're starting to reach the true Xeon station potential. As it deals with load more efficiently judging from that graph.

 Here's some results of Xeon working on 90 photos set where each one is 18mp!



 Having that data, its pretty clear that sequential chunk processing is not the most efficient way to work, for those of us who need to process multiple sets of photos. Hereof I would like to request for Parallel chunk processing feature in Photoscan. As from this, all will benefit.

 Here's an example.

 Sequential processing of 5 sets, 90 photos each:
 107 + 107 +107 +107 +107 = 535 min

 Parallel processing of 5 sets, 90 photos each:
 128 + 128 + 107 = 363 min

And this is 172 hours of saved time, we can spend processing two more sets and walk a dog  :)

Also depending on system potential, it would be cool to be able to set the number of parallel processes Photoscan will do. If sets are not heavy, like 20-35 photos, it can be set to 4 may be even 5. If its heavy, 100 photos and more, could be set to 2.

Would like to hear what you all think, may be you have something to add or see where I'm wrong.

Some screen shots of nice smooth synchronized parallel processing here:


« Last Edit: June 15, 2013, 04:22:22 PM by AlexanderT »

James

  • Hero Member
  • *****
  • Posts: 748
    • View Profile
Re: Strange workstations testing results.
« Reply #10 on: June 15, 2013, 07:09:21 PM »
Don't suppose it makes a massive difference but did you try parallel processing identical chunks with the same images, or completely different chunks with different images? Just wondered if there may be any caching of anything anywhere in that case that might make the results look better than they really are. I doubt it really but it might have an effect.

Brilliant work though :)

Triplegangers

  • Jr. Member
  • **
  • Posts: 55
    • View Profile
    • Triplegangers
Re: Strange workstations testing results.
« Reply #11 on: June 15, 2013, 08:28:25 PM »
Hey James, from what I know caching is not something that happens for no reason. You have to really sweat a little on the code side to make caching possible.

Also I did ran test on totally different sets of photos as well as on identical. And did not notice anything of that sort.

Infinite

  • Sr. Member
  • ****
  • Posts: 366
    • View Profile
Re: Strange workstations testing results.
« Reply #12 on: June 16, 2013, 01:41:22 AM »
Thanks for sharing these results Alexander, it's good to see people talking about this topic and sharing stats.
_______________________________________________
I N F I N I T E
www.ir-ltd.net

Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Strange workstations testing results.
« Reply #13 on: June 16, 2013, 01:49:20 PM »
Hi All, today will try compose few things that you understand the problematic of multicore systems, problematic around it and how to improve few things....

Have anybody from here some 4 socket system AMD - INTEL so we can test it how it performs as we add even more cores to the system ??
« Last Edit: June 16, 2013, 02:00:19 PM by Wishgranter »
----------------
www.mhb.sk

Wishgranter

  • Hero Member
  • *****
  • Posts: 1202
    • View Profile
    • Museum of Historic Buildings
Re: Strange workstations testing results.
« Reply #14 on: June 16, 2013, 04:34:03 PM »
OK, this is not a easy understandable stuff, its need a litle better understanding how CPUs work, how OS work, and how is software  well writen..... wil try give here as many as possible text explanation, some shorter some larger..... if interested can put here much more links for explanations

In short, programming for multicore is not a easy task, and with adding more core to the system you get LOWER efficiency, so mostly when going to say duble ammount of cores say from i7 to dual xeon you get NOT linear speedup of the process. Why ? 

Think as a director of a small company, you have 4 employe, to divide work to them is easy, 4 people, work is clearly divided to them so everyone get 25 % of say cooking a cake. One get the ingrediences, other mix it together, third do bake stuff and 4th put it all together. Easy to command, just say one by one what he should do.....

When going to 16 employee, its need a much deeper thinking who will do what. So you need carefully divide same work to more people, and carefully think what everyone get just so much that they do same ammount of work... just imagine to say to 16 people their instructions.....
so when we get even larger stuff of to do, this process of proper planing a delivering it need to be more complicated. You start to use some of the people just to synchronize others, say for every 8 people is one just commanding that mean when 16 people „working“ 2 of them do nothing, they just organize all other, 16-2 = 14 people work on what they should...... if have 32 peoples ( cores) 4 or them just organize, if 64 of them, its not just the 1 every 8 people, you need 2 extra for commanding the 8 supervisors, so ending with 64 people and just 54 work ( 2 for managing 8supervisors ) so overal efficiency is starting to decrease.....


so back to our problematic with hard numbers :-)

so with Alexander we have tested „identical systems“ ( CPU + Windows 2012 Server ) my system is set with DISABLED TurboCore (2,7 GHz), Alex was with ENABLED TurboCore ( 2,7 GHz @ 3,4 GHz max ) so we have 32 thread system
we have tested it on this dataset for easier comparation http://downloads.agisoft.ru/photoscan/sample01.zip
settings align stage MEDIUM, no mask, no pairs, 40k points



So as you can see in 1 test, having enabled TurboCore on say same CPUs results in very small difference. Why ? because TC is for single to quad thread processes. So TC is enabled just few moments....

result 2nd
Afther going to TASK MANAGER – DETAILS, selecting photoscan.exe, rightclick on it and setting the PRIORITY to REALTIME ( later on this ) we set the photoscan.exe to run in higher priority level, so OS will see this process as most importaint, so other sw running will get smaller amount of CPU time. The system is then litlle unresponsible, becasue of realtime level.... but as you can see we get just aprox 2% speedup !! results are in the 2nd line......

result 3rd
Now for the 3rd test we disabled Hyperthreading in BIOS, so we have just real core ( 2x8 cores – 16 and not 32 ) and run the test.....
So voila, as you can see in 3rd result we get aprox 68 % speedup, just disabling the HT stuff !!
So just DISABLING Hyperthreading we get very decent speedup, but WHY when we lower ammount of threads get BETTER results ???


The short explanation is the VIRTUAL ( HYPERTHREAD) cores are not real cores, so the virtual core MUST share L1 cache and few other resources.... Even Intel explain that we can get just 10-15 % speedup. But why we see so big difference in results in Pscan ? from my perspective as im see it from my knowlwge: Agisoft team has writen a VERY efficient code ( this can be seen when REALTIME settings are ON ) so even when we set that all CPU resources are set to pscan it can get just 2% out. But they create pscan subroutines so that they handle every core as a realcore, but not with hyperthreading ( HT core have its resources only when realcore waits on data ) here is better explanation:

All threads are not created equal. Two hardware threads might be on separate chips, on the same chip, or even on the same core. The most important configuration for game programmers to be aware of is two hardware threads on one core—Simultaneous Multi-Threading (SMT) or Hyper-Threading Technology (HT Technology).
SMT or HT Technology threads share the resources of the CPU core. Because they share the execution units, the maximum speedup from running two threads instead of one is typically 10 to 20 percent, instead of the 100 percent that is possible from two independent hardware threads.
More significantly, SMT or HT Technology threads share the L1 instruction and data caches. If their memory access patterns are incompatible, they can end up fighting over the cache and causing many cache misses. In the worst case, the total performance for the CPU core can actually decrease when a second thread is run.


so Agisoft

so who interested in deeper knowlwge read this links:
 
1. http://scalibq.wordpress.com/2012/06/01/multi-core-and-multi-threading/
2. http://en.wikipedia.org/wiki/Simultaneous_multithreading
3. http://en.wikipedia.org/wiki/Amdahl%27s_law !!!

So for now try on yours i5 or i7 or any CPU with HT to disable the Hyperthreading and post your results.......

----------------
www.mhb.sk