OK, this is not a easy understandable stuff, its need a litle better understanding how CPUs work, how OS work, and how is software well writen..... wil try give here as many as possible text explanation, some shorter some larger..... if interested can put here much more links for explanations
In short, programming for multicore is not a easy task, and with adding more core to the system you get LOWER efficiency, so mostly when going to say duble ammount of cores say from i7 to dual xeon you get NOT linear speedup of the process. Why ?
Think as a director of a small company, you have 4 employe, to divide work to them is easy, 4 people, work is clearly divided to them so everyone get 25 % of say cooking a cake. One get the ingrediences, other mix it together, third do bake stuff and 4th put it all together. Easy to command, just say one by one what he should do.....
When going to 16 employee, its need a much deeper thinking who will do what. So you need carefully divide same work to more people, and carefully think what everyone get just so much that they do same ammount of work... just imagine to say to 16 people their instructions.....
so when we get even larger stuff of to do, this process of proper planing a delivering it need to be more complicated. You start to use some of the people just to synchronize others, say for every 8 people is one just commanding that mean when 16 people „working“ 2 of them do nothing, they just organize all other, 16-2 = 14 people work on what they should...... if have 32 peoples ( cores) 4 or them just organize, if 64 of them, its not just the 1 every 8 people, you need 2 extra for commanding the 8 supervisors, so ending with 64 people and just 54 work ( 2 for managing 8supervisors ) so overal efficiency is starting to decrease.....
so back to our problematic with hard numbers :-)
so with Alexander we have tested „identical systems“ ( CPU + Windows 2012 Server ) my system is set with DISABLED TurboCore (2,7 GHz), Alex was with ENABLED TurboCore ( 2,7 GHz @ 3,4 GHz max ) so we have 32 thread system
we have tested it on this dataset for easier comparation
http://downloads.agisoft.ru/photoscan/sample01.zip settings align stage MEDIUM, no mask, no pairs, 40k points
So as you can see in 1 test, having enabled TurboCore on say same CPUs results in very small difference. Why ? because TC is for single to quad thread processes. So TC is enabled just few moments....
result 2nd
Afther going to TASK MANAGER – DETAILS, selecting photoscan.exe, rightclick on it and setting the PRIORITY to REALTIME ( later on this ) we set the photoscan.exe to run in higher priority level, so OS will see this process as most importaint, so other sw running will get smaller amount of CPU time. The system is then litlle unresponsible, becasue of realtime level.... but as you can see we get just aprox 2% speedup !! results are in the 2nd line......
result 3rd
Now for the 3rd test we disabled Hyperthreading in BIOS, so we have just real core ( 2x8 cores – 16 and not 32 ) and run the test.....
So voila, as you can see in 3rd result we get aprox 68 % speedup, just disabling the HT stuff !!
So just DISABLING Hyperthreading we get very decent speedup, but WHY when we lower ammount of threads get BETTER results
The short explanation is the VIRTUAL ( HYPERTHREAD) cores are not real cores, so the virtual core MUST share L1 cache and few other resources.... Even Intel explain that we can get just 10-15 % speedup. But why we see so big difference in results in Pscan ? from my perspective as im see it from my knowlwge: Agisoft team has writen a VERY efficient code ( this can be seen when REALTIME settings are ON ) so even when we set that all CPU resources are set to pscan it can get just 2% out. But they create pscan subroutines so that they handle every core as a realcore, but not with hyperthreading ( HT core have its resources only when realcore waits on data ) here is better explanation:
All threads are not created equal. Two hardware threads might be on separate chips, on the same chip, or even on the same core. The most important configuration for game programmers to be aware of is two hardware threads on one core—Simultaneous Multi-Threading (SMT) or Hyper-Threading Technology (HT Technology).
SMT or HT Technology threads share the resources of the CPU core. Because they share the execution units, the maximum speedup from running two threads instead of one is typically 10 to 20 percent, instead of the 100 percent that is possible from two independent hardware threads.
More significantly, SMT or HT Technology threads share the L1 instruction and data caches. If their memory access patterns are incompatible, they can end up fighting over the cache and causing many cache misses. In the worst case, the total performance for the CPU core can actually decrease when a second thread is run.so Agisoft
so who interested in deeper knowlwge read this links:
1.
http://scalibq.wordpress.com/2012/06/01/multi-core-and-multi-threading/2.
http://en.wikipedia.org/wiki/Simultaneous_multithreading 3.
http://en.wikipedia.org/wiki/Amdahl%27s_law !!!
So for now try on yours i5 or i7 or any CPU with HT to disable the Hyperthreading and post your results.......