I agree with Wishgranter, it's not a PCI bandwidth issue.
If you look at the GPU load graph during processing, you can see that the work is done in batches. A GPU gets a batch of work and has high load for a minute or two, and then goes idle for a short while until it gets a new batch. At the end of the Dense Cloud reconstruction one GPU is often idle while the other is finishing the last batch.
So with a single GPU you have often 100% utilization, and with two GPUs it's just less efficient.