Hi, Martin!
1.1) Depth Pro - performs processing in 0.3 seconds on a Tesla V100 with 32 GB VRAM, and we're only talking about 2.25 megapixels here.
1.2) Metashape supports many video cards from different vendors (including those that do not support CUDA, including those with quite little VRAM), to process even large 100+ megapixel images as fast as possible, etc.
2.1) Such methods are well suited for VFX (bokeh effect, etc.), but poorly suited for photogrammetric tasks, that's why there are no measurements of accuracy in real units of measurement (centimeters/millimeters) in the article, besides such neural networks will not be able to solve the problem in unfamiliar conditions (for example, when shooting in a well, if there were no such frames in the training dataset).
2.2) Metashape relies on the laws of optics and geometry, which are universal and reliable in any environment (even underwater), have a certain predictability in accuracy and therefore allow the use of photogrammetry for construction/measurement tasks.
Indeed, ideas from such methods can be used, for example, to try to improve a depth map constructed by photogrammetric methods - by patching holes in it. But this requires a lot of work to bring an academic idea to a reliable industrial application, will slowdown processing a lot and have other problems. In particular, there is a risk that such a method will often spoil the depth map - by patching holes that are actually holes (as a mental experiment - for example, imagine a lattice in a window, and through the hole of this lattice you can see a perfectly clear sky - what is it? a hole in the lattice, or a white sheet of paper in the lattice?).
If you are talking about building depth maps from single frame (i.e. without taking parallax into account), then this is a task that should rather be solved in VFX programs for creating effects in video, while Metashape is focused on photogrammetry tasks.