Airmap3d, that still happens in my case but, mostly, only in areas where the number of projections is low (less than 3 photos). Things improved a lot considering geometric defined structures, such as homes.
From now on and forever in computer graphics terms, I will always correct researchers and graphics artists when they use crazy terms like z-depth. "NO! NO! NO! I shall have you know it is called Y-depth! no X-depth!" Oh I give up
From a GIS point of view, z is depth so z-depth is correct
To describe the surface os the earth, Y is used for Latitude, X for Longitude (geoidal surface of the earth after unfolding or projection) and Z for Altitude (or depth hehehehe), precisely as PS defines the Axis orientation when real world coordinates are used. In most 3D applications, such as Maya, Cinema 4D, you will get the same Axis orientation if you use the Front View. So, indeed, if you get a human figure orientated in front view standing up, X will be from hand to hand, Y from feet to head and Z will be the depth. The confusion is that, in Mapping and GIS applications, this is a top view and not a front view. It's indeed confusing and that's why it could be useful to allow axis shifting according to convenience.
This problem goes far beyond this because is one of the main issues when you try to import terrain models, generated from real world data, to most 3D applications. When we import a DEM, for instance, it's a mess. The terrain does not adjust to the planar default grid unless you do a X-90 degree shifting. Real world coordinates are all messed up. This is important if you need to use most dynamic and atmosphere effects from 3D apps. GIS and Mapping applications do not mix well with 3D modeling and animation applications unless you do some tweaks to your data, as I always do.