I know your work, and I sincerely admire it.
but I still think that your case is not that common. I might have misunderstood something but
these are its specifics to my understanding.
1. You have more than one camera. (how could the more common single camera user do the trick of taking all photos with and without subject?).
2. Your subject is not only separated from the background, but also removable.
3. You would probably like to mask thousands of photos that share the same attitude and background to subtract. It is indeed a reasonable desire because it sounds boring... but still rather specific imho.
I wish I could say I have the same needs as you because that would mean I have your flock of cameras and your magic hopefully.
Let me point that in your setup there are interesting invariants that could be exploited if photoscan allowed that, and here (again imho) it could make a little more sense to ask for such a feature. As long as your setup is fixed in all aspects but the subject. Why the need of calculating external parameters for each set of simultaneous photos. and why not fix the bounding box too. You should be allowed to do it just once in that cases and go directly to dsm phase.
For automating masking, let me think aloud about Tezen's approach.
suppose you have the sparse cloud, then you can resize and orient the bounding box to fit your subject... back-project the 3D points inside the box to all images and think about this new planar cloud onto your images. how could we convert that cloud into a mask?
Tezen suggest a trick that works as if those pixels were binarized and expanded until they fill a certain region but that would probably exceed subject's boundaries (which is bad) and leave holes where little feature points were found... we need a better hint.
Convex-hull is my first bid
Here starts the game: share your thoughts
my kindest regards
1. If the user, who has only 1 Camera uses a white, black or colour background the same thing could be useful. A way to automate the masking procedure. Although this could be done with a python script as well in Agisoft.
All this could be automated, to a point, outside Agisoft but it is not trivial for allot of images and damn hard work, not just boring but physically and mentally demanding. For example I have recently just finished processing 270 expressions of 3 actors for a client. The task is taxing to say the least.
I've been contacted by 5 companies in the last few weeks who are talking about their own multi-camera system with agisoft and how they can improve their workflow. I think you will find multi camera setups will start to become the norm for most people and companies who are serious about capture. Even if it's just a stereo pair, it's still useful. 1 Shot with subject, 1 shot without. Even if they Chroma key, this idea would still work in that scenario. Rotate and repeat. Automask.
I can understand for users with 1 camera they may not need it but for license payers who have shelled out for the PRO version, like myself, it would be ideal. Thanks to Alexey's super hard work Agisoft is already improving greatly even with the last few upgrades thanks to his innovation. This feature along with 4D processing would place Agisoft into another class of production proven software.
The system here is not always fixed. Automasking would shave off hours, possibly days of manually labour. Per Pose!