Just a few suggestions:
See the attached image below identifying a sample of three of your images, which i assume were taken all standing in the same spot.
In this case, IMG_7151.jpg is almost completely redundant as it only contains information already in IMG_7150 and IMG_7152. It may be useful in the image alignment stage, but for depth map/dense cloud and subsequent mesh generation you may as well disable all these 'middle' images and save some processing time and memory. If you consistently shoot 3 or 4 images with the same angles and overlap at each position, then it is very simple to disable every 2nd, 3rd or 4th image in one go by setting the width of the photos pane to get the required number of columns, then selecting a whole column by click/dragging and right click->disable.
The other thing about these 3 images is that they are only really good for reconstructing that end wall, as it the only part of the scene that is close to parallel to the image plane. Unfortunately it's a very long way away! Also there is so much variation is depth in the image, from the floor, seats and ceiling close to where you are standing all the way down to the end wall that there is little chance of all that being within your camera's depth of field, and therefore in sharp focus.
I think it is better to treat a room as 4 individual walls, a floor and a ceiling. Then shoot each of these 6 elements as if you are doing an aerial survey in as close to a grid configuration as possible. When you are stuck at ground level you can't do that, but at least just take all images in a nice row perpendicular to the surface you are shooting, and at a close enough range to get the detail you need. Shoot up and down to get the top and bottom of walls. Ceilings are great as you can almost always do a nice grid pattern even if you get a sore neck doing it... Floors are a bit tricky as you always get your feet in the shot, unless you can elevate the camera with a pole of some sort.
Once you have acquired the 6 main elements (walls, floor, ceiling) you need to think about how you join these together, probably with a whole load of photos pointing in the corners at various angles from various positions. Personally i would at least begin by processing walls, ceilings and floor separately in chunks, then the joining 'corners' separately in chunks, and then merge them all when i was happy, although that doesn't always work out... (If you include the 'corner' photos with their 'related' walls, so you have the same 'corner' photos in multiple chunks, then you can merge chunks based on these common photos (i think it's called merge by cameras), which is very computationally quick and cheap, although it does mean you have more photos in each chunk to process to start with, and then duplicates to remove once you have merged the chunks.)
None of that is much applicable to using a fisheye lens which will acquire everything in every shot and with great depth of field. Unfortunately i don't think photoscan standard supports fisheye lenses, but it may still be worth a go, even just a quick test with 5 images taken at home!
Finally, i haven't tried using my smart phone since i got access to DSLRs, but i am confident you will get way better results with your D3200. Processing in chunks means you can use more images, and every time you reduce the 'align images' or 'build dense cloud' quality level by 1 it is equivalent to reducing the megapixels of your input photos by a factor of 4, so 24MP quickly becomes 6MP, and the improved dynamic range and optical quality etc should outweigh any reduction in number of pixels.