Hello jkova96,
According to the images in the dataset I would say that there are several reasons of alignment issues:
1. Quality of the images
1.1. Shallow focal depth that leads to the blurred output - if you can use a tripod for camera mount, you can increase the F-stop number up to 11.
1.2. In addition to shallow focal depth it also seems that in most cases the focal point is not on the object of interest itself, so some part of the background appears sharp, but not the object.
1.3. High ISO value introduce additional noise to the images - using a tripod would allow to reduce ISO to minimal values, like 100, for example.
2. Image aquisition scenario
2.1. The object is mostly untextured, so to capture as much details as possible it is better to avoid any reflections. Thus, you may need to use controlled environment: white box, diffuse uniform light, polarizing filter and etc.
2.2. More effective use of the image frame space is recommended, currently the object uses less than 10% of the image space, i.e. only a couple of MPix. Not sure, though, if it is possible with the smartphone due to the size of the object, but probably some "macro" mode can be utilized.
2.3. You can add intermediate "circle" of images around the object, as images taken from top are actually quite similar - so using 45 degree shots may give additional information and would help to align together images taken from the top and from the sides.
But generally, I would recommend to work out the suitable approach for better textured object of the same size and then develop it further when working on poorly textured objects. For example, for test you can add some non-uniform texture pattern to the same figurine (using washable paint or powder) and perform scanning using a tripod and considering my comments. I think you would be able to get proper mesh for the model.