Flexible 3D - Reconstruction techniques with applications in archeology 

by M. Pollefeys, M. Proesmans, R. Koch, M. Vergauwen and L. Van Gool

In this paper two different 3D acquisition techniques for 3D acquisition which we have applied to reconstructions of objects, monuments and buildings of the archeological site of Sagalassos (in Turkey), are discussed.  An overview reconstruction of the complete archeological site was also generated.  Both techniques heavily rely on state-of-the-art computer vision and image processing algorithms.  The many advantages of these techniques make them very useful for archaeology and allow the emergence of a variety of new applications. 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 
 


 

Introduction 

On many archaeological sites an enormous amount of time is spend to obtain metric measurements.   Sometimes this is done with simple rule or tape measure; sometimes far more complex techniques are used.  All existing techniques suffer severe disadvantages.  They are too expensive, too complex or require too much effort. 
On the other hand 3D measurements are critical on archaeological sites.  During the diggings much evidence is destroyed.  A correct and precise annotation of all findings, stratigraphic structure, etc is therefore required.  Due to practical limitations of the existing techniques the archeologist often has to consider tradeoffs: only a 2D slice of the stratigraphic structure is recorded, some photos are taken in stead of measuring up the whole scene in 3D, etc. 
The techniques that are presented in this paper form the state-of-the-art of what is available in easy, low cost 3D acquisition.  They do not only allow efficient recording of 3D data, but also generate 3D-models that are perfectly suited for visualization.  Therefore Virtual Reality models are obtained at no extra cost.

 
 

NEXT 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 

High-quality surface reconstruction with single-shot range sensor 

The technique presented in this section is an active technique (Proesmans et al., 1996; Proesmans and Van Gool, 1998).  It is very flexible and only requires a minor calibration step.  A pattern is projected with a standard slide projector onto an object.  A single image of the illuminated object is then sufficient to obtain a very accurate textured 3D reconstruction of that object.  This technique was used to generate models of statues and masks found at Sagalassos. 
The advantage of this technique is the fast delivery of accurate 3D models of objects at a fraction of the price of a laser range scanner.  This is due to the use of off-the-shelf components - slide projector and camera - complemented with software. 

Overview of the method 
Starting from an image that contains the projected grid the first step consists of extracting horizontal and vertical lines.  From this an initial grid is constructed.  In general this grid still contains some inconsistencies.  These are detected and corrected.  Then the grid is refined to obtain subpixel accuracy.  From the deformation of this final grid the shape is computed.  At the same time the lines are filtered away to obtain the texture. In figure 1 an overview fo the different steps of the method is given.

 

 
System Calibration 
Given the restriction of pseudo-orthographic projection, one can verify (Proesmans et al. 1996) that the shape can be recovered up to a scale, if the angle between the directions of projection and viewing is known. In order to find this angle, the system is initialized or 'calibrated'. For calibration it suffices to show two planar patches and to give their subtended angle.  In practice, showing a box or corner of a room and specifying the angle is all it takes to perform this calibration.  The system will extract the grid as explained below, looks for the two largest planar regions within that grid and using the knowledge about their orientation (typically 90 degrees) gives the angle between directions of viewing and projection. 

Extraction of the grid 
Once the system is calibrated, the object to be modeled in 3D can be shown. The system first extracts the grid lines in the image, through a set of horizontally and vertically aligned line detectors. The different edge segments are then linked to two sets of crossing contours. The crossings are used to generate a complete grid representation of the pattern. Each grid point or 'node' has at most 4 neighbors which are labeled N, E, S and W.  Note that nodes that are vertices of quadrangles are of special interest, since they indicate how the square pattern is deformed under projection.  In order to determine the lines with sub-pixel precision, a snake-like process is introduced.  The problem is stated as the minimization of the energy functional that relates both the intensity along the grid connections as well as its smoothness.  This makes sure that the grid will move towards the darker parts of the image and let it more closely follow the pattern lines. 

Correction of the extracted grid 
Ideally, we would now have a complete set of all the line intersections or nodes, together with their connections. Of course, the extracted grid will not be perfect yet.  Nodes or their connections may have gone undetected, and similarly spurious nodes and connections may have been found.  Furthermore, in order to arrive at a 3D reconstruction the connectivity of the lines has to be known.  The important difference with classical techniques lies in the fact that no absolute correspondences between the lines in the grid and the lines in the image are (nor can be) derived. Finding the correct connectivity between lines is the crux of the matter (see Proesmans et al. 1996). To that end, iterative procedures mend the initially extracted grid. Two principles underlie this mending.  First, the grid should look square when viewed from the direction of the projector.  Secondly, the mending should lead to a consistent numbering for the two sets of grid: whatever path is taken between two nodes, the grid based city block distance should be the same. Based on these principles, connections are broken and created.  These procedures allow the system to work its way around local discontinuities in the grid, e.g. due to sudden changes in depth.  A limitation of the system as described is that the grid patches should be connected somewhere in the image to be reconstructed as a single surface. 

Extraction of 3D shape 
As soon as the grid has been extracted and the connectivity has been established, the extraction of the 3D shape is quite straightforward.  The grid in the image plane can be modeled as the cross-section of two sets of planes with the object's surface.  To each node a depth value can be assigned (see Proesmans et al. 1996).  Note that the reconstruction is up to a scale factor.  The knowledge of a single distance between two points would fix the scale. 

As mentioned, the ability to deal with discontinuities is restricted to connected sets of data. If two or more separated objects are present in the scene, they will be reconstructed independently in the sense that their relative positions are not extracted. 
For many applications in visualization, this is not necessarily a problem, because objects are either shown separately or the scene is synthesized starting from individual object models.  Discontinuities by self-occlusion need not be a problem, however.  The system recovers from self-occlusions. Of course, there are gaps in the reconstructions for those parts not visible to the camera. 
It is noteworthy that the system yields a genuine surface description, not a mere cloud of points. The connections found between the nodes are used in the description, that consists of small bilinear patches (the little squares of the pattern). Thus, problems with retrieving the correct topology are largely eliminated. 

Extraction of surface texture 
With the given setup, shape and texture extraction could in principle be very simple by just alternating between images with and without the grid. The former would yield shape and the latter texture. Alignment would be perfect as the same, fixed camera takes both images. However, the need for switching the pattern on and off would complicate system operation. For one thing, it would require additional provisions to block the pattern. Moreover, if the alternations should follow each other quickly and automatically, this would require the synchronization of this blocking with the image acquisition by the camera. All this much to the detriment of the simplicity gained by using nothing but a simple camera and slide projector. 
Therefore, a method was developed (and patented as the rest of the system) that extracts surface texture with the pattern on. It does away with the need to alternate between shape and texture frames altogether. The underlying principle literally is to read between the lines.  This might seem a weird approach at first, but actually the human visual system solves a similar problem. The blood vessels on our retina are positioned in front of the photoreceptors.  As with the blind spot, we are hardly aware of these obstacles because they are filtered out.

 
 

NEXT 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 

3D reconstruction from uncalibrated image sequences 

The second technique (Pollefeys et al. 1998; Koch et al. 1998) starts from an uncalibrated image sequence of a scene.  Using nothing else than the images a textured 3D reconstruction of the recorded scene is obtained in an automated way.  The sequence can be taken with a simple hand-held video- or photo camera.  The camera need not be calibrated and zoom and focus can be used freely.  The motion is unconstrained and we do not make use of any reference points.  In addition the method is just as easy to use for small objects as for complete sites.  Our method therefore offers a lot of flexibility and is easy to use.  We have for example obtained a global reconstruction of the whole site of Sagalassos (Turkey), but also of separate monuments. 
These are shown on this site. 
This technique could therefore be an alternative for a wide variety of measurement techniques used nowadays by archeologists, which range from drawings and measurements done manually, over traditional photogrammetry to the computation of a DTM (digital terrain map) from aerial photographs. 

 Overview of the method 
In this paragraph a brief overview of 3D acquisition from multiple views is given.  Starting from the images the knowledge about the scene and the cameras is gradually increased.  At first a partial camera calibration is obtained together with a sparse reconstruction of the scene.  This is all done using robust algorithms to insure that the algorithms can work on real images.  The next step consists of upgrading the calibration of the cameras.  Imposing some constraints that are in general valid for standard cameras does this.  The next step consists of determining the complete surface (i.e. not only a few feature points).  This is done through dense correspondence algorithms.  Finally a 3D-model is build which contains the geometric (i.e. the shape) and photometric (i.e. the color and the texture) information of the scene.  These models can then be used for visualization or measurement purposes. 
In Figure 2 an overview of the method is
shown

 
 
 
The acquisition of the image sequence 
The first step is to take the images from which the 3D model of the scene will be generated.  It is important to take some restrictions of our method into consideration at this point.  Since the images are to be matched automatically consecutive images of the sequence shouldn't be too different.  Consecutive viewpoints should be targeted at more or less the same point and shouldn't be separated by more than 5 to 10 degrees.  Illumination conditions are also important.  If shadows are present at the time of the image acquisition, these will be found back in the final model.  Additionally the reconstruction of parts that are in the shadow will be more complicate due to the restricted range of the intensity values in these parts of the images.  Therefore the ideal circumstance for image acquisition is an overcast sky.  Quite good results can also be obtained in other circumstances. 
The images can be taken with different types of cameras.  We have already used video cameras, digital still cameras and standard photo cameras.  In this last case the images can either be scanned in or put on photo-CD.  Typically images of 700x500 were used.  When a higher resolution was available it was used to get a better texture map for the final model. 

Relating the images 
A first step in the reconstruction procedure is to relate the different images to each other.  Therefore it is necessary to put points from different images into correspondence (i.e. finding the points which originate from the same 3D point).  When the images are not too different the neighborhood of these points will have the same appearance and they can therefore be matched through normalized intensity correlation. 
To avoid a combinatorial explosion of the problem only a restricted amount of so-called interest points will be considered at this stage.  These are extracted through a special filter which selects corner-like features from the images (Harris, 1988).  In fact these features are often not real corners, but this filter tends to extract the same points in consecutive images (i.e. points corresponding to the same 3D points). 
An important concept to simplify the matching process between two images is the epipolar geometry.  This expresses the fact that a point which is situated on a specific plane through the two camera centers is bound to be projected on the intersection lines between that plane and the respective image planes.  These lines are therefore in so-called epipolar correspondence (i.e. the corresponding point of every point on an epipolar line in one image should be found back on the corresponding epipolar line in the other image).  All these epipolar lines pass through the epipole.  This point is the intersection of the image plane with the line through the two camera positions. 
If this geometry is known the matching complexity is reduced from a 2D (whole image) to a 1D search (epipolar line).  In uncalibrated circumstances the only way to obtain this epipolar geometry is by computing it from a number of matches.  We are therefore confronted with a chicken and egg problem.  The approach that is followed first obtains a set of initial matches, then computes some candidate epipolar geometry from a minimal subset of matches.  If this epipolar geometry receives enough support from the rest of the matches, it is accepted, else another subset is selected and the procedure is repeated.  Once some satisfactory epipolar geometry has been obtained it can be used to generate more matches who in turn allow refining the epipolar geometry (see Torr (1995)). 

A first reconstruction 
Once the epipolar geometry and a set of corresponding points are known it is possible to generate a first reconstruction.  This reconstruction is only determined up to a projective ambiguity.  This means that besides not knowing the absolute position, orientation and scale of the scene some additional parameters are missing.  In fact parallelism and orthogonality can not be verified at this stage and metric properties like relative distances and angles can not yet be measured. 
This initial reconstruction is only based on the two first images of a sequence and is therefore far from complete.  For additional images the epipolar geometry (with the previous image) is determined as described earlier.  Then the relative position and orientation of the camera in the frame determined from the first two views is determined in a similar way through the already reconstructed points.  At this point the reconstruction is refined, extended and corrected.   A technical description of this approach can be found in Beardsley et al. (1996). 

Obtaining metric information 
The projective reconstruction obtained in the previous paragraph is clearly not satisfactory for our purposes since both visualization and measurement applications require at least a metric reconstruction.  Luckily algorithms have been developed which can upgrade projective reconstruction to metric reconstructions.  The approach presented here is based on the work described in Pollefeys et al. (1998). 
It is always possible to factorize the cameras in intrinsic (e.g. focal length) and extrinsic parameters (i.e. position and orientation of the camera).  In the case of a projective reconstruction the obtained parameters do not have a physical interpretation.  In the metric case however they correspond to actual physical entities that are subject to some constraints.   Self-calibration therefore consists of finding the projective transformation that transforms the initial reconstruction into a reconstruction for which all these constraints are satisfied. 
The most successful methods to achieve this are based on some abstract geometric concepts.  When a camera is moved through space, its relative position to the plane at infinity and the absolute conic does not change.  This property allows identifying both entities in projective space.  These entities encode the metric properties of space and are therefore equivalent to a metric calibration. 
Once the metric calibration is retrieved, the initial reconstruction can be transformed to a metric reconstruction.  This reconstruction is now a true scale model of the initial scene that was recorded.  The absolute scale can be determined through a single measurement in the scene. 

Dense correspondences 
At first, only the correspondences for specific feature points were extracted.  In this paragraph it will be discussed how the correspondences can be determined for almost all the points of the images.  In this way a full surface model of the recorded scene can be reconstructed.  This is not only important for visualization, but also for measurements since not all the interesting points are always retrieved as feature points in our sparse reconstruction. 
The epipolar geometry greatly simplifies the matching problem.  As described previously corresponding points have to be found on each other's epipolar lines.  The matching problem can therefore be solved line by line. 
To obtain efficient algorithms the images are first warped so that corresponding image rows are also corresponding epipolar lines.  This procedure is called rectification. 
The matching proceeds row by row over both images.  Not only the correlation score is taken into account, but a cost is also included for discontinuities in the surface.  Some other constraints are also taken into account (e.g. uniqueness, ordering).  An optimal solution can then be obtained through dynamic programming. 
The result is a disparity map.  This is an image were the value associated with each pixel indicates the position of the corresponding pixel in the other.  Since at this point the cameras are calibrated this can immediately be translated to a dept map. This is an image that contains the distance from the camera center to the scene along direction corresponding every pixel.  This is often called a 2.5D representation. 
To obtain a more accurate and more complete depth-information more than two images should be used.  We have implemented an approach, which is described in detail in Koch et al. (1998).  At first the depth maps are computed pair-wise for the whole sequence.  Then, to compute accurate depths for a specific image, the disparities are used to create a correspondence chain that goes over multiple images.  The initial depth is refined by taking into account the other correspondences of the chain using an extended Kalman filter.  The chain is interrupted once the back-projected ray falls outside the 95% probability bounds. 
An important advantage of this approach is that one can go both backward and forward in the sequence.  This allows avoiding most of the occlusion problems one typically encounters with stereo.  In addition the inaccuracy of small baseline and the difficulty of matching for wide baseline stereo are both solved through this technique. 
Additionally this multi-viewpoint stereo also allows improving the quality of the texture.  One 3D point is observed in many images and the texture can therefor robustly be estimated.  Highlights and other artifacts can easily be removed. 

Building the model 
The 3D-model surface is constructed of triangular surface patches with the vertices storing the surface geometry and the faces holding the projected image color in texture maps.  The texture maps add very much to the visual appearance of the models and augment missing surface detail. 
The model building process is at present restricted to partial models computed from single viewpoint and work remains to be done to fuse different viewpoints.  Since all the views are registered into one metric framework it is possible to fuse the depth estimate into one consistent model surface.

 
 
 

NEXT 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 


 

Applications 

These techniques will not only greatly facilitate the acquisition of 3D data for existing applications, but will also allow the appearance of new applications.  In this paragraph a few possible applications of the presented techniques are described. 

3D Measurements 
Once a 3D surface model of an object is obtained, the model can be used to obtain 3D measurements.  Our models are up to scale and can therefore immediately be used to obtain relative length measurements.  Carrying out a single length measurement in the real scene allows getting absolute lengths from the model.  Measuring lengths on the 3D model is much simpler than in the real scene.  It is sufficient to click on two points on the model surface and the computer immediately computes the distance between the two points.  When the model coordinates are aligned with the world coordinates a simple click on the model surface results in the absolute coordinates of the indicated point.  It is clear that these procedures result in an important gain of efficiency for generating plans of the archeological sites. 

Virtual Exhibitions 
The models delivered by the different techniques presented in this paper are well suited for virtual exhibitions.  The models have a high visual quality since they are immediately generated from real images.  Complex shapes like statues, masks or pottery can accurately be modeled.  An important advantage of virtual exhibitions is that it allows placing the object back in their natural environment.  In this respect the method based on multiple views offers the advantage that the environment can also be 'virtualized' without too much effort. 

Virtual Reconstruction 
The presented techniques do not only allow acquiring the 3D-model of the remains of a building, but also of the broken down parts which are retrieved on the site.  It becomes therefore possible to carry out a 'virtual reconstruction'.  The different parts of a monument can be reassembled on the screen of a computer. 
This offers a flexible method to test reconstruction hypothesis.  Additional tools could even be developed to simplify reconstruction even more.  Broken parts of pillars could for example automatically be matched using standard registration software (see e.g. Chen and Medioni, 1991). 

Annotations 
Since much evidence is destroyed during diggings it is very important for archaeologists to annotate all the findings.  Traditionally a lot of pictures are taken, many plans and notes are made and the stratigraphic structure of slices of the terrain is drawn. 
An alternative would be to do a 3D recording of a location after the removal of every stratigraphic layer.  In this 3D reconstruction specific models of pieces of interest could be inserted at the exact location were they were discovered. The whole could then be assembled to a complete annotation model were a slider would allow to travel through time by adding or removing stratigraphic layers at will.

 
 
 

NEXT 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 


 

Conclusion 

In this paper two techniques for 3D acquisition were presented.  They both heavily rely on state-of-the-art computer vision and image processing algorithms.  Compared to traditional acquisition techniques the complexity of the acquisition has been moved from hardware to software.  Additionally, geometric insights have been used to avoid complicate calibration procedures and provide a maximal flexibility in acquisition.  These different factors make the presented techniques especially suited for use in archaeology where often non-technical people have to carry out 3D acquisition in difficult circumstances. 
Additionally the presented techniques offer the possibility to develop new applications or to improve existing approaches.  A few examples of these applications were given in this paper.
 
 
 

NEXT 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 


 

Acknowledgement + References 

We would like to thank Prof. Dr. Marc Waelkens and his team for making it possible for us to do experiments on the archeological site of Sagalassos (Turkey).  We would also like to thank Dr. Andrew Zisserman and his team in Oxford for providing us with robust projective reconstruction algorithms.  The financial support of the European ACTS project VANGUARD and of the Belgian IUAP project IMechS are gratefully acknowledged.  Marc Pollefeys and Marc Proesmans both thank the IWT for financial support.
 

1. Beardsley, P., Torr., P. and Zisserman, A. 1996, ? from Extended Image Sequences,  Proceedings ECCV'96 (European Conference on Computer Vision). 
2. Chen, Y. and Medioni, G. 1991. Object Modeling by Registration of Multiple Range Images, Proceedings International Conference on Robotics and Automation. 
3. Harris, C. and Stephens, M. 1988.  A combined corner and edge detector, proceedings Fourth Alvey Vision Conference, pp.147-151. 
4. Koch, R., Pollefeys, M. and Van Gool, L. 1998.  Multi Viewpoint Stereo from Uncalibrated Video Sequences.  Proceedings European Conference on computer Vision, Lecture Notes in Computer Science, Springer-Verlag. 
5. Koch, R., Pollefeys, M. and Van Gool, L. 1998.  Automatic 3D Model Acquisition from Uncalibrated Image Sequences, proceedings CGI'98 (Computer Graphics International) 
6. Pollefeys, M., Koch, R., Vergauwen, M. and Van Gool, L. 1998, Metric 3D Surface Reconstruction from Uncalibrated Image Sequences, In proceedings SMILE Workshop, Freiburg, Germany, Lecture Notes in Computer Science, Springer-Verlag. 
7. Pollefeys, M., Koch, R. and Van Gool, L. 1998.  Self-Calibration and Metric Reconstruction in Spite of Varying and Unknown Internal Camera Parameters, Proceedings ICCV'98 (International Conference on Computer Vision), Bombay, 1998. 
8. Proesmans, M., Van Gool, L. and Oosterlinck, A. 1996. One-shot active range acquisition, Proceedings ICPR'96 (International Conference on Pattern Recognition), Vienna, Vol. C, pp.336-340. 
9. Proesmans, M., and Van Gool, L. 1998. Reading between the lines - a method for extracting dynamic 3D with texture, Proceedings ICCV'98 (International Conference on Computer Vision), Bombay, 1998. 
10. Torr, P. 1995. Motion Segmentation and Outlier Detection, Ph.D. Thesis, Dept. of Engineering Science, University of Oxford.
 
 
 

BACK TO THE START OF THE ARTICLE