Image Based Localization
thesisposted on 2018-11-27, 00:00 authored by Mahdi Salarian
The proposed research was motivated by the need for developing assistive technology to provide the capability for people who are blind or visually impaired to navigate outdoor, un-assisted, as sighted people can do. This goal was set by a problem posed to us by some members of the National Feder- ation of the Blind. Numerous navigation methods are available to aid the blind and they are primarily based on the use of GPS technology. However, they are often ineffective in very dense area of cities due to GPS signal degradation. The research in this dissertation seeks to devise a key ingredient to sup- port the objective of developing a system to enable ease of outdoor navigation especially for crossing streets. While the overall problem of navigation involves many issues such as traffic and pedestrian signal recognition, vehicle recognition and their velocity estimation and finding their distance, this dis- sertation is largely focused on research to enhance accurate global position estimation that is essential for navigation. While most of the recently proposed methods to estimate position utilize GPS for out- door applications, some methods consider the use of additional sensors such as compass and camera to achieve better results. One thrust of this dissertation research is to develop robust methods for accurate localization using mobile devices. The main goal in this effort is to develop algorithms and efficient im- plementation for compensating GPS ineffectiveness in dense areas of cities and especially for accurate localization of a pedestrian walking on street sidewalks where GPS accuracy suffers due to proximity to walls and buildings. We seek to overcome the shortcomings of current methods by adopting approaches that venture beyond available pure retrieval methods. We augment our knowledge of the approximate position of query image by incorporating additional sensor information to adaptively select the search area. We use the hitherto unutilized information of the Error Positioning Estimation (EPE) as an aid to address issues of complexity and accuracy in the large-scale geo-tagged dataset to narrow down the search space for a query and thereby improve the efficiency by reducing the number of images that should be searched for a particular query. To compare our proposed method based on adaptive selection of search radius, we created our query for the Chicago dataset by including GPS uncertainty, so that each query image is provided with the associated EPE information. To test the system in a real-world situation a client server application is developed, where the client is an Android Application responsible for acquiring all necessary information such as query image, position data, EPE, Pitch, Yaw and Roll which is sent to a server through TCP-IP protocol. In our second contribution we exploit availability of multiple images of the scene captured from different perspectives to perform the location coordinate estimation. A new approach based on Structure From Motion (SFM) is proposed that shows better performance in terms of accuracy. Since we know that even successful retrieval only returns the position of the vehicle from which the image was captured not the actual position of the query, we propose a method to refine the query position relative to some retrieved images. While the query image in a successful retrieval may be similar to the stored dataset images, positions from which the images are captured are not necessarily close. The error range has been reported to be as large as 30 meters that would not be useful for navigation especially for the blind. For selecting multiple images with similar content we examine different strategies. We propose a method to optimally select images to achieve higher convergence rate in the SFM process. The criterion for selecting image in our proposed method is that the selection should be similar not only to the query image but also to all images in the final selected set. To evaluate and compare our proposed algorithm with other results the San Francisco dataset is used. The final localization error in most of the cases is below five meters which is significantly better than other reported results and suitable for navigation. In our third major contribution to improve the performance of the system the image retrieval engine is modified by considering more spatial information in vectors representing images. In this context, instead of frequency of visual words we consider the scale of feature descriptors in a vector represent- ing images. The new vector, called Bag Of Scale-Indexed Features (BOSIF), does not impose higher memory usage than that required in Soft Assignment while significantly improving the recall rate. We also propose a hybrid method that combines our proposed method with the well-known Adaptive As- signment algorithm and show how the hybrid method provides recall performance that is better than either method while maintaining the level of memory usage. We show how our proposed method can be combined with Adaptive Assignment to provide a performance better than either method.