Journal Search Engine
Search Advanced Search Adode Reader(link)
Download PDF Export Citaion korean bibliography PMC previewer
ISSN : 1975-6291(Print)
ISSN : 2287-3961(Online)
Journal of Korea Robotics Society Vol.13 No.1 pp.26-30
DOI : https://doi.org/10.7746/jkros.2018.13.1.026

Planetary Long-Range Deep 2D Global Localization Using Generative Adversarial Network

Ahmed M.Naguib1, Tuan Anh Nguyen2, Naeem Ul Islam3, Jaewoong Kim4, Sukhan Lee

※ The original ideas presented in this paper are suggested by Prof.
Sukhan Lee.

Corresponding author : Intelligent Systems Research Institute, School of Information and Communication Engineering Sungkyunkwan University, Suwon, Korea (lsh1@skku.edu)
20180115 20180211 20180213

Abstract

Planetary global localization is necessary for long-range rover missions in which communication with command center operator is throttled due to the long distance. There has been number of researches that address this problem by exploiting and matching rover surroundings with global digital elevation maps (DEM). Using conventional methods for matching, however, is challenging due to artifacts in both DEM rendered images, and/or rover 2D images caused by DEM low resolution, rover image illumination variations and small terrain features. In this work, we use train CNN discriminator to match rover 2D image with DEM rendered images using conditional Generative Adversarial Network architecture (cGAN). We then use this discriminator to search an uncertainty bound given by visual odometry (VO) error bound to estimate rover optimal location and orientation. We demonstrate our network capability to learn to translate rover image into DEM simulated image and match them using Devon Island dataset. The experimental results show that our proposed approach achieves ~74% mean average precision.


생성적 적대 신경망을 이용한 행성의 장거리 2차원 깊이 광역 위치 추정 방법

아 하메드 엠1, 나 기브2, 투 안 아인 뉴엔3, 나 임 울 이슬람4, 김 재 웅†, 이 석 한
1Intelligent Systems Research Institute, School of Information and Communication Engineering Sungkyunkwan University, Suwon, Korea
()

2Intelligent Systems Research Institute, School of Information and Communication Engineering Sungkyunkwan University, Suwon, Korea
()

3Intelligent Systems Research Institute, School of Information and Communication Engineering Sungkyunkwan University, Suwon, Korea
()

4Intelligent Systems Research Institute, School of Information and Communication Engineering Sungkyunkwan University, Suwon, Korea
()

초록


    Ministry of Science, ICT and Future Planning
    2015- 10060160Ministry of Education
    2017R1A6A3A11036554

    1.Introduction

    Autonomous navigation is a necessity for planetary rover to be able to traverse long-range distances. Although there are numerous localization techniques developed to assist rover autonomous navigation, such as Visual Odometry (VO) and wheel odometry, they often suffer from growing error due to the lack of absolute reference. Filtering algorithms and bundle adjustments can effectively reduce the growing error, but they cannot totally eliminate it. Thus, for a long-range mission, a global localization algorithm is needed in which the location and orientation of the rover is estimated with respect to global absolute reference such as planetary inertial frame, Universal Transverse Mercator (UTM) frame, Topocentric frame, … etc. There are many research approaches that address the problem of planetary global localization. Multi-frame Odometry-compensated Global Alignment (MOGA) [1] uses LIDAR data and match it to Digital Elevation Map (DEM). LIDAR is often too heavy and power demanding to deploy on a rover. 3D stereo reconstruction, on the other hand, typically generates reliable point cloud up to 40m from the rover. With DEM resolution of about 2m per pixel, the accuracy global localization by matching 3D features from a single frame becomes limited. In order to accurately localize rover from a single frame, 2D images based matching is needed, since mountains, craters, skylines, and other 2D features are not mostly limited by distance.

    Stein et al. [2] matches skyline in rover 2D image with that rendered from DEM to estimate rover global pose. Similarly, Visual Position Estimator for Rovers (VIPER) [3,4] matches skyline in rover 2D image with that rendered from DEM using local geometrical shape features such as mountain peaks and shapes. Li Wei, et al. [5] solves the same problem as VIPER. They used more robust skyline detection by major line detection [6] method. Strong skyline was used with a more robust Bayesian Network for matching. Both approaches, however, suffers from low DEM resolution as well as artifacts in both DEM rendered images, and/or rover 2D images due to illumination variations and small terrain features as shown in [fig. 1]. Yicong Tian, et al. [7] shows that deep CNN outperforms traditional approaches in matching local and global images under variations. They use Faster R-CNN to match buildings in street-level image and geo-tagged tilted aerial photos for the purpose of geolocalization in urban environment. Similarly, Lin et al. [8] use CNN pretrained on ImageNet [11] and Places [12] to match google street view with geo-tagged tilted aerial photos. Workman et al. [9], on the other hand, use CNN pretrained on Places [12] to match street-level image with ortho-maps for estimating optimal geo-location.

    In this paper, we propose a novel Planetary Long-Range Deep 2D Global Localization Using Generative Adversarial Network. Our objective is to search a given space defined by the error bound of VO to find rover location. We divide the 6DOF space into 6D grid cells, each with specific pose for rover camera. For each cell, we simulate a virtual camera with a cell’s pose and render virtual 2D image. While simulated image does not reflect local features that are too small to be seen by DEM, the general underlying terrain and skyline are mostly preserved. We match this simulated image from the cell with true image captured by rover camera. In this case, we use initial matching by extracted 2D major lines [6] in both images. We also obtain matching score for each cell and choose the optimal cell as rover 6D Pose.

    2.Long-Range Deep 2D Global Localization Network

    Uncertainties in captured rover images due to local geometrical shapes that are too small to appear in DEM, and illumination variations may lead to difficulties in matching images using conventional approaches. To overcome this challenge, we propose Long-Range Deep 2D Global Localization. We use conditional generative adversarial network (cGAN) to generate “fake” DEM-simulated image corresponding to a captured rover image. We then use this “fake” pair against “true” pair of rover image and “true” DEM-simulated images to train a CNN discriminator to distinguish between the fake and true pairs, effectively matching the true correspondences.

    2.1.Generator

    Similar to image-to-image translation [10], Generator (G) of cGAN uses “U-Net” to generate new images, this is a 15-layers encoder-decoder with skip connections between mirrored layers in the e0ncoder and decoder stacks. Input image is scaled to 256x256x3 and output image is 256x256. Each convolutional layer of the encoder uses stride of 2 to cut the spatial resolution by half. The resolution and number of filters per layer are as follows: {128/64, 64/128, 32/256, 16/512, 8/512, 4/512, 2/512, 1/512}. The decoder is a mirrored structure of the encoder with concatenated input of skip connections from each corresponding layer. We used leaky ReLu activation for all the layers and batch normalization for all but the first layer. The structure of the generator is shown in [fig. 2].

    2.2.Discriminator

    Discriminator (D) uses a 7-layers CNN to discriminate real and fake images’ pairs. Input pair are concatenated and each layer cuts the resolution by half. The resolution and number of filters per layer are as follows: {128/64, 64/128, 32/256, 16/512, 8/512, 4/512, 1/1}. Finally, a unit sigmoid is used to estimate matching probability between the paired input images. Similar to the generator, we used leaky ReLu activation for all the layers and batch normalization for all but the first layer. The structure of the generator is shown in [fig. 2].

    2.3.Training Procedure

    Let’s assume we have a dataset of pair samples: { x i , y i } i = 1 N : x i A , y i B where A and B are rover and DEM simulated images respectively. For every mini-batch of paired samples, we use back propagation to minimize generator’s, and maximize discriminator’s log likelihood functions:

    L L 1 ( G ) = E x P d a t a ( x ) , z P z ( z ) [ y G ( x , y ) 1 ] L c G A N ( G , D ) = E x , y P d a t a ( x , y ) [ log D ( x , y ) ] + E x P d a t a ( x ) , z P z ( z ) [ log ( 1 D ( x , G ( x , z ) ) ) ]

    Where L L 1 ( G ) , L c G A N ( G , D ) are log likelihood functions of generator and discriminator respectively, G(x, y) is generated “fake” DEM image by generator, and D(x, y) is discriminator output probability of matching x to y.

    3.Experimental Results

    We used Devon Island dataset [13] bundle 1 which includes 2056 images with ground truth location. For each image, we used OpenGL based renderer to generate DEM-simulated image at ground truth location. We divided the set into 80%/20% training and testing samples. We trained cGAN for 80 epochs on the training samples.

    The evolution of the log likelihood function can be seen in [fig. 3] and the examples of generator output can be seen in [fig. 4].

    We then assumed a VO error bound of 100m around each ground truth location. We divided the error bound into 100x100x180 Tx, Ty, Rz grid cells. For each cell, we rendered DEM-simulated images. We used discriminator to evaluate each cell and the best match is selected as optimal estimated global location of the rover. [Table 1] shows the accuracy of proposed algorithm as well as L1-loss between estimated location and ground truth.

    4.Conclusion

    In this paper, we proposed a novel algorithm for global localization of a planetary rover. We used cGAN to train a CNN discriminator to extract deep representations of rover 2D image and DEM rendered images; and use them for estimating a matching probability. To overcome challenging artifacts of rover image and/or DEM rendered images, we used a generator to generate “false” DEM simulated images and help train discriminator further. This approach elevates the requirement for massive training dataset which may not be attainable for interplanetary environments. We divide pose searching space defined by VO error bound into a grid and query rover optimal pose by evaluating the discriminator at each cell using rover image and DEM rendered image at this cell perspective. Experimental results demonstrate the training performance, generated samples, and accuracy of our proposed approach.

    Figure

    JKROS-13-26_F1.gif

    Examples of artifacts that pose a challenge in matching rover images with DEM rendered images. Top Left and Bottom Right: example of rocks in rover image too small to appear in DEM. Bottom Left and Top Right: examples of illumination variations such as lens flare and shallow dynamic range

    JKROS-13-26_F2.gif

    Architecture of Long-Range Deep 2D Global Localization network. Top: Training phase. Bottom: Classification phase

    JKROS-13-26_F3.gif

    Loss function (inverse of log likelihood function) evolution with training iterations. Left: Discriminator loss function. Right: Generator loss function

    JKROS-13-26_F4.gif

    Examples of input rover images (left), generator’s “fake” DEM simulated images (middle), and actual “true” DEM simulated images (right)

    Table

    Performance of proposed algorithm on bundle 1 of Devon Island Dataset

    Reference

    1. CarleP.J.F. FurgaleP.T. BarfootT.D. (2010) Long-range rover localization by matching LIDAR scans to orbital elevation maps. , J. Field Robot., Vol.27 (3) ; pp.344-370
    2. SteinF. MedioniG. (1995) Map-based localization using the panoramic horizon. , IEEE Trans. Robot. Autom., Vol.11 (6) ; pp.892-896
    3. CozmanF. KrotkovE. (1997) Automatic mountain detection and pose estimation for teleoperation of lunar rovers , International Conference on Robotics and Automation, ; pp.2452-2457
    4. CozmanF. KrotkovE. GuestrinC. (2000) Outdoor visual position estimation for planetary rovers. , Auton. Robots, Vol.9 (2) ; pp.135-150
    5. WeiL. LeeS. (2016) 3D peak based long range rover localization , 2016 7th International Conference on Mechanical and Aerospace Engineering,
    6. KimJ. LeeS. (2015) Extracting major lines by recruiting zero-threshold canny edge links along sobel highlights. , IEEE Signal Process. Lett., Vol.22 (10) ; pp.1689-1692
    7. TianY. ChenC. ShahM. (2017) Cross-View Image Matching for Geo-localization in Urban Environments,
    8. LinT-Y. CuiY. BelongieS. HaysJ. (2015) Learning deep representations for ground-to-aerial geolocalization , 2015 IEEE Conference on Computer Vision and Pattern Recognition, ; pp.5007-5015
    9. WorkmanS. SouvenirR. JacobsN. (2015) Wide-area image geolocalization with aerial reference imagery. , Proceedings of the IEEE International Conference on Computer Vision,
    10. IsolaP. ZhuJ-Y. ZhouT. EfrosA.A. (2017) Image-to-Image Translation with Conditional Adversarial Networks , 2017 IEEE Conference on Computer Vision and Pattern Recognition, ; pp.5967-5976
    11. RussakovskyO. DengJ. SuH. KrauseJ. SatheeshS. MaS. HuangZ. KarpathyA. KhoslaA. BernsteinM. BergA.C. Fei-FeiL. (2015) Imagenet large scale visual recognition challenge. , Int. J. Comput. Vis., Vol.115 (3) ; pp.211-252
    12. ZhouB. LapedrizaA. XiaoJ. TorralbaA. OlivaA. (2014) Learning deep features for scene recognition using places database. , Adv. Neural Inf. Process. Syst., Vol.27 ; pp.487-495
    13. FurgaleP. CarleP. EnrightJ. BarfootT.D. (2012) The Devon Island rover navigation dataset. , Int. J. Robot. Res., Vol.31 (6) ; pp.707-713

    저자소개

    • 아 하메드 엠 (Ahmed M.Naguib)
    • 2005 Electronics & Electrical Communication Engineering, Cairo University, Egypt (BSc.)
    • 2014 Information and Communication Engineering Sungkyunkwan University (MSc.) Area of Interest: Visual Recognition and Machine Learning

    • 나 기브 (Tuan Anh Nguyen)
    • 2016 Information technology, PTIT, Vietnam (BSc.)
    • Area of Interest: Computer Vision, Deep Learning, Artificial Intelligence

    • 투 안 아인 뉴엔 (Naeem Ul Islam)
    • 2008 Electrical and Electronics Engineering, UET, Pakistan (BSc.)
    • From 20013 Information and Communication Engineering Sungkyunkwan University (PhD. Candidate)
    • Area of Interest: Machine Learning and Artificial Intelligence

    • 나 임 울 이슬람 (Jaewoong Kim)
    • 2002 Information and Communication Engineering Sungkyunkwan University (BSc.)
    • 2007 Information and Communication Engineering Sungkyunkwan University (MSc.)
    • 2015 Information and Communication Engineering Sungkyunkwan University (PhD. In Computer Science Engineering)
    • Area of Interest: Visual Recognition, Pattern Recognition, Image Processing, and Machine Learning

    • 김 재 웅 (Sukhan Lee)
    • 1972 Electrical Engineering, Seoul National University (BSc.)
    • 1974 Electrical Engineering, Seoul National University (MSc.)
    • 1982 Electrical Engineering, Purdue University (Ph.D)
    • Area of Interest: Robotics, Computer Vision, Pattern Recognition, Machine Learning, Field Robotics
    오늘하루 팝업창 안보기 닫기