이번엔 lidar-based loop closure detection에 쓰이는 scan context 논문과 후속논문까진 아니지만 scan context에서 아이디어를 차용한 intensity scan context 논문을 리뷰해보고자 한다.

Scan Context

paper: Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map

Lidar-based place recognition에는 다음과 같은 issue들이 존재해왔다고 한다.

Descriptor가 rotational invariant 해야한다.

Point cloud의 resolution이 distance에 따라서 달라지고 normal도 noisy하기 때문에 noise handling을 해줘야한다.

기존의 histogram based descriptor들은 stochastic 정보만 담고 있지 자세한 structure 정보를 담고있지 않아서 false positive를 만들어내기 쉽다.

그래서 scan context에서는 다음과 같은 contribution이 있다고 주장한다.

효율적인 bin encoding function을 사용해서 density와 normal에 invariance하도록 하였다.

Point cloud의 internal structure를 보존해서 scan context matrix의 element의 값은 bin 안에 존재하는 point cloud의 값에 의해 결정되도록 하였다. 이로 인해 반대방향에서의 loop detection도 가능했다고 한다.

효율적인 nn-search를 한 이후에 pair-wise similarity scoring을 하는 two-phase matching algorithm을 만들었다고 한다.

Method

Scan context의 전체 시스템은 위와 같다. Point Cloud로 부터 bin encoding을 통해 scan context를 만들고 저장해 뒀던 이전 frame의 scan context들과 ring key를 사용해 nn-search를 한 후에 그 결과로 나온 scan context들과 pair-wise matching을 통해 matching score와 yaw 값을 얻는다.

Scan Context
scan context는 shape context라는 point 주변의 spherical support 내부에 주변 point들을 binning해서 geometrical shape를 encoding하는 논문에서 아이디어를 얻어서 shape context는 bin의 값으로 point의 값을 사용했지만 scan context에서는 bin안에 존재하는 point들의 max height를 사용하였다. maximum height는 곧 "egocentric visibility"를 의미해서 urban 환경의 맥락을 잘 파악할 수 있을것이라고 주장했다.
이 Scan context descriptor를 만들기 위해서는
우선 3D scan을 equally spaced한 azimuthal & radial bin들로 나눈다고 한다. $N_r, N_s$ 는 ring과 sector의 개수를 의미하며 논문에서는 $N_r=20, N_s=60$ 으로 사용하였고 이에 따라서 bin의 크기는 $radial \ gap=\frac}, angle \ of \ sector=\frac$ 가 된다. 이 결과로 모든 point들의 parition이 된다. $P=\cup_P_$ 이다. 이렇게 binning을 함으로써 center에 가까운 bin은 area가 좁지만 dense하고 멀리있는 bin은 area가 크지만 sparse해서 bin이 어디에 있든 간에 density invariant 해지며 근처의 dynamic object를 sparse noise로 취급할 수 있게 해준다.
이러한 point cloud partitioning 이후에 maximum height 값을 이용해 각각의 bin마다 single real value를 할당한다. $\phi:P_\rightarrow \mathbb, \ \ \phi(P_)=\max_}z(\bold)$ . 즉, 각 bin내부에 존재하는 point들의 max height을 bin의 값으로 부여한다. 그리고 bin이 비어있다면 0을 할당한다.
앞의 process로부터 얻은 bin과 value로 $N_r\times N_s$ matrix 형태인 scan context $I$ 를 얻는다.
$I=(a_)\in \mathbb^,\ a_=\phi(P_)$ . scan context의 특성상 center location의 영향을 많이 받을 수 밖에 없기 때문에 translation에 대해서 robust한 reconition을 하기 위해서 "root shifting"을 사용해 scan context augmentation 또한 하였다. 논문에서 $N_=8$ 을 사용하였다. 이를 통해 약간의 motion perturbance를 feasible하게 하였다.

Similarity Score between Scan Contexts
Place recognition을 위해서는 scan context의 pair가 주어졌을때 similarity를 얻을 수 있어야한다. 그래서 similarity 연산에 대해서 정의를 해보자면 $I^q,I^c$ 를 각각 query point cloud의 scan context와 candidate point cloud의 scan context라하면 이 두 scan context를 column-wise하게 비교한다. column을 비교할때는 두 column사이의 consine distance를 사용하며 이를 모든 column마다 비교한다. 이를 식으로 나타내면 아래와 같다.
$d(I^q,I^c)=\frac\sum_^(1-\frac{||c^q_j||||c^c_j||})$
위의 식은 view point(특히 rotation이 달라질때)에 따라서 값이 달라질 수 있기 때문에 column(sector)를 하나씩 shift하면서 모든 경우의 수에 대해서 비교하여 그 중에 가장 작은 값과 그때의 shift를 구한다.
$D(I^q,I^c)=\min\limits_d(I^q,I^c_n),\\ n^{*}=\argmin\limits_d(I^q,I^c_n).$
이 shift는 fine registration 이전의 initial value로 사용된다.

Two-phase Search Algorithm
place recognition은 크게 pairwise similarity scoring, nearest neighbor search, sparse optimization의 이 세가지의 main stream이 있다. 이 논문에서는 pairwise similarity scoring과 NN search를 hierarchical 하게 합쳐서 searching time을 향상시키고자 했다.
위에서 정의한 distance caculation은 다소 연산이 무거운 편이라 ring key라는 개념을 도입해서 two-phase의 hierarchical search algorithm을 개발했다고 한다. Ring key( $\bold$ )는 rotational invariant한 descriptor로 scan context $I$ 의 row $I(r)$ 를 각각 single real value로 변환한 것이다.
$\bold=(\psi(r_1), \cdots,\psi(r_)), where \ \psi:r_i\rightarrow \mathbb \\ \psi(r_i)=\frac{||r_i||_0}$
즉 각각의 ring들을 $L_0$ -norm을 사용해 비어있지 않은 bin의 수로 encoding하고 center로 부터 가까운 순으로 $r_1,\cdots,r_$ 의 값으로 $\bold$ 라는 vector를 만들고 이를 KD-tree search에 사용하게 된다.
그래서 모든 frame에 대해서 similarity를 계산할 필요 없이 KD-tree의 결과로 나온 candidate들에 대해서만 similarity를 계산하며 searching time을 향상 시켰다.
$c^*=\argmin\limits_D(I^q,I^), s.t \ D<\tau$
그래서 이 candidate들 중에서 threshold를 만족하면서 가장 similarity가 높은 scene과 그 shift 값이 place recognition의 결과가 된다. 논문에서는 $\tau=1.5$ 를 사용하였다.

Intensity Scan Context

paper: Intensity Scan Context: Coding Intensity and Geometry Relations for Loop Closure Detection

Intensity scan context는 scan context에서 아이디어를 차용하였고 거기에 intensity정보를 활용하고자 하였다. 그리고 scan context에서의 two-stage hierarchical search algorithm의 속도를 binary-operation based fast geometry relation retrieval을 사용해 개선했다고 한다. 주장하는 main contribution은 다음과 같다.

Geometry와 intensity 특성을 통합한 새로운 global descriptor를 제안했다고 한다.

query당 평균 1.2ms 밖에 걸리지 않는 효율적인 loop closure detection 방법을 제시했다고 한다.

Method

Intensity Calibration and Pre-processing
Intensity는 주변에 있는 surface들의 reflectance structure 정보를 얻는 것인데 물질에 따라서 intensity값이 다르게 나오다보니 scene을 구별하는데 있어서 도움이 되는 정보를 준다. 그러나 intensity는 distance와 같은 geometry에 영향을 받기도 하고 기기에 따라서 달라지기 때문에 noisy하다. 그래서 intensity calibration을 해줘야한다. $\eta_=\varphi(\eta_r,d)$ . 여기서 $\varphi$ 는 input intensity $\eta_r$ 의 mapping function인데 이는 실험을 통해서 일일이 mapping값들을 구하고 이를 normalize 해서 구한다.
그리고 Lidar의 측정값이 거리가 멀어질수록 noise가 커지므로 $L_$ 의 distance threshold를 줘서 point를 filtering한다. 여기에 column-wise evaluation방법으로 ground removal 또한 한다.

Intensity Scan Context
Intensity scan context $\Omega$ 를 만드는 법은 scan context와 크게 다르지 않다. 다만 bin의 value로 max height이 아닌 max intensity를 사용한다.
$\eta_=\kappa(S_)=\max\limits_}\eta_k, \Omega(i,j)=\eta_$

Place Re-identification
place recognition을 위해서는 과거의 database와 현재의 data를 비교해야하는데 그렇게되면 database의 scale이 가면 갈수록 커져서 place recognition의 computation cost가 커진다. 이러한 cost를 줄이기 위해서 two-stage hierarchical intensity scan context retrieval strategy를 사용한다.
- Fast Geometry Re-identification
  대부분의 histogram based 방법은 matching 하는데 float number의 muliplication을 하다보니 computation이 오래 걸렸는데 binary(logical) operation을 통해서 속도 향상을 했다고 한다. Intensity scan context $\Omega$ 가 있을때 이를 binary matrix $I$ 로 변환할 수 있다.
  $I(x,y)=\begin false, & if \ \Omega(x,y)=0 \cr true, & otherwise\end$
  그래서 query, candidate intensity scan context $\Omega^q,\Omega^c$ 를 $I^q,I^c$ 로 변환하고 다음과 같은 연산으로 geometry similarity를 계산한다.
  $\varphi_g(I^q,I^c)=\frac{|I^q|}$
  그리고 scan context와 마차가지로 column들을 shift하면서 가장 큰 값을 찾는다.
  $\Phi_g(I^q,I^c)=\max\limits_\varphi_g(I^q_i,I^c)$
  이 연산은 0.5ms 밖에 걸리지 않았다고 한다.
- Intensity Structure Matching
  두번째 stage는 두 intensity scan context 사이의 intensity similarity를 측정하고자 한다. 그래서 $v^q_i,v^c_i$ 를 $\Omega^q,\Omega^c$ 의 $i$ th column이라 하고 column wise로 cosine distance를 구하는 식으로 similarity를 구했다.
  $\varphi_i(\Omega^q,\Omega^c)=\frac\sum_^(\frac{||v_i^q||\cdot||v_i^c||})$
  그리고 여기서도 마찬가지로 column shift를 통해 가장 큰 값과 shift 값을 얻는다.
  $\Phi_i(\Omega^q,\Omega^c)=\varphi_i(\Omega^q_k,\Omega^c)$

Consistency Verification
global descriptor가 original point cloud를 매우 simplify하면서 몇몇 feature들이 무시되었을 수 있고 이는 false positive로 이어질 수 있기 때문에 loop closure detection 이전에 consistency check도 해야한다고 했다.
- Temporal Consistency Check
  loop clousure를 찾았을때 연속적인 neighbor scan들에서도 보통 높은 similarity가 나온다. 그래서 다음과 같은 temporal consistency를 통해서 loop closure를 판별한다.
  $P(P_m,P_n)=\frac\sum_^(\Phi_g(I_,I_)+\Phi_i(\Omega_,\Omega_))$
  reverse visit인 경우에는 $m-k$ 가 아닌 $m+k$ 를 사용한다.
- Geometrical Consistency
  geometrical consistency는 raw scan-to-scan similarity를 나타낸다. shift로 얻은 initial estimate로부터 ICP와 같은 알고리즘을 사용해 correspondence distance error로 loop closure를 판별한다.

Uploaded by Notion2Tistory v1.1.0

I want to know everything

I want to know everything

Scan Context

Intensity Scan Context

'Paper Review > SLAM Related' 카테고리의 다른 글

관련글

티스토리툴바