# 智能网络与优化实验室 – Intelligent Network and Optimization Laboratory, Renmin University of China.pdf

50 Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map XUEHAN YE, Renmin University of China YONGCAI WANG† , Renmin University of China YUHE GUO, Renmin University of China WEI HU, Princeton University DEYING LI, Renmin University of China An efficient way to overcome the calibration challenge and RSS dynamics in radio-map-based indoor localization is to collect radio signal strength (RSS) along indoor paths and conduct localization by sequence matching. But such sequence-based indoor localization suffers problems including indoor path combinational explosion, random RSS miss-of-detection during user movement, and user moving speed disparity in online and offline phases. To address these problems, this paper proposes an undirected graph model, called WarpMap to efficiently calibrate and store the sequence-type radio-map. It reduces RSS sequence signature storage complexity from O (2N ) to O (N ) where N is the number of path crosses. An efficient on-line candidate path extraction algorithm is developed in it to find a set of the most possible candidate paths for matching with the on-line collected RSS sequence. Then, to determine the user’s exact location, a sub-sequence dynamic time warping (SDTW) algorithm is proposed, which matches the online collected RSS sequence with the sequential RSS signatures of the candidate paths. We show the SDTW algorithm is highly efficient and adaptive, which localizes user without backtracking of warping path. Extensive experiments in office environments verified the efficiency and accuracy of WarpMap, which can be calibrated within thirty minutes by one person for 1100m 2 area and provides overall nearly 20% accuracy improvements than the state-of-the-art of radio-map method. Additional Key Words and Phrases: Sequence-type radio-map, Dynamic Time Warping, Indoor Location ACM Reference format: Xuehan Ye, Yongcai Wang† , Yuhe Guo, Wei Hu, and Deying Li. 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1, Article 50 (March 2018), 22 pages. https://doi.org/10.1145/3191782 1 INTRODUCTION Radio-map based indoor locating, which characterizes each location-of-interest by the radio signal strength (RSS) signature at that location, has attracted great attentions [2][5][38][12]. Compared with other locating techniques, it provides key advantages including: 1) purely software-implementable on mobile phones without requiring † Corresponding author: Yongcai Wang, e-mail: ycw@ruc.edu.cn. This work was supported in part by the National Natural Science Foundation of China Grant No. 11671400, 61672524; the Fundamental Research Funds for the Central University, and the Research Funds of Renmin University of China, 2015030273. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2018 Association for Computing Machinery. 2474-9567/2018/3-ART50 $15.00 https://doi.org/10.1145/3191782 Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. 50:2 • X. Ye et al. additional hardware infrastructure; 2) privacy protection for working in navigation mode; and 3) reasonable accuracy with errors around 2 - 3 meters . Despite of the advantages, several practical issues must be considered for practical applications: 1) it is very laborious to calibrate a fine-grained radio-map, especially for large application areas; 2) the point-type matching at discrete locations is sensitive to random online RSS noises and environment dynamics. To overcome these problems, previous works have devoted deliberated efforts to exploit different kinds of information to reduce radio-map calibration cost and to improve the positioning accuracy and robustness. A major approach to reduce calibration cost is to exploit environment signature [31], including magnetic fluctuation, illumination intensity at specific spots as internal landmarks [29], and used dead reckoning by mobile phones [6] to track the inertial landmarks to conduct locating, so as to avoid the manually radio-map calibration. Another major approach exploited automatic labeling [35], which leveraged dead reckoning by motion sensors to construct a radio-map firstly in the radio space, then stretch and associate the radio-space geometry to physical space by geometrical matching using the path information from the floor-plan. Other approaches also exploited Expectation-Maximisation (EM) and Manifold methods to learn radio-map parameters by training parametric radio-map models [22][36]. These methods, however, generally require additional sensors or depend heavily on the accuracy of dead reckoning, which is known inaccurate by using commodity mobile phones [6][21]. The inaccuracy of radio-map model may also degrade the performance of locating. Compared with these approaches, a more straightforward way to tailor RSS noises and to reduce calibration cost is to collect RSS sequence along indoor paths to build sequence-type radio-map. This sequence-matching idea was firstly exploited in [26], which proposed geomagnetic sequence collected on a path as the path’s digital signature. They showed that sequence matching outperformed point matching in both accuracy and reliability. However, as stated in the paper, scalability is a key challenge for sequence-based localization, and the paper considered only a small number of paths. When indoor navigation on all paths is considered, the number of indoor paths grow exponentially with the number of path crosses, i.e., in the order of O (2N ) where N is the number of path crosses (See Section II). This requires efficient model for calibrating and storing the sequential signatures for all paths. Secondly, the moving speeds and moving patterns of users in offline sequence collection phase and online navigation phase are generally different, leading to sequence misalignment during matching. Adaptive and efficient sequence matching algorithm to tolerate the moving pattern differences is required. Thirdly, WiFi RSS is widely available and more descriptive signature for characterizing the indoor paths. How to exploit RSS sequence signature for scalable and accurate indoor localization remains a promising and challenging problem. This paper proposes to model the RSS sequence signatures of indoor path by an undirected trace-graph. Each vertex in the graph models the RSS sequence of a path segment, and the edges model the spatial adjacency of the vertexes. This trace-graph can be trained by a user traversing the indoor paths once and overcomes the path combinatorial explosion problem. The sequential RSS signatures of a map with at most O (2N ) paths can be stored by a graph of O (N ) nodes. Based on the trace-graph, an efficient candidate path extraction algorithm is proposed, which extracts O (1) number of candidate paths based on the on-line collected RSS sequence. Finally, to determine the exact location of the user, a subsequence dynamic time warping (SDTW) algorithm is proposed to conduct on-line localization. The SDTW algorithm can adaptively localize the user to an end point of a path segment in the candidate path set, without the need of traditional DTW’s backtracking. Further, the random RSS missing problem is addressed by collaborative filter (CF), to smooth the measured RSS values, and to fill the missing RSS values.More specifically, our contributions include: (1) A WarpMap model represented by undirected graph G = (V, E), where V represents RSS signatures on path segments and E represents adjacency of path segments. (2) A fast calibration process to build G by the user traversing indoor routes for once, and Collaborative Filter method to smooth the noisy RSS data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map y1,1 y1,2 y2,1 y2,2 X t ,1 X t w1, 2 X t ,2 ... ... ... ... y1,N y2,N ... ... ... ... l1 li* X t w1, N X t ,N lK (b) Point radio-map & vector matching (c) Online measuring RSS matrix X AP1 AP2 Y1,s1 Y2s,1 50:3 Y1,s2 Y2 , 2 s ... ... ... ... ... ... APn xt,N (a) Online measuring RSS vector xt X t w1,1 ... ... xt,2 AP1 AP2 ... ... xt,1 • APN Y1, N Y2 , N s s* s s b* (d) Sequence radio-map & Matrix matching Fig. 1. Comparison between point-type radio-map and sequence-type radio-map. (3) In online phase, a potential path extraction algorithm to extract from G the potential paths that the target maybe undergoing, based on the real-time detected access points (APs), which is called candidate sequence set (CSS). (4) A subsequence dynamic time warping (SDTW) algorithm to find a subsequence in CSS which has the least warping distance to the online measured RSS sequence within a short time window. (5) Investigations of different warping distance functions and different CSS selection methods including Principle Component Analysis (PCA) etc. (6) System implementation and extensive experiments in different environments using different brands of cell phones, which verified the efficiency and accuracy improvement than the state-of-the-art (point-type radio-map + K-nearest neighbor/particle filter) locating method. The rest of this paper is organized as follows. Background and problem model are introduced in Section 3. The trace-graph construction is introduced in Section 4. Online locating by SDTW is introduced in Section 5. Performance evaluation by a prototype system is presented in Section 7. The paper is concluded in Section 8. 2 RELATED WORK Radio-map based locating is essentially a pattern-matching based approach. The seminal work is RADAR [2], proposed in 2001, to use radio frequency identification (RFID) signature for indoor locating. After that, various efforts have been devoted into this area. The major related works fall into three research categories: 1) reducing the radio-map calibration efforts; 2) improving the location accuracy; 3) improve radio-map adaptivity. 2.1 Reduce the radio map calibration efforts One key problem in indoor locating is how to reduce the radio-map calibration cost, because it is very laborious to train the radio-map, especially for the large environment. A major approach to reduce calibration cost is unsupervised indoor locating method [31], which exploited environment signature. Scholl et al. [24] proposed fast indoor radio-map building for RSS based indoor locating by using hand-held laser mapping device for building floor plan and radio-map simultaneously. Geng et al. [4] proposed hybrid radio-map for indoor locating, which reduce the radio-map training efforts by the aid of sparsely deployed ultrasound ranging system. MolinaGarcÃŋa et al. [17] proposed to enhance in-building fingerprint by femtocell networks. Tian et al. [29] exploited illumination intensity at specific spots as internal landmarks. Harle et al. [6] used dead reckoning by mobile phones to track the inertial landmarks to conduct locating, so as to avoid the manually radio-map calibration. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. 50:4 • X. Ye et al. Another major approach exploited automatic labeling Yang et al. [35] leveraged dead reckoning by motion sensors to construct a radio-map firstly in the radio space, then stretch and associate the radio-space geometry to physical space by geometrical matching using the path information from the floor-plan. Other approaches also exploited Expectation-Maximisation (EM) and Manifold methods to learn radio-map parameters by training parametric radio-map models [22][36]. 2.2 Improve the locating accuracy From RADAR [2], a series of works focused on improving the locating accuracy. Some approaches used enhanced online locating algorithms to tolerate random RSS noises in the locating process. Haque et.al. [5] proposed LEMON, which enhances K-nearest neighbor (KNN) approach be mining the oversampled neighborhoods. Wu et.al. [32] exploited Support Vector Machine (SVM) in radio-map. Horus [38] proposed a probabilistic radio-map model, in which the probability density of the RSS signatures are collected and stored as radio map. Compass [12] is also an probabilistic radio-map model, which also leverages the object orientation to improve location accuracy. Other major approaches used information fusion techniques, which exploited the motion continuity information, floor map information to design Bayesian filter [5], particle filter [8], and Markov Random Field models [33] to narrow down the search space to improve the locating accuracy against noises. Zampella et al. [39] proposed the use of a particle filter to fuse foot mounted inertial measurements with radio-map. A recent work by Herrera et al. [1] proposed the fusion of radio-map and IndoorOSM floor plan for accurate indoor locating. Geng et al. [4] proposed hybrid radio-map method to reduce calibration cost and to improve location accuracy. Shu et al. [26] first applied sequence-matching idea in radio-map to improve locating accuracy. But scalability is a key challenge for sequence-based localization. A comparative study of radio-map based location accuracy performance can be referred to [9]. 2.3 Improve the radio map adaptivity Even if the radio-map was offline calibrated, it maybe outdated due to the environment change. How to design adaptive radio-map to tolerate the environment impacts is an important problem. Ji et al. [10] investigated the impact of building environment on the performance of dynamic indoor location. Ni et al. [19] proposed to use landmark RFIDs to measure RSS signatures online to make the radio-map be adaptive to environments. Yin et al. [36, 37] proposed adaptive temporal radio-map model by learning algorithms. Pan et al. [22] proposed adaptive localization in dynamic environment using multi-view learning. Lo et al. [15] proposed adaptive radio maps for pattern-matching based localization via inter-beacon co-calibration. Yang et al. [34] proposed AdaMap, which use linear regression model to represent the radio-map and online adapts the model coefficients by online learning. However, note all existing work mainly use point-type radio-map to represent the RSS signatures of specific locations. The spatial dependency among the RSS signatures are rarely utilized. In this paper, we explore the trace-type fingerprint to further improve the locating accuracy and robustness. 3 BACKGROUND AND PROBLEM MODEL 3.1 Point-type radio-map In traditional point-type radio-map, let L = {l1 , l2 , · · · , lK } denote K points of interest. The RSS signatures of these points are offline trained and stored as Mp = {y1 , y2 , · · · , yK }, where yi is the captured RSS vector at location li ; In online locating, the online collected RSS signature is xt at time t. Point-type matching is to find a location whose RSS signature in Mp matches best with xt . li∗ = arg min Dist (xt , yi ) . (1) (i ):yi ∈Mp Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map • 50:5 where Dist(xt , yi ) measures the similarity between xt and yi . Fig. 1(a)(b) illustrate the locating process of pointmatching, which is indeed carried out by matching of RSS vectors. For clarity, the notations used in this paper are listed in Table 1. 3.2 Sequence-type radio-map In sequence-type radio-map, the RSS signatures along indoor paths are offline calibrated. Let Γ = {Γ 1 , · · · , Γ K } be the set of calibrated paths; Mq = {Y1 , · · · , YK } are the captured RSS sequences for these paths. The row length for Ys (i.e, signatures for path Γs ) is the number of detectable APs on the path, and its column length ms is the number of sample points on path Γs . In online locating, a target measures RSS sequences within a moving time window {t −w +1, · · · , t }. This forms a online measured RSS matrix X = (Xt −w +1 , · · · , Xt ), whose column length ∗ is w, and t is the current time. Sequence matching is to find the best match between X and a subsequence Ys[a ∗,b ∗ ] : (s ∗ , a ∗ , b ∗ ) = arg min Dist X, Ys[a,b] . (2) (s,a,b ):Ys ∈Mq,1≤a ≤b ≤m s where Ys[a,b] is a subsequence of Ys ∈ Mq . a and b are the start point and the end point of the subsequence. X is matched to a subsequence of Ys because the online moving window is generally shorter than the offline ∗ trained sequence. The real-time location of the target is given by Γbs∗ , which is the end point of the best matched sub-path. Fig. 1(c)(d) illustrate the locating principle of localization by sequence radio-map. It can be seen that Table 1. List of notations and explanation Notations Explanation Γs a calibrated path i Γbs the bth sample point of Γs s Y the RSS sequence collected on path Γi s Yi the ith sample point in sequence Ys Ys[a,b] subsequence of Ys from Ysa to Ybs s Yi, j the RSS value of jth AP of ith sample point in Ys X RSS sequence in collected online moving window Xt the tth sample point in X ms length of RSS matrix Ys w length of moving window Bx the efficient AP union set in the moving window Bs the efficient AP union set of Ys G the trace-graph V the vertex set in G E the edge set in G Bi the efficient AP union set of vertex i N the number of path crosses in G NV the size of V NE the size of E M the size of candidate vertex set (CSV) extracted from V P candidate sequence set (CSS) generated based on CVS L the size of CSS p the size of extracted principal components from Bx ∩ Bs Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. 50:6 • X. Ye et al. Offline calibration RSS sequence collection Collaborative filter Trace-graph generation Trace-graph Online locating Online RSS moving window Candidate sequence set generation SDTW Fig. 2. System structure of WarpMap. Locating result (a) A combined path extracted from original map. (b) Corresponding path in sequencetype radio-map. Fig. 3. Combinational explosion of sequence-type radio-map. sequence matching is essentially to find the best match for the RSS matrix X. But note that the detectable APs in different locations maybe different, therefore, online locating needs to firstly search candidates paths by the list of detectable APs, and then to determine the target location by sequence matching. These will be detailed in Section 5. It can be understood intuitively that sequence-type radio-map implicitly embeds the motion continuity and the path information into the radio-map, which is promising to provide better location accuracy and robustness. But key challenges prevent the wide adoption of sequence type radio-map. (1) Combinational explosion of indoor paths causes the sequence-type radio-map hard to enumerate and calibrate, which is illustrated in Fig. 3. The red line shows a combined path extracted from sequence-type radio-map. We prove in Section 4 that the number of combinatorial paths extracted from a indoor map G is in the order of O (2N ), where N is the number of crossing points. (2) RSS miss-of-detection and RSS noises are remarkable during user movement, leading highly “dirty" radio-map by mobile calibration. One example is shown in Fig. 4. Fig. 4(a) shows the point-type radio-map captured for a room. Fig. 4(b) shows the sequence-type radio-map captured by a user walking recursively in the room. Compared with the point-type radio-map, RSS values are randomly miss-of-detection and are highly noisy in the sequence-type radio-map. (3) The users’ moving speeds, moving directions may be different in training phase and online phase, causing RSS sequences hard to be aligned in sequence matching. 3.3 Working Flow of Warp-map We propose WarpMap to address these difficulty, which includes efficient model and algorithms to address these difficulties. The working flow of WarpMap is shown in Fig. 2. It contains an offline phase and an online phase. • In offline phase, the RSS signatures along a set of indoor paths are calibrated to construct a sequence-type radio-map. The RSS sequence is firstly smoothed by an proposed collaborative filter for noise removing and empty item filling. Then a trace-graph model is proposed to store the RSS signatures of the paths. • In online phase, RSS signatures are collected into a moving window when a user is walking along indoor paths. A candidate sequence set (CSS) is extracted from the stored radio-map based on AP-list matching, which reduces the searching space of sequence matching. Finally, the location of the user is calculated by a proposed subsequence dynamic warping algorithm. We assume the indoor map can be obtained in advance, e.g. by Google Map [16] or Baidu Map [3], which is increasingly available for public spots, such as shopping mall. The storage complexity in offline phase and the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map • 50:7 computation complexity in online matching are key problems for scalability, while the location accuracy and robustness are improved by subsequence dynamic time warping. 4 TRAINING: TRACE-GRAPH An undirected graph model, i.e., trace-graph is proposed for calibrating sequence-type radio-map using little human efforts, while avoiding the path combinatorial explosion problem. We firstly introduce the main idea of trace-graph construction. 4.1 Trace-Graph Construction For the innumerable indoor path combinations, the idea of sequence radio-map calibration is firstly to divide the indoor paths into path segments, i.e., segments of path between crossing points. Each path segment will be treated as a vertex in the trace-graph model. The RSS signatures of each path segment are calibrated to form vertex of the trace-graph. The advantages of trace-graph includes: • The RSS sequences of any desired path can be generated from the trace-graph, even if the path is not calibrated. • Trace-graph efficiently reduces the storage space, which does not repeatedly store the RSS signature of the same path segment. More specifically, the trace-graph is constructed by the following three steps. (1) RSS sequence collection is carried out by a user walking along indoor paths while taking a mobile phone running a calibration APP. The user clicks the start point and the end point for each direct path, and the calibration APP records and interprets the RSS signatures of this path. Section 6.1 gives the details of RSS sequence collection. (2) Then the cross points of these paths are found according to the map in the APP. These crossing points divide the collected RSS sequences into a set of path segments. (3) Suppose there are N crossing points. We treat each path segment as a vertex, denoted by Vi , then an undirected trace-graph is generated as G = (V, E) where V = {Vi , i = 1, 2, · · · , NV } is the segment set; E = {Ei, j } represents the connections of these segments. Ei, j = 1 if segment Vi and Vj share a common crossing point. Let NV and N E denote the number of vertexes and edges. The graph is stored as sequence radio-map. An example of trace-graph construction is shown in Fig. 3. Fig. 3(a) shows four direct paths with four cross points. The cross points divide the paths into 12 path segments. The RSS signatures of these path segments can be calibrated easily by a user just walking along the four directed path once. However, these path segments can combinatorially form a large set of possible indoor paths. Fig. 3(b) shows the formed trace-graph, which is an undirected graph. Each vertex is a path segment, and each edge indicates the adjacency of two path segments. The RSS signature of any uncalibrated path can be generated from this trace-graph. For example, the RSS signatures of path Γ 5 , which is an uncalibrated path can be generated as Y5 by connecting four calibrated path segments. Mathematically, the trace-graph in Fig. 3(b) can be modeled by an adjacency matrix A. Then the number of length-n paths between node i and j in the graph can be calculated by the (i, j) entry of the nth power of A, i.e., Ani, j . So the total number of length-n paths in the graph can be calculated by: P (n) = N X N X i=1 j=i+1 Ani, j (3) In this particular example, it can be calculated that the number of paths containing {1, 2, 3, 4, 5} path segments are {12, 24, 84, 413, 1764} respectively. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. 50:8 • X. Ye et al. (a) Point radio-map (b) Sequence radio-map with RSS miss-of-detection and RSS noises (c) Sequence radio-map cleaned by Collaborative Filter Fig. 4. Comparison between point-type radio-map, sequence-type radio-map and sequence-type radio-map filtered by CF. Theorem 4.1 (Storage Scalability). The trace-graph is efficient to store. Suppose there are N cross points in the indoor paths, then NV (the number of vertexes) of the trace-graph is in the order of O (N ); and N E (the number of edges) is also O (N ) in G. Proof. In indoor paths, the cross points have low degrees, which are generally less than 4. Let a constant C be the maximum degree of a cross point, then the number of vertexes is at most NV = CN = O (N ). Each cross point will generate at most C (C − 1)/2 edges, so N E ≤ C (C − 1)/2 · N = O (N ). □ Theorem 4.2 (Number of Combined Paths). The number of combined paths extracted from G is at most O (2N ), where N is the number of crossing points. ! where NV is the number Proof. The number of length-n paths in a complete graph is in the order of O (NNV V−n)! PN NV ! V of vertexes and n is the path length. So the number of combined paths is O n=1 (NV −n)! = O (2NV ) = O (2N ). □ Note that the trace-graph radio-map model is also suitable when there is an open area without clear path. The RSS signatures can be calibrated just by walking in directed paths to calibrate a trace graph like that shown in Fig. 3(b). In online locating, the moving window of online collected RSS sequence is actually short (the RSS sequence collected in the past one minute can be used as the online RSS sequential signature), so only a limited number of short paths need to be generated from the trace-graph for online sequence matching. Using AP list, we can also narrows down the searching space of graph vertexes. These avoid the complexity to generate many long paths in online phase, which is also a key for scalability in sequence-based radio-map. This problem will be detailed in Section IV.B. 4.2 RSS Sequence Cleaning Although RSS sequence collection by walking along indoor paths can collect the RSS sequences quickly, the collected RSS values are highly noisy and are frequently missed during detection. So another key problem in constructing the trace-graph is to clean the collected RSS sequence. As the detectable AP-list at different locations is different due to the limited coverage range of the APs and the random miss-of-detection of RSS values, the row length of RSS sequence Ys should be regularized to be the same. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map • 50:9 At first, we remove the fake APs by filtering their SSID and MAC addresses, e.g. some mobile hotspots are easily identified by their SSIDs. Then let Bs indicate the union set of effective APs detected on Γs , we regularize the row length of Ys to the size of Bs . Algorithm 1 Collaborative filter Require: Ys : raw RSS matrix on path Γs Ensure: Ys : cleaned RSS matrix on path Γs for round = 1: T do Update polynomial model of each row; for each empty value Ysi, j do Ŷsi, j,1 ← poly-fit by row i; Ŷsi, j,2 ← w eighted interpolation by column j; if | Ŷsi, j,1 − Ŷsi, j,2 | < Threshold then Ysi, j = (Ŷsi, j,1 + Ŷsi, j,2 )/2 end if end for end for Algorithm 2 Weighted interpolation based on the APs similarity Require: Ys : raw RSS matrix on path Γs , i: the row index of empty RSS value, j: the column index of empty RSS value, hs : row length of Ys Ensure: Ŷsi, j,2 : one estimated value for Ysi, j for k = 1 : hs and k , i do S (Ysi , Ysk ) = (Ysi · Ysk )/(∥Ysi ∥ ∗ ∥Ysk ∥) end for Select top-K most similar rows of Ysi . Let K denotes the set of their row indexes. P P Ŷsi, j,2 = ( k ∈K S (Ysi , Ysk ) ∗ Ysk, j )/( k ∈K S (Ysi , Ysk )) Regularized Ys then need to be cleaned. There are many traditional data clean methods to fill the missing data in Ys , e.g., polynomial fitting [30], discrete fourier transform [7] and discrete wavelet transform [25]. However, there are practical challenges to apply these methods: 1) In some rows of Ys , only a small number of RSS values can be detected. 2) RSS values in the two ends of some rows are usually missed. So collaborative filter (CF) [28] is exploited which takes time and APs correlation factors into account to filter the RSS sequence data. The routine of CF is given in Algorithm 1. For a missing value, one prediction value is calculated in row by polynomial fitting. Polynomial fitting can filter the noisy RSS data for user’s movement on irregular paths. The path segments in offline training phase are generally direct path segments which can be designed by the trainer. So quadratic function is generally good enough for fitting the RSS signatures of a path segment without using higher degree functions. Usually when RSS sampling rate is higher than 0.5 Hz, CF works well. The second step in RSS data filling is to fill data in column by a weighted interpolation based on the APs’ RSS sequence similarity, which is briefly introduced in Algorithm 2. If the two predicted values in column and row have difference smaller than a threshold, the average value of these two predictions will be filled into the empty value. This process repeats a fixed number of times. One example of CF results is shown in Fig. 4(c). We Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. 50:10 • X. Ye et al. can see CF fill as many empty values as possible reasonably. In Section. VII-C, we investigate the effectiveness of different data clean methods. It can be proved that RSS sequence clearning can improve the locating accuracy and CF provides effective data cleaning. 5 ONLINE LOCATING After trace-graph construction, the online locating is carried out by three steps: 1) online RSS sequence collection in a moving window; 2) candidate RSS sequence extraction from the trace-graph; 3) subsequence dynamic time warping for location determination. 5.1 Online RSS Sequence Collection using a Moving Window As the target is moving, we capture RSS periodically and stores the measured RSS vectors into a moving window. Let X = (Xt −w +1 , · · · , Xt ) denote the RSS sequence collected in the moving window and w be the window size. Xt is the latest collected RSS sample. Let Bx be the set of AP list in the moving window.The row length of X should be regularized to the length of |Bx | and CF is then applied to filter the online RSS matrix. 5.2 Candidate RSS sequence extraction from the trace-graph 5.2.1 Candidate Vertex Set Extraction. The first step is to extract all vertices who has similar AP list with X. The preselection of vertices can greatly reduce the scale of CSS. For every vertex in G, let Bi denote the AP list of the vertex i, Jaccard similarity score [13] is calculated between i B and Bx : |Bi ∩ Bx | J (Bi , Bx ) = (4) |Bx | Vertex i is added to the candidate vertex set (CVS) if J (Bi , Bx ) > Threshold, which means the AP list in Bi cover most of the APs in Bx . In implementation, we set the Threshold to 0.7. 5.2.2 Candidate Sequence Set Generation. The CVS set selects a small subset of vertexes from trace-graph for reducing online matching complexity. Since the vertexes with similar AP list are generally neighboring to each other, the vertexes in CVS set are in a subgraph of the trace-graph. Then, the candidate sequence set (CSS) is generated based on the vertexes in CVS, which online builds the possible paths that the target may be moving on. Since the moving window is short, it is unnecessary to generate long candidate paths. This heuristic can limit the size of CSS set. In implementation, we generate possible sequences containing at most c vertices, where c = 3 or 4. A direct way for CSS generation is to let each vertex in CVS to conduct Breadth-First-Search c steps to find paths originated at that vertex with length at most c, and then merge the repeated paths returned by different CVS. The remained distinct paths will form the CSS for online locating. Fig. 5 illustrates the process of CVS extraction and CSS extraction. We denote the returned CSS by {Ys : s = 1, · · · , L}, where L is the number of candidate sequences. Theorem 5.1 (Computational Complexity). The time complexity of Algorithm 3 to generate candidate sequence set is at most O (M c ), where M is the size of CVS and c is a small user-defined constant. Proof. Let Pk denote all combined paths of length k extracted from CVS. We always have |Pk | ≤ M k throughout the algorithm. Note that it takes O (1) time to add a new path the database since c is a constant. to Pc−1 k M 2 (M c −1 −1) Therefore the time used by Algorithm 3 is at most O ( k =1 M · M ) = O = O (M c ). M −1 □ Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map 1. Online collected RSS sequence 2. Search trace-graph by AP list 4. Generate CSS by CVS 3. Generate CVS 1 • 50:11 Dis=0.6 5. SDTW matching with candidate paths 1 Estimated location Matched subsequence 2 Orange vertexs having AP list similar with that in the moving window are added to CVS Dis=2 2 Candidate Sequence Set Matching result of SDTW Fig. 5. The working flow of online matching. After RSS sequence is online collected into a moving window, the trace-graph is searched by the AP-list of the collected RSS. The trace-graph vertexes with similar AP list will be added into a CVS set. Then the CSS is generated from the CVS set. After that, for each candidate path in CSS, SDTW is applied to find the subsequence matched with the online collected RSS sequence in the moving window. The subsequence with the minimum warping distance will be calculated as the matched path (in the figure, the matched subsequences are in green, with warping distance 0.6 and 2 respectively, so the subsequence in CSS1 is the matched path), and the end point of the subsequence is the current location of the target (the dark point). Algorithm 3 CSS generation Require: G = (V, E), c, CVS set Ensure: CSS containing at most c Vertices for each vertex Vi in CVS do Breadth-First-Search c steps to find paths originated at vertex Vi with length no larger than c. end for Merge the returned paths of different CVS and delete the repeated paths. Return the RSS sequences on the remained distinct paths. Theorem 5.2 (Matching Scalability). The number of generated candidate paths, i.e., |P |, is at most O (M c ), where M is the size of CVS. c −1) P Proof. Since |Pk | ≤ M k , the number of generated candidate path set is at most ck =1 M k = M (M = O (M c ). M −1 □ Theorem 5.2 indicates the scalability of online locating by sequence matching. The online collected RSS sequence needs only to be compared with a limited number of RSS sequences generated in the CSS set. 5.3 Subsequence Dynamic Time Warping Recall X is the online collected RSS sequence in a moving window; {Ys : s = 1, · · · , L} is the candidate RSS ∗ sequence for comparison. Since X is short, the goal of matching is to find a particular sequence Ys in CSS, whose ∗ ∗ subsequence Ys[a ∗,b ∗ ] satisfies the objective equation (2). The end point of the subsequence Γbs∗ is the location of the target. This is carried out by a proposed subsequence dynamic time warping (SDTW) algorithm. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. 50:12 • X. Ye et al. 5.3.1 Dynamic Time Warping (DTW). SDTW is an extension of dynamic time warping (DTW) [18][23], an algorithm for measuring similarity between two temporal sequences which may vary in time or speed. Consider two sequences X = (x 1 , · · · , x n ) and Y = (y1 , · · · , ym ). DTW calculates a warping path P = (p1 , · · · , pl ) where pk = (i k , jk ); i k ∈ [1 : n]; jk ∈ [1 : m]; and k ∈ [1 : l] that satisfies the following alignment conditions: ( p1 = (1, 1), pl = (n, m) (5) pk +1 − pk ∈ {(0, 1), (1, 0), (1, 1)}, ∀k ∈ [1 : l − 1] The first condition enforces the first elements of X and Y as well as the last elements of X and Y are aligned to each other. The second condition reflects no elements in X and Y can be omitted and there are no replications in alignment. An alignment satisfies above conditions is called a warping path. The warping distance between X and Y is defined as the summation of local distances along the warping path. DTW(X , Y ) = l X d (x i k , y jk ). (6) k=1 The optimal warping path, i.e., P ∗ that minimizes DTW(X , Y ) can be calculated by dynamic programming[14]. 5.3.2 Subsequence Dynamic Time Warping (SDTW). SDTW relaxes the boundary condition in (5). It allows X match a subsequence Y[a,b] = (ya , ya+1 , · · · , yb ) (1 ≤ a ≤ b ≤ m) of Y [14], such that (a ∗ , b ∗ ) = arg min (a,b ):1≤a ≤b ≤m DTW(X , Y[a,b] ). (7) But the locating problem has some differences from the traditional SDTW. (1) It needs only to determine the end point of the matched subsequence, i.e., b ∗ without the need to determine the start point a ∗ . This feature is utilized to design an efficient subsequence matching algorithm which doesn’t need the cost of back-trace by dynamic programming, as detailed in Algorithm 4. (2) It needs to search over multiple candidate sequences {Ys : s = 1, · · · , L} to find the subsequence with the overall minimum warping distance. (3) We need to consider the situation that the user may walk in the reversed direction of the training direction of the candidate sequences. So matching is also conducted reversely, as detailed in Algorithm 3. 5.3.3 SDTW for sequence-based locating. The sequence-based locating method based on SDTW is summarized in Algorithm 4. Reducing the height of X and Ys by Principal component analysis (PCA) [11] is the first step to reduce computation complexity of sequence matching. Then the optimal warping distance D (n, bs∗ ) and the ending point are calculated by (10) between X and each Ys . We also need to reverse the direction of X and recalculate the optimal warping distance. Finally, the target location is determined as the end point in the matched subsequence having the least warping distance. Note that since the ending point and the optimal warping distance D(n, bs∗ ) have already been provided by (10), it is not necessary to calculate as∗ by back-tracing. This saves the cost of dynamic programming than the traditional SDTW [14]. Fig. 6 further illustrates the warping distance matrix. The values represented by “+" is given as the boundary values. They produce the other entries of distance matrix by (9). After getting the matrix, the entry with the smallest distance in the nth row can be found, i.e., the star. Its column index is bs∗ , and its value is D(n, bs∗ ). In locating problem, we don’t need to back-trace to find as∗ , and the warping path (represented by the dashed line), although they have already been determined by the distance matrix. This saves computation cost than traditional SDTW [14]. It is easy to verify that the computation cost is O (mn) in this step. The overall computation cost is O (Lmn), which is efficient. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map • 50:13 Algorithm 4 SDTW for sequence-based locating Require: X: online RSS sequence, {Ys ,s = 1, · · ·, L}: CSS Ensure: s ∗ : the location of target STEP1: AP Set Reduction · Extract principal components from AP union set of X and Ys by PCA and replace X and Ys by the extracted represented sequences. STEP2: SDTW between X and {Ys ,s = 1, · · ·, L} · Set the boundary conditions of warping distance matrix: D s (i, 1) = i X d (Xk , Ys1 ), i ∈ [1 : n] (8) k =1 D s (1, j) = d (X1 , Ysj ), j ∈ [1 : ms ] · Then the warping distance matrix is produced as: D s (i, j) = min{D s (i −1, j −1), D s (i, j −1), D s (i −1, j)}+d (x i , y j,s ) · The matched end point between X and Ys is determined by D s : bs∗ = arg min D s (n, bs ), (9) (10) bs ∈[1:m s ] · The optimal warping distance with Ys is given by: D(X, Ys ) = D(n, bs∗ ) STEP3: Reverse sequence X to match with {Ys ,s = 1, · · ·, L} ← ← · Reverse the direction of Xt and recalculate (8) - (11) to obtain bs∗ and D (11) ← n, bs∗ ! for s = 1, · · ·, L. STEP4: Determine the target location and the minimum warping distance · For all the candidate sequences, the overall best matched sequence is the one with the overall least warping distance: ( !) ← ← s ∗ = arg min min D(n, bs∗ ), D n, bs∗ (12) s ∈[1:L] · The location of the target is determined as the end matching point in the matched subsequence that has the overall least warping distance. 5.3.4 Using Different Distance Functions. We also investigate the impacts of using different distance functions, i.e., d (x i k , y jk ) to check their impacts to the locating accuracy. Four different functions were investigated. qP n 1) Euclidean distance: d xy = (x k − yk ) 2 Pn k =1 2) Cityblock distance: d xy = k =1 |x k − yk | 3) Chebyshev distance: d xy = maxk |x k − yk | dot (x,y ) 4) Cosine distance: d xy = ∥x ∥∗ ∥y ∥ In Section. 7.4, we will concretely analyze the effect of basic warping distance functions on locating accuracy for SDTW. 5.3.5 AP Set Reduction in SDTW. As SDTW distance calculation between X or Ys needs high computation complexity when the RSS sequence has high dimension in the number of APs. So reducing the height of X or Ys Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. 50:14 • X. Ye et al. a* ! s n X ! t 1 b* !s + w w w w w w w w w w ★ w w w + w w w w w w w w w w w w w w + w w w w w w w w w w w w w w + w w w w w w w w w w w w w w + + + + + as* ! 1 + + + + + + + + s !Y + + ms + Initialized boundary distances w Produced by equation (11) ★ The entry has the smallest value in the nth row Fig. 6. Illustration of the warping distance matrix in SDTW. is also important for efficiency of sequence matching. We investigate PCA reduce the dimension of RSS sequence for matching efficiency. Principal component analysis (PCA) [11] is an efficient method to extract a set of linearly uncorrelated variables from a set of correlated variables. Such uncorrelated variables represent main properties of original variable set. We use PCA to extract principal components from AP union set of X and Ys and use extracted corresponding RSS sequences for SDTW distance calculation. Let |Bx ∪ Bs | denote the size of AP union of X and Ys and p denote the size of extract principal components. We know 1 ≤ p ≤ |Bx ∪ Bs |. The height of X and Ys should be regularized to p for subsequent SDTW distance calculation. In Section. VII-E, we investigate the effect of principal component size p on locating accuracy. It can be proved that reducing the height of X and Ys can meet the requirements of high computational efficiency and high locating accuracy. 6 IMPLEMENTATION We have developed an APP on Android platform to implement sequence-based indoor localization. The android APP carries out path calibration, RSS path collection, and data transmission to server and real-time position display. The location determination is carried out by a location server. The server constructs the trace-graph generation, CSS extraction based on real-time detected AP list, and SDTW-based online locating. 6.1 RSS sequence collection RSS sequence collection can be carried out in many ways using interactive smart phones or indoor navigation systems by inertial sensors [20]. We use an interactive way by an APP developed on the mobile phone. The APP renders the floor-plan and gets the scaling factor of the floor-plan, so that each pixel on the screen maps to a physical location.In order to calibrate a path, a user clicks on the map to firstly mark the path. Several clicks on the screen will characterize a path. Then the user walks along the specified path to collect RSS sequence. The WiFi-scanner calls the RSS sampling library of Android to take RSS samples in approximately equal intervals, so the positions where RSS signal is measured on the path can be calculated by assuming the user is moving at constant speed. But in practice, we found that RSS sampling rate of different phones differ greatly, which can be referred to Table. 2. The difference is mainly due to the power optimization policy of the phone. The locating performance of Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 50. Publication date: March 2018. Accurate and Efficient Indoor Location by Dynamic Warping in Sequence-Type Radio-map • 50:15 FigX0 Fig. 7. Snapshots of RSS sequence calibration. Fig. 8. Map of NEC Laboratories China. Before constructing the final training set, collected sequence information has to be normalized to the prescriptive form. Because the number and order of detected APs at different locations are rather different, the preprocess process can be divided into 3 steps. First, we define uniform data form, that every RSSI vectorphones at a certain location the should be projected to the fixed SDTWthemay deteriorate greatly when collect RSS sequences in different frequency during the off-line length(we choose 50 here). In other words, we assume 50 values for each RSSI vector. Second, we training and on-line locating process. For example, the Sumsang phones can collect RSS values in 38.03 Hz, while should sort the signal strengths in every RSSI vector to comply with the predefined MAC-address Meizu order. phones only collect RSS values in 0.52 Hz. There are duplicate values in the highly frequent RSS scans. So Third, some RSSI values that cannot be detected at certain locations should be replaced by in practice, we regularize RSS sampling in order SDTW’s robustness in different one constant. The selectionphones of the constants has a littlerate effecttoon0.5 theHz system, which to willimprove be phonesdiscussed and to remove the duplicated concretely in the experiment section. RSS scans for energy efficiency. In case the RSS values are missed in a Finally, have to construct theand simplest We try to guaranteeon a path will be a series of