Papers - 徐宗本 - 教师个人主页.pdf

Mathematical and Computer Modelling 52 (2010) 1674–1681 Contents lists available at ScienceDirect Mathematical and Computer Modelling journal homepage: www.elsevier.com/locate/mcm Constructive approximate interpolation by neural networks in the metric spaceI Feilong Cao a,∗ , Shaobo Lin b , Zongben Xu b a Institute of Metrology and Computational Science, China Jiliang University, Hangzhou 310018, Zhejiang Province, PR China b Institute for Information and System Sciences, Xi’an Jiaotong University, Xi’an 710049, Shannxi Province, PR China article abstract info Article history: Received 27 March 2010 Received in revised form 22 June 2010 Accepted 22 June 2010 Keywords: Neural networks Approximate interpolation Exact interpolation Metric space In this paper, we construct two types of feed-forward neural networks (FNNs) which can approximately interpolate, with arbitrary precision, any set of distinct data in the metric space. Firstly, for analytic activation function, an approximate interpolation FNN is constructed in the metric space, and the approximate error for this network is deduced by using Taylor formula. Secondly, for a bounded sigmoidal activation function, exact interpolation and approximate interpolation FNNs are constructed in the metric space. Also the error between the exact and approximate interpolation FNNs is given. Crown Copyright © 2010 Published by Elsevier Ltd. All rights reserved. 1. Introduction Let (X , d) be a metric space with distance d, and the interpolation nodes S = {x1 , x2 , . . . , xn } be n distinct points in X . For {yi : i = 1, 2, . . . , n} ⊂ R, we call the set {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )} (1) a set of interpolation samples. If there exists a feed-forward neural network (FNN), Ne (x), satisfying Ne (xj ) = yj , j = 1, 2, . . . , n, then Ne (x) is called an exact interpolation FNN of the sample set (1). If for any fixed ε > 0, there is an FNN, Na (x), such that |Na (xj ) − yj | < ε, j = 1, 2, . . . , n then Na (x) is called an ε -approximate interpolation FNN of the sample set (1). In applications, FNNs are usually trained by using finite input samples. It is known that n arbitrary distinct samples (xi , fi ) (i = 1, 2, . . . , n) can be learned precisely by FNNs with n hidden neurons. Several proofs on the existence of exact interpolation FNNs have been proposed in [1–5]. However, it is difficult to fix all the parameters of the interpolation FNNs. So ones turn to study the approximate interpolation FNNs which were first used in [5] as a tool to study the exact interpolation FNNs. Later, some scholars studied the approximate interpolation FNNs and the relationship between the exact and approximation interpolation FNNs. By now, there have been more results related to this topic. We refer the reader to [6–11]. I The research was supported by the National Natural Science Foundation of China (No. 60873206) and the Natural Science Foundation of Zhejiang Province of China (No. Y7080235). ∗ Corresponding author. E-mail address: feilongcao@gmail.com (F. Cao). 0895-7177/$ – see front matter Crown Copyright © 2010 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.mcm.2010.06.035 F. Cao et al. / Mathematical and Computer Modelling 52 (2010) 1674–1681 1675 All of these results mentioned above are related to Euclidean space Rd . However, in many applications, the problem of interpolation often arises in general metric space. In this paper, we focus on the approximation and construction of the exact and approximate interpolation FNNs in the metric space. We will construct two types of approximate interpolation FNNs in metric space, one with analytic activation function and the other with sigmoidal activation function. The rest of this paper is organized as follows. In the next section, we will construct an approximate interpolation FNN with analytic and non-polynomial activation function. Our construction is based on the Lagrange interpolant on the metric space. In Section 3, we will give a rigorous proof of the existence of exact interpolation FNN with sigmoidal function in the metric space. We consider the approximate interpolation FNN with a bounded sigmoidal activation function in Section 4, where an error estimate between exact and approximate interpolation FNNs will be also given. 2. Approximate interpolation FNN with analytic activation function Before giving the main result of this section, we first construct the Lagrange interpolant and the Newton interpolant in the metric space. The following Proposition 1 is the main tool of our construction. Proposition 1. Suppose that (X , d) is a metric space with distance d, and for every interpolation nodes {x1 , x2 , . . . , xn } ⊂ X , there is a ξ ∈ X satisfying d(ξ , xi ) 6= d(ξ , xj ) for 1 ≤ i < j ≤ n. Then for the interpolation sample set (1), there exists P (x) := n−1 X Ck (d(ξ , x))k k=0 such that P (xi ) = yi . (2) Proof. In order to prove (2), it is sufficient to prove that the system of equations n −1 X Ck (d(ξ , xi ))k = yi , 1≤i≤n (3) k=0 is solvable. That is to prove that 1 1  d(ξ , x1 ) d(ξ , x2 ) 1 d(ξ , xn )  . . . (d(ξ , x1 ))2 (d(ξ , x2 ))2 ... (d(ξ , xn ))2 ... ... ... ... ...     (d(ξ , x1 ))n−1 y1 C0 y2  C    (d(ξ , x2 ))n−1  1   . . .  = . . .  ... yn Cn−1 (d(ξ , xn ))n−1 is solvable. Noting that the coefficients matrix of the system of equations (3) is a Vandermonde matrix, and d(ξ , xi ) 6= d(ξ , xj ), i 6= j, then (3) is solvable. This completes the proof of Proposition 1. Based on Proposition 1, we can construct the Lagrange interpolant in the metric space as follows. Proposition 2. Under the conditions of Proposition 1, there exists L(x) := n X yi li (x) i=1 satisfying L(xi ) = yi , i = 1, 2, . . . , n, for the sample set (1), where li (x) := n Y d(ξ , x) − d(ξ , xj ) . d(ξ , xi ) − d(ξ , xj ) j=1,j6=i (4) The proof of Proposition 2 is obvious. For the sake of brevity, we omit the details. In order to construct the Newton interpolant in the metric space, we need introduce the divided difference with respect to ξ recursively, by yξ [xi ] = yi , 1 ≤ i ≤ n, yξ [xj , . . . , xk ] = yξ [xj , . . . , xk−1 ] − yξ [xj+1 , . . . , xk ] d(ξ , xj ) − d(ξ , xk ) . Then the Newton interpolant in the metric space can be constructed as follows. 1676 F. Cao et al. / Mathematical and Computer Modelling 52 (2010) 1674–1681 Proposition 3. Under the conditions of Proposition 1, there exists T (x) := yξ [x1 ] + n −1 X yξ [x1 , . . . , xi+1 ](d(ξ , x) − d(ξ , x1 ))(d(ξ , x) − d(ξ , x2 )) · · · (d(ξ , x) − d(ξ , xi )) (5) i =1 such that T (xi ) = yi , 1≤i≤n for the interpolation sample set (1). Proof. Let ti = d(ξ , xi ), and y[ti ] = yi , 1 ≤ i ≤ n, y[tj , . . . , tk−1 ] − y[tj+1 , . . . , tk ] y[tj , . . . , tk ] = tj − tk . Then form the Newton interpolation formula of univariate polynomial, it is obvious that there exists g (t ) := y[t1 ] + n−1 X y[t1 , . . . , ti+1 ](t − t1 )(t − t2 ) . . . (t − ti ) i=1 satisfying g (ti ) = yi , 1 ≤ i ≤ n. Moreover, since yξ [xj , . . . , xk ] = y[tj , . . . , tk ], then we obtain (5) immediately. 1 ≤ j ≤ k ≤ n, Now, we are in a position to give our main result in this section. Suppose that sup d(ξ , x) ≤ b < ∞, (6) x∈X and σ : [0, b] → R is analytic and not a polynomial. Then there exist [c , d] ⊂ [0, b] and β ∈ [c , d] such that for any k ≥ 0, σ (k) (β) 6= 0 (see [12]). Thus by using Taylor formula, for arbitrary θj ∈ [0, b], j = 0, 1, . . . , n and any h > 0, there holds σ (n−1) (β) 1 (θj t )n−1 + (n − 1)! (n − 1)! σ (θj ht + β) = σ (β) + σ 0 (β)θj ht + · · · + Z θj ht +β σ (n) (s)(θj ht + β − s)n−1 ds. 0 Therefore tj = n−1 X cjk j! hj σ (j) (β) k=0 σ (θk ht + β) − n−1 X j! hj (n − 1)!σ (j) (β) k=0 Z θk ht +β cjk σ (n) (s)(θk ht + β − s)n−1 ds, 0 where (cj0 , . . . , cjn−1 ) denotes the (j + 1) row of W −1 where W −1 denotes the inverse matrix of θ0 θ1 ... θn−1  1 1  W :=  ... 1 ... ... ... ...  θ0n−1 θ1n−1  . ...  θnn−−11 If we write M := max |σ (n) t ∈[−1,1] (t )| max 0≤j≤n−1 j! n −1 X |σ (j) (β)|(n − 1)! k=0 |cjk ||bθk |n then from [6, Lemma 3.2], we know that for arbitrary 0 ≤ j ≤ n − 1 there holds j t − n−1 X cjk j! hj σ (j) (β) k=0 σ (θk ht + β) < Mhn−j . Based on this, we can construct the approximate interpolation FNN in the metric space as Na (x) := n−1 X k=0 dk σ (θk hd(ξ , x) + β) := n −1 X n −1 X Cj cjk j! σ (θk hd(ξ , x) + β), j h σ (j) (β) k=0 j=0 (7) F. Cao et al. / Mathematical and Computer Modelling 52 (2010) 1674–1681 1677 where {C0 , C1 , . . . , Cn−1 } is the solution to the system of equations (3). Furthermore, Na (x) can be interpreted as a model of FNN with four layers: • The first layer is the input layer with the input x (x ∈ X ). • The second one is the pre-handling layer, which transform an input x into the distance between x and ξ , d(ξ , x). • The third one is the handling layer with n neurons in it. • The last one is the output layer. From our construction, we can get the following Theorem 1, which describes the error between Na (x) and the interpolation sample. Theorem 1. Suppose that (X , d) is a metric space with distance d, and for every interpolation nodes {x1 , x2 , . . . , xn } ⊂ X , there is a ξ ∈ X satisfying d(ξ , xi ) 6= d(ξ , xj ) for 1 ≤ i < j ≤ n. Suppose further that (6) holds and σ : [0, b] → R is analytic and not a polynomial. Then for any h > 0 and sample set (1) there exists an FNN, Na (x), such that |Na (xi ) − yi | ≤ CMh, Pn−1 where C = j=1 |Cj |, and M and Cj are given as the above. Pn−1 k Proof. From Proposition 1, there exists P (x) = j=0 Cj (d(ξ , x)) such that P (xi ) = yi , i = 1, 2, . . . , n. Then (8) |Na (xi ) − yi | = |Na (xi ) − P (xi )|. Noting (7) we obtain n−1 X n−1 n −1 X X Cj cjk j! σ (θ hd (ξ , x ) + β) − Cj (d(ξ , x))j k j σ (j) (β) h k=0 j=0 j =0 |Na (x) − P (x)| = ≤ n −1 X |Cj | j=0 ≤ n −1 X n−1 X cjk j! k=0 hj σ (j) (β) σ (θk hd(ξ , x) + β) − (d(ξ , x))j |Cj |Mhn−j ≤ CMh. j=0 Therefore |Na (xi ) − yi | = |Na (xi ) − P (xi )| ≤ CMh. This finishes the proof of Theorem 1. 3. The existence of exact interpolation FNN with sigmoidal activation function If σ : R → R satisfies lim σ (t ) = 0, t →−∞ lim σ (t ) = 1, t →∞ then we call it a sigmoidal function. Assume that σ is a bounded sigmoidal function, A > 0, and δσ (A) := sup max {|1 − σ (t )|, |σ (−t )|} , t ≥A then δσ (A) is non-increasing with variable A and satisfies lim δσ (A) = 0. A→+∞ Based on these conditions, we now construct the exact interpolation FNN for the interpolation sample set (1). Let x01 = x1 and {x01 , x02 , . . . , x0n } satisfy d(x01 , x0i ) ≤ d(x01 , x0j ) for i ≤ j. Then we rearrange the order of elements of the interpolation sample set (1) as (x01 , y01 ), (x02 , y02 ), . . . , (x0n , y0n ). For the sake of convenience, we also denote the set of (x0i , y0i )(i = 1, 2, . . . , n) as {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}. Then the exact interpolation FNN can be constructed as follows. NeA d(x1 , x) − d(x1 , xj ) d(x1 , x) − d(x1 , xn ) (x) := cj σ −2A + A + cn σ −2A +A . d(x1 , xj+1 ) − d(x1 , xj ) d(x1 , xn ) − d(x1 , xn−1 ) j =1 n −1 X 1678 F. Cao et al. / Mathematical and Computer Modelling 52 (2010) 1674–1681 From the definition, we know that when i < j, −2A d(x1 , xi ) − d(x1 , xj ) d(x1 , xj+1 ) − d(x1 , xj ) + A ≥ A, when i = j, −2A d(x1 , xi ) − d(xi , xj ) d(x1 , xj+1 ) − d(x1 , xj ) + A = A, when i = j + 1, −2A d(x1 , xi ) − d(x1 , xj ) d(x1 , xj+1 ) − d(x1 , xj ) + A = −A and when i > j + 1 −2A d(x1 , xi ) − d(x1 , xj ) d(x1 , xj+1 ) − d(x1 , xj ) + A ≤ −A. Therefore, by the definition of δσ (A) we obtain that 1−σ −2A d(x1 , xi ) − d(x1 , xj ) d(x1 , xj+1 ) − d(x1 , xj ) + A ≤ δσ (A), 1 ≤ i ≤ j, (9) j + 1 ≤ i ≤ n. (10) and there holds d(x1 , xi ) − d(x1 , xj ) σ −2A d(x1 , xj+1 ) − d(x1 , xj ) + A ≤ δσ (A), The following Theorem 2 is the main result of this section, which gives a rigorous proof of the existence of the exact interpolation FNN in the metric space. Theorem 2. If σ is a bounded sigmoidal function and δσ (A) < 1 4n , (11) then for sample set (1) there exists {cj }nj=1 ⊂ R such that NeA (x) is an exact interpolation FNN. Proof. In order to prove Theorem 2, it is sufficient to prove that the following systems of equation with variable {ci }ni=1 NeA (xi ) = yi (i = 1, . . . , n) (12) is solvable under the condition (11). Denote d(x1 , xi ) − d(x1 , xj ) ei,j (A) := σ −2A + A , i, j = 1, . . . , n − 1, d(x1 , xj+1 ) − d(x1 , xj ) d(x1 , xi ) − d(x1 , xj ) ei,n (A) := σ −2A + A , i = 1, . . . , n − 1, d(x1 , xn ) − d(x1 , xn−1 ) d(x1 , xn ) − d(x1 , xj ) + A , j = 1, . . . , n − 1, en,j (A) := σ −2A d(x1 , xj+1 ) − d(x1 , xj ) en,n (A) := σ (A). Then the coefficient matrix of the system of equations (12) can be written as Dn (A) := e11 (A) e21 (A) ... en1 (A) e12 (A) e22 (A) ... en2 (A) ... ... ... ... e1n (A) e2n (A) ... enn (A) . Let dij (A) = eij (A) − ei+1j (A), i, j = 1, . . . , n − 1, din (A) = ein (A) − ei+1n (A), i = 1, . . . , n − 1, dnj (A) = enj (A), j = 1, . . . , n − 1, dnn (A) = enn (A), F. Cao et al. / Mathematical and Computer Modelling 52 (2010) 1674–1681 1679 then d11 (A) d21 (A) Dn (A) = ... dn1 (A) d12 (A) d22 (A) ... dn2 (A) ... ... ... ... d1n (A) d2n (A) ... dnn (A) . Moreover, from (11) and the definition of δσ (A) we obtain that if t ≥ A, then |σ (−t )| < 1 4n , |1 − σ (t )| < 1 4n . Therefore, dii (A) = σ (A) − σ (−A) = 1 − (1 − σ (A)) − σ (−A) 1 1 1 ≥ 1− − = 1 − , 1 ≤ i ≤ n − 1, 4n 4n 2n and dnn (A) = σ (A) = 1 − (1 − σ (A)) 1 1 ≥ 1− ≥1− . 4n 2n Noting (9) and (10), we get n X |dij | = |eij − eij+1 | ≤ |eij | + |eij+1 | < n · j=1,j6=i 1 2n = 1 2 , i = 1 , . . . , n. Hence n X dii (A) > |dij | (i = 1, . . . , n). j=1,j6=i Then from the strictly diagonally dominant matrices are invertible principle (see [13]), we have Dn (A) 6= 0, which means that the system of equations (12) is solvable. This completes the proof of Theorem 2. 4. Approximate interpolation FNN with sigmoidal activation function The inner weights and thresholds of the exact interpolation FNN, NeA (x), in Theorem 2 depend on the interpolation nodes, while the coefficients of NeA (x) is a solution to the system of equations (12). We have proved that (12) is solvable when A is sufficient large, but it is not easy to work out the solution. Therefore, we turn to consider the approximate interpolation FNN, NaA (x): NaA (x) := n −1 X d(x1 , x) − d(x1 , xj ) d(x1 , x) − d(x1 , xn ) (yj − yj+1 )σ −2A + A + yn σ −2A +A . d(x1 , xj+1 ) − d(x1 , xj ) d(x1 , xn ) − d(x1 , xn−1 ) j =1 It is obvious that the exact interpolation FNN, NeA (x), and the approximate interpolation FNN, NaA (x), differ only in the coefficients. The following Theorem 3 shows the error between NeA (x) and NaA (x). Theorem 3. If σ is a bounded sigmoidal function and (11) holds, then NeA (x) − NaA ! n −1 (2n + 1)δσ (A)kσ k X (x) ≤ |yj − yj+1 | + |yn | , 1 − (2n + 1)δσ (A) j=1 where kσ k := supt ∈R |σ (t )|. Proof. Since (11) holds, from Theorem 1 we know that (12) is solvable. We denote its solution as Vc := (c1 , . . . , cn ), and the coefficients matrix of (12) as M. Let Vy = (y1 , . . . , yn ), then (12) can be rewritten as MVcT = VyT , (13) 1680 F. Cao et al. / Mathematical and Computer Modelling 52 (2010) 1674–1681 where V T denotes the transpose of the vector V . Define  1 0 U := . . . 0 0  1 1 ... 0 0 ... ... ... ... ...  1 1 1 1 ...  . . . . 1 1 0 1 A direct computation yields that the inverse matrix of U  −1 1 0 U −1 =  . . . 0 0 1 0 −1 0 0 0 0 ...  ... ... ... ... ... ... 0 0 ... 1 0  0 0   . . . . −1 1 Write n ,n M − U = (αij )i,j=1 , then |αij | ≤ δσ (A). If we denote n ,n U −1 (M − U ) = (βij )i,j=1 , then |βij | ≤ 2δσ (A), |βnj | ≤ δσ (A) (i = 1, . . . , n − 1, j = 1, . . . n). (14) Let VY := (y1 − y2 , . . . , yn−1 − yn , yn ), ∆Vc := Vc − VY , we have UVYT = VyT . Noting (13) we obtain (U + (M − U ))(VYT + ∆VcT ) = VyT . That is U ∆VcT = −(M − U )∆VcT − (M − U )VYT . Hence ∆VcT = −U −1 (M − U )∆VcT − U −1 (M − U )VYT . The last equation together with (14) yields n X |∆VCi | ≤ (2n + 1)δσ (A) i=1 n X |∆VCi | + (2n + 1)δσ (A) i=1 n1 X ! |yi − yi+1 | + |yn | . i =1 Furthermore, from (11) we know ! n X (2n + 1)δσ (A) |∆VCi | ≤ |yj − yj+1 | + |yn | . 1 − (2n + 1)δσ (A) j=1 i=1 n X Since |NeA (x) − NaA (x)| ≤ n X |∆VCi |kσ k, i=1 we have ! n−1 X ( 2n + 1 )δ ( A )kσ k σ |NeA (x) − NaA (x)| ≤ |yj − yj+1 | + |yn | . 1 − (2n + 1)δσ (A) j=1 This finishes the proof of Theorem 3. F. Cao et al. / Mathematical and Computer Modelling 52 (2010) 1674–1681 1681 References [1] Y. Ito, K. Saito, Superposition of linearly independent functions and finite mappings by neural networks, Math. Sci. 21 (1996) 27–33. [2] Y. Ito, Independence of unscaled basis functions and finite mappings by neural networks, Math. Sci. 26 (2001) 117–126. [3] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numer. (1999) 143–195. [4] Y. Shrivatava, S. Dasgupta, Neural networks for exact matching of functions on a discrete domain, in: Proceedings of the 29th Ieee Conference on Decision and Control, Honolulu, 1990, pp. 1719–1724. [5] E.D. Sontag, Feedforward nets for interpolation and classification, J. Comput. System Sci. 45 (1) (1992) 20–48. [6] X. Li, Interpolation by ridge polynomials and its application in neural networks, J. Comput. Appl. Math. 144 (2002) 197–209. [7] H.X. Li, E.S. Lee, Interpolation functions of feedfoward neural networks, Comput. Math. Appl. 46 (2003) 1861–1874. [8] H.X. Li, L.X. Li, J.Y. Wang, Interpolation representation of feedforward neural networks, Math. Comput. Model. 37 (2003) 829–847. [9] B. Llanas, F.J. Sainz, Constructive approximate interpolation by nerual networks, J. Comput. Appl. Math. 188 (2006) 283–308. [10] B. Llanas, S. Lantarón, Hermite interpolation by neural networks, Appl. Math. Comput. 191 (2007) 429–439. [11] F.L. Cao, Y.Q. Zhang, Z.R. He, Interpolation and rates of convergence for a class of neural networks, Appl. Math. Modelling 33 (2009) 1441–1456. [12] T.F. Xie, F.L. Cao, The ridge function representation of polynomials and an application to neural networks. Acta Math. Sin. Eng. Ser. (2010) (in press). [13] P. Lascaux, T. Theodor, Analyse Numerique Matricielle Apliquée a l’Art de l’Ingenieur, Masson, Paris, 1986.

Papers - 徐 宗本 - 教师个人主页.pdf

Papers - 徐 宗本 - 教师个人主页.pdf

Papers - 徐宗本 - 教师个人主页.pdf

Papers - 徐宗本 - 教师个人主页.pdf