Tài liệu Digital Signal Processing Handbook P32 pptx

Thông tin tài liệu

A. C. Surendran. “Inverse Problems in Microphone Arrays.” 2000 CRC Press LLC. <http://www.engnetbase.com>. InverseProblemsinMicrophone Arrays A.C.Surendran BellLaboratories LucentTechnologies 32.1Introduction:DereverberationUsingMicrophoneArrays 32.2SimpleDelay-and-SumBeamformers ABriefLookatAdaptiveArrays • ConstrainedAdaptiveBeam- formingFormulatedasanInverseProblem • MultipleBeam- forming 32.3MatchedFiltering 32.4DiophantineInverseFilteringUsingtheMultiple Input-Output(MINT)Model 32.5Results SpeakerIdentification 32.6Summary References 32.1 Introduction:DereverberationUsingMicrophone Arrays Anacousticenclosureusuallyreducestheintelligibilityofthespeechtransmittedthroughitbecause thetransmissionpathisnotideal.Apartfromthedirectsignalfromthesource,thesoundisalso reflectedoffoneormoresurfaces(usuallywalls)beforereachingthereceiver.Theresultingsignalcan beviewedastheoutputofaconvolutioninthetimedomainofthespeechsignalandtheroomimpulse response.Thisphenomenonaffectsthequalityofthetransmittedsoundinimportantapplications suchasteleconferencing,cellulartelephony,andautomaticvoiceactivatedsystems(speakerand speechrecognizers).Roomreverberationcanbeperceptuallyseparatedintotwobroadclasses.Early roomechoesaremanifestedasirregularitiesor“ripples”intheamplitudespectrum.Thiseffect dominatesinsmallrooms,typicallyoffices.Long-termreverberationistypicallyexhibitedasan echo“tail”followingthedirectsound[1]. IfthetransferfunctionG(z)ofthesystemisknown,itmightbepossibletoremovethedeleterious multi-patheffectsbyinversefilteringtheoutputusingafilterH(z)where H(z)= 1 G(z) . (32.1) TypicallyG(z)isthetransformoftheimpulseresponseoftheroomg(n).Ingeneral,thetransfer functionofareverberantenvironmentisanon-minimumphasefunction,i.e.,allthezerosofthe functiondonotnecessarilylieinside|z|=1.Aminimumphasefunctionhasastablecausalinverse, whiletheinverseofanon-minimumphasefunctionisacausaland,ingeneral,infiniteinlength. c  1999byCRCPressLLC In general, G(z) can be expressed as a product of a minimum-phase function and a non-minimum phase function: G(z) = G min (z) · G max (z) . (32.2) Many approacheshavebeen proposed for dereverberating signals. The aim of all the compensation schemes is to bring the impulse response of the system after dereverberation as close as possible to an impulse function. Homomorphicfiltering techniques wereused to estimate the minimum phase part of G(z) [2, 3]. In [2], the minimum phase component was estimated by zeroing out the cepstrum for negativefrequencies. Thentheoutputsignalwasfilteredbythe inverseofthe minimum phasetransfer function. But this technique still did not remove the reverberation contributed by the maximum- phase part of the room response. In [3], the inverse of the maximum-phase part was also estimated from the delayed and truncated version of the acausal inverse. But, the delay can be inordinate and care must be taken to avoid temporal aliasing. An alternate approach to dereverberation is to calculate, in some form, the least squares estimate of the inverse of the transmission path, i.e., calculate the least squares solution of the equation h(n) ∗ g(n) = d(n) , (32.3) where d(n) is the impulse function and ∗ denotes convolution. Assuming that the system can be modeled by an FIR filter, Eq. (32.3) can be expressed in matrix form as:              g(0) g(1)g(0) . . .g(1) ··· 0 g(m) . . . ··· g(0) 0 g(m) ··· g(1) 00··· . . . g(m)                   h(0) h(1) . . . h(i)      =      1 0 . . . 0      , (32.4) or, GH = D, (32.5) where D is the unity matrix and G, H and D are matrices of appropriate dimensions as shown in Eq. (32.4). The least squares method finds an approximate solution given by ˆ H(z) =  G T G  −1 G T D. (32.6) Thus, the error vector can be written as  =[D − G ˆ H ] =[I − G  G T G  −1 G T ]D = ED , where E =[I − G(G T G) −1 G T ]. The mean square error or the energy in the error vector is |||| 2 =||ED|| 2 ≤|E|||D|| 2 ≤ λ max λ min ||D|| 2 , (32.7) where |E| is the norm of E and λ max and λ min are the maximum and minimum eigenvalues of E. The ratio between the maximum and minimum eigenvalues is called the condition number of a matrix and it specifies the noise amplification of the inversion process [4]. c  1999 by CRC Press LLC FIGURE 32.1: Modeling a room with a microphone array as a multiple output FIR system. Typically, the operation is done on the full-band signal. Sub-band approaches have been proposed in [5, 7, 8]. All these approaches use a single microphone. The amplitude spectrum of the room response has “ripples” which produce pronounced notches in the signal output spectrum. As the location of the microphone in the room changes, the room response for the same source changes and, as a result, the position of the notches in the amplitude spectrum varies. This property was used to advantage in [1]. In this method, multiple microphones were located in the room. Then, the output of each microphone was divided into multiple bands of equal bandwidth. For each band, by choosing the microphone whose output has the maximum energy, the ripples were reduced. In [9], the signals from all the microphones in each band were first co-phased, and then weighted by a gain calculated from a normalized cross-correlation function calculated based on the outputs of different microphones. Since the reverberation tails are uncorrelated, the cross-correlation-based gain turned off the tail of the signal. These techniques have had modest success in combating reverberation. In recent years, great progress has been made in the quality, availability, and cost of high performance microphones. Fast digital signal processors that permit complex algorithms to operate in real time have been developed. These advances have enabled the use of large microphone arrays that deploy more sophisticated algorithms for dereverberation. Figure 32.1 shows a generic microphone array system which can “invert” the room acoustics. Different choices of H i (z) lead to different algorithms, each with their own advantages and disadvantages. In this report, we shall discuss single and multiple beamforming, matched filtering, and Diophantine inverse filtering through multiple input-output (MINT) modeling. In all cases we assume that the source location and the room configuration or, alternatively, the G i (z)s, are known. 32.2 Simple Delay-and-Sum Beamformers Arrays that form a single beam directed towards the source of the sound have been designed and built[11]. Inthesesimple delay-and-sumbeamformers, theprocessingfilter hasthe impulse response h i (n) = δ(n − n i ), (32.8) where n i = d i /c, d i is the distance of the ith microphone from the source and c is the speed of sound in air. Sound propagation in the room can be modeled by a set of successive reflections off the surfaces (typically the walls) [10]. Figure 32.2 illustrates the impulse response of a single c  1999 by CRC Press LLC beamformer. The delay at the output of each microphone coheres the sound that arrives at the microphone directly from the source. It can be seen from Fig. 32.2 that in the resulting response, the strength of the coherent pulse is N and there are N(K − 1) distributed pulses. So, ideally, the signal-to-reverberant noise ratio (measured as the ratio of undistorted signal power to reverberant noise power) is N 2 /N(K − 1) [13]. In a highly reverberant room, as the number of images K increases towards infinity, the SNR improvement, N/K − 1, falls to zero. FIGURE 32.2: A single beamformer. (Source: Flanagan, J.L., Surendran, A.C., and Jan, E.-E., Spatially selective sound capture for speech and audio processing, Speech Commun., 13: 207–222, 1993. With kind permission of Elsevier Science - NL, Sara Burgerhartstraat 25, 1055 KV Amsterdam, The Netherlands). The single-beamforming system reported in [11] can automatically determine the direction of the source and rapidly steer the array. But, as the beam is steered away from the broadside, the system exhibits a reduction in spatial discrimination because the beam pattern broadens [12]. Further, beamwidth varies with frequency, so an array has an approximate “useful bandwidth” given by the upper and lower frequencies [12]: f upper = c d| cos φ − cos φ  | max , (32.9) and f lower = f upp er N , (32.10) where c is the speed of sound in air, N is the number of sensors in the array, d is the sensor spacing, φ  is the steering angle measured with respect to the axis of the array, and φ is the direction of the source. c  1999 by CRC Press LLC For example, consider an array with seven microphones and a sensor spacing of 6.5 cm. Further, suppose the desired range of steering is ±30 ◦ from broadside. Then, | cos φ −cos φ  | max = 1.5 and hence f upp er ≈ 3500Hzand f lower ≈ 500Hz. So, to cover the bandwidth of speech, say from 250 Hz to 7 kHz, three harmonically nested arrays of spacing 3.25, 6.5, and 13 cm can be used. Further, the beamwidth also depends on the frequency of the signal as well as the steering direction. If the beam is steered to an angle φ  , then the direction of the source for which the beam response falls to half its power is [12] φ 3dB = cos −1  cos φ  ± 2.8 Nωd  , (32.11) where ω = 2πf and f is the frequency of the signal. Equation 32.11 shows that the smaller the array, the wider the beam. Since most of the energy of a typical room interfering noise lies at lower frequencies, it would be advantageous to build arrays that have higher directivity (smaller beamwidth) at lower frequencies. This, combined with the fact that the array spacing is larger for lower frequency bands, gives yet another reason to harmonically nest arrays (see Fig. 32.3). FIGURE 32.3: Harmonically nested array that covers three frequency ranges. Just as linear one-dimensional arrays display significant fattening of the beams when steered towards the axis of the array, two-dimensional arrays exhibit widening of the beams when steered at angles acute to the plane of the array. Three-dimensional microphone arrays can be constructed [13] that have essentially a constant beamwidth over 4π steradians. Multiple beamforming using three- dimensionalarrays ofsensors notonly providesselectivity in azimuthand elevation butalso selectivity in the direction of the beam, i.e., it provides range selectivity. The performance of single beamformers can degrade severely in the presence of other interfering noise sources, especially if they fall in the direction of the sidelobes. This problem can be mitigated using adaptive arrays. Adaptive arrays are briefly discussed in the next section. 32.2.1 A Brief Look at Adaptive Arrays Adaptive signal processing techniques can be used to form a beam at the desired source while si- multaneously forming a null in the direction of the interfering noise source. Such arrays are called c  1999 by CRC Press LLC “adaptive arrays”. Though adaptive arrays are not effective under conditions of severe reverberation, they are included here because problems in adaptive arrays can be formulated as inverse problems. Hence, we shall discuss adaptive arrays briefly without providing a quantitative analysis of them. Broadband arrays have been analyzed in [14, 15, 16, 17, 18, 19]. In all these methods, the direction of arrival of the signal is assumed to be known. Let the array have N sensors and M delay taps per sensor. If X(k) =[x 1 (k) .x i (k) .x NM (k)] T (see Fig. 32.4) is the set of signals observed at the tap points, then X(k) = S(k) + N(k), where FIGURE 32.4: General form of an adaptive filter. S(k) is the contribution of the desired signal at the tap points and N(k) is the contribution of the unknown interfering noise. The inputs to the sensors, x (jM+1) (k), j = 0, .,(N − 1), are the noisy versions of g(k), the actual signal at the source. Now, the filter output y(k) = W T X(k),where W T =[w 11 , .,w 1M ,w 21 , .,w 2M , .,w N1 , .,w NM ] is the set of weights at the tap points. The goal of the system is to make the output y(k)as close as possible to the source g(k). One way of doing this is to minimize the error E{(g(k) − y(k)) 2 }. The weight W ∗ that achieves this least mean square (LMS) error is also called the Weiner filter, and is given by W ∗ = R −1 XX C gX , (32.12) where R XX is the autocorrelation of X(k) and C gX is the set of cross-correlations between g(k) and each element of X(k).Ifg(k) and N(k)are uncorrelated, then C gX = E{g(k)X(k)}=E{g(k)S(k)}+E{g(k)N(k)} = E{g(k)S(k)} and R XX = E{X(k)X T (k)}=E{(S(k) + N(k))(S(k) + N(k)) T } = R SS + R NN , where R SS and R NN are the autocorrelation matrices for the signal and noise. Usually R NN is not known. In such cases, the exact inverse cannot be calculated and an iterative approach to update the weights is needed. In Widrow’s approach [15], a known pilot-signal g(k) c  1999 by CRC Press LLC is injected into the array. Then, the weights are updated using the Widrow-Hopf algorithm that increments the weight vector in the direction of the negative gradient of the error: W k+1 = W k + µ[g(k) − y(k)]X(k), where W k+1 is the weight vector after the kth update and µ is the step size. Griffiths’ method also uses the LMS approach, but minimizes the mean square error based on the autocorrelation and the cross-correlation values between the input and the output, rather than the signals themselves. Since the mean square error can be written as E{ ( g(k) − y(k) ) 2 }=R gg − 2C T gS W + W T R XX W, where R gg isthe auto-correlationmatrix of g(k)and C gS isthe setof cross-correlation matrix between g(k) and each element of S(k), the weight update can also be done by W k+1 = W k + µ[C gS − R XX W k ] (32.13) = W k + µ[C gS − X(k)X T (k)W k ] (32.14) = W k + µ[C gS − y(k)X(k)] . (32.15) In the above methods, significant distortion is observed in the primary beam due to null-steering. Constrained LMS techniques which place constraints on the performance of the main lobe can be used to reduce distortion [18, 19]. By specifying the broad-band response and the array beam characteristicsas constraints, morerobustbeams can be formed. The problemnow canbe formulated as an optimization technique that minimizes the output power of the system. Given that the output power is E  y 2 (k)  = E  W T X(k)X T (k)W  = W T R XX W = W T R SS W + W T R NN W, if W can be chosen such that W T R NN W = 0, the noise can be eliminated. It was proposed [18] that once the array is steered towards the source with appropriate delays, minimizing the output power is equivalent to removing directional interference, since in-phase signals add coherently. In an accurately steered array, the wavefronts arriving from the direction of steering generate identical signals at each sensor. Hence, the array may be collapsed to a single sensor implementation which is equivalent to an FIR filter [18], i.e., the columns of the broadband array sum to an FIR filter. Additional constraints can be placed on this FIR filter. If the weights of the filters can be written as a matrix: ˆ W =    w 11 w 12 . w 1M . . . . . . . . . . . . w N1 w N2 . w NM    , then it can be specified that  N i=1 w ij = f j ,j= 1, .,M,wheref j ,j= 1, .,M are the taps of an FIR filter that provides the desired filter response. Hence, using this method, directional interference can be suppressed by minimizing the output power and spectral interference can be suppressed by constraining the columns of the weight coefficients. Thus, the problem can be formulated as Minimize: W T R XX W (32.16) subject to: C T W = F, (32.17) c  1999 by CRC Press LLC where F is the desired FIR filter and C =      100 . 0100 . 0 . 100 . 0 010 . 0010 . 0 . 010 . 0 . . . . . . . . . . . . 000 . 1 000 . 1 . 000 . 1      . (32.18) C has M rows with NM entries on each row. The first row of C in Eq. 32.18 has ones in positions 1,(M+1), .,(N−1)∗M +1; the secondrow has ones in positions2,(M+2), .,(N−1)∗M +2, etc. Equation 32.17 can be solved using Lagrange multipliers [18]. This optimization problem can alternatively be posed as an inverse problem. 32.2.2 Constrained Adaptive Beamforming Formulated as an Inverse Problem Using a similar cost function and the same constraint, the system can be formulated as an inverse problem [19]. The function to be optimized, W T R XX W = 0, can be approximated by X T W = 0. This, combined with the constraint in Eq. 32.17 is written as:      x 1 . x M . x (N−1)∗M+1 . x N∗M 1 . 0 . 1 . 0 . . . . . . . . . 0 . 1 . 0 . 1      ∗              w 11 . . . w 1M . . . w N1 . . . w NM              =      0 f 1 . . . f M      , (32.19) AW = F (32.20) This equation can be solved with any techniquethat can invert a matrix. There areseveral problems in solving Eq. 32.20. In general, the equation can be inconsistent. In addition, the system is rank deficient. Further, traditional methods used to solve Eq. 32.20 are not robust to errors such as round- off errors in digital computers, measurement inaccuracies, and noise corruption. In the least squares solution (Eq. 32.6), the noise amplification is dictated by the condition number of the error matrix, i.e., the ratio of the highest and the lowest eigenvalues of E. In the extreme case when λ min = 0, the system is rank-deficient. In such cases, the pseudo-inverse solution can be used. c  1999 by CRC Press LLC Any matrix A can be written using the singular value decomposition as A = UDV T , where D =      σ 1 0 . 0 0 σ 2 . 0 . . . . . . . . . . . . 00 . σ N      , then, A −1 = VD −1 U T , where D −1 =       1 σ 1 0 . 0 0 1 σ 2 . 0 . . . . . . . . . . . . 00 . 1 σ N       . σ 2 i ,i= 1, .,Nare the eigenvalues of AA T . The matrices U and V aremadeupoftheeigenvectors of AA T and A T A, respectively. Extending this definition to rank-deficient matrices, the pseudo-inverse can be written as A † = VD † U T , where D † =        1 σ 1 0 . 0 0 1 σ 2 . 0 00 . 1 σ r . 0 0        , where r is the rank of the matrix A. The rank-deficient system has infinite number of solutions. The pseudo-inverse solution can be shown to be the least squares solution with minimum energy. It can also be viewed as the projection of the least squares solution in the range space of A. An iterative technique called the Row Action Projection (RAP) algorithm [4, 19] can be used to solve Eq. 32.20. Row Action Projection An effective way to find a solution for Eq. 32.20 is to use the RAP method [4], which has been shown to be effective in providing a fast and stable solution to a system of simultaneous equations. Traditional least squares methods need a block of data to calculate the estimate. Most of these methods demand a lot of memory and processing power. RAP operates on only one row at a time, which makes it a useful sample-by-sample method in adaptive signal processing. Further, the matrix A in Eq. 32.20 is a sparse matrix. RAP has been shown to be effective in solving systems with sparse matrices [4]. For a given system of equations, a 01 w 1 + a 02 w 2 + .+ a 0,NM w NM = f 0 a 11 w 1 + a 12 w 2 + .+ a 1,NM w NM = f 1 . = . a M1 w 1 + a M2 w 2 + .+ a M,NM w NM = f M , c  1999 by CRC Press LLC [...]... The SNR of the output was calculated using the formula: SNR(dB) = 10 log10 s (n)2 (y(n) − s(n))2 (32.35) where s(n) is the input speech signal and y(n) is the output speech signal The two signals are sufficiently staggered to account for the delay in the processing The signal- to-noise-ratios were calculated as follows: No of mics c 1999 by CRC Press LLC SNR 2 3 4 15 dB 27 dB 37 dB FIGURE 32.8: Rate of... Cannon, T.M and Ingebresten, B.R., Blind deconvolution through digital signal processing, Proc IEEE, 63(4), 678–692, 1975 [7] Langhans, T and Strube, H.W., Speech enhancement by nonlinear multiband envelope filtering, Proc IEEE Conf Acoustics, Speech, Signal Process., 156–159, 1982 [8] Wang, H and Itakura, F., Dereverberation of speech signals based on sub-band envelope estimation, ICIE Trans., E 74(11),... the inversion of mixed phase signals, Proc IEEE Conf Acoustics, Speech, Signal Process ’82, 1858–1861, 1982 [4] Mammone, R.J., Computational Methods of Signal Recognition and Recovery, John Wiley & Sons, New York, 1992 [5] Mourjopoulos, J and Hammond, J.K., Modeling and enhancement of reverberant speech using an envelope convolution method, Proc IEEE Conf Acoustics, Speech, Signal Process., 1144– 1147,... for speech and audio processing, Speech Comm., 13, 207–222, 1993 [14] Widrow, B and Stearns, S.T., Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1985 c 1999 by CRC Press LLC [15] Widrow, B., Mantey, P.E., Griffiths, L.J., and Goode, B.B., Adaptive antenna systems, Proc IEEE, 55, 2143–2159, Dec., 1967 [16] Griffiths, L.J., A simple adaptive algorithm for real-time processing in antenna... Dereverberation of speech signals based on sub-band envelope estimation, ICIE Trans., E 74(11), 3576–3583, Nov 1991 [9] Allen, J.B., Berkeley, D.A and Blauert, J., Multimicrophone signal processing technique to remove room reverberation from speech signals, J Acoustical Soc Am., 62, 912–915, Oct., 1977 [10] Allen, J.B and Berkeley, D.A., Image method for efficiently simulating small-room acoustics, J Acoustical Soc... model can be found Though this method provides a higher signal- to-noise ratio for a source at the focus, it does not provide immunity from noise interference that the matched filtering can offer Speaker identification results are provided that substantiate the performance analysis of these systems References [1] Flanagan, J.L and Lummis, R.C., Signal processing to reduce multipath distortions in small rooms,... Trans Antennas Propagation, AP-24(5), 585–599, Sept 1976 [22] Jan, E.-E and Flanagan, J.L., Microphone arrays for speech processing, Intl Symp Signals, Syst Electron., San Francisco, CA, 1995 [23] Miyoshi, M and Kaneda, Y., Inverse filtering of room acoustics, IEEE Trans Acoustics, Speech, Signal Process., 36(2), 145–152, Feb., 1988 [24] Sondhi, M.M., Personal communication [25] Surendran, A.C and Flanagan,... sound capture for speech and audio processing, Speech Commun., 13: 207–222, 1993 With kind permission of Elsevier Science - NL, Sara Burgerhartstraat 25, 1055 KV Amsterdam, The Netherlands) c 1999 by CRC Press LLC FIGURE 32.7: Principle of a matched filter (Source: Flanagan, J.L., Surendran, A.C., and Jan, E.-E., Spatially selective sound capture for speech and audio processing, Speech Commun., 13: 207–222,... at this location would be poor (the effective response does not look like an ideal impulse) But the signal is effectively amplified On the other hand, the matched filter maximizes the output power for a source located at the focus and minimizes the output power for all other sources thus providing lower signal- to-noise ratio improvement, but higher levels of spatial discrimination c 1999 by CRC Press... Propagation, AP-30(1), 27–34, Jan 1982 [18] Frost III, O.L., An algorithm for linearly constrained adaptive array processing, Proc IEEE, 60(8), 926–935, 1972 [19] Farrell, K., Mammone, R.J and Flanagan, J.L., Beamforming microphone arrays for speech enhancement, Proc IEEE Conf Acoustics, Speech, Signal Process ’92, 1, 285–288, 1992 [20] IEEE Trans Antennas Propagation: Special Issues on Adaptive Arrays, . input speech signal and y(n) is the output speech signal. The two signals are sufficiently staggered to account for the delay in the processing. The signal- to-noise-ratios. of memory and processing power. RAP operates on only one row at a time, which makes it a useful sample-by-sample method in adaptive signal processing. Further,

Ngày đăng: 22/12/2013, 21:17

Xem thêm: Tài liệu Digital Signal Processing Handbook P32 pptx, Tài liệu Digital Signal Processing Handbook P32 pptx

Tài liệu Digital Signal Processing Handbook P32 pptx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan