index

List of Figures

Chapter 2

2.1. Classification of the spectrum based on wavelength

2.2. Theoretical magnitude response vs. phase difference f in the transmitted and reflected waves for a homodyne system. The same change in position results in two different values for the change in magnitude depending on the phase difference between the transmitted and reflected waves (or distance to the reflecting interface).

2.3. Theoretical sensitivity envelope for the radar as a function of distance away from the antennae assuming n2 > n1 reflection and a 15 cm wavelength.

2.4 Overview of the GEMS’s physical configuration.

2.5. Plots of the GEMS pulse and frequency spectrum.

2.6. Measured antenna patterns for the GEMS antennae at 30.5 cm in Volts.

2.7. Example demonstrating how a low frequency signal can cause voltage amplitude resolution for a high frequency signal to degrade.

2.8. Measured GEMS filter response (red) and model response (black)

2.9. Magnitude and phase response for the GEMS, its stable inverse, and its noncausual inverse.

2.10. Magnitude and phase response for a 64 tap FIR noncausal differentiator.

2.11. Plot of GEMS signal (blue), inverse filtered GEMS (position, red), and the derivative of the position (velocity, black) for a normal chest phonation.

2.12. Shaker experimental setup.

2.13. Relative amplitude (GEMS/accel) and phase lead of the GEMS vs. distance from the GEMS to the shaker block. The arrows denote points where the GEMS changed phase with respect to the inverted accelerometer signal.

2.14. Sensitivity envelope of GEMS. A positive sensitivity denotes a positive signal for a reflecting surface moving toward the GEMS. The calculated positions of the null points for the anterior and posterior wall are shown in blue. They are discussed in Section 3.5.1.

2.15. Posterior and right lateral view of laryngeal structures (Copyright 1997. Novartis. Reprinted with permission from the Atlas of Human Anatomy, illustrated by Frank H. Netter, M.D. All rights reserved)

2.16. Superior and lateral dissection view of laryngeal structures (Copyright 1997. Novartis. Reprinted with permission from the Atlas of Human Anatomy, illustrated by Frank H. Netter, M.D. All rights reserved)

2.17. Median section of neck. (Copyright 1997. Novartis. Reprinted with permission from the Atlas of Human Anatomy, illustrated by Frank H. Netter, M.D. All rights reserved)

2.18. Cross-section of trachea. (Copyright 1997. Novartis. Reprinted with permission from the Atlas of Human Anatomy, illustrated by Frank H. Netter, M.D. All rights reserved)

2.19. Standing pressure waves for the lowest resonances in an open-ended tube (top) and close-ended tube (bottom)

2.20. Cylindrical-tube approximation of the vocal tract for a simulated /u/ vowel (from Titze, Principles of Voice Production, 1994. All rights reserved. Reprinted by permission of Allyn & Bacon).

2.21. Vowel chart showing regions of F1 and F2 for 10 English vowels (from Titze, Principles of Voice Production, 1994. All rights reserved. Reprinted by permission of Allyn & Bacon).

2.22. Sagittal (front to back) cross section of a vocal fold.

2.23. A one-mass model of the vocal folds, including airflow through the glottis, pressure against the tissue wall, and a supraglottal air column (from Titze , Principles of Voice Production, 1994. All rights reserved. Reprinted by permission of Allyn & Bacon).

2.24. Glottal resistance for moist, warm, viscous air vs. glottal width (assuming glottis is rectangular)

2.25. Normalized audio traces for /a/ "ah" (top) and /i/ "ee" (bottom). The duration of both are 22 msec. Note how the "ee", although longer in period, has more amplitude than the "ah", which loses energy more quickly.

2.26. Equivalent circuit for plane acoustic wave propagation in an incremental yielding tube (from Ishizaka, French, and Flanagan (1975)).

2.27. Schematic representation of an LTI system.

2.28. Comparison of a synthetic transfer function with 4 poles and two zeros (top) to three models: 4 pole/2 zero ARMA, 4 pole LPC, and 16 coefficient cepstral.

Chapter 3

3.1. Use of the GEMS and other EM sensors to detect human vocal articulator movement.

3.2. Simple planar calculation of the neck tissue layers’ reflectivity, neglecting geometrical factors, multiple reflections, and conductivity.

3.3. Demonstration of the geometrical effects on EM wave scattering from the folds. As viewed front the anterior side, the folds have little scattering cross-section. Most of the energy is simply diffracted around the folds. Where scattering does occur it is not reflected back to the transmitting antenna but rather to the sides.

3.4. Frame from tracheal reflectivity simulation with 2.3 GHz wave. The frame is slightly stretched in the x direction due to machine graphics incompatibilities.

3.5. Energy vs. time for the calibration experiment. Positive values are energy moving to the right (incident), negative values are energy moving to the left (reflected). The two peaks are the positive and negative fields peaks shown in 3.4. In this example, R = 57.7%, very close to the theoretical value of 57.2%.

3.6. Energy vs. time for the trachea experiment. R = 15.3%.

3.7. Energy vs. time for the fully open folds experiment. R = 0.8%.

3.8. One cycle of a GEMS signal with some of the corresponding video frames. The vertical bars superimposed on the GEMS signal denote the exposure time. The GEMS signal has not been inverse filtered. The horizontal bars on the video frames are caused by a camera defect.

3.9. Plot of inverse filtered GEMS (blue), first derivative of GEMS (green), and integral of GEMS (black) along with the frame markers (red) for the abnormal physiology subject. The width of the frame markers denotes the exposure time of the frame. Observations for the dataset are included. The time scale is in samples, at 10000 samples/second.

3.10. Approximate locations on the GEMS return for the frames analysis.

3.11. Example of "fully closed" folds in falsetto mode

3.12. Summary of observations for the falsetto portion of the normal physiology. The time scale is in samples, at 10000 samples/second.

3.13. Audio, GEMS, and inverted EGG (IEGG) for /a/.

3.14. Plot of audio, GEMS, and IEGG for breathy cessation of speech. Note the total lack of EGG signal as contact is lost, and also the similarity of the audio and GEMS near the end of the speech.

3.15. Data from position experiments when GEMS is moved from the center of the trachea (the laryngeal prominence) to 5 cm to the left of the prominence. Note the phase change at 4 cm.

3.16. Data from position experiments when GEMS is moved from the center of the trachea (the laryngeal prominence) to 5 cm to the right of the prominence. Note the phase change at 3 cm and again at 5 cm.

3.17. Slice 1250 of the visible human (available at http://www.npac.syr.edu/projects/vishuman/VisibleHuman.html)

3.18. The visible human slice and the GEMS, moved 1 cm at a time to the left.

3.19. Data from position experiments when the GEMS is moved from 2 cm above the laryngeal prominence to 2 cm below it. Note the phase change from the positions above to the center.

3.20. Slice from a series of CT scans performed on the author at the UC Davis Medical Center on October 21, 1998. Note the scale in centimeters to the right and below.

3.21. Expanded view of the region of interest from 3.20.

Chapter 4

4.1. Side view of the trachea.

4.2. Electrical circuit model of the tracheal wall.

4.3. Magnitude and phase lead of the impedance Zw of the lumped-element circuit model of the vocal tract.

4.4. Plot of modeled tracheal wall frequency response vs. L.

4.5. Plot of modeled tracheal wall frequency response vs. C.

4.6. Tracheal wall impedance modeled digitally.

4.7. GEMS, position, velocity, and pressure for subject GB.

4.8. GEMS and inverted derived pressure from subject GB.

4.9. The breathy audio and the GEMS-derived pressure from Figure 3.14.

4.10. Calculated transfer function for /a/, subject GB. The first two formant locations are at 673 and 1162 Hz, normal locations are 600-1300 and 1000-1500.

4.11. Calculated transfer function for /i/, subject GB. The first two formant locations are at 332 and 2227 Hz, normal locations are 200-400 and 2000-4000.

4.12. Calculated transfer function for /u/, subject GB. The first two formant locations are at 390 and 1338 Hz, normal locations are 400-600 and 900-1400.

4.13. Formant location trend for /a/ for subjects GB and TG. Note the relative differences between formant locations, which are individualistic.

4.14. Formant location trend for /i/ for subjects GB and TG.

4.15. Formant location trend for /u/ for subjects GB and TG.

Appendix A

A.1. Normalized frequency response for the example highpass filter.

A.2. An unstable filter in the z plane, and the method used to calculate the phase (qp and qz) and magnitude (z and p) contribution from each pole and zero.

A.3. An unstable filter (black) with a stabilizing AP filter (blue).

A.4. The inverted, stable filter Hs(z). The magnitude is perfectly inverted but the phase is not.

A.5. Plot of second allpass filter (blue) and the triangles used to calculate B, the position of the allpass zero.

A.6. Phase response for causal allpass filter designed to have a phase shift of 75 degrees at 100 Hz.

A.7. The frequency response for the original filter H(z), the stable inverted filter Hs(z), and the noncausal filter Hc(z-1).

Appendix B

B.1. The NMOS imaging sensor is divided into 12 blocks of pixels, each of which can be read separately to increase frame rate.

B.2. Comparison between frames taken at 1000 fps. The unintensified (left) frame used a exposure time of 1 millisecond while the intensified (right) frames used an exposure time of 0.1 msec.

B.3. Experimental setup for using the intensified EktaPro simultaneously with other data sources.

B.4. Digitized frame from an intensified EktaPro using 4 out of the 12 blocks at 3000 fps. The image has been cropped slightly by the digitizing program to reduce file size.

B.5. GEMS signal overlaid with "ext sync" data. The exposure time is much smaller than the frame rate period and this results in sharp images.

B.6. How to determine where the exposure occurs when the intensifier is not available and the "frame marker" is used. In this example, 1/3 of the screen is used and an exposure time of 1/3000 of a second is selected. The top plot depicts operation at 3000 fps and the bottom at 1000 fps.

B.7. An example of the error involved when undersampling a fast signal at 40 kHz. The error in locating where the signal occurred depends on the width of the pulse and the sampling rate.

Appendix C

C.1. GEMS placement for pitch measurements. Normally light skin contact is made but is not necessary.

C.2. Audio and GEMS signals from 29 year old male native English speaker, voicing /a/ ("ah")

C.3. GEMS signal overlaid with the corresponding high-speed vocal fold video frames. Each bar is 30 microseconds wide and represents the exposure time of the frame.

C.4. Block diagram of the GEMS zero-crossing algorithm.

C.5. Normalized signals from a tuning fork. The audio (upper) is offset in the y direction to facilitate comparison to the GEMS signal (lower).

C.6. Relative error vs. actual pitch for each pitch algorithm. A three second long synthetic signal with multiple harmonics was used. Cepstral (-x), autocorrelation (-o), and GEMS (x).

C.7. Noisy (includes a second male speaker) audio signal (/i/) with pitch contours for GEMS, cepstral, and autocorrelation methods. The GEMS signal is unaffected by the noise.

C.8. Noisy (includes a second male speaker) audio signal ("When all else fails, use force") with pitch contours for GEMS, cepstral, and autocorrelation methods. Again the GEMS signal is unaffected.

Return to Table of Contents