Voiceprint Identification

来源:互联网 发布:choice数据接口 编辑:程序博客网 时间:2024/04/29 14:18

Voiceprint identification can be defined as a combination ofboth aural (listening) and spectrographic (instrumental) comparison of one ormore known voices with an unknown voice for the purpose of identification orelimination. Developed by Bell Laboratories in the late 1940s for militaryintelligence purposes, the modern-day forensic utilization of the technique didnot start until the late 1960s following its adoption by the Michigan StatePolice. From 1967 until the present, more than 5,000 law enforcement relatedvoice identification cases have been processed by certified voiceprintexaminers.

Voice identification has been used in a variety of criminalcases, including murder, rape, extortion, drug smuggling, wagering-gamblinginvestigations, political corruption, money-laundering, tax evasion, burglary,bomb threats, terrorist activities and organized crime activities. It is part ofa larger forensic role known as acoustic analyses, which involves tape filteringand enhancement, tape authentication, gunshot acoustics, reconstruction ofconversations and the analysis of any other questioned acoustic event.

Theory

The fundamental theory for voice identification rests on thepremise that every voice is individually characteristic enough to distinguish itfrom others through voiceprint analysis. There are two general factors involvedin the process of human speech. The first factor in determining voice uniquenesslies in the sizes of the vocal cavities, such as the throat, nasal and oralcavities, and the shape, length and tension of the individual's vocal cordslocated in the larynx. The vocal cavities are resonators, much like organ pipes,which reinforce some of the overtones produced by the vocal cords, which produceformats or voiceprint bars. The likelihood that two people would have all theirvocal cavities the same size and configuration and coupled identically appearsvery remote.

The second factor in determining voice uniqueness lies in themanner in which the articulators or muscles of speech are manipulated duringspeech. The articulators include the lips, teeth, tongue, soft palate and jawmuscles whose controlled interplay produces intelligible speech. Intelligiblespeech is developed by the random learning process of imitating others who arecommunicating. The likelihood that two people could develop identical usepatterns of their articulators also appears very remote.

Therefore, the chance that two speakers would have identicalvocal cavity dimensions and configurations coupled with identical articulatoruse patterns appears extremely remote. While there have been claims that several voices have been found to be indistinguishable, no evidence to support suchallegations has been published, offered for examination or demonstrated to theauthors.

Several studies have been published evidencing the ability toreliably identify voices under certain conditions, and a Federal Bureau ofInvestigation survey of its own performance in the examination of 2,000 forensiccases revealed an error rate of 0.31 percent for false identifications, and 0.53per cent for false eliminations. (See Koenig, B.E., 1986, Spectrographic VoiceIdentification: a forensic survey, Journal of the Acoustical Society of America,79:2088-2090.)

While there is disagreement in the so-called "scientificcommunity" on the degree of accuracy with which examiners can identify speakersunder all conditions, there is agreement that voices can, in fact, beidentified.

To facilitate the visual comparisons of voices, a soundspectrograph is used to analyze the complex speech wave form into a pictorialdisplay on what is referred to as a spectrogram. The spectrogram displays thespeech signal with the time along the horizontal axis, frequency on the verticalaxis, and relative amplitude indicated by the degree of gray shading on thedisplay. The resonance of the speaker's voice is displayed in the form ofvertical signal impressions or markings for consonant sounds, and horizontalbars or formants for vowel sounds. The visible configurations displayed arecharacteristic of the articulation involved for the speaker producing the wordsand phrases. The spectrograms serve as a permanent record of the words spokenand facilitate the visual comparison of similar words spoken between and unknownand known speaker's voice.

Procedural Guidelines

The acoustic environment in many cases can be controlled at thereceiving end of speech signal. Shutting off the radio, television or othersignal- noise generating devices will reduce or eliminate unwanted backgroundspeech or noise. While not always possible, the investigator should at tempt toselect a reasonably quiet environment for controlled activities such as drugbuys or other illegal operations being investigated. Many times these types ofactivities are carried out in bars, restaurants, car washes, billiard rooms andthe like, and the investigator cannot always dictate the location.

It may require the recording of telephone conversations orface-to-face encounters under a variety of acoustic conditions in which someoneis wearing a body recorder or transmitting the conversation via radio frequencyto a remote location. Unfortunately, in many cases the investigators cannotcontrol the acoustic environment. In situations involving an adverseenvironment, investigators should use high technology stereo equipment tooptimize recording capability.

The attempt to produce samples as parallel to the unknown aspossible actually assists the examiner in his task because speaker variables arereduced to a minimum. Numerous studies have been conducted that indicate veryreliable decisions can be made by trained professional examiners when samplesare obtained in the manner described.

The notion proposed by some opponents that duplicating theunknown as closely as possible may cause error is not supported by any availableevidence. Research studies have produced strong evidence that even very goodmimics cannot duplicate an- other's speech patterns.

In an attempt to obtain proper speech samples, investigatorsshould not hesitate to ask suspects for the samples they need. Surprisingly,many suspects will voluntarily give a sample of their voice for comparisonpurposes.

In the event you are dealing with some type of vocal' disguise,attempt to obtain a similarly produced known exemplar in addition to thesuspect's normal voice. It should be noted that vocal disguises can be verydifficult for the examiner to deal with and the probability of determination isless than with normal voice samples.

If a suspect refuses to cooperate with the investigator, acourt order may be acquired compelling the suspect to produce voice recordingsfor the purpose of comparison. Courts have repeatedly held that requiring theaccused to submit voice exemplars for the purpose of comparison foridentification or elimination does not violate the suspect's Fifth Amendmentrights. In Wade, 388 U.S. 218 (1967), the Court held that the privilege againstself-incrimination offers no protection from compulsion to submit to speakingfor purpose of voice identification, or to writing, photographing, finger-printing and measurements.

Several problems have been encountered in obtaining knownvoice exemplars even with the use of a court order. If the court order is vague,the suspect may utter a few words of the text involved, speak too softly, toofast, or too slowly, or otherwise disguise the sample and claim compliance withthe order.

To prevent such problems, the investigator is wise to requestthat the court order specify in detail, that the suspect give a sample of his orher voice, repeating the phrases of the questioned call in a naturalconversational voice (or in a similar disguise, if that is the case) and thatsuch sample shall be given at least three times and to the reasonablesatisfaction of the investigator. Voice exemplars obtained with such specificinstructions are usually very satisfactory for comparison purposes.

Before terminating the recording session, check the recordingto deter mine whether or not a satisfactory exemplar was obtained.' Rememberthat once a suspect is released, a second known sample may be very difficult toobtain.

Whatever the recording circum stance, background noise and thedistance between the talker and the receiving device should be minimized foroptimal recording. Good quality tape recording equipment should be used, as wellas magnetic recording tape. As a rule of thumb, recording tape with standard 120equalization, normal bias and no more than a 5 dB drop at 6 KHz should beused.

After the development of a suspect, the next task is toproperly obtain known voice samples for comparison purposes. Do not hesitate toask a suspect for a speech sample. If the suspect refuses, a court order may beobtained requiring compliance with the request. See Schmerber v. California,384 US. 757(1966). and Gilbert v. California, 388 US. 263 (1967).Both are landmark cases. There are also many additional decisions at bothstate and federal court levels that may be cited to support such a request.Court orders should clearly spell out the minimum number of samples to beobtained, the manner of speech, and the method to be employed.

The next task for the investigator is to obtain proper speechsamples for comparison purposes. Probably the best guide here is attempting toduplicate the recording of the questioned call. Known samples should be obtainedvia the telephone and recorded in the same manner as the questioned call. Ifpossible, the same recorder and telephone pickup should be used. In some cases,even the same telephone has been employed. If there is room on the questionedtape, the known sample may be placed on it. If there is not, another tape of thesame type and brand should be used if at all possible.

Speech samples obtained should contain exactly the same wordsand phrases as those in the questioned sample because only like speech soundsare used for comparison. Be cause the voice, like handwriting, is dynamic andvariant, several samples of each spoken phrase are desired for analysis. Unlessthe questioned call sounds like a read statement, the suspect should not beallowed to read the phrases from a transcript but should repeat each phraseafter it is spoken by someone else. To avoid an unnatural verbal response, thesuspect should repeat the first phrase and proceed in the same manner with eachsuccessive phrase.

When all phrases have been recorded, the same procedure shouldbe repeated at least two more times beginning with the first word or phrase. Thesuspect may be asked to read the phrases if a very poor job of repeating isdone. Some people do a better job of reading than repeating the phrases.

It is important that the known sample be spoken in the samemanner as the questioned sample; therefore, the investigator should be familiarwith the voice, manner of speech and the text. If the caller's voice wasdisguised, the suspect should give a normal sample and a disguised one as in thequestioned call.

Recorded evidence should be wrapped in tinfoil to protect itfrom possible contact with a magnetic field if it is submitted by mail. Theevidence should be shipped in a secure container that will prevent the evidencefrom tearing through the packaging material. Do not submit a copy of yourinvestigative report with the evidence. The examiner does not want to know thedetails of the case. It is important, however, to provide the examiner withinformation regarding the recording method, the number of calls and suspectsinvolved, and any other information that may assist the examiner in theexamination of the evidence.

Upon receipt of the evidence by the laboratory, it is properlymarked and a case number is assigned. The analysis and comparison of known andquestioned voice samples may take several hours or days to complete, dependingon the number of samples involved and the complexity of the examination. Both anaural (listening) and visual (spectrographic) examination and comparison isconducted. Aural and spectrographic cues examined should compliment one anotherin the event the voices are in fact the same.

As with the identification of fingerprints, there is presentlyno universal standard for the number of words required for identification. Itdoes, how ever, vary from a minimum of 10 for some agencies and 20 for others.The Internal Revenue Service has chose to use 20 or more like speech soundsbetween an unknown and known sample with the degree of certainty based onquality and excellence of the evidence examined. Obtaining a second, independentdecision is standard practice in this field as in other forensic sciences.

Visual comparison of spectrograms involves, in general, theexamination of spectrograph features of like sounds as portrayed in spectrogramsin terms of time, frequency and amplitude. Specific features, the result ofproducing consonants, vowels and semi-vowels in isolation or in combination(co-articulation), include the following but certainly not all-inclusive clues:pitch, bandwidth, mean frequency, trajectory of vowel formants, distribution offormant energy, nasal resonance, stops, plosives, fricatives, pauses, interformant features and other idiosyncratic and pathological features.

Special aural comparison tapes are prepared facilitatingcomparison of psycholinguistic features via short-term memory. Aural cuescompared include resonance quality, pitch, temporal factors, inflection,dialect, articulation, syllable grouping, breath pattern, disguise, pathologiesand other peculiar speech characteristics.

Some agencies offer court testimony, others do not. The IRSlaboratory is the only federal agency that presently offers testimony. All othercertified examiners, whether in state agencies or in private practice, alsooffer court testimony.


Court Admissibility

Court testimony involving aural- spectrographic voicecomparison essentially started having an impact on the courts after the TosiStudy in December 1970. Since then there have been between 150 and 200 trials inlocal, state or federal courts. Because of a difference based on evidentiaryphilosophical reasons, some courts have admitted aural-spectrographic voiceevidence and others have not.

There are two general "rules" or "standards" by whichscientific evidence is accepted in courts of law in the United States. Thefirst, commonly referred to as the Frye "rule" or "test," is based on a 1923District of Columbia case and basically requires "general acceptance in theparticular field in which it belongs." See Frye v. United States, 54 App. D.C.46, 293 F. 1013 (1923). The second is based on the argument of McCormick (See"McCormick on Evidence," 3rd Ed., 203 at 608.) McCormick states: "Generalscientific acceptance is a proper condition for taking judicial notice ofscientific facts, but it is not a suitable criterion for the admissibility ofscientific evidence. Any relevant conclusion supported by a qualified expertwitness should be received unless there are distinct reasons for exclusion." SeeRule 702 of the Federal Rules of Evidence.

Many state and federal courts have abandoned Frye and adoptedthe argument of McCormick. The supreme courts of Minnesota, Maine, Ohio andRhode Island have admitted aural-spectrographic voice evidence followingMcCormick. Intermediate appellate courts in California, Mary land and Michiganadmitted such evidence following Frye but were reversed by their respectivesupreme courts, which held that the Frye test had not been met. TheMassachusetts Supreme Court held aural-spectrographic voice evidence admissibleapplying the Frye test, while those of Arizona, Indiana and Pensylvania didnot.

In the federal court system, we are aware of 30 trials in whichthe question of aural-spectrographic voice evidence was addressed. All but threeadmitted the evidence based on Frye or McCormick. On appeal, the Second, Fourthand Sixth Circuits held the evidence admissible, applying McCormick, while theDistrict of Columbia did not, applying Frye. See United States v. Williams,583 F.2d 1194 (2d Cir.), cert. denied 439 US.

1117 (1978); United States v. Bailer, 519 F.2d 463 (4th Cir.),cert. denied

423 US. 1019 (1975); United States v. Franks, 511 F.2d 25 (6thCir.) cert. denie4 422 US. 1042 (1975), and United States v. McDaniel,538 F.2d 408 (D.C. Cir. 1976).

In United States v. Williams, supra at 1198, the courtsaid: "The 'Frye' test is usually construed as necessitating a survey andcategorization of the subjective views of a number of scientists, assuringthereby a reserve of experts available to testify. Difficulty in applying the'Frye' test has led a number of courts to its implicit modification." Also seeUnited States v. Bailer, supra at n.6.

Since 1970, the forensic application of aural-spectrographicvoice identification has been reliably applied in the investigation of severalthousand cases. While there is disagreement on the reliability of the methodunder all conditions, there is agreement that voices can be identified andeliminated when the proper conditions exist and the analysis is carefullyconducted by qualified examiners.

Several state appellate and supreme courts have admitted theevidence, as have three of four federal appellate courts. The United StatesSupreme Court has refused to review and decide the three cases brought beforeit. While the admission of aural-spectrographic voice evidence continues to bedecided in various courts, the method continues to be a very important tool mthe arsenal against crime.

Other areas of acoustic analysis include, in part, gun shotanalysis, tape enhancement and tape authentication. While not discussed in thisarticle, it should be noted that laboratory analysis related to these problemsis avail able in some laboratories.

 

By:  Steve Cain  Email: info@tapeexpert.com