1851:
pattern has to be recognized or classified into a category that represents a meaning to a human. Every acoustic signal can be broken into smaller more basic sub-signals. As the more complex sound signal is broken into the smaller sub-sounds, different levels are created, where at the top level we have complex sounds, which are made of simpler sounds on the lower level, and going to lower levels, even more, we create more basic and shorter and simpler sounds. At the lowest level, where the sounds are the most fundamental, a machine would check for simple and more probabilistic rules of what sound should represent. Once these sounds are put together into more complex sounds on upper level, a new set of more deterministic rules should predict what the new complex sound should represent. The most upper level of a deterministic rule should figure out the meaning of complex expressions. In order to expand our knowledge about speech recognition, we need to take into consideration neural networks. There are four steps of neural network approaches:
689:(GMM-HMM) technology based on generative models of speech trained discriminatively. A number of key difficulties had been methodologically analyzed in the 1990s, including gradient diminishing and weak temporal correlation structure in the neural predictive models. All these difficulties were in addition to the lack of big training data and big computing power in these early days. Most speech recognition researchers who understood such barriers hence subsequently moved away from neural nets to pursue generative modeling approaches until the recent resurgence of deep learning starting around 2009–2010 that had overcome all these difficulties. Hinton et al. and Deng et al. reviewed part of this recent history about how their collaboration with each other and then with colleagues across four groups (University of Toronto, Microsoft, Google, and IBM) ignited a renaissance of applications of deep feedforward neural networks for speech recognition.
2244:(Publisher: Springer) written by Microsoft researchers D. Yu and L. Deng and published near the end of 2014, with highly mathematically oriented technical detail on how deep learning methods are derived and implemented in modern speech recognition systems based on DNNs and related deep learning methods. A related book, published earlier in 2014, "Deep Learning: Methods and Applications" by L. Deng and D. Yu provides a less technical but more methodology-focused overview of DNN-based speech recognition during 2009–2014, placed within the more general context of deep learning applications including not only speech recognition but also image recognition, natural language processing, information retrieval, multimodal processing, and multitask learning.
1090:
assumptions and can learn all the components of a speech recognizer including the pronunciation, acoustic and language model directly. This means, during deployment, there is no need to carry around a language model making it very practical for applications with limited memory. By the end of 2016, the attention-based models have seen considerable success including outperforming the CTC models (with or without an external language model). Various extensions have been proposed since the original LAS model. Latent
Sequence Decompositions (LSD) was proposed by
1343:. The report also concluded that adaptation greatly improved the results in all cases and that the introduction of models for breathing was shown to improve recognition scores significantly. Contrary to what might have been expected, no effects of the broken English of the speakers were found. It was evident that spontaneous speech caused problems for the recognizer, as might have been expected. A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially.
1521:. Individuals with learning disabilities who have problems with thought-to-paper communication (essentially they think of an idea but it is processed incorrectly causing it to end up differently on paper) can possibly benefit from the software but the technology is not bug proof. Also the whole idea of speak to text can be hard for intellectually disabled person's due to the fact that it is rare that anyone tries to learn the technology to teach the person with the disability.
1278:
keyboard and mouse: voice-based navigation provides only modest ergonomic benefits. By contrast, many highly customized systems for radiology or pathology dictation implement voice "macros", where the use of certain phrases – e.g., "normal report", will automatically fill in a large number of default values and/or generate boilerplate, which will vary with the type of the exam – e.g., a chest X-ray vs. a gastrointestinal contrast series for a radiology system.
9113:
9093:
677:
in their 2012 review paper). A Microsoft research executive called this innovation "the most dramatic change in accuracy since 1979". In contrast to the steady incremental improvements of the past few decades, the application of deep learning decreased word error rate by 30%. This innovation was quickly adopted across the field. Researchers have begun to use deep learning techniques for language modeling as well.
665:, a type of neural network based solely on "attention", have been widely adopted in computer vision and language modeling, sparking the interest of adapting such models to new domains, including speech recognition. Some recent papers reported superior performance levels using transformer models for speech recognition, but these models usually require large scale training datasets to reach high performance levels.
1354:, employs a speaker-dependent system, requiring each pilot to create a template. The system is not used for any safety-critical or weapon-critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other cockpit functions. Voice commands are confirmed by visual and/or aural feedback. The system is seen as a major design feature in the reduction of pilot
873:(or an approximation thereof) Instead of taking the source sentence with maximal probability, we try to take the sentence that minimizes the expectancy of a given loss function with regards to all possible transcriptions (i.e., we take the sentence that minimizes the average distance to other possible sentences weighted by their estimated probability). The loss function is usually the
1878:(how often it vibrates per second). Accuracy can be computed with the help of word error rate (WER). Word error rate can be calculated by aligning the recognized word and referenced word using dynamic string alignment. The problem may occur while computing the word error rate due to the difference between the sequence lengths of the recognized word and referenced word.
290:(DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary. DTW processed speech by dividing it into short frames, e.g. 10ms segments, and processing each frame as a single unit. Although DTW would be superseded by later algorithms, the technique carried on. Achieving speaker independence remained unsolved at this time period.
1432:
speech recognition task should be possible. In practice, this is rarely the case. The FAA document 7110.65 details the phrases that should be used by air traffic controllers. While this document gives less than 150 examples of such phrases, the number of phrases supported by one of the simulation vendors speech recognition systems is in excess of 500,000.
1486:/other injuries to the upper extremities can be relieved from having to worry about handwriting, typing, or working with scribe on school assignments by using speech-to-text programs. They can also utilize speech recognition technology to enjoy searching the Internet or using a computer at home without having to physically operate a mouse and keyboard.
400:
independently discovered the application of HMMs to speech.) This was controversial with linguists since HMMs are too simplistic to account for many common features of human languages. However, the HMM proved to be a highly useful way for modeling speech and replaced dynamic time warping to become the dominant speech recognition algorithm in the 1980s.
2136:
devices are also accessible to visitors to the building, or even those outside the building if they can be heard inside. Attackers may be able to gain access to personal information, like calendar, address book contents, private messages, and documents. They may also be able to impersonate the user to send messages or make online purchases.
782:, then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians, which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each
387:(HMM) for speech recognition. James Baker had learned about HMMs from a summer job at the Institute of Defense Analysis during his undergraduate education. The use of HMMs allowed researchers to combine different sources of knowledge, such as acoustics, language, and syntax, in a unified probabilistic model.
1248:
sector, speech recognition can be implemented in front-end or back-end of the medical documentation process. Front-end speech recognition is where the provider dictates into a speech-recognition engine, the recognized words are displayed as they are spoken, and the dictator is responsible for editing
944:
One approach to this limitation was to use neural networks as a pre-processing, feature transformation or dimensionality reduction, step prior to HMM based recognition. However, more recently, LSTM and related recurrent neural networks (RNNs), Time Delay Neural
Networks(TDNN's), and transformers have
502:
system at CMU. The Sphinx-II system was the first to do speaker-independent, large vocabulary, continuous speech recognition and it had the best performance in DARPA's 1992 evaluation. Handling continuous speech with a large vocabulary was a major milestone in the history of speech recognition. Huang
3949:
Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob; Houlsby, Neil (3 June 2021). "An Image is Worth 16x16 Words: Transformers for Image
Recognition at
2135:
Speech recognition can become a means of attack, theft, or accidental operation. For example, activation words like "Alexa" spoken in an audio or video broadcast can cause devices in homes and offices to start listening for input inappropriately, or possibly take an unwanted action. Voice-controlled
1823:
Read vs. Spontaneous Speech – When a person reads it's usually in a context that has been previously prepared, but when a person uses spontaneous speech, it is difficult to recognize the speech because of the disfluencies (like "uh" and "um", false starts, incomplete sentences, stuttering, coughing,
1426:
Training for air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog that the controller would
1123:
Typically a manual control input, for example by means of a finger control on the steering-wheel, enables the speech recognition system and this is signaled to the driver by an audio prompt. Following the audio prompt, the system has a "listening window" during which it may accept a speech input for
2139:
Two attacks have been demonstrated that use artificial sounds. One transmits ultrasound and attempt to send commands without nearby people noticing. The other adds small, inaudible distortions to other speech or music that are specially crafted to confuse the specific speech recognition system into
1524:
This type of technology can help those with dyslexia but other disabilities are still in question. The effectiveness of the product is the problem that is hindering it from being effective. Although a kid may be able to say a word depending on how clear they say it the technology may think they are
1435:
The USAF, USMC, US Army, US Navy, and FAA as well as a number of international ATC training organizations such as the Royal
Australian Air Force and Civil Aviation Authorities in Italy, Brazil, and Canada are currently using ATC simulators with speech recognition from a number of different vendors.
967:
with multiple hidden layers of units between the input and output layers. Similar to shallow neural networks, DNNs can model complex non-linear relationships. DNN architectures generate compositional models, where extra layers enable composition of features from lower layers, giving a huge learning
709:
In 2017, Microsoft researchers reached a historical human parity milestone of transcribing conversational telephony speech on the widely benchmarked
Switchboard task. Multiple deep learning models were used to optimize speech recognition accuracy. The speech recognition word error rate was reported
676:
and his students at the
University of Toronto and by Li Deng and colleagues at Microsoft Research, initially in the collaborative work between Microsoft and the University of Toronto which was subsequently expanded to include IBM and Google (hence "The shared views of four research groups" subtitle
6013:
pronunciation researchers are primarily interested in improving L2 learners' intelligibility and comprehensibility, but they have not yet collected sufficient amounts of representative and reliable data (speech recordings with corresponding annotations and judgments) indicating which errors affect
1431:
techniques offer the potential to eliminate the need for a person to act as a pseudo-pilot, thus reducing training and support personnel. In theory, Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the
1413:
As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness. Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment. Much remains to be done both in speech recognition
1331:
aircraft, and other programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft, with applications including setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and
1269:
or EHR). The use of speech recognition is more naturally suited to the generation of narrative text, as part of a radiology/pathology interpretation, progress note or discharge summary: the ergonomic gains of using speech recognition to enter structured discrete data (e.g., numeric values or codes
971:
A success of DNNs in large vocabulary speech recognition occurred in 2010 by industrial researchers, in collaboration with academic researchers, where large output layers of the DNN based on context dependent HMM states constructed by decision trees were adopted. See comprehensive reviews of this
910:
Dynamic time warping is an algorithm for measuring similarity between two sequences that may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another he or she were walking more quickly, or even if
705:
recognition, and speaker independence was considered a major breakthrough. Until then, systems required a "training" period. A 1987 ad for a doll had carried the tagline "Finally, the doll that understands you." – despite the fact that it was described as "which children could train to respond to
399:
team created a voice activated typewriter called
Tangora, which could handle a 20,000-word vocabulary Jelinek's statistical approach put less emphasis on emulating the way the human brain processes and understands speech in favor of using statistical modeling techniques like HMMs. (Jelinek's group
1706:
Speech recognition by machine is a very complex problem, however. Vocalizations vary in terms of accent, pronunciation, articulation, roughness, nasality, pitch, volume, and speed. Speech is distorted by a background noise and echoes, electrical characteristics. Accuracy of speech recognition may
1471:
People with disabilities can benefit from speech recognition programs. For individuals that are Deaf or Hard of
Hearing, speech recognition software is used to automatically generate a closed-captioning of conversations such as discussions in conference rooms, classroom lectures, and/or religious
987:
and to use raw features. This principle was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features, showing its superiority over the Mel-Cepstral features which contain a few stages of fixed transformation from spectrograms. The
612:
since at least 2006. This technology allows analysts to search through large volumes of recorded conversations and isolate mentions of keywords. Recordings can be indexed and analysts can run queries over the database to find conversations of interest. Some government research programs focused on
1277:
A more significant issue is that most EHRs have not been expressly tailored to take advantage of voice-recognition capabilities. A large part of the clinician's interaction with the EHR involves navigation through the user interface using menus, and tab/button clicks, and is heavily dependent on
940:
make fewer explicit assumptions about feature statistical properties than HMMs and have several qualities making them more attractive recognition models for speech recognition. When used to estimate the probabilities of a speech feature segment, neural networks allow discriminative training in a
789:
Described above are the core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system
749:
Modern general-purpose speech recognition systems are based on hidden Markov models. These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary
103:
into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent".
1850:
This hierarchy of constraints is exploited. By combining decisions probabilistically at all lower levels, and making more deterministic decisions only at the highest level, speech recognition by a machine is a process broken into several phases. Computationally, it is a problem in which a sound
1089:
in 2016. The model named "Listen, Attend and Spell" (LAS), literally "listens" to the acoustic signal, pays "attention" to different parts of the signal and "spells" out the transcript one character at a time. Unlike CTC-based models, attention-based models do not have conditional-independence
1127:
Simple voice commands may be used to initiate phone calls, select radio stations or play music from a compatible smartphone, MP3 player or music-loaded flash drive. Voice recognition capabilities vary between car make and model. Some of the most recent car models offer natural-language speech
941:
natural and efficient manner. However, in spite of their effectiveness in classifying short-time units such as individual phonemes and isolated words, early neural networks were rarely successful for continuous recognition tasks because of their limited ability to model temporal dependencies.
2225:
also uses the same features, most of the same front-end processing, and classification techniques as is done in speech recognition. A comprehensive textbook, "Fundamentals of
Speaker Recognition" is an in depth source for up to date details on the theory and practice. A good insight into the
929:
Neural networks emerged as an attractive acoustic modeling approach in ASR in the late 1980s. Since then, neural networks have been used in many aspects of speech recognition such as phoneme classification, phoneme classification through multi-objective evolutionary algorithms, isolated word
656:
and can learn "Very Deep
Learning" tasks that require memories of events that happened thousands of discrete time steps ago, which is important for speech. Around 2007, LSTM trained by Connectionist Temporal Classification (CTC) started to outperform traditional speech recognition in certain
1760:
e.g. the 26 letters of the English alphabet are difficult to discriminate because they are confusing words (most notoriously, the E-set: "B, C, D, E, G, P, T, V, Z — when "Z" is pronounced "zee" rather than "zed" depending on the English region); an 8% error rate is considered good for this
1146:
assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is
914:
A well-known application has been automatic speech recognition, to cope with different speaking speeds. In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g., time series) with certain restrictions. That is, the sequences are "warped"
5340:
115:
appliance control, search key words (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), determining speaker characteristics, speech-to-text processing (e.g.,
680:
In the long history of speech recognition, both shallow form and deep form (e.g. recurrent nets) of artificial neural networks had been explored for many years during 1980s, 1990s and a few years into the 2000s. But these methods never won over the non-uniform internal-handcrafting
1253:
system, the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the editor, where the draft is edited and report finalized. Deferred speech recognition is widely used in the industry currently.
1501:
Speech recognition is also very useful for people who have difficulty using their hands, ranging from mild repetitive stress injuries to involve disabilities that preclude using conventional computer input devices. In fact, people who used the keyboard a lot and developed
1489:
Speech recognition can allow students with learning disabilities to become better writers. By saying the words aloud, they can increase the fluidity of their writing, and be alleviated of concerns regarding spelling, punctuation, and other mechanics of writing. Also, see
1057:, the first end-to-end sentence-level lipreading model, using spatiotemporal convolutions coupled with an RNN-CTC architecture, surpassing human-level performance in a restricted grammar dataset. A large-scale CNN-RNN-CTC architecture was presented in 2018 by
1869:
Analysis of four-step neural network approaches can be explained by further information. Sound is produced by air (or some other medium) vibration, which we register by ears, but machines by receivers. Basic sound creates a wave which has two descriptions:
1316:
1497:
The use of voice recognition software, in conjunction with a digital audio recorder and a personal computer running word-processing software has proven to be positive for restoring damaged short-term memory capacity, in stroke and craniotomy individuals.
838:, or MLLT). Many systems use so-called discriminative training techniques that dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. Examples are maximum
1044:
assumptions similar to a HMM. Consequently, CTC models can directly learn to map speech acoustics to English characters, but the models make many common spelling mistakes and must rely on a separate language model to clean up the transcripts. Later,
1222:
bias, especially in high-stakes assessments; from words with multiple correct pronunciations; and from phoneme coding errors in machine-readable pronunciation dictionaries. In 2022, researchers found that some newer speech to text systems, based on
7084:
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., ... & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing
972:
development and of the state of the art as of October 2014 in the recent Springer book from Microsoft Research. See also the related background of automatic speech recognition and the impact of various machine learning paradigms, notably including
4883:
1381:
environment as well as in the jet fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot, in general, does not wear a
5329:
1073:
are important strategies for reusing and extending the capabilities of deep learning models, particularly due to the high costs of training models from scratch, and the small size of available corpus in many languages and/or specific domains.
172:. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems.
1265:) provides for substantial financial benefits to physicians who utilize an EMR according to "Meaningful Use" standards. These standards require that a substantial amount of data be maintained by the EMR (now more commonly referred to as an
6764:
Caridakis, George; Castellano, Ginevra; Kessous, Loic; Raouzaiou, Amaryllis; Malatesta, Lori; Asteriadis, Stelios; Karpouzis, Kostas (19 September 2007). "Multimodal emotion recognition from expressive faces, body gestures and speech".
726:
are important parts of modern statistically based speech recognition algorithms. Hidden Markov models (HMMs) are widely used in many systems. Language modeling is also used in many other natural language processing applications such as
1479:) or have very low vision can benefit from using the technology to convey words and then hear the computer recite them, as well as use a computer by commanding with their voice, instead of having to look at the screen and keyboard.
911:
there were accelerations and deceleration during the course of one observation. DTW has been applied to video, audio, and graphics – indeed, any data that can be turned into a linear representation can be analyzed with DTW.
4579:
The earliest applications of speech recognition software were dictation ... Four months ago, IBM introduced a 'continual dictation product' designed to ... debuted at the National Business Travel Association trade show in
849:
to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model, which includes both the acoustic and language model information and combining it statically beforehand (the
1008:
is required for all HMM-based systems, and a typical n-gram language model often takes several gigabytes in memory making them impractical to deploy on mobile devices. Consequently, modern commercial ASR systems from
3074:
1128:
recognition in place of a fixed set of commands, allowing the driver to use full sentences and common phrases. With such systems there is, therefore, no need for the user to memorize a set of fixed command words.
6192:
786:, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes.
815:
6070:
761:
Another reason why HMMs are popular is that they can be trained automatically and are simple and computationally feasible to use. In speech recognition, the hidden Markov model would output a sequence of
6223:
5246:
4622:
2068:
517:, a Belgium-based speech recognition company, acquired several other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000. The L&H speech technology was used in the
5481:
Shillingford, Brendan; Assael, Yannis; Hoffman, Matthew W.; Paine, Thomas; Hughes, Cían; Prabhu, Utsav; Liao, Hank; Sak, Hasim; Rao, Kanishka (13 July 2018). "Large-Scale Visual Speech Recognition".
4263:
2233:
A good and accessible introduction to speech recognition technology and its history is provided by the general audience book "The Voice in the Machine. Building Computers That Understand Speech" by
5012:
2170:
877:, though it can be different distances for specific tasks; the set of possible transcriptions is, of course, pruned to maintain tractability. Efficient algorithms have been devised to re score
3564:
803:
835:
3706:
2154:
521:
operating system. L&H was an industry leader until an accounting scandal brought an end to the company in 2001. The speech technology from L&H was bought by ScanSoft which became
448:
Much of the progress in the field is owed to the rapidly increasing capabilities of computers. At the end of the DARPA program in 1976, the best computer available to researchers was the
2925:
3480:
802:
to normalize for a different speaker and recording conditions; for further speaker normalization, it might use vocal tract length normalization (VTLN) for male-female normalization and
2812:
When you speak to someone, they don't just recognize what you say: they recognize who you are. WhisperID will let computers do that, too, figuring out who you are by the way you sound.
1456:
systems. Despite the high level of integration with word processing in general personal computing, in the field of document production, ASR has not seen the expected increases in use.
6101:
4212:
2013 IEEE International Conference on Acoustics, Speech and Signal Processing: New types of deep neural network learning for speech recognition and related applications: An overview
2213:, second edition published in 2004, and "Speech Processing: A Dynamic and Optimization-Oriented Approach" published in 2003 by Li Deng and Doug O'Shaughnessey. The updated textbook
4525:
1940:
7032:
819:
7010:
1004:. End-to-end models jointly learn all the components of the speech recognizer. This is valuable since it simplifies the training process and deployment process. For example, a
831:
6485:
3762:, Connectionist Speech Recognition: A Hybrid Approach, The Kluwer International Series in Engineering and Computer Science; v. 247, Boston: Kluwer Academic Publishers, 1994.
1746:
e.g. the 10 digits "zero" to "nine" can be recognized essentially perfectly, but vocabulary sizes of 200, 5000 or 100000 may have error rates of 3%, 7%, or 45% respectively.
845:
Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the
192:– Three Bell Labs researchers, Stephen Balashek, R. Biddulph, and K. H. Davis built a system called "Audrey" for single-speaker digit recognition. Their system located the
3737:
3070:
8987:
4099:
Lohrenz, Timo; Li, Zhengyang; Fingscheidt, Tim (14 July 2021). "Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition".
2125:
6985:
6184:
3620:
3339:
2888:
1800:
With continuous speech naturally spoken sentences are used, therefore it becomes harder to recognize the speech, different from both isolated and discontinuous speech.
164:
From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in
2832:
3170:
657:
applications. In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM, which is now available through
811:
7276:
4566:
4027:
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (24 May 2019). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
907:
Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach.
6746:
6559:
6062:
5085:
4005:
5539:
Kriman, Samuel; Beliaev, Stanislav; Ginsburg, Boris; Huang, Jocelyn; Kuchaiev, Oleksii; Lavrukhin, Vitaly; Leary, Ryan; Li, Jason; Zhang, Yang (22 October 2019),
5935:
3971:
Wu, Haiping; Xiao, Bin; Codella, Noel; Liu, Mengchen; Dai, Xiyang; Yuan, Lu; Zhang, Lei (29 March 2021). "CvT: Introducing Convolutions to Vision Transformers".
2858:
6215:
3922:
3649:
5215:
4612:
1700:
1390:. Substantial test and evaluation programs have been carried out in the past decade in speech recognition systems applications in helicopters, notably by the
7474:
6359:
4255:
1798:
With discontinuous speech full sentences separated by silence are used, therefore it becomes easier to recognize the speech as well as with isolated speech.
542:
486:
deployed the Voice Recognition Call Processing service in 1992 to route telephone calls without the use of a human operator. The technology was developed by
5699:
5162:
4428:
2230:(the largest speech recognition-related project ongoing as of 2007 is the GALE project, which involves both speech recognition and translation components).
2201:
can be useful to acquire basic knowledge but may not be fully up to date (1993). Another good source can be "Statistical Methods for Speech Recognition" by
494:
By this point, the vocabulary of the typical commercial speech recognition system was larger than the average human vocabulary. Raj Reddy's former student,
7634:
6402:
4462:
Deng, L.; Hassanein, K.; Elmasry, M. (1994). "Analysis of the correlation structure for a neural predictive model with application to speech recognition".
2284:. When Mozilla redirected funding away from the project in 2020, it was forked by its original developers as Coqui STT using the same open-source license.
2189:/ACM Transactions on Audio, Speech and Language Processing—after merging with an ACM publication), Computer Speech and Language, and Speech Communication.
1228:
791:
1227:
to map audio signals directly into words, produce word and phrase confidence scores very closely correlated with genuine listener intelligibility. In the
1040:
and a CTC layer. Jointly, the RNN-CTC model learns the pronunciation and acoustic model together, however it is incapable of learning the language due to
861:) to rate these good candidates so that we may pick the best one according to this refined score. The set of candidates can be kept either as a list (the
4981:
4831:
4145:
NIPS Workshop: Deep Learning for Speech Recognition and Related Applications, Whistler, BC, Canada, Dec. 2009 (Organizers: Li Deng, Geoff Hinton, D. Yu).
857:
A possible improvement to decoding is to keep a set of good candidates instead of just keeping the best candidate, and to use a better scoring function (
807:
6953:
9167:
9152:
8235:
5124:
Dahl, George E.; Yu, Dong; Deng, Li; Acero, Alex (2012). "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition".
1692:
858:
710:
to be as low as 4 professional human transcribers working together on the same benchmark, which was funded by IBM Watson speech team on the same task.
4210:
Deng, L.; Hinton, G.; Kingsbury, B. (2013). "New types of deep neural network learning for speech recognition and related applications: An overview".
4908:
4348:
3556:
3411:
2591:
6590:
5613:
Joshi, Raviraj; Singh, Anupam (May 2022). Malmasi, Shervin; Rokhlenko, Oleg; Ueffing, Nicola; Guy, Ido; Agichtein, Eugene; Kallumadi, Surya (eds.).
1077:
An alternative approach to CTC-based models are attention-based models. Attention-based ASR models were introduced simultaneously by Chan et al. of
8829:
4284:
Morgan, Bourlard, Renals, Cohen, Franco (1993) "Hybrid neural network/hidden Markov model systems for continuous speech recognition. ICASSP/IJPRAI"
4075:
Ristea, Nicolae-Catalin; Ionescu, Radu Tudor; Khan, Fahad Shahbaz (20 June 2022). "SepTr: Separable Transformer for Audio Spectrogram Processing".
1298:. Further research needs to be conducted to determine cognitive benefits for individuals whose AVMs have been treated using radiologic techniques.
6159:
3698:
4160:
Hinton, Geoffrey; Deng, Li; Yu, Dong; Dahl, George; Mohamed, Abdel-Rahman; Jaitly, Navdeep; Senior, Andrew; Vanhoucke, Vincent; Nguyen, Patrick;
2617:
597:, a telephone based directory service. The recordings from GOOG-411 produced valuable data that helped Google improve their recognition systems.
4504:
2260:
toolkit is one place to start to both learn about speech recognition and to start experimenting. Another resource (free but copyrighted) is the
5931:
2917:
2766:
1358:, and even allows the pilot to assign targets to his aircraft with two simple voice commands or to any of his wingmen with only five commands.
1262:
1258:
1171:) but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their
5460:
Assael, Yannis; Shillingford, Brendan; Whiteson, Shimon; de Freitas, Nando (5 November 2016). "LipNet: End-to-End Sentence-level Lipreading".
3254:
2665:
Sarangi, Susanta; Sahidullah, Md; Saha, Goutam (September 2020). "Optimization of data-driven filterbank for automatic speaker verification".
1525:
saying another word and input the wrong one. Giving them more work to fix, causing them to have to take more time with fixing the wrong word.
7612:
2158:
862:
6530:
6281:
6093:
5745:
Chorowski, Jan; Jaitly, Navdeep (8 December 2016). "Towards better decoding and language model integration in sequence to sequence models".
2226:
techniques used in the best modern systems can be gained by paying attention to government sponsored evaluations such as those organised by
5502:
Li, Jason; Lavrukhin, Vitaly; Ginsburg, Boris; Leary, Ryan; Kuchaiev, Oleksii; Cohen, Jonathan M.; Nguyen, Huyen; Gadde, Ravi Teja (2019).
2524:
1065:
launched two CNN-CTC ASR models, Jasper and QuarzNet, with an overall performance WER of 3% . Similar to other deep learning applications,
4164:; Kingsbury, Brian (2012). "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The shared views of four research groups".
3476:
1671:
7024:
4533:
3992:
Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017).
6806:
6438:
4856:
Wu, J.; Chan, C. (1993). "Isolated Word Recognition by Neural Network Models with Cross-Correlation Coefficients for Speech Dynamics".
4675:
3503:
1518:
1231:(CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels.
7350:
The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications (Human Factors and Ergonomics)
6928:
6853:
1049:
expanded on the work with extremely large datasets and demonstrated some commercial success in Chinese Mandarin and English. In 2016,
988:
true "raw" features of speech, waveforms, have more recently been shown to produce excellent larger-scale speech recognition results.
8023:
7467:
7000:
4512:
2799:
2166:
3232:
3105:
4941:
2499:
566:
264:
wrote an open letter that was critical of and defunded speech recognition research. This defunding lasted until Pierce retired and
207:
6477:
6014:
these speech dimensions and which do not. These data are essential to train ASR algorithms to assess L2 learners' intelligibility.
1687:
The performance of speech recognition systems is usually evaluated in terms of accuracy and speed. Accuracy is usually rated with
8345:
8192:
5373:
3533:
2332:
1969:
1827:
Adverse conditions – Environmental noise (e.g. Noise in a car or a factory). Acoustical distortions (e.g. echoes, room acoustics)
99:
Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated
3729:
3203:
2643:
8228:
4746:
Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K. J. (1989). "Phoneme recognition using time-delay neural networks".
279:
in the late 1960s. Previous systems required users to pause after each word. Reddy's system issued spoken commands for playing
7224:
7071:
6975:
6612:
6463:
5315:
2140:
recognizing music as speech, or to make what sounds like one command to a human sound like a different command to the system.
541:
In the 2000s DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002 and
7395:
7357:
7334:
7315:
7065:
6847:
6782:
5814:
5671:
4227:
3888:
3613:
3332:
3281:
2881:
2563:
2442:
1095:
1021:
5787:
Chung, Joon Son; Senior, Andrew; Vinyals, Oriol; Zisserman, Andrew (16 November 2016). "Lip Reading Sentences in the Wild".
2824:
9018:
7933:
7624:
7460:
6508:
5103:
3163:
1846:
e.g. Known word pronunciations or legal word sequences, which can compensate for errors or uncertainties at a lower level;
662:
6386:
Common European framework of reference for languages learning, teaching, assessment: Companion volume with new descriptors
6329:
Compare "four" given as "F AO R" with the vowel AO as in "caught," to "row" given as "R OW" with the vowel OW as in "oat."
6316:
6039:
9119:
8670:
8407:
8187:
7272:
6691:
Forgrave, Karen E. "Assistive Technology: Empowering Students with Disabilities." Clearing House 75.3 (2002): 122–6. Web.
6629:
6123:
Hair, Adam; et al. (19 June 2018). "Apraxia world: A speech therapy game for children with speech sound disorders".
5438:
2484:
2309:
1152:
7246:
6750:
6670:
6555:
5951:
only 16% of the variability in word-level intelligibility can be explained by the presence of obvious mispronunciations.
5052:
4958:
Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (2013). "Speech recognition with deep recurrent neural networks".
4558:
4502:
Achievements and Challenges of Deep Learning: From Speech Analysis and Recognition To Language and Multimodal Processing
3993:
3590:
2949:"A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol"
9162:
9157:
7794:
7427:
7376:
6341:"Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction"
4721:
4369:; Morgan, N.; O'Shaughnessy, D. (2009). "Developments and Directions in Speech Recognition and Understanding, Part 1".
4294:
3053:
2494:
915:
non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models.
798:(so that phonemes with different left and right context would have different realizations as HMM states); it would use
17:
5920:
2854:
1735:
As mentioned earlier in this article, the accuracy of speech recognition may vary depending on the following factors:
8931:
8558:
8365:
8221:
7948:
7779:
6901:
6145:
4318:
3926:
3645:
2276:
to gather big database of voices that would help build free speech recognition project DeepSpeech (available free at
1362:
1224:
3675:
2981:
1452:
and simulation. In telephony systems, ASR is now being predominantly used in contact centers by integrating it with
593:'s first effort at speech recognition came in 2007 after hiring some researchers from Nuance. The first product was
511:
joined Apple where, in 1992, he helped develop a speech interface prototype for the Apple computer known as Casper.
8886:
7719:
7129:
2407:
1406:. Results have been encouraging, and voice applications have included: control of communication radios, setting of
1110:
extended LAS to "Watch, Listen, Attend and Spell" (WLAS) to handle lip reading surpassing human-level performance.
1025:
6340:
6254:
9172:
8136:
7789:
5688:
5159:
4425:
2347:
2337:
1172:
732:
7155:
6884:
Gerbino, E.; Baggia, P.; Ciaramella, A.; Rullent, C. (1993). "Test and evaluation of a spoken dialogue system".
6384:
770:
being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of
9177:
9073:
9013:
8611:
7784:
7529:
4303:[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing
4124:
1249:
and signing off on the document. Back-end or deferred speech recognition is where the provider dictates into a
931:
372:
311:
247:
6769:. IFIP the International Federation for Information Processing. Vol. 247. Springer US. pp. 375–388.
6701:
Tang, K. W.; Kamoua, Ridha; Sutan, Victor (2004). "Speech Recognition Technology for Disabilities Education".
4792:
3419:
8606:
8295:
8053:
7774:
6945:
2489:
2352:
1366:
1311:
Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in
996:
Since 2014, there has been much research interest in "end-to-end" ASR. Traditional phonetic-based (i.e., all
6028:"Using automatic speech processing for foreign language pronunciation tutoring: Some issues and a prototype"
1377:
The problems of achieving high recognition accuracy under stress and noise are particularly relevant in the
9048:
8445:
8402:
8355:
8350:
7746:
1886:
1395:
823:
529:
originally licensed software from Nuance to provide speech recognition capability to its digital assistant
6643:"Using Speech Recognition Software to Increase Writing Fluency for Individuals with Physical Disabilities"
4345:
2587:
1463:. Speech is used mostly as a part of a user interface, for creating predefined or custom speech commands.
1017:(as of 2017) are deployed on the cloud and require a network connection as opposed to the device locally.
9147:
9099:
8395:
8321:
8091:
8076:
8048:
7913:
7908:
7483:
7185:
6582:
5766:
Chan, William; Zhang, Yu; Le, Quoc; Jaitly, Navdeep (10 October 2016). "Latent Sequence Decompositions".
3442:
2162:
1647:
1629:
1086:
570:
334:
7095:
5182:
Yu, D.; Deng, L. (2014). "Automatic Speech Recognition: A Deep Learning Approach (Publisher: Springer)".
1207:
and from Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat
1000:-based model) approaches required separate components and training for the pronunciation, acoustic, and
8723:
8658:
8259:
7828:
7799:
7577:
6303:
6124:
5104:"Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition"
2472:
2437:
2417:
2292:
2253:
1601:
1148:
1091:
1078:
653:
629:
504:
5503:
3366:""There's No Data Like More Data": Automatic Speech Recognition and the Making of Algorithmic Culture"
2613:
9124:
8982:
8621:
8452:
8275:
7671:
7524:
4897:
4546:
Maners said IBM has worked on advancing speech recognition ... or on the floor of a noisy trade show.
4501:
3886:
Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets
2153:
Popular speech recognition conferences held each year or two include SpeechTEK and SpeechTEK Europe,
1219:
964:
937:
924:
345:– The IEEE Acoustics, Speech, and Signal Processing group held a conference in Newton, Massachusetts.
111:
such as voice dialing (e.g. "call home"), call routing (e.g. "I would like to make a collect call"),
49:
7344:
Karat, Clare-Marie; Vergo, John; Nahamoo, David (2007). "Conversational Interface Technologies". In
7310:. Cambridge Studies in Natural Language Processing. Vol. XII–XIII. Cambridge University Press.
6715:
5200:
5072:
4884:
Vowel Classification for Computer based Visual Feedback for Speech Training for the Hearing Impaired
2719:
337:
all participated in the program. This revived speech recognition research post John Pierce's letter.
9023:
8280:
8197:
8121:
7853:
7809:
7694:
7592:
6063:"Reading Coach in Immersive Reader plus new features coming to Reading Progress in Microsoft Teams"
5689:"Listen, Attend and Spell: A Neural Network for Large Vocabulary Conversational Speech Recognition"
4793:"Optimisation of phonetic aware speech recognition through multi-objective evolutionary algorithms"
3258:
2377:
2372:
1557:
1503:
1483:
1266:
1180:
1137:
1041:
1037:
882:
641:
605:
581:
containing 260 hours of recorded conversations from over 500 speakers. The GALE program focused on
574:
558:
433:
231:
133:
6874:
Ciaramella, Alberto. "A prototype performance evaluation report." Sundial workpackage 8000 (1993).
3920:
Haşim Sak, Andrew Senior, Kanishka Rao, Françoise Beaufays and Johan Schalkwyk (September 2015): "
3905:
9068:
9053:
8706:
8701:
8601:
8469:
8250:
8101:
8071:
7738:
7306:; Uszkoreit, Hans; Varile, Giovanni Battista; Zaenen, Annie; Zampolli; Zue, Victor, eds. (1997).
6526:
6277:
1792:
With isolated speech, single words are used, therefore it becomes easier to recognize the speech.
1476:
1168:
890:
851:
728:
7572:
6427:
Speech recognition in the JAS 39 Gripen aircraft: Adaptation to speech at different G-loads
5642:"Domain Adaptation of Low-Resource Target-Domain Models Using Well-Trained ASR Conformer Models"
2520:
2083:
359:, which since then has been a major venue for the publication of research on speech recognition.
9028:
8788:
8507:
8502:
7958:
7651:
7629:
7619:
7587:
7562:
6710:
5067:
1696:
1658:
1612:
799:
637:
582:
441:
440:
used HMM to recognize languages (both in software and in hardware specialized processors, e.g.
5409:
Amodei, Dario (2016). "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin".
1394:
Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (
1274:) are relatively minimal for people who are sighted and who can operate a keyboard and mouse.
9058:
9043:
9008:
8696:
8596:
8464:
7818:
5572:
Medeiros, Eduardo; Corado, Leonel; Rato, Luís; Quaresma, Paulo; Salgueiro, Pedro (May 2023).
5187:
3071:"ISCA Medalist: For leadership and extensive contributions to speech and language processing"
2457:
1664:
1576:
1271:
1176:
1033:
624:
In the early 2000s, speech recognition was still dominated by traditional approaches such as
522:
310:, speech recognition research seeking a minimum vocabulary size of 1,000 words. They thought
8926:
6946:"Letter Names Can Cause Confusion and Other Things to Know About Letter–Sound Relationships"
6798:
6425:
5724:
Bahdanau, Dzmitry (2016). "End-to-End Attention-based Large Vocabulary Speech Recognition".
5271:
4926:
4645:
4051:
Gong, Yuan; Chung, Yu-An; Glass, James (8 July 2021). "AST: Audio Spectrogram Transformer".
3881:
3825:
3779:
1369:
lead-in fighter trainer. These systems have produced word accuracy scores in excess of 98%.
1257:
One of the major issues relating to the use of speech recognition in healthcare is that the
649:
9078:
9033:
8479:
8424:
8270:
8265:
8171:
7847:
7823:
7676:
6925:
6434:
5574:"Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning"
5287:
4435:, Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber.
4378:
4173:
3441:
Billi, Roberto; Canavesio, Franco; Ciaramella, Alberto; Nebbia, Luciano (1 November 1995).
3136:
3018:
2684:
2367:
2357:
2210:
1587:
1383:
1103:
1050:
902:
874:
562:
514:
467:
287:
221:
180:
The key areas of growth were: vocabulary size, speaker independence, and processing speed.
108:
89:
5384:
4488:
Keynote talk: Recent Developments in Deep Neural Networks. ICASSP, 2013 (by Geoff Hinton).
3097:
2791:
2264:
book (and the accompanying HTK toolkit). For more recent and state-of-the-art techniques,
8:
8653:
8631:
8380:
8375:
8333:
8285:
8151:
8081:
8038:
7994:
7766:
7756:
7751:
7639:
7001:"Is it possible to control Amazon Alexa, Google Now using inaudible commands? Absolutely"
3228:
2521:"Speaker Independent Connected Speech Recognition- Fifth Generation Computer Corporation"
2452:
2427:
2422:
2382:
2222:
1564:
1491:
1347:
1295:
1218:
Assessing authentic listener intelligibility is essential for avoiding inaccuracies from
997:
984:
744:
686:
625:
598:
384:
276:
150:
145:
7407:"SpeeG2: A Speech- and Gesture-based Interface for Efficient Controller-free Text Entry"
6834:. SpringerBriefs in Electrical and Computer Engineering. Singapore: Springer Singapore.
5615:"A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data"
5542:
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions
5291:
4930:
4382:
4177:
3140:
3022:
2688:
9038:
8161:
7898:
7661:
7644:
7502:
6907:
6728:
6662:
6151:
6004:
5901:
5853:
5843:
5820:
5792:
5767:
5746:
5725:
5649:
5546:
5511:
5482:
5461:
5410:
5238:
5141:
5004:
4959:
4823:
4773:
4667:
4404:
4324:
4233:
4189:
4100:
4076:
4052:
4028:
3972:
3951:
3863:
3837:
3807:
3393:
3365:
3309:
2758:
2700:
2674:
2569:
2234:
1336:
1156:
839:
751:
750:
signal. In a short time scale (e.g., 10 milliseconds), speech can be approximated as a
275:
was the first person to take on continuous speech recognition as a graduate student at
154:
129:
7208:
4931:"Sequence labelling in structured domains with hierarchical recurrent neural networks"
3525:
2720:"Robust text-independent speaker identification using Gaussian mixture speaker models"
9104:
9092:
8896:
8548:
8419:
8412:
8166:
7878:
7686:
7597:
7423:
7391:
7372:
7353:
7330:
7311:
7216:
7061:
6897:
6843:
6829:
6778:
6732:
6666:
6394:
6390:
6141:
5996:
5905:
5893:
5857:
5810:
5667:
5595:
4827:
4815:
4671:
4475:
4314:
4223:
4193:
3855:
3799:
3458:
3397:
3385:
3301:
3195:
3049:
2973:
2750:
2742:
2704:
2639:
2559:
2202:
2174:
1570:
1546:
1538:
1415:
1328:
1250:
1164:
1070:
1066:
846:
775:
396:
265:
243:
41:
31:
7126:"A TensorFlow implementation of Baidu's DeepSpeech architecture: mozilla/DeepSpeech"
6911:
6355:
6155:
6008:
5242:
5145:
4328:
4237:
3867:
2573:
1506:
became an urgent early market for speech recognition. Speech recognition is used in
1459:
The improvement of mobile processor speeds has made speech recognition practical in
452:
with 4 MB ram. It could take up to 100 minutes to decode just 30 seconds of speech.
157:
in systems that have been trained on a specific person's voice or it can be used to
8849:
8839:
8646:
8440:
8390:
8385:
8328:
8316:
8043:
7928:
7903:
7704:
7607:
6889:
6835:
6770:
6720:
6654:
6616:
6455:
6351:
6133:
5986:
5976:
5883:
5824:
5802:
5663:
5659:
5622:
5585:
5525:
5521:
5295:
5230:
5133:
5077:
4996:
4865:
4807:
4777:
4763:
4755:
4713:
4657:
4471:
4394:
4386:
4306:
4298:
4215:
4181:
3847:
3811:
3791:
3454:
3377:
3313:
3293:
3144:
3026:
2963:
2762:
2734:
2692:
2551:
2447:
2432:
2412:
2392:
2299:
2265:
2198:
1606:
1428:
1418:
in order to consistently achieve performance improvements in operational settings.
1312:
1208:
1200:
1160:
779:
609:
586:
550:
487:
463:
322:
239:
93:
81:
45:
7055:
5626:
5008:
4408:
3885:
3317:
2185:
Transactions on Audio, Speech and Language Processing and since Sept 2014 renamed
220:
demonstrated its 16-word "Shoebox" machine's speech recognition capability at the
8962:
8906:
8728:
8370:
8290:
8155:
8116:
8111:
7979:
7709:
7582:
7557:
7539:
7443:
6932:
6774:
6185:"Computer says no: Irish vet fails oral English test needed to stay in Australia"
5166:
5029:
Maas, Andrew L.; Le, Quoc V.; O'Neil, Tyler M.; Vinyals, Oriol; Nguyen, Patrick;
4508:
4432:
4421:
4352:
3892:
3851:
3775:
2218:
1778:
A speaker-independent system is intended for use by any speaker (more difficult).
1688:
1596:
1449:
1399:
1204:
878:
866:
827:
673:
645:
613:
intelligence applications of speech recognition, e.g. DARPA's EARS's program and
364:
330:
6724:
6512:
5965:"Directions for the future of technology in pronunciation research and teaching"
5641:
4219:
3699:"The Power of Voice: A Conversation With The Head Of Google's Speech Technology"
3164:"The Acoustics, Speech, and Signal Processing Society. A Historical Perspective"
1361:
Speaker-independent systems are also being developed and are under test for the
1102:
to directly emit sub-word units which are more natural than English characters;
8936:
8901:
8891:
8716:
8474:
8300:
7863:
7843:
7567:
7303:
6893:
6658:
6308:
6027:
5300:
5275:
4811:
4613:"Microsoft researchers achieve new conversational speech recognition milestone"
4448:
Artificial Neural Networks and their Application to Speech/Sequence Recognition
4310:
3795:
3006:
2261:
1591:
1514:
1287:
1212:
1001:
723:
719:
669:
632:. Today, however, many aspects of speech recognition have been taken over by a
409:
380:
376:
261:
158:
117:
7452:
6839:
6398:
6389:. Language Policy Programme, Education Policy Division, Education Department,
5888:
5871:
5430:
5234:
5137:
4768:
4717:
3906:
An application of recurrent neural networks to discriminative keyword spotting
3196:"First-Hand:The Hidden Markov Model – Engineering and Technology History Wiki"
2696:
2555:
1339:
Gripen cockpit, Englund (2004) found recognition deteriorated with increasing
806:(MLLR) for more general speaker adaptation. The features would have so-called
9141:
8881:
8861:
8778:
8457:
8126:
7938:
7918:
7699:
7413:. 15th International Conference on Multimodal Interaction. Sydney, Australia.
7220:
6000:
5897:
5599:
5330:"Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR"
5000:
4819:
4594:
4185:
3759:
3586:
3389:
3305:
2977:
2746:
2206:
1550:
1184:
1143:
980:
973:
968:
capacity and thus the potential of modeling complex patterns of speech data.
954:
886:
682:
633:
618:
578:
495:
235:
165:
6642:
6137:
5614:
4698:
4390:
3614:
Automatic Speech Recognition – A Brief History of the Technology Development
3127:
Klatt, Dennis H. (1977). "Review of the ARPA speech understanding project".
2882:"Automatic speech recognition–a brief history of the technology development"
2754:
1427:
have to conduct with pilots in a real ATC situation. Speech recognition and
1315:. Of particular note have been the US program in speech recognition for the
963:
are also under investigation. A deep feedforward neural network (DNN) is an
8967:
8798:
8213:
8106:
7724:
7345:
7005:
5981:
5964:
4982:"Modular Construction of Time-Delay Neural Networks for Speech Recognition"
4662:
4366:
4161:
3859:
2387:
2342:
2273:
1862:
Compute features of spectral-domain of the speech (with Fourier transform);
1196:
1188:
1099:
1082:
755:
701:
recognition, also called voice recognition was clearly differentiated from
658:
368:
356:
6216:"Australian ex-news reader with English degree fails robot's English test"
5806:
4791:
Bird, Jordan J.; Wanner, Elizabeth; Ekárt, Anikó; Faria, Diego R. (2020).
4446:
3803:
3333:
Automatic speech recognition–a brief history of the technology development
2948:
2305:
The commercial cloud based speech recognition APIs are broadly available.
1963:
While computing, the word recognition rate (WRR) is used. The formula is:
842:(MMI), minimum classification error (MCE), and minimum phone error (MPE).
9063:
8834:
8743:
8738:
8360:
8338:
8063:
7943:
7656:
7549:
7497:
7327:
Robustness in Automatic Speech Recognition: Fundamentals and Applications
6767:
Artificial Intelligence and Innovations 2007: From Theory to Applications
6126:
Proceedings of the 17th ACM Conference on Interaction Design and Children
4592:
Ellis Booker (14 March 1994). "Voice recognition enters the mainstream".
4341:
4299:"A real-time recurrent error propagation network word recognition system"
3671:
2546:
P. Nguyen (2010). "Automatic classification of speaker characteristics".
1653:
1245:
960:
203:
85:
57:
53:
7125:
4699:"Edit-Distance of Weighted Automata: General Definitions and Algorithms"
4399:
1807:
e.g. Querying application may dismiss the hypothesis "The apple is red."
778:
of a short time window of speech and decorrelating the spectrum using a
8957:
8916:
8911:
8824:
8733:
8641:
8553:
8533:
7666:
6886:
IEEE International Conference on Acoustics Speech and Signal Processing
6246:
5991:
5621:. Dublin, Ireland: Association for Computational Linguistics: 244–249.
5590:
5573:
5314:
L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed, and G. Hinton (2010)
5081:
5033:(2012). "Recurrent Neural Networks for Noise Reduction in Robust ASR".
2968:
2580:
2402:
2296:
2281:
2257:
1866:
computed every 10 ms, with one 10 ms section called a frame;
1675:
1641:
1635:
1618:
1581:
1542:
1460:
1407:
1387:
1378:
1324:
1014:
870:
526:
518:
508:
499:
100:
5789:
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
5374:"Towards End-to-End Speech Recognition with Recurrent Neural Networks"
4869:
3030:
2738:
896:
8952:
8921:
8819:
8663:
8626:
8563:
8517:
8512:
8497:
7534:
7151:
6094:"Schools Are Using Voice Technology to Teach Reading. Is It Helping?"
5030:
4784:
4759:
4617:
3948:
3148:
2322:
1875:
1871:
1534:
1510:
1445:
1391:
1291:
1192:
818:(HLDA); or might skip the delta and delta-delta coefficients and use
272:
257:
149:
refers to identifying the speaker, rather than what they are saying.
5459:
4120:
3297:
1858:
For telephone speech the sampling rate is 8000 samples per second;
1163:. Pronunciation assessment does not determine unknown speech (as in
8854:
8686:
8009:
7989:
7974:
7953:
7923:
7868:
7833:
7714:
7406:
7369:
The Voice in the Machine. Building Computers That Understand Speech
6763:
5848:
5797:
5772:
5751:
5730:
5654:
5551:
5540:
5516:
5487:
5466:
5415:
4105:
4081:
4057:
4033:
3977:
3956:
3557:"Speech Recognition Through the Decades: How We Ended Up With Siri"
3381:
2918:"Speech Recognition Through the Decades: How We Ended Up With Siri"
2679:
2467:
2462:
1775:
A speaker-dependent system is intended for use by a single speaker.
1634:
Speech to text (transcription of speech into text, real time video
1623:
1398:) in the UK. Work in France has included speech recognition in the
1355:
1107:
1058:
1029:
771:
594:
193:
169:
161:
or verify the identity of a speaker as part of a security process.
125:
6478:"Eurofighter Typhoon – The world's most advanced fighter aircraft"
5842:, Conference on Empirical Methods in Natural Language Processing,
5328:
Tüske, Zoltán; Golik, Pavel; Schlüter, Ralf; Ney, Hermann (2014).
4964:
3842:
1835:
Acoustical signals are structured into a hierarchy of units, e.g.
1061:
achieving 6 times better performance than human experts. In 2019,
8977:
8814:
8768:
8691:
8591:
8586:
8538:
8146:
8004:
7984:
7858:
7602:
7517:
5921:"Pronunciation accuracy and intelligibility of non-native speech"
5687:
Chan, William; Jaitly, Navdeep; Le, Quoc; Vinyals, Oriol (2016).
5619:
Proceedings of the Fifth Workshop on E-Commerce and NLP (ECNLP 5)
5160:
Recent Advances in Deep Learning for Speech Research at Microsoft
4559:"Voice Recognition To Ease Travel Bookings: Business Travel News"
4355:. IEEE Transactions on Acoustics, Speech, and Signal Processing."
3474:
2664:
2269:
2221:
and Martin presents the basics and the state of the art for ASR.
1836:
1810:
e.g. Constraints may be semantic; rejecting "The apple is angry."
1753:
Vocabulary is hard to recognize if it contains confusing letters:
1340:
1286:
Prolonged use of speech recognition software in conjunction with
948:
795:
783:
483:
466:
with up to 4096 words support, of which only 64 could be held in
112:
5216:"Machine Learning Paradigms for Speech Recognition: An Overview"
5111:
NIPS Workshop on Deep Learning and Unsupervised Feature Learning
3904:
Santiago Fernandez, Alex Graves, and Jürgen Schmidhuber (2007).
1831:
Speech recognition is a multi-leveled pattern recognition task.
1147:
computer-aided pronunciation teaching (CAPT) when combined with
8992:
8972:
8844:
8636:
7512:
7507:
7447:
4898:"Dimensionality Reduction Methods for HMM Phonetic Recognition"
2548:
International Conference on Communications and Electronics 2010
2397:
2362:
2288:
2277:
1403:
1062:
1054:
1010:
1005:
590:
449:
422:
352:
60:
of spoken language into text by computers. It is also known as
7388:
Advanced algorithms and architectures for speech understanding
7177:
6926:
The History of Automatic Speech Recognition Evaluation at NIST
6883:
5480:
5316:
Binary Coding of Speech Spectrograms Using a Deep Auto-encoder
5158:
Deng L., Li, J., Huang, J., Yao, K., Yu, D., Seide, F. et al.
4858:
IEEE Transactions on Pattern Analysis and Machine Intelligence
3440:
8793:
8773:
8763:
8758:
8753:
8748:
8711:
8543:
8202:
7838:
7103:
4748:
IEEE Transactions on Acoustics, Speech, and Signal Processing
3991:
3774:
2327:
2227:
1410:
systems, and control of an automated target handover system.
1046:
614:
554:
437:
303:
280:
121:
3470:
3468:
3443:"Interactive voice technology at work: The CSELT experience"
2161:/Eurospeech, and the IEEE ASRU. Conferences in the field of
1628:
Security, including usage with other biometric scanners for
1332:
weapons release parameters, and controlling flight display.
436:
allowed language models to use multiple length n-grams, and
8783:
7308:
Survey of the state of the art in human language technology
5786:
5504:"Jasper: An End-to-End Convolutional Neural Acoustic Model"
5223:
IEEE Transactions on Audio, Speech, and Language Processing
5126:
IEEE Transactions on Audio, Speech, and Language Processing
4745:
4600:
Just a few years ago, speech recognition was limited to ...
2186:
2182:
2181:
Transactions on Speech and Audio Processing (later renamed
2178:
2063:{\displaystyle WRR=1-WER={(n-s-d-i) \over n}={h-i \over n}}
1507:
1320:
1290:
has shown benefits to short-term-memory restrengthening in
530:
136:
is used in education such as for spoken language learning.
7301:
7178:"Coqui, a startup providing open speech tech for everyone"
6739:
5571:
5538:
4924:
4364:
4249:
4247:
3646:"Nuance Exec on iPhone 4S, Siri, and the Future of Speech"
991:
934:, audiovisual speaker recognition and speaker adaptation.
577:. EARS funded the collection of the Switchboard telephone
260:
dried up for several years when, in 1969, the influential
7999:
6980:
6583:"Speech recognition in schools: An update from the field"
6509:"Researchers fine-tune F-35 pilot-aircraft speech system"
4496:
4494:
3828:(2015). "Deep learning in neural networks: An overview".
3465:
3280:
Huang, Xuedong; Baker, James; Reddy, Raj (January 2014).
1453:
1351:
1175:
to listeners, sometimes along with often inconsequential
668:
The use of deep feedforward (non-recurrent) networks for
546:
326:
217:
6587:
Technology And Persons With Disabilities Conference 2000
6580:
5963:
O’Brien, Mary Grantham; et al. (31 December 2018).
5501:
4706:
International Journal of Foundations of Computer Science
3223:
3221:
869:). Re scoring is usually done by trying to minimize the
7209:"Māori are trying to save their language from Big Tech"
5028:
4482:
4244:
4026:
4491:
4205:
4203:
3475:
Xuedong Huang; James Baker; Raj Reddy (January 2014).
2242:
Automatic Speech Recognition: A Deep Learning Approach
814:
to capture speech dynamics and in addition, might use
482:– Dragon Dictate, a consumer product released in 1990
7025:"Attack Targets Automatic Speech Recognition Systems"
6976:"Listen Up: Your AI Assistant Goes Crazy For NPR Too"
5919:
Loukina, Anastassia; et al. (6 September 2015),
5327:
4526:"Improvements in voice recognition software increase"
3880:
Alex Graves, Santiago Fernandez, Faustino Gomez, and
3218:
3044:
Benesty, Jacob; Sondhi, M. M.; Huang, Yiteng (2008).
2173:, EMNLP, and HLT, are beginning to include papers on
2086:
1972:
1889:
1881:
The formula to compute the word error rate (WER) is:
545:(GALE). Four teams participated in the EARS program:
7152:"GitHub - tensorflow/docs: TensorFlow documentation"
6527:"Overcoming Communication Barriers in the Classroom"
4790:
4461:
4346:
Phoneme recognition using time-delay neural networks
4098:
3898:
3096:
Blechman, R. O.; Blechman, Nicholas (23 June 2008).
2606:
1306:
1229:
Common European Framework of Reference for Languages
6558:. National Center for Technology Innovation. 2010.
5686:
5646:
2022 IEEE Spoken Language Technology Workshop (SLT)
4957:
4882:
S. A. Zahorian, A. M. Zimmer, and F. Meng, (2002) "
4209:
4200:
4155:
4153:
4151:
2197:Books like "Fundamentals of Speech Recognition" by
1421:
897:
Dynamic time warping (DTW)-based speech recognition
7418:Woelfel, Matthias; McDonough, John (26 May 2009).
5838:El Kheir, Yassine; et al. (21 October 2023),
4256:"Scientists See Promise in Deep-Learning Programs"
3043:
2119:
2062:
1934:
1739:Error rates increase as the vocabulary size grows:
713:
7343:
7247:"Why you should move from DeepSpeech to coqui.ai"
6924:National Institute of Standards and Technology. "
6247:"The English test that ruined thousands of lives"
5765:
5640:Sukhadia, Vrunda N.; Umesh, S. (9 January 2023).
4646:"Minimum Bayes-risk automatic speech recognition"
4074:
3998:Advances in Neural Information Processing Systems
2718:Reynolds, Douglas; Rose, Richard (January 1995).
2588:"British English definition of voice recognition"
608:has made use of a type of speech recognition for
476:– a recognizer from Kurzweil Applied Intelligence
286:Around this time Soviet researchers invented the
80:). It incorporates knowledge and research in the
56:and technologies that enable the recognition and
27:Automatic conversion of spoken language into text
9139:
7685:
7417:
6831:Robustness-Related Issues in Speaker Recognition
5264:
4159:
4148:
3970:
3477:"A Historical Perspective of Speech Recognition"
3282:"A historical perspective of speech recognition"
3129:The Journal of the Acoustical Society of America
3095:
2727:IEEE Transactions on Speech and Audio Processing
2632:
1448:and is becoming more widespread in the field of
945:demonstrated improved performance in this area.
672:was introduced during the later part of 2009 by
7482:
6700:
6581:Follensbee, Bob; McCloskey-Dale, Susan (2000).
5744:
5321:
5308:
5123:
4426:Untersuchungen zu dynamischen neuronalen Netzen
3923:"Google voice search: faster and more accurate"
3818:
3279:
1819:Constraints are often represented by grammar.
416:
375:. A decade later, at CMU, Raj Reddy's students
6641:Garrett, Jennifer Tumlin; et al. (2011).
6350:. INTERSPEECH 2022. ISCA. pp. 3493–3497.
6191:. Australian Associated Press. 8 August 2017.
5962:
5932:International Speech Communication Association
5431:"LipNet: How easy do you think lipreading is?"
4415:
4365:Baker, J.; Li Deng; Glass, J.; Khudanpur, S.;
4050:
3554:
3005:
2915:
2295:applications. It can be activated through the
2240:The most recent book on speech recognition is
1439:
1317:Advanced Fighter Technology Integration (AFTI)
1259:American Recovery and Reinvestment Act of 2009
949:Deep feedforward and recurrent neural networks
8229:
7468:
6550:
6548:
6245:Main, Ed; Watson, Richard (9 February 2022).
5869:
5840:Automatic Pronunciation Assessment — A Review
5639:
5177:
5175:
4918:
3696:
3579:
2077:is the number of correctly recognized words:
1854:Digitize the speech that we want to recognize
1813:e.g. Syntactic; rejecting "Red is apple the."
1020:The first attempt at end-to-end ASR was with
774:coefficients, which are obtained by taking a
8243:
5918:
4591:
4287:
3752:
3501:
3011:Journal of the Acoustical Society of America
2717:
1785:Isolated, Discontinuous or continuous speech
1717:Isolated, discontinuous or continuous speech
1482:Students who are physically disabled have a
816:heteroscedastic linear discriminant analysis
7405:Signer, Beat; Hoste, Lode (December 2013).
7404:
7324:
5837:
5270:
5060:Foundations and Trends in Signal Processing
5046:
5044:
4896:Hu, Hongbing; Zahorian, Stephen A. (2010).
4344:, Hanazawa, Hinton, Shikano, Lang. (1989) "
4293:
3824:
2911:
2909:
2205:and "Spoken Language Processing (2001)" by
2148:
1842:Each level provides additional constraints;
1466:
1386:, which would reduce acoustic noise in the
1199:. Pronunciation assessment is also used in
421:The 1980s also saw the introduction of the
250:(NTT), while working on speech recognition.
8236:
8222:
7475:
7461:
7366:
6545:
6338:
6278:"13 Words That Can Be Pronounced Two Ways"
6244:
6122:
5870:Isaacs, Talia; Harding, Luke (July 2017).
5612:
5172:
5152:
5101:
4895:
4644:Goel, Vaibhava; Byrne, William J. (2000).
4438:
4335:
4278:
3916:
3914:
3874:
2999:
1691:(WER), whereas speed is measured with the
1402:. There has also been much useful work in
1335:Working with Swedish pilots flying in the
865:approach) or as a subset of the models (a
317:would be key to making progress in speech
9153:Automatic identification and data capture
7390:. Springer Science & Business Media.
6714:
6703:Journal of Educational Technology Systems
6511:. United States Air Force. Archived from
6339:Tu, Zehai; Ma, Ning; Barker, Jon (2022).
5990:
5980:
5887:
5847:
5796:
5771:
5750:
5729:
5653:
5589:
5550:
5515:
5486:
5465:
5414:
5299:
5095:
5071:
5053:"Deep Learning: Methods and Applications"
4963:
4767:
4661:
4643:
4398:
4358:
4104:
4080:
4056:
4032:
3976:
3955:
3841:
3048:. Springer Science & Business Media.
2967:
2678:
2545:
6828:Zheng, Thomas Fang; Li, Lantian (2017).
6613:"Speech recognition for disabled people"
6060:
6025:
5969:Journal of Second Language Pronunciation
5723:
5356:
5041:
4455:
4139:
3908:. Proceedings of ICANN (2), pp. 220–229.
3727:
3611:
3330:
3068:
2906:
2879:
2500:Timeline of speech and voice recognition
2272:launched the open source project called
2252:In terms of freely available resources,
1549:used speech recognition technology from
1239:
208:source-filter model of speech production
196:in the power spectrum of each utterance.
107:Speech recognition applications include
6998:
6827:
6799:"What is real-time captioning? | DO-IT"
6749:. The Planetary Society. Archived from
6647:Journal of Special Education Technology
6640:
6506:
6435:Stockholm Royal Institute of Technology
6423:
6276:Joyce, Katy Spratte (24 January 2023).
6213:
6091:
5956:
4253:
3911:
3770:
3768:
3161:
3009:(1969). "Whither speech recognition?".
2333:Applications of artificial intelligence
2280:), using Google's open source platform
1528:
1444:ASR is now commonplace in the field of
992:End-to-end automatic speech recognition
766:-dimensional real-valued vectors (with
738:
601:is now supported in over 30 languages.
543:Global Autonomous Language Exploitation
408:– Dragon Systems, founded by James and
14:
9140:
7385:
7206:
7013:from the original on 2 September 2017.
5408:
5371:
5346:from the original on 21 December 2016.
5213:
5181:
5050:
4979:
4855:
4444:
3895:. Proceedings of ICML'06, pp. 369–376.
3046:Springer Handbook of Speech Processing
2594:from the original on 16 September 2011
2143:
1714:Speaker dependence versus independence
826:-based projection followed perhaps by
30:For the human linguistic concept, see
9168:History of human–computer interaction
8217:
7456:
7279:from the original on 9 September 2024
7227:from the original on 9 September 2024
7188:from the original on 9 September 2024
7158:from the original on 9 September 2024
7132:from the original on 9 September 2024
7074:from the original on 31 January 2018.
7053:
7047:
6956:from the original on 9 September 2024
6856:from the original on 9 September 2024
6809:from the original on 9 September 2024
6673:from the original on 9 September 2024
6405:from the original on 9 September 2024
6365:from the original on 9 September 2024
6284:from the original on 9 September 2024
6275:
6257:from the original on 9 September 2024
6226:from the original on 9 September 2024
6195:from the original on 9 September 2024
6165:from the original on 9 September 2024
6104:from the original on 9 September 2024
6073:from the original on 9 September 2024
6042:from the original on 9 September 2024
5941:from the original on 9 September 2024
5705:from the original on 9 September 2024
5252:from the original on 9 September 2024
5117:
5091:from the original on 22 October 2014.
4837:from the original on 9 September 2024
4696:
4625:from the original on 9 September 2024
4569:from the original on 9 September 2024
4266:from the original on 30 November 2012
4127:from the original on 9 September 2024
4094:
4092:
4070:
4068:
4046:
4044:
4008:from the original on 9 September 2024
3652:from the original on 19 November 2011
3536:from the original on 21 December 2016
3126:
2802:from the original on 25 February 2014
2646:from the original on 19 February 2013
2527:from the original on 11 November 2013
2443:Speech recognition software for Linux
1935:{\displaystyle WER={(s+d+i) \over n}}
1824:and laughter) and limited vocabulary.
1695:. Other measures of accuracy include
1022:Connectionist Temporal Classification
505:speech recognition group at Microsoft
9074:Generative adversarial network (GAN)
7934:Simple Knowledge Organization System
7325:Junqua, J.-C.; Haton, J.-P. (1995).
6444:from the original on 2 October 2008.
4947:from the original on 15 August 2017.
3765:
3593:from the original on 5 February 2014
3567:from the original on 13 January 2017
3483:from the original on 20 January 2015
3108:from the original on 20 January 2015
3077:from the original on 24 January 2018
2946:
2940:
2928:from the original on 3 November 2018
2792:"Speaker Identification (WhisperID)"
2620:from the original on 3 December 2011
2130:
1768:Speaker dependence vs. independence:
1553:in the Mars Microphone on the Lander
1294:patients who have been treated with
804:maximum likelihood linear regression
7352:. Lawrence Erlbaum Associates Inc.
7057:Fundamentals of Speaker Recognition
6630:Friends International Support Group
6593:from the original on 21 August 2006
6319:from the original on 15 August 2010
5102:Yu, D.; Deng, L.; Dahl, G. (2010).
4925:Fernandez, Santiago; Graves, Alex;
4511:," Interspeech, September 2014 (by
3504:"When Cole talks, computers listen"
3345:from the original on 17 August 2014
3331:Juang, B. H.; Rabiner, Lawrence R.
3235:from the original on 28 August 2017
2987:from the original on 9 October 2022
2894:from the original on 17 August 2014
2880:Juang, B. H.; Rabiner, Lawrence R.
2485:List of speech recognition software
2310:List of speech recognition software
2291:supports speech recognition on all
1350:, currently in service with the UK
1153:computer-assisted language learning
959:Deep neural networks and denoising
836:maximum likelihood linear transform
412:, was one of IBM's few competitors.
24:
7295:
6999:Claburn, Thomas (25 August 2017).
6988:from the original on 23 July 2017.
6562:from the original on 13 April 2014
6466:from the original on 1 March 2017.
6061:Tholfsen, Mike (9 February 2023).
6032:Language Learning & Technology
5474:
5441:from the original on 27 April 2017
5018:from the original on 29 June 2016.
4727:from the original on 18 March 2012
4451:(Ph.D. thesis). McGill University.
4254:Markoff, John (23 November 2012).
4089:
4065:
4041:
3782:(1997). "Long Short-Term Memory".
3697:Jason Kincaid (13 February 2011).
3626:from the original on 9 August 2017
3555:Melanie Pinola (2 November 2011).
3518:
3363:
3176:from the original on 9 August 2017
2916:Melanie Pinola (2 November 2011).
2861:from the original on 9 August 2018
2614:"voice recognition, definition of"
2495:Outline of artificial intelligence
1960:is the number of word references.
1281:
1203:, for example in products such as
1024:(CTC)-based systems introduced by
918:
830:linear discriminant analysis or a
25:
9189:
7949:Thesaurus (information retrieval)
7437:
7207:Coffey, Donavyn (28 April 2021).
7035:from the original on 3 March 2018
6747:"Projects: Planetary Microphones"
6556:"Speech Recognition for Learning"
6533:from the original on 25 July 2013
6507:Schutte, John (15 October 2007).
6214:Ferrier, Tracey (9 August 2017).
6026:Eskenazi, Maxine (January 1999).
4914:from the original on 6 July 2012.
4678:from the original on 25 July 2011
3740:from the original on 27 June 2015
3709:from the original on 21 July 2015
3678:from the original on 11 July 2017
3648:. Tech.pinions. 10 October 2011.
3206:from the original on 3 April 2018
2835:from the original on 4 April 2019
2772:from the original on 8 March 2014
2308:For more software resources, see
2177:. Important journals include the
1956:is the number of insertions, and
1711:Vocabulary size and confusability
1644:(e.g. vehicle Navigation Systems)
1573:listing in audiovisual production
1307:High-performance fighter aircraft
1225:end-to-end reinforcement learning
1118:
9112:
9111:
9091:
7265:
7239:
7200:
7170:
7144:
7118:
7088:
7078:
7017:
6992:
6968:
6938:
6918:
6877:
6868:
6821:
6791:
6757:
6694:
6685:
6634:
6623:
6605:
6574:
6519:
6500:
6488:from the original on 11 May 2013
6470:
6448:
6417:
6377:
6332:
6309:"The CMU Pronouncing Dictionary"
6296:
6269:
6238:
6207:
6177:
6116:
6085:
6054:
6019:
5912:
5863:
5831:
5780:
5759:
5738:
5717:
5680:
4800:Expert Systems with Applications
4532:. 27 August 2002. Archived from
3612:Juang, B.H.; Rabiner, Lawrence.
2590:. Macmillan Publishers Limited.
2408:Multimedia information retrieval
1948:is the number of substitutions,
1839:, Words, Phrases, and Sentences;
1422:Training air traffic controllers
1036:in 2014. The model consisted of
983:is to do away with hand-crafted
754:. Speech can be thought of as a
321:, but this later proved untrue.
7386:Pirani, Giancarlo, ed. (2013).
6356:10.21437/Interspeech.2022-10408
6092:Banerji, Olina (7 March 2023).
5633:
5606:
5565:
5532:
5495:
5453:
5423:
5402:
5365:
5350:
5207:
5035:Proceedings of Interspeech 2012
5022:
4973:
4951:
4889:
4876:
4849:
4739:
4690:
4637:
4605:
4585:
4551:
4518:
4371:IEEE Signal Processing Magazine
4166:IEEE Signal Processing Magazine
4113:
4020:
3985:
3964:
3942:
3721:
3690:
3664:
3638:
3605:
3548:
3526:"ACT/Apricot - Apricot history"
3495:
3434:
3412:"History of Speech Recognition"
3404:
3357:
3324:
3273:
3255:"Pioneering Speech Recognition"
3247:
3188:
3155:
3120:
3089:
3062:
3037:
2873:
2847:
2348:Audio-visual speech recognition
2338:Articulatory speech recognition
1113:
976:, in recent overview articles.
893:verifying certain assumptions.
733:statistical machine translation
714:Models, methods, and algorithms
92:fields. The reverse process is
9024:Recurrent neural network (RNN)
9014:Differentiable neural computer
7530:Natural language understanding
7329:. Kluwer Academic Publishers.
6393:. February 2018. p. 136.
5664:10.1109/SLT54892.2023.10023233
5526:10.21437/Interspeech.2019-1819
5359:Speech and Language Processing
4650:Computer Speech & Language
3257:. 7 March 2012. Archived from
2825:"Obituaries: Stephen Balashek"
2817:
2784:
2711:
2658:
2539:
2513:
2215:Speech and Language Processing
2111:
2099:
2030:
2006:
1923:
1905:
1804:Task and language constraints
1723:Read versus spontaneous speech
1682:
1590:: Speech recognition computer
1372:
1234:
932:audiovisual speech recognition
758:for many stochastic purposes.
373:Institute for Defense Analysis
248:Nippon Telegraph and Telephone
238:method, was first proposed by
13:
1:
9069:Variational autoencoder (VAE)
9029:Long short-term memory (LSTM)
8296:Computational learning theory
8054:Optical character recognition
3730:"THE COMPUTERS ARE LISTENING"
3502:Kevin McKean (8 April 1980).
3479:. Communications of the ACM.
3364:Li, Xiaochang (1 July 2023).
2506:
2490:List of emerging technologies
2353:Automatic Language Translator
2268:toolkit can be used. In 2017
1720:Task and language constraints
1513:, such as voicemail to text,
1367:Alenia Aermacchi M-346 Master
1327:), the program in France for
979:One fundamental principle of
652:in 1997. LSTM RNNs avoid the
507:in 1993. Raj Reddy's student
455:Two practical products were:
367:developed the mathematics of
308:Speech Understanding Research
9049:Convolutional neural network
7747:Multi-document summarization
7367:Pieraccini, Roberto (2012).
6775:10.1007/978-0-387-74161-1_41
6529:. MassMATCH. 18 March 2010.
6067:Techcommunity Education Blog
4476:10.1016/0893-6080(94)90027-2
3852:10.1016/j.neunet.2014.09.003
3728:Froomkin, Dan (5 May 2015).
3459:10.1016/0167-6393(95)00030-R
3416:Dragon Medical Transcription
2956:Found. Trends Signal Process
2209:etc., "Computer Speech", by
1952:is the number of deletions,
1475:Students who are blind (see
1131:
889:represented themselves as a
832:global semi-tied co variance
417:Practical speech recognition
293:
206:developed and published the
62:automatic speech recognition
7:
9044:Multilayer perceptron (MLP)
8077:Latent Dirichlet allocation
8049:Natural language generation
7914:Machine-readable dictionary
7909:Linguistic Linked Open Data
7484:Natural language processing
6725:10.2190/K6K8-78K2-59Y7-R9R2
6424:Englund, Christine (2004).
5627:10.18653/v1/2022.ecnlp-1.28
5214:Deng, L.; Li, Xiao (2013).
5051:Deng, Li; Yu, Dong (2014).
4220:10.1109/ICASSP.2013.6639344
3994:"Attention is All you Need"
2315:
2247:
2163:natural language processing
1730:
1630:multi-factor authentication
1440:Telephony and other domains
1301:
1085:and Bahdanau et al. of the
335:Stanford Research Institute
183:
70:computer speech recognition
10:
9194:
9120:Artificial neural networks
9034:Gated recurrent unit (GRU)
8260:Differentiable programming
7829:Explicit semantic analysis
7578:Deep linguistic processing
7420:Distant Speech Recognition
7348:; Jacko, Julie A. (eds.).
6894:10.1109/ICASSP.1993.319250
6888:. pp. 135–138 vol.2.
6659:10.1177/016264341102600104
5872:"Pronunciation assessment"
5648:. IEEE. pp. 295–301.
5301:10.4249/scholarpedia.32832
4812:10.1016/j.eswa.2020.113402
4311:10.1109/ICASSP.1992.225833
4305:. pp. 617–620 vol.1.
3796:10.1162/neco.1997.9.8.1735
2473:Windows Speech Recognition
2438:Speech interface guideline
2418:Phonetic search technology
2254:Carnegie Mellon University
2120:{\displaystyle h=n-(s+d).}
1602:Interactive voice response
1149:computer-aided instruction
1135:
1092:Carnegie Mellon University
1079:Carnegie Mellon University
1032:and Navdeep Jaitly of the
952:
922:
900:
742:
654:vanishing gradient problem
630:artificial neural networks
628:combined with feedforward
604:In the United States, the
175:
29:
9163:User interface techniques
9158:Computational linguistics
9087:
9001:
8945:
8874:
8807:
8679:
8579:
8572:
8526:
8490:
8453:Artificial neural network
8433:
8309:
8276:Automatic differentiation
8249:
8180:
8135:
8090:
8062:
8022:
7967:
7889:
7877:
7808:
7765:
7737:
7672:Word-sense disambiguation
7548:
7525:Computational linguistics
7490:
7096:"Common Voice by Mozilla"
6840:10.1007/978-981-10-3238-7
6433:(Masters thesis thesis).
6220:The Sydney Morning Herald
5889:10.1017/S0261444817000118
5357:Jurafsky, Daniel (2016).
5235:10.1109/TASL.2013.2244083
5138:10.1109/TASL.2011.2134090
4718:10.1142/S0129054103002114
3672:"Switchboard-1 Release 2"
3286:Communications of the ACM
2697:10.1016/j.dsp.2020.102795
2667:Digital Signal Processing
2556:10.1109/ICCE.2010.5670700
1707:vary with the following:
1038:recurrent neural networks
965:artificial neural network
925:Artificial neural network
834:transform (also known as
661:to all smartphone users.
565:, and a team composed of
153:can simplify the task of
50:computational linguistics
8281:Neuromorphic engineering
8244:Differentiable computing
8198:Natural Language Toolkit
8122:Pronunciation assessment
8024:Automatic identification
7854:Latent semantic analysis
7810:Distributional semantics
7695:Compound-term processing
7593:Named-entity recognition
7411:Proceedings of ICMI 2013
7054:Beigi, Homayoon (2011).
5165:9 September 2024 at the
5001:10.1162/neco.1989.1.1.39
4351:25 February 2021 at the
4186:10.1109/MSP.2012.2205597
3891:9 September 2024 at the
3587:"Ray Kurzweil biography"
2947:Gray, Robert M. (2010).
2857:. androidauthority.net.
2378:Fluency Voice Technology
2373:Dragon NaturallySpeaking
2192:
2149:Conferences and journals
1874:(how strong is it), and
1650:(digital speech-to-text)
1609:, including mobile email
1484:Repetitive strain injury
1467:People with disabilities
1267:Electronic Health Record
1138:Pronunciation assessment
1042:conditional independence
883:finite state transducers
881:represented as weighted
812:delta-delta coefficients
692:
642:recurrent neural network
606:National Security Agency
575:University of Washington
536:
490:and others at Bell Labs.
232:Linear predictive coding
134:pronunciation assessment
9054:Residual neural network
8470:Artificial Intelligence
8102:Automated essay scoring
8072:Document classification
7739:Automatic summarization
6138:10.1145/3202185.3202733
4806:. Elsevier BV: 113402.
4391:10.1109/MSP.2009.932166
3229:"James Baker interview"
2855:"IBM-Shoebox-front.jpg"
1560:with speech recognition
1477:Blindness and education
1169:automatic transcription
891:finite state transducer
852:finite state transducer
729:document classification
589:broadcast news speech.
151:Recognizing the speaker
9173:Computer accessibility
7959:Universal Dependencies
7652:Terminology extraction
7635:Semantic decomposition
7630:Semantic role labeling
7620:Part-of-speech tagging
7588:Information extraction
7573:Coreference resolution
7563:Collocation extraction
7273:"Type with your voice"
7060:. New York: Springer.
6931:8 October 2013 at the
6348:Proc. Interspeech 2022
5982:10.1075/jslp.17001.obr
5934:, pp. 1917–1921,
5791:. pp. 3444–3453.
5195:Cite journal requires
4663:10.1006/csla.2000.0138
4563:BusinessTravelNews.com
3506:. Sarasota Journal. AP
3338:(Report). p. 10.
2121:
2064:
1936:
1697:Single Word Error Rate
1613:Multimodal interaction
1087:University of Montreal
800:cepstral normalization
683:Gaussian mixture model
638:Long short-term memory
363:During the late 1960s
306:funded five years for
146:speaker identification
9178:Machine learning task
9009:Neural Turing machine
8597:Human image synthesis
7720:Sentence segmentation
6313:www.speech.cs.cmu.edu
5807:10.1109/CVPR.2017.367
5372:Graves, Alex (2014).
4980:Waibel, Alex (1989).
4004:. Curran Associates.
3589:. KurzweilAINetwork.
2640:"The Mailbag LG #114"
2458:Subtitle (captioning)
2122:
2065:
1937:
1577:Automatic translation
1272:controlled vocabulary
1240:Medical documentation
1034:University of Toronto
1006:n-gram language model
854:, or FST, approach).
515:Lernout & Hauspie
503:went on to found the
109:voice user interfaces
9100:Computer programming
9079:Graph neural network
8654:Text-to-video models
8632:Text-to-image models
8480:Large language model
8465:Scientific computing
8271:Statistical manifold
8266:Information geometry
8172:Voice user interface
7883:datasets and corpora
7824:Document-term matrix
7677:Word-sense induction
6132:. pp. 119–131.
5930:, Dresden, Germany:
4938:Proceedings of IJCAI
4507:5 March 2021 at the
4431:6 March 2015 at the
3447:Speech Communication
2642:. Linuxgazette.net.
2550:. pp. 147–152.
2368:Cache language model
2358:Automotive head unit
2211:Manfred R. Schroeder
2084:
1970:
1887:
1701:Command Success Rate
1588:Hands-free computing
1529:Further applications
1104:University of Oxford
1051:University of Oxford
903:Dynamic time warping
875:Levenshtein distance
739:Hidden Markov models
626:hidden Markov models
563:Cambridge University
288:dynamic time warping
90:computer engineering
8446:In-context learning
8286:Pattern recognition
8152:Interactive fiction
8082:Pachinko allocation
8039:Speech segmentation
7995:Google Ngram Viewer
7767:Machine translation
7757:Text simplification
7752:Sentence extraction
7640:Semantic similarity
7166:– via GitHub.
7154:. 9 November 2019.
7140:– via GitHub.
7128:. 9 November 2019.
7106:on 27 February 2020
7031:. 31 January 2018.
6753:on 27 January 2012.
6515:on 20 October 2007.
6482:www.eurofighter.com
6460:Eurofighter Typhoon
6280:. Reader's Digest.
5437:. 4 November 2016.
5292:2015SchpJ..1032832S
5272:Schmidhuber, Jürgen
4927:Schmidhuber, Jürgen
4445:Bengio, Y. (1991).
4383:2009ISPM...26...75B
4178:2012ISPM...29...82H
3826:Schmidhuber, Jürgen
3758:Herve Bourlard and
3320:on 8 December 2023.
3261:on 19 February 2015
3202:. 12 January 2015.
3141:1977ASAJ...62.1345K
3023:1969ASAJ...46.1049P
2689:2020DSP...10402795S
2616:. WebFinance, Inc.
2453:Speech verification
2428:Speaker recognition
2423:Speaker diarisation
2383:Google Voice Search
2223:Speaker recognition
2144:Further information
1668:as working examples
1659:Tom Clancy's EndWar
1638:, Court reporting )
1565:emotion recognition
1519:captioned telephone
1492:Learning disability
1348:Eurofighter Typhoon
985:feature engineering
745:Hidden Markov model
687:hidden Markov model
599:Google Voice Search
559:Univ. of Pittsburgh
462:– was released the
385:hidden Markov model
277:Stanford University
246:and Shuzo Saito of
9148:Speech recognition
9039:Echo state network
8927:Jürgen Schmidhuber
8622:Facial recognition
8617:Speech recognition
8527:Software libraries
8162:Question answering
8034:Speech recognition
7899:Corpus linguistics
7879:Language resources
7662:Textual entailment
7645:Sentiment analysis
6803:www.washington.edu
5591:10.3390/fi15050159
5510:. pp. 71–75.
5390:on 10 January 2017
5082:10.1561/2000000039
4989:Neural Computation
4769:10338.dmlcz/135496
4697:Mohri, M. (2002).
4621:. 21 August 2017.
4536:on 23 October 2018
3882:Jürgen Schmidhuber
3784:Neural Computation
2969:10.1561/2000000036
2796:Microsoft Research
2235:Roberto Pieraccini
2117:
2060:
1932:
1726:Adverse conditions
840:mutual information
792:context dependency
752:stationary process
650:Jürgen Schmidhuber
155:translating speech
130:direct voice input
38:Speech recognition
18:Speech Recognition
9135:
9134:
8897:Stephen Grossberg
8870:
8869:
8211:
8210:
8167:Virtual assistant
8092:Computer-assisted
8018:
8017:
7775:Computer-assisted
7733:
7732:
7725:Word segmentation
7687:Text segmentation
7625:Semantic analysis
7613:Syntactic parsing
7598:Ontology learning
7444:Speech Technology
7397:978-3-642-84341-9
7371:. The MIT Press.
7359:978-0-8058-5870-9
7336:978-0-7923-9646-8
7317:978-0-521-59277-2
7251:Mozilla Discourse
7100:voice.mozilla.org
7067:978-0-387-77591-3
6849:978-981-10-3237-0
6784:978-0-387-74160-4
6391:Council of Europe
5876:Language Teaching
5816:978-1-5386-0457-1
5673:979-8-3503-9690-4
4870:10.1109/34.244678
4864:(11): 1174–1185.
4229:978-1-4799-0356-6
3422:on 13 August 2015
3031:10.1121/1.1911801
3017:(48): 1049–1051.
2739:10.1109/89.365379
2565:978-1-4244-7055-6
2203:Frederick Jelinek
2175:speech processing
2131:Security concerns
2058:
2037:
1930:
1672:Virtual assistant
1584:(Legal discovery)
1547:Mars Polar Lander
1539:space exploration
1416:speech technology
1363:F-35 Lightning II
1270:from a list or a
1251:digital dictation
1071:domain adaptation
1067:transfer learning
847:Viterbi algorithm
776:Fourier transform
724:language modeling
720:acoustic modeling
670:acoustic modeling
266:James L. Flanagan
244:Nagoya University
222:1962 World's Fair
141:voice recognition
42:interdisciplinary
32:Speech perception
16:(Redirected from
9185:
9125:Machine learning
9115:
9114:
9095:
8850:Action selection
8840:Self-driving car
8647:Stable Diffusion
8612:Speech synthesis
8577:
8576:
8441:Machine learning
8317:Gradient descent
8238:
8231:
8224:
8215:
8214:
8188:Formal semantics
8137:Natural language
8044:Speech synthesis
8026:and data capture
7929:Semantic network
7904:Lexical resource
7887:
7886:
7705:Lexical analysis
7683:
7682:
7608:Semantic parsing
7477:
7470:
7463:
7454:
7453:
7433:
7414:
7401:
7382:
7363:
7340:
7321:
7289:
7288:
7286:
7284:
7269:
7263:
7262:
7260:
7258:
7243:
7237:
7236:
7234:
7232:
7204:
7198:
7197:
7195:
7193:
7174:
7168:
7167:
7165:
7163:
7148:
7142:
7141:
7139:
7137:
7122:
7116:
7115:
7113:
7111:
7102:. Archived from
7092:
7086:
7082:
7076:
7075:
7051:
7045:
7044:
7042:
7040:
7021:
7015:
7014:
6996:
6990:
6989:
6984:. 6 March 2016.
6972:
6966:
6965:
6963:
6961:
6942:
6936:
6922:
6916:
6915:
6881:
6875:
6872:
6866:
6865:
6863:
6861:
6825:
6819:
6818:
6816:
6814:
6795:
6789:
6788:
6761:
6755:
6754:
6743:
6737:
6736:
6718:
6698:
6692:
6689:
6683:
6682:
6680:
6678:
6638:
6632:
6627:
6621:
6620:
6619:on 4 April 2008.
6615:. Archived from
6609:
6603:
6602:
6600:
6598:
6578:
6572:
6571:
6569:
6567:
6552:
6543:
6542:
6540:
6538:
6523:
6517:
6516:
6504:
6498:
6497:
6495:
6493:
6474:
6468:
6467:
6452:
6446:
6445:
6443:
6432:
6421:
6415:
6414:
6412:
6410:
6381:
6375:
6374:
6372:
6370:
6364:
6345:
6336:
6330:
6328:
6326:
6324:
6300:
6294:
6293:
6291:
6289:
6273:
6267:
6266:
6264:
6262:
6242:
6236:
6235:
6233:
6231:
6211:
6205:
6204:
6202:
6200:
6181:
6175:
6174:
6172:
6170:
6164:
6131:
6120:
6114:
6113:
6111:
6109:
6089:
6083:
6082:
6080:
6078:
6058:
6052:
6051:
6049:
6047:
6023:
6017:
6016:
5994:
5984:
5960:
5954:
5953:
5948:
5946:
5940:
5928:INTERSPEECH 2015
5925:
5916:
5910:
5909:
5891:
5867:
5861:
5860:
5851:
5835:
5829:
5828:
5800:
5784:
5778:
5777:
5775:
5763:
5757:
5756:
5754:
5742:
5736:
5735:
5733:
5721:
5715:
5714:
5712:
5710:
5704:
5693:
5684:
5678:
5677:
5657:
5637:
5631:
5630:
5610:
5604:
5603:
5593:
5569:
5563:
5562:
5561:
5559:
5554:
5536:
5530:
5529:
5519:
5508:Interspeech 2019
5499:
5493:
5492:
5490:
5478:
5472:
5471:
5469:
5457:
5451:
5450:
5448:
5446:
5427:
5421:
5420:
5418:
5406:
5400:
5399:
5397:
5395:
5389:
5383:. Archived from
5378:
5369:
5363:
5362:
5354:
5348:
5347:
5345:
5337:Interspeech 2014
5334:
5325:
5319:
5312:
5306:
5305:
5303:
5268:
5262:
5261:
5259:
5257:
5251:
5229:(5): 1060–1089.
5220:
5211:
5205:
5204:
5198:
5193:
5191:
5183:
5179:
5170:
5156:
5150:
5149:
5121:
5115:
5114:
5108:
5099:
5093:
5092:
5090:
5075:
5066:(3–4): 197–387.
5057:
5048:
5039:
5038:
5026:
5020:
5019:
5017:
4986:
4977:
4971:
4969:
4967:
4955:
4949:
4948:
4946:
4935:
4922:
4916:
4915:
4913:
4902:
4893:
4887:
4886:," in ICSLP 2002
4880:
4874:
4873:
4853:
4847:
4846:
4844:
4842:
4836:
4797:
4788:
4782:
4781:
4771:
4760:10.1109/29.21701
4743:
4737:
4736:
4734:
4732:
4726:
4703:
4694:
4688:
4687:
4685:
4683:
4665:
4641:
4635:
4634:
4632:
4630:
4609:
4603:
4602:
4589:
4583:
4582:
4576:
4574:
4565:. 3 March 1997.
4555:
4549:
4548:
4543:
4541:
4530:TechRepublic.com
4522:
4516:
4498:
4489:
4486:
4480:
4479:
4459:
4453:
4452:
4442:
4436:
4419:
4413:
4412:
4402:
4362:
4356:
4339:
4333:
4332:
4291:
4285:
4282:
4276:
4275:
4273:
4271:
4251:
4242:
4241:
4214:. p. 8599.
4207:
4198:
4197:
4157:
4146:
4143:
4137:
4136:
4134:
4132:
4123:. Li Deng Site.
4117:
4111:
4110:
4108:
4096:
4087:
4086:
4084:
4072:
4063:
4062:
4060:
4048:
4039:
4038:
4036:
4024:
4018:
4017:
4015:
4013:
3989:
3983:
3982:
3980:
3968:
3962:
3961:
3959:
3946:
3940:
3938:
3936:
3934:
3925:. Archived from
3918:
3909:
3902:
3896:
3878:
3872:
3871:
3845:
3822:
3816:
3815:
3790:(8): 1735–1780.
3772:
3763:
3756:
3750:
3749:
3747:
3745:
3725:
3719:
3718:
3716:
3714:
3694:
3688:
3687:
3685:
3683:
3668:
3662:
3661:
3659:
3657:
3642:
3636:
3635:
3633:
3631:
3625:
3618:
3609:
3603:
3602:
3600:
3598:
3583:
3577:
3576:
3574:
3572:
3552:
3546:
3545:
3543:
3541:
3522:
3516:
3515:
3513:
3511:
3499:
3493:
3492:
3490:
3488:
3472:
3463:
3462:
3438:
3432:
3431:
3429:
3427:
3418:. Archived from
3408:
3402:
3401:
3361:
3355:
3354:
3352:
3350:
3344:
3337:
3328:
3322:
3321:
3316:. Archived from
3277:
3271:
3270:
3268:
3266:
3251:
3245:
3244:
3242:
3240:
3225:
3216:
3215:
3213:
3211:
3192:
3186:
3185:
3183:
3181:
3175:
3168:
3162:Rabiner (1984).
3159:
3153:
3152:
3149:10.1121/1.381666
3135:(6): 1345–1366.
3124:
3118:
3117:
3115:
3113:
3093:
3087:
3086:
3084:
3082:
3066:
3060:
3059:
3041:
3035:
3034:
3003:
2997:
2996:
2994:
2992:
2986:
2971:
2953:
2944:
2938:
2937:
2935:
2933:
2913:
2904:
2903:
2901:
2899:
2893:
2886:
2877:
2871:
2870:
2868:
2866:
2851:
2845:
2844:
2842:
2840:
2831:. 22 July 2012.
2821:
2815:
2814:
2809:
2807:
2788:
2782:
2781:
2779:
2777:
2771:
2724:
2715:
2709:
2708:
2682:
2662:
2656:
2655:
2653:
2651:
2636:
2630:
2629:
2627:
2625:
2610:
2604:
2603:
2601:
2599:
2584:
2578:
2577:
2543:
2537:
2536:
2534:
2532:
2523:. Fifthgen.com.
2517:
2448:Speech synthesis
2433:Speech analytics
2413:Origin of speech
2393:Keyword spotting
2199:Lawrence Rabiner
2126:
2124:
2123:
2118:
2069:
2067:
2066:
2061:
2059:
2054:
2043:
2038:
2033:
2004:
1941:
1939:
1938:
1933:
1931:
1926:
1903:
1693:real time factor
1607:Mobile telephony
1313:fighter aircraft
1209:speech disorders
1201:reading tutoring
1161:accent reduction
780:cosine transform
610:keyword spotting
549:, a team led by
498:, developed the
488:Lawrence Rabiner
464:Apricot Portable
425:language model.
383:began using the
240:Fumitada Itakura
128:(usually termed
94:speech synthesis
82:computer science
46:computer science
21:
9193:
9192:
9188:
9187:
9186:
9184:
9183:
9182:
9138:
9137:
9136:
9131:
9083:
8997:
8963:Google DeepMind
8941:
8907:Geoffrey Hinton
8866:
8803:
8729:Project Debater
8675:
8573:Implementations
8568:
8522:
8486:
8429:
8371:Backpropagation
8305:
8291:Tensor calculus
8245:
8242:
8212:
8207:
8176:
8156:Syntax guessing
8138:
8131:
8117:Predictive text
8112:Grammar checker
8093:
8086:
8058:
8025:
8014:
7980:Bank of English
7963:
7891:
7882:
7873:
7804:
7761:
7729:
7681:
7583:Distant reading
7558:Argument mining
7544:
7540:Text processing
7486:
7481:
7440:
7430:
7398:
7379:
7360:
7337:
7318:
7304:Mariani, Joseph
7298:
7296:Further reading
7293:
7292:
7282:
7280:
7271:
7270:
7266:
7256:
7254:
7245:
7244:
7240:
7230:
7228:
7205:
7201:
7191:
7189:
7176:
7175:
7171:
7161:
7159:
7150:
7149:
7145:
7135:
7133:
7124:
7123:
7119:
7109:
7107:
7094:
7093:
7089:
7083:
7079:
7068:
7052:
7048:
7038:
7036:
7023:
7022:
7018:
6997:
6993:
6974:
6973:
6969:
6959:
6957:
6944:
6943:
6939:
6933:Wayback Machine
6923:
6919:
6904:
6882:
6878:
6873:
6869:
6859:
6857:
6850:
6826:
6822:
6812:
6810:
6797:
6796:
6792:
6785:
6762:
6758:
6745:
6744:
6740:
6716:10.1.1.631.3736
6699:
6695:
6690:
6686:
6676:
6674:
6639:
6635:
6628:
6624:
6611:
6610:
6606:
6596:
6594:
6579:
6575:
6565:
6563:
6554:
6553:
6546:
6536:
6534:
6525:
6524:
6520:
6505:
6501:
6491:
6489:
6476:
6475:
6471:
6454:
6453:
6449:
6441:
6430:
6422:
6418:
6408:
6406:
6383:
6382:
6378:
6368:
6366:
6362:
6343:
6337:
6333:
6322:
6320:
6307:
6301:
6297:
6287:
6285:
6274:
6270:
6260:
6258:
6243:
6239:
6229:
6227:
6212:
6208:
6198:
6196:
6183:
6182:
6178:
6168:
6166:
6162:
6148:
6129:
6121:
6117:
6107:
6105:
6090:
6086:
6076:
6074:
6059:
6055:
6045:
6043:
6024:
6020:
5961:
5957:
5944:
5942:
5938:
5923:
5917:
5913:
5868:
5864:
5836:
5832:
5817:
5785:
5781:
5764:
5760:
5743:
5739:
5722:
5718:
5708:
5706:
5702:
5691:
5685:
5681:
5674:
5638:
5634:
5611:
5607:
5578:Future Internet
5570:
5566:
5557:
5555:
5537:
5533:
5500:
5496:
5479:
5475:
5458:
5454:
5444:
5442:
5429:
5428:
5424:
5407:
5403:
5393:
5391:
5387:
5376:
5370:
5366:
5355:
5351:
5343:
5332:
5326:
5322:
5313:
5309:
5276:"Deep Learning"
5269:
5265:
5255:
5253:
5249:
5218:
5212:
5208:
5196:
5194:
5185:
5184:
5180:
5173:
5169:. ICASSP, 2013.
5167:Wayback Machine
5157:
5153:
5122:
5118:
5106:
5100:
5096:
5088:
5073:10.1.1.691.3679
5055:
5049:
5042:
5027:
5023:
5015:
4984:
4978:
4974:
4956:
4952:
4944:
4933:
4923:
4919:
4911:
4900:
4894:
4890:
4881:
4877:
4854:
4850:
4840:
4838:
4834:
4795:
4789:
4785:
4744:
4740:
4730:
4728:
4724:
4701:
4695:
4691:
4681:
4679:
4642:
4638:
4628:
4626:
4611:
4610:
4606:
4590:
4586:
4572:
4570:
4557:
4556:
4552:
4539:
4537:
4524:
4523:
4519:
4509:Wayback Machine
4500:Keynote talk: "
4499:
4492:
4487:
4483:
4464:Neural Networks
4460:
4456:
4443:
4439:
4433:Wayback Machine
4422:Sepp Hochreiter
4420:
4416:
4363:
4359:
4353:Wayback Machine
4340:
4336:
4321:
4292:
4288:
4283:
4279:
4269:
4267:
4252:
4245:
4230:
4208:
4201:
4158:
4149:
4144:
4140:
4130:
4128:
4119:
4118:
4114:
4097:
4090:
4073:
4066:
4049:
4042:
4025:
4021:
4011:
4009:
3990:
3986:
3969:
3965:
3947:
3943:
3932:
3930:
3929:on 9 March 2016
3921:
3919:
3912:
3903:
3899:
3893:Wayback Machine
3879:
3875:
3830:Neural Networks
3823:
3819:
3776:Sepp Hochreiter
3773:
3766:
3757:
3753:
3743:
3741:
3726:
3722:
3712:
3710:
3695:
3691:
3681:
3679:
3670:
3669:
3665:
3655:
3653:
3644:
3643:
3639:
3629:
3627:
3623:
3616:
3610:
3606:
3596:
3594:
3585:
3584:
3580:
3570:
3568:
3553:
3549:
3539:
3537:
3524:
3523:
3519:
3509:
3507:
3500:
3496:
3486:
3484:
3473:
3466:
3439:
3435:
3425:
3423:
3410:
3409:
3405:
3362:
3358:
3348:
3346:
3342:
3335:
3329:
3325:
3298:10.1145/2500887
3278:
3274:
3264:
3262:
3253:
3252:
3248:
3238:
3236:
3227:
3226:
3219:
3209:
3207:
3194:
3193:
3189:
3179:
3177:
3173:
3166:
3160:
3156:
3125:
3121:
3111:
3109:
3094:
3090:
3080:
3078:
3067:
3063:
3056:
3042:
3038:
3004:
3000:
2990:
2988:
2984:
2951:
2945:
2941:
2931:
2929:
2914:
2907:
2897:
2895:
2891:
2884:
2878:
2874:
2864:
2862:
2853:
2852:
2848:
2838:
2836:
2829:The Star-Ledger
2823:
2822:
2818:
2805:
2803:
2790:
2789:
2785:
2775:
2773:
2769:
2722:
2716:
2712:
2663:
2659:
2649:
2647:
2638:
2637:
2633:
2623:
2621:
2612:
2611:
2607:
2597:
2595:
2586:
2585:
2581:
2566:
2544:
2540:
2530:
2528:
2519:
2518:
2514:
2509:
2504:
2318:
2250:
2195:
2151:
2146:
2133:
2085:
2082:
2081:
2044:
2042:
2005:
2003:
1971:
1968:
1967:
1904:
1902:
1888:
1885:
1884:
1799:
1733:
1689:word error rate
1685:
1597:Home automation
1545:, etc.) NASA's
1531:
1469:
1450:computer gaming
1442:
1424:
1414:and in overall
1400:Puma helicopter
1375:
1309:
1304:
1288:word processors
1284:
1282:Therapeutic use
1242:
1237:
1205:Microsoft Teams
1173:intelligibility
1155:(CALL), speech
1140:
1134:
1121:
1116:
1108:Google DeepMind
1059:Google DeepMind
1030:Google DeepMind
994:
957:
951:
938:Neural networks
927:
921:
919:Neural networks
905:
899:
828:heteroscedastic
747:
741:
716:
697:By early 2010s
695:
674:Geoffrey Hinton
646:Sepp Hochreiter
539:
419:
331:Carnegie Mellon
296:
186:
178:
118:word processors
35:
28:
23:
22:
15:
12:
11:
5:
9191:
9181:
9180:
9175:
9170:
9165:
9160:
9155:
9150:
9133:
9132:
9130:
9129:
9128:
9127:
9122:
9109:
9108:
9107:
9102:
9088:
9085:
9084:
9082:
9081:
9076:
9071:
9066:
9061:
9056:
9051:
9046:
9041:
9036:
9031:
9026:
9021:
9016:
9011:
9005:
9003:
8999:
8998:
8996:
8995:
8990:
8985:
8980:
8975:
8970:
8965:
8960:
8955:
8949:
8947:
8943:
8942:
8940:
8939:
8937:Ilya Sutskever
8934:
8929:
8924:
8919:
8914:
8909:
8904:
8902:Demis Hassabis
8899:
8894:
8892:Ian Goodfellow
8889:
8884:
8878:
8876:
8872:
8871:
8868:
8867:
8865:
8864:
8859:
8858:
8857:
8847:
8842:
8837:
8832:
8827:
8822:
8817:
8811:
8809:
8805:
8804:
8802:
8801:
8796:
8791:
8786:
8781:
8776:
8771:
8766:
8761:
8756:
8751:
8746:
8741:
8736:
8731:
8726:
8721:
8720:
8719:
8709:
8704:
8699:
8694:
8689:
8683:
8681:
8677:
8676:
8674:
8673:
8668:
8667:
8666:
8661:
8651:
8650:
8649:
8644:
8639:
8629:
8624:
8619:
8614:
8609:
8604:
8599:
8594:
8589:
8583:
8581:
8574:
8570:
8569:
8567:
8566:
8561:
8556:
8551:
8546:
8541:
8536:
8530:
8528:
8524:
8523:
8521:
8520:
8515:
8510:
8505:
8500:
8494:
8492:
8488:
8487:
8485:
8484:
8483:
8482:
8475:Language model
8472:
8467:
8462:
8461:
8460:
8450:
8449:
8448:
8437:
8435:
8431:
8430:
8428:
8427:
8425:Autoregression
8422:
8417:
8416:
8415:
8405:
8403:Regularization
8400:
8399:
8398:
8393:
8388:
8378:
8373:
8368:
8366:Loss functions
8363:
8358:
8353:
8348:
8343:
8342:
8341:
8331:
8326:
8325:
8324:
8313:
8311:
8307:
8306:
8304:
8303:
8301:Inductive bias
8298:
8293:
8288:
8283:
8278:
8273:
8268:
8263:
8255:
8253:
8247:
8246:
8241:
8240:
8233:
8226:
8218:
8209:
8208:
8206:
8205:
8200:
8195:
8190:
8184:
8182:
8178:
8177:
8175:
8174:
8169:
8164:
8159:
8149:
8143:
8141:
8139:user interface
8133:
8132:
8130:
8129:
8124:
8119:
8114:
8109:
8104:
8098:
8096:
8088:
8087:
8085:
8084:
8079:
8074:
8068:
8066:
8060:
8059:
8057:
8056:
8051:
8046:
8041:
8036:
8030:
8028:
8020:
8019:
8016:
8015:
8013:
8012:
8007:
8002:
7997:
7992:
7987:
7982:
7977:
7971:
7969:
7965:
7964:
7962:
7961:
7956:
7951:
7946:
7941:
7936:
7931:
7926:
7921:
7916:
7911:
7906:
7901:
7895:
7893:
7884:
7875:
7874:
7872:
7871:
7866:
7864:Word embedding
7861:
7856:
7851:
7844:Language model
7841:
7836:
7831:
7826:
7821:
7815:
7813:
7806:
7805:
7803:
7802:
7797:
7795:Transfer-based
7792:
7787:
7782:
7777:
7771:
7769:
7763:
7762:
7760:
7759:
7754:
7749:
7743:
7741:
7735:
7734:
7731:
7730:
7728:
7727:
7722:
7717:
7712:
7707:
7702:
7697:
7691:
7689:
7680:
7679:
7674:
7669:
7664:
7659:
7654:
7648:
7647:
7642:
7637:
7632:
7627:
7622:
7617:
7616:
7615:
7610:
7600:
7595:
7590:
7585:
7580:
7575:
7570:
7568:Concept mining
7565:
7560:
7554:
7552:
7546:
7545:
7543:
7542:
7537:
7532:
7527:
7522:
7521:
7520:
7515:
7505:
7500:
7494:
7492:
7488:
7487:
7480:
7479:
7472:
7465:
7457:
7451:
7450:
7439:
7438:External links
7436:
7435:
7434:
7429:978-0470517048
7428:
7415:
7402:
7396:
7383:
7378:978-0262016858
7377:
7364:
7358:
7341:
7335:
7322:
7316:
7302:Cole, Ronald;
7297:
7294:
7291:
7290:
7264:
7238:
7199:
7169:
7143:
7117:
7087:
7077:
7066:
7046:
7016:
6991:
6967:
6937:
6917:
6902:
6876:
6867:
6848:
6820:
6790:
6783:
6756:
6738:
6693:
6684:
6633:
6622:
6604:
6573:
6544:
6518:
6499:
6469:
6447:
6416:
6376:
6331:
6295:
6268:
6237:
6206:
6176:
6146:
6115:
6084:
6053:
6018:
5975:(2): 182–207.
5955:
5911:
5882:(3): 347–366.
5862:
5830:
5815:
5779:
5758:
5737:
5716:
5679:
5672:
5632:
5605:
5564:
5531:
5494:
5473:
5452:
5422:
5401:
5364:
5349:
5320:
5318:. Interspeech.
5307:
5263:
5206:
5197:|journal=
5171:
5151:
5116:
5094:
5040:
5021:
4972:
4950:
4917:
4888:
4875:
4848:
4783:
4754:(3): 328–339.
4738:
4712:(6): 957–982.
4689:
4656:(2): 115–135.
4636:
4604:
4598:. p. 45.
4584:
4550:
4517:
4490:
4481:
4470:(2): 331–339.
4454:
4437:
4414:
4357:
4334:
4319:
4286:
4277:
4260:New York Times
4243:
4228:
4199:
4147:
4138:
4112:
4088:
4064:
4040:
4019:
3984:
3963:
3941:
3910:
3897:
3873:
3817:
3780:J. Schmidhuber
3764:
3751:
3720:
3689:
3663:
3637:
3604:
3578:
3547:
3530:actapricot.org
3517:
3494:
3464:
3453:(3): 263–271.
3433:
3403:
3382:10.1086/725132
3356:
3323:
3272:
3246:
3217:
3187:
3154:
3119:
3102:The New Yorker
3088:
3069:John Makhoul.
3061:
3055:978-3540491255
3054:
3036:
3007:John R. Pierce
2998:
2962:(4): 203–303.
2939:
2905:
2872:
2846:
2816:
2783:
2710:
2657:
2631:
2605:
2579:
2564:
2538:
2511:
2510:
2508:
2505:
2503:
2502:
2497:
2492:
2487:
2481:
2480:
2476:
2475:
2470:
2465:
2460:
2455:
2450:
2445:
2440:
2435:
2430:
2425:
2420:
2415:
2410:
2405:
2400:
2395:
2390:
2385:
2380:
2375:
2370:
2365:
2360:
2355:
2350:
2345:
2340:
2335:
2330:
2325:
2319:
2317:
2314:
2249:
2246:
2194:
2191:
2150:
2147:
2145:
2142:
2132:
2129:
2128:
2127:
2116:
2113:
2110:
2107:
2104:
2101:
2098:
2095:
2092:
2089:
2071:
2070:
2057:
2053:
2050:
2047:
2041:
2036:
2032:
2029:
2026:
2023:
2020:
2017:
2014:
2011:
2008:
2002:
1999:
1996:
1993:
1990:
1987:
1984:
1981:
1978:
1975:
1929:
1925:
1922:
1919:
1916:
1913:
1910:
1907:
1901:
1898:
1895:
1892:
1864:
1863:
1856:
1855:
1852:
1844:
1843:
1840:
1829:
1828:
1825:
1817:
1816:
1815:
1814:
1811:
1808:
1796:
1795:
1794:
1793:
1787:
1786:
1782:
1781:
1780:
1779:
1776:
1770:
1769:
1765:
1764:
1763:
1762:
1755:
1754:
1750:
1749:
1748:
1747:
1741:
1740:
1732:
1729:
1728:
1727:
1724:
1721:
1718:
1715:
1712:
1684:
1681:
1680:
1679:
1669:
1651:
1645:
1639:
1632:
1626:
1621:
1615:
1610:
1604:
1599:
1594:
1592:user interface
1585:
1579:
1574:
1567:
1561:
1554:
1530:
1527:
1515:relay services
1468:
1465:
1441:
1438:
1423:
1420:
1374:
1371:
1365:(JSF) and the
1308:
1305:
1303:
1300:
1283:
1280:
1241:
1238:
1236:
1233:
1136:Main article:
1133:
1130:
1124:recognition.
1120:
1119:In-car systems
1117:
1115:
1112:
1002:language model
993:
990:
953:Main article:
950:
947:
923:Main article:
920:
917:
901:Main article:
898:
895:
887:edit distances
743:Main article:
740:
737:
715:
712:
706:their voice".
694:
691:
636:method called
538:
535:
492:
491:
477:
471:
446:
445:
434:back-off model
418:
415:
414:
413:
410:Janet M. Baker
402:
401:
397:Fred Jelinek's
381:Janet M. Baker
361:
360:
346:
339:
338:
295:
292:
270:
269:
251:
225:
211:
197:
185:
182:
177:
174:
74:speech-to-text
52:that develops
26:
9:
6:
4:
3:
2:
9190:
9179:
9176:
9174:
9171:
9169:
9166:
9164:
9161:
9159:
9156:
9154:
9151:
9149:
9146:
9145:
9143:
9126:
9123:
9121:
9118:
9117:
9110:
9106:
9103:
9101:
9098:
9097:
9094:
9090:
9089:
9086:
9080:
9077:
9075:
9072:
9070:
9067:
9065:
9062:
9060:
9057:
9055:
9052:
9050:
9047:
9045:
9042:
9040:
9037:
9035:
9032:
9030:
9027:
9025:
9022:
9020:
9017:
9015:
9012:
9010:
9007:
9006:
9004:
9002:Architectures
9000:
8994:
8991:
8989:
8986:
8984:
8981:
8979:
8976:
8974:
8971:
8969:
8966:
8964:
8961:
8959:
8956:
8954:
8951:
8950:
8948:
8946:Organizations
8944:
8938:
8935:
8933:
8930:
8928:
8925:
8923:
8920:
8918:
8915:
8913:
8910:
8908:
8905:
8903:
8900:
8898:
8895:
8893:
8890:
8888:
8885:
8883:
8882:Yoshua Bengio
8880:
8879:
8877:
8873:
8863:
8862:Robot control
8860:
8856:
8853:
8852:
8851:
8848:
8846:
8843:
8841:
8838:
8836:
8833:
8831:
8828:
8826:
8823:
8821:
8818:
8816:
8813:
8812:
8810:
8806:
8800:
8797:
8795:
8792:
8790:
8787:
8785:
8782:
8780:
8779:Chinchilla AI
8777:
8775:
8772:
8770:
8767:
8765:
8762:
8760:
8757:
8755:
8752:
8750:
8747:
8745:
8742:
8740:
8737:
8735:
8732:
8730:
8727:
8725:
8722:
8718:
8715:
8714:
8713:
8710:
8708:
8705:
8703:
8700:
8698:
8695:
8693:
8690:
8688:
8685:
8684:
8682:
8678:
8672:
8669:
8665:
8662:
8660:
8657:
8656:
8655:
8652:
8648:
8645:
8643:
8640:
8638:
8635:
8634:
8633:
8630:
8628:
8625:
8623:
8620:
8618:
8615:
8613:
8610:
8608:
8605:
8603:
8600:
8598:
8595:
8593:
8590:
8588:
8585:
8584:
8582:
8578:
8575:
8571:
8565:
8562:
8560:
8557:
8555:
8552:
8550:
8547:
8545:
8542:
8540:
8537:
8535:
8532:
8531:
8529:
8525:
8519:
8516:
8514:
8511:
8509:
8506:
8504:
8501:
8499:
8496:
8495:
8493:
8489:
8481:
8478:
8477:
8476:
8473:
8471:
8468:
8466:
8463:
8459:
8458:Deep learning
8456:
8455:
8454:
8451:
8447:
8444:
8443:
8442:
8439:
8438:
8436:
8432:
8426:
8423:
8421:
8418:
8414:
8411:
8410:
8409:
8406:
8404:
8401:
8397:
8394:
8392:
8389:
8387:
8384:
8383:
8382:
8379:
8377:
8374:
8372:
8369:
8367:
8364:
8362:
8359:
8357:
8354:
8352:
8349:
8347:
8346:Hallucination
8344:
8340:
8337:
8336:
8335:
8332:
8330:
8327:
8323:
8320:
8319:
8318:
8315:
8314:
8312:
8308:
8302:
8299:
8297:
8294:
8292:
8289:
8287:
8284:
8282:
8279:
8277:
8274:
8272:
8269:
8267:
8264:
8262:
8261:
8257:
8256:
8254:
8252:
8248:
8239:
8234:
8232:
8227:
8225:
8220:
8219:
8216:
8204:
8201:
8199:
8196:
8194:
8193:Hallucination
8191:
8189:
8186:
8185:
8183:
8179:
8173:
8170:
8168:
8165:
8163:
8160:
8157:
8153:
8150:
8148:
8145:
8144:
8142:
8140:
8134:
8128:
8127:Spell checker
8125:
8123:
8120:
8118:
8115:
8113:
8110:
8108:
8105:
8103:
8100:
8099:
8097:
8095:
8089:
8083:
8080:
8078:
8075:
8073:
8070:
8069:
8067:
8065:
8061:
8055:
8052:
8050:
8047:
8045:
8042:
8040:
8037:
8035:
8032:
8031:
8029:
8027:
8021:
8011:
8008:
8006:
8003:
8001:
7998:
7996:
7993:
7991:
7988:
7986:
7983:
7981:
7978:
7976:
7973:
7972:
7970:
7966:
7960:
7957:
7955:
7952:
7950:
7947:
7945:
7942:
7940:
7939:Speech corpus
7937:
7935:
7932:
7930:
7927:
7925:
7922:
7920:
7919:Parallel text
7917:
7915:
7912:
7910:
7907:
7905:
7902:
7900:
7897:
7896:
7894:
7888:
7885:
7880:
7876:
7870:
7867:
7865:
7862:
7860:
7857:
7855:
7852:
7849:
7845:
7842:
7840:
7837:
7835:
7832:
7830:
7827:
7825:
7822:
7820:
7817:
7816:
7814:
7811:
7807:
7801:
7798:
7796:
7793:
7791:
7788:
7786:
7783:
7781:
7780:Example-based
7778:
7776:
7773:
7772:
7770:
7768:
7764:
7758:
7755:
7753:
7750:
7748:
7745:
7744:
7742:
7740:
7736:
7726:
7723:
7721:
7718:
7716:
7713:
7711:
7710:Text chunking
7708:
7706:
7703:
7701:
7700:Lemmatisation
7698:
7696:
7693:
7692:
7690:
7688:
7684:
7678:
7675:
7673:
7670:
7668:
7665:
7663:
7660:
7658:
7655:
7653:
7650:
7649:
7646:
7643:
7641:
7638:
7636:
7633:
7631:
7628:
7626:
7623:
7621:
7618:
7614:
7611:
7609:
7606:
7605:
7604:
7601:
7599:
7596:
7594:
7591:
7589:
7586:
7584:
7581:
7579:
7576:
7574:
7571:
7569:
7566:
7564:
7561:
7559:
7556:
7555:
7553:
7551:
7550:Text analysis
7547:
7541:
7538:
7536:
7533:
7531:
7528:
7526:
7523:
7519:
7516:
7514:
7511:
7510:
7509:
7506:
7504:
7501:
7499:
7496:
7495:
7493:
7491:General terms
7489:
7485:
7478:
7473:
7471:
7466:
7464:
7459:
7458:
7455:
7449:
7445:
7442:
7441:
7431:
7425:
7421:
7416:
7412:
7408:
7403:
7399:
7393:
7389:
7384:
7380:
7374:
7370:
7365:
7361:
7355:
7351:
7347:
7346:Sears, Andrew
7342:
7338:
7332:
7328:
7323:
7319:
7313:
7309:
7305:
7300:
7299:
7278:
7274:
7268:
7253:. 7 July 2021
7252:
7248:
7242:
7226:
7222:
7218:
7214:
7210:
7203:
7187:
7183:
7179:
7173:
7157:
7153:
7147:
7131:
7127:
7121:
7105:
7101:
7097:
7091:
7081:
7073:
7069:
7063:
7059:
7058:
7050:
7034:
7030:
7026:
7020:
7012:
7008:
7007:
7002:
6995:
6987:
6983:
6982:
6977:
6971:
6955:
6951:
6947:
6941:
6934:
6930:
6927:
6921:
6913:
6909:
6905:
6903:0-7803-0946-4
6899:
6895:
6891:
6887:
6880:
6871:
6855:
6851:
6845:
6841:
6837:
6833:
6832:
6824:
6808:
6804:
6800:
6794:
6786:
6780:
6776:
6772:
6768:
6760:
6752:
6748:
6742:
6734:
6730:
6726:
6722:
6717:
6712:
6709:(2): 173–84.
6708:
6704:
6697:
6688:
6672:
6668:
6664:
6660:
6656:
6652:
6648:
6644:
6637:
6631:
6626:
6618:
6614:
6608:
6592:
6588:
6584:
6577:
6561:
6557:
6551:
6549:
6532:
6528:
6522:
6514:
6510:
6503:
6487:
6483:
6479:
6473:
6465:
6461:
6457:
6456:"The Cockpit"
6451:
6440:
6436:
6429:
6428:
6420:
6404:
6400:
6396:
6392:
6388:
6387:
6380:
6361:
6357:
6353:
6349:
6342:
6335:
6318:
6314:
6310:
6305:
6299:
6283:
6279:
6272:
6256:
6252:
6248:
6241:
6225:
6221:
6217:
6210:
6194:
6190:
6186:
6180:
6161:
6157:
6153:
6149:
6147:9781450351522
6143:
6139:
6135:
6128:
6127:
6119:
6103:
6099:
6095:
6088:
6072:
6069:. Microsoft.
6068:
6064:
6057:
6041:
6037:
6033:
6029:
6022:
6015:
6010:
6006:
6002:
5998:
5993:
5988:
5983:
5978:
5974:
5970:
5966:
5959:
5952:
5937:
5933:
5929:
5922:
5915:
5907:
5903:
5899:
5895:
5890:
5885:
5881:
5877:
5873:
5866:
5859:
5855:
5850:
5845:
5841:
5834:
5826:
5822:
5818:
5812:
5808:
5804:
5799:
5794:
5790:
5783:
5774:
5769:
5762:
5753:
5748:
5741:
5732:
5727:
5720:
5701:
5697:
5690:
5683:
5675:
5669:
5665:
5661:
5656:
5651:
5647:
5643:
5636:
5628:
5624:
5620:
5616:
5609:
5601:
5597:
5592:
5587:
5583:
5579:
5575:
5568:
5553:
5548:
5544:
5543:
5535:
5527:
5523:
5518:
5513:
5509:
5505:
5498:
5489:
5484:
5477:
5468:
5463:
5456:
5440:
5436:
5432:
5426:
5417:
5412:
5405:
5386:
5382:
5375:
5368:
5360:
5353:
5342:
5338:
5331:
5324:
5317:
5311:
5302:
5297:
5293:
5289:
5286:(11): 32832.
5285:
5281:
5277:
5273:
5267:
5248:
5244:
5240:
5236:
5232:
5228:
5224:
5217:
5210:
5202:
5189:
5178:
5176:
5168:
5164:
5161:
5155:
5147:
5143:
5139:
5135:
5131:
5127:
5120:
5112:
5105:
5098:
5087:
5083:
5079:
5074:
5069:
5065:
5061:
5054:
5047:
5045:
5036:
5032:
5031:Ng, Andrew Y.
5025:
5014:
5010:
5006:
5002:
4998:
4994:
4990:
4983:
4976:
4966:
4961:
4954:
4943:
4939:
4932:
4928:
4921:
4910:
4906:
4899:
4892:
4885:
4879:
4871:
4867:
4863:
4859:
4852:
4833:
4829:
4825:
4821:
4817:
4813:
4809:
4805:
4801:
4794:
4787:
4779:
4775:
4770:
4765:
4761:
4757:
4753:
4749:
4742:
4723:
4719:
4715:
4711:
4707:
4700:
4693:
4677:
4673:
4669:
4664:
4659:
4655:
4651:
4647:
4640:
4624:
4620:
4619:
4614:
4608:
4601:
4597:
4596:
4595:Computerworld
4588:
4581:
4568:
4564:
4560:
4554:
4547:
4535:
4531:
4527:
4521:
4514:
4510:
4506:
4503:
4497:
4495:
4485:
4477:
4473:
4469:
4465:
4458:
4450:
4449:
4441:
4434:
4430:
4427:
4423:
4418:
4410:
4406:
4401:
4396:
4392:
4388:
4384:
4380:
4376:
4372:
4368:
4361:
4354:
4350:
4347:
4343:
4338:
4330:
4326:
4322:
4320:0-7803-0532-9
4316:
4312:
4308:
4304:
4300:
4296:
4290:
4281:
4265:
4261:
4257:
4250:
4248:
4239:
4235:
4231:
4225:
4221:
4217:
4213:
4206:
4204:
4195:
4191:
4187:
4183:
4179:
4175:
4171:
4167:
4163:
4162:Sainath, Tara
4156:
4154:
4152:
4142:
4126:
4122:
4116:
4107:
4102:
4095:
4093:
4083:
4078:
4071:
4069:
4059:
4054:
4047:
4045:
4035:
4030:
4023:
4007:
4003:
3999:
3995:
3988:
3979:
3974:
3967:
3958:
3953:
3945:
3928:
3924:
3917:
3915:
3907:
3901:
3894:
3890:
3887:
3883:
3877:
3869:
3865:
3861:
3857:
3853:
3849:
3844:
3839:
3835:
3831:
3827:
3821:
3813:
3809:
3805:
3801:
3797:
3793:
3789:
3785:
3781:
3777:
3771:
3769:
3761:
3760:Nelson Morgan
3755:
3739:
3735:
3734:The Intercept
3731:
3724:
3708:
3704:
3700:
3693:
3677:
3673:
3667:
3651:
3647:
3641:
3622:
3615:
3608:
3592:
3588:
3582:
3566:
3562:
3558:
3551:
3535:
3531:
3527:
3521:
3505:
3498:
3482:
3478:
3471:
3469:
3460:
3456:
3452:
3448:
3444:
3437:
3421:
3417:
3413:
3407:
3399:
3395:
3391:
3387:
3383:
3379:
3375:
3371:
3367:
3360:
3341:
3334:
3327:
3319:
3315:
3311:
3307:
3303:
3299:
3295:
3292:(1): 94–103.
3291:
3287:
3283:
3276:
3260:
3256:
3250:
3234:
3230:
3224:
3222:
3205:
3201:
3197:
3191:
3172:
3165:
3158:
3150:
3146:
3142:
3138:
3134:
3130:
3123:
3107:
3103:
3099:
3092:
3076:
3072:
3065:
3057:
3051:
3047:
3040:
3032:
3028:
3024:
3020:
3016:
3012:
3008:
3002:
2983:
2979:
2975:
2970:
2965:
2961:
2957:
2950:
2943:
2927:
2923:
2919:
2912:
2910:
2890:
2887:. p. 6.
2883:
2876:
2860:
2856:
2850:
2834:
2830:
2826:
2820:
2813:
2801:
2798:. Microsoft.
2797:
2793:
2787:
2768:
2764:
2760:
2756:
2752:
2748:
2744:
2740:
2736:
2732:
2728:
2721:
2714:
2706:
2702:
2698:
2694:
2690:
2686:
2681:
2676:
2672:
2668:
2661:
2645:
2641:
2635:
2619:
2615:
2609:
2593:
2589:
2583:
2575:
2571:
2567:
2561:
2557:
2553:
2549:
2542:
2526:
2522:
2516:
2512:
2501:
2498:
2496:
2493:
2491:
2488:
2486:
2483:
2482:
2478:
2477:
2474:
2471:
2469:
2466:
2464:
2461:
2459:
2456:
2454:
2451:
2449:
2446:
2444:
2441:
2439:
2436:
2434:
2431:
2429:
2426:
2424:
2421:
2419:
2416:
2414:
2411:
2409:
2406:
2404:
2401:
2399:
2396:
2394:
2391:
2389:
2386:
2384:
2381:
2379:
2376:
2374:
2371:
2369:
2366:
2364:
2361:
2359:
2356:
2354:
2351:
2349:
2346:
2344:
2341:
2339:
2336:
2334:
2331:
2329:
2326:
2324:
2321:
2320:
2313:
2311:
2306:
2303:
2301:
2298:
2294:
2290:
2285:
2283:
2279:
2275:
2271:
2267:
2263:
2259:
2255:
2245:
2243:
2238:
2236:
2231:
2229:
2224:
2220:
2216:
2212:
2208:
2207:Xuedong Huang
2204:
2200:
2190:
2188:
2184:
2180:
2176:
2172:
2168:
2164:
2160:
2156:
2141:
2137:
2114:
2108:
2105:
2102:
2096:
2093:
2090:
2087:
2080:
2079:
2078:
2076:
2055:
2051:
2048:
2045:
2039:
2034:
2027:
2024:
2021:
2018:
2015:
2012:
2009:
2000:
1997:
1994:
1991:
1988:
1985:
1982:
1979:
1976:
1973:
1966:
1965:
1964:
1961:
1959:
1955:
1951:
1947:
1942:
1927:
1920:
1917:
1914:
1911:
1908:
1899:
1896:
1893:
1890:
1882:
1879:
1877:
1873:
1867:
1861:
1860:
1859:
1853:
1849:
1848:
1847:
1841:
1838:
1834:
1833:
1832:
1826:
1822:
1821:
1820:
1812:
1809:
1806:
1805:
1803:
1802:
1801:
1791:
1790:
1789:
1788:
1784:
1783:
1777:
1774:
1773:
1772:
1771:
1767:
1766:
1759:
1758:
1757:
1756:
1752:
1751:
1745:
1744:
1743:
1742:
1738:
1737:
1736:
1725:
1722:
1719:
1716:
1713:
1710:
1709:
1708:
1704:
1702:
1698:
1694:
1690:
1677:
1673:
1670:
1667:
1666:
1661:
1660:
1655:
1652:
1649:
1648:Transcription
1646:
1643:
1640:
1637:
1633:
1631:
1627:
1625:
1622:
1620:
1616:
1614:
1611:
1608:
1605:
1603:
1600:
1598:
1595:
1593:
1589:
1586:
1583:
1580:
1578:
1575:
1572:
1568:
1566:
1562:
1559:
1555:
1552:
1551:Sensory, Inc.
1548:
1544:
1540:
1536:
1533:
1532:
1526:
1522:
1520:
1516:
1512:
1509:
1505:
1499:
1495:
1493:
1487:
1485:
1480:
1478:
1473:
1464:
1462:
1457:
1455:
1451:
1447:
1437:
1433:
1430:
1419:
1417:
1411:
1409:
1405:
1401:
1397:
1393:
1389:
1385:
1380:
1370:
1368:
1364:
1359:
1357:
1353:
1349:
1344:
1342:
1338:
1333:
1330:
1326:
1322:
1318:
1314:
1299:
1297:
1293:
1289:
1279:
1275:
1273:
1268:
1264:
1260:
1255:
1252:
1247:
1232:
1230:
1226:
1221:
1216:
1214:
1210:
1206:
1202:
1198:
1194:
1190:
1186:
1182:
1178:
1174:
1170:
1166:
1162:
1158:
1154:
1150:
1145:
1144:pronunciation
1139:
1129:
1125:
1111:
1109:
1105:
1101:
1097:
1093:
1088:
1084:
1080:
1075:
1072:
1068:
1064:
1060:
1056:
1052:
1048:
1043:
1039:
1035:
1031:
1027:
1023:
1018:
1016:
1012:
1007:
1003:
999:
989:
986:
982:
981:deep learning
977:
975:
974:deep learning
969:
966:
962:
956:
955:Deep learning
946:
942:
939:
935:
933:
930:recognition,
926:
916:
912:
908:
904:
894:
892:
888:
884:
880:
876:
872:
868:
864:
860:
855:
853:
848:
843:
841:
837:
833:
829:
825:
821:
817:
813:
809:
805:
801:
797:
793:
787:
785:
781:
777:
773:
769:
765:
759:
757:
753:
746:
736:
734:
730:
725:
721:
711:
707:
704:
700:
690:
688:
684:
678:
675:
671:
666:
664:
660:
655:
651:
647:
644:published by
643:
639:
635:
634:deep learning
631:
627:
622:
620:
619:Babel program
616:
611:
607:
602:
600:
596:
592:
588:
584:
580:
579:speech corpus
576:
572:
568:
564:
560:
556:
552:
548:
544:
534:
532:
528:
524:
520:
516:
512:
510:
506:
501:
497:
496:Xuedong Huang
489:
485:
481:
478:
475:
472:
469:
465:
461:
458:
457:
456:
453:
451:
443:
439:
435:
431:
428:
427:
426:
424:
411:
407:
404:
403:
398:
394:
390:
389:
388:
386:
382:
378:
374:
370:
369:Markov chains
366:
358:
354:
350:
347:
344:
341:
340:
336:
332:
328:
324:
320:
316:
315:
314:understanding
309:
305:
301:
298:
297:
291:
289:
284:
282:
278:
274:
267:
263:
259:
256:– Funding at
255:
252:
249:
245:
241:
237:
236:speech coding
233:
229:
226:
223:
219:
215:
212:
209:
205:
201:
198:
195:
191:
188:
187:
181:
173:
171:
167:
166:deep learning
162:
160:
156:
152:
148:
147:
142:
137:
135:
132:). Automatic
131:
127:
123:
119:
114:
110:
105:
102:
97:
95:
91:
87:
83:
79:
75:
71:
67:
63:
59:
55:
54:methodologies
51:
47:
43:
39:
33:
19:
8968:Hugging Face
8932:David Silver
8616:
8580:Audio–visual
8434:Applications
8413:Augmentation
8258:
8107:Concordancer
8033:
7503:Bag-of-words
7419:
7410:
7387:
7368:
7349:
7326:
7307:
7281:. Retrieved
7267:
7255:. Retrieved
7250:
7241:
7229:. Retrieved
7212:
7202:
7190:. Retrieved
7181:
7172:
7160:. Retrieved
7146:
7134:. Retrieved
7120:
7108:. Retrieved
7104:the original
7099:
7090:
7080:
7056:
7049:
7037:. Retrieved
7028:
7019:
7006:The Register
7004:
6994:
6979:
6970:
6958:. Retrieved
6949:
6940:
6920:
6885:
6879:
6870:
6858:. Retrieved
6830:
6823:
6811:. Retrieved
6802:
6793:
6766:
6759:
6751:the original
6741:
6706:
6702:
6696:
6687:
6675:. Retrieved
6653:(1): 25–41.
6650:
6646:
6636:
6625:
6617:the original
6607:
6595:. Retrieved
6586:
6576:
6564:. Retrieved
6535:. Retrieved
6521:
6513:the original
6502:
6490:. Retrieved
6481:
6472:
6459:
6450:
6426:
6419:
6407:. Retrieved
6385:
6379:
6367:. Retrieved
6347:
6334:
6321:. Retrieved
6312:
6298:
6286:. Retrieved
6271:
6259:. Retrieved
6250:
6240:
6228:. Retrieved
6219:
6209:
6197:. Retrieved
6189:The Guardian
6188:
6179:
6167:. Retrieved
6125:
6118:
6106:. Retrieved
6098:EdSurge News
6097:
6087:
6075:. Retrieved
6066:
6056:
6044:. Retrieved
6038:(2): 62–76.
6035:
6031:
6021:
6012:
5972:
5968:
5958:
5950:
5943:, retrieved
5927:
5914:
5879:
5875:
5865:
5839:
5833:
5788:
5782:
5761:
5740:
5719:
5707:. Retrieved
5695:
5682:
5645:
5635:
5618:
5608:
5581:
5577:
5567:
5558:30 September
5556:, retrieved
5541:
5534:
5507:
5497:
5476:
5455:
5443:. Retrieved
5434:
5425:
5404:
5392:. Retrieved
5385:the original
5380:
5367:
5358:
5352:
5336:
5323:
5310:
5283:
5280:Scholarpedia
5279:
5266:
5254:. Retrieved
5226:
5222:
5209:
5188:cite journal
5154:
5132:(1): 30–42.
5129:
5125:
5119:
5110:
5097:
5063:
5059:
5034:
5024:
4995:(1): 39–46.
4992:
4988:
4975:
4970:ICASSP 2013.
4953:
4937:
4920:
4904:
4891:
4878:
4861:
4857:
4851:
4839:. Retrieved
4803:
4799:
4786:
4751:
4747:
4741:
4729:. Retrieved
4709:
4705:
4692:
4680:. Retrieved
4653:
4649:
4639:
4627:. Retrieved
4616:
4607:
4599:
4593:
4587:
4578:
4571:. Retrieved
4562:
4553:
4545:
4538:. Retrieved
4534:the original
4529:
4520:
4484:
4467:
4463:
4457:
4447:
4440:
4417:
4400:1721.1/51891
4377:(3): 75–80.
4374:
4370:
4367:Chin-Hui Lee
4360:
4337:
4302:
4289:
4280:
4268:. Retrieved
4259:
4211:
4172:(6): 82–97.
4169:
4165:
4141:
4129:. Retrieved
4115:
4022:
4010:. Retrieved
4001:
3997:
3987:
3966:
3944:
3931:. Retrieved
3927:the original
3900:
3876:
3833:
3829:
3820:
3787:
3783:
3754:
3742:. Retrieved
3733:
3723:
3711:. Retrieved
3702:
3692:
3680:. Retrieved
3666:
3654:. Retrieved
3640:
3628:. Retrieved
3607:
3597:25 September
3595:. Retrieved
3581:
3569:. Retrieved
3560:
3550:
3538:. Retrieved
3529:
3520:
3508:. Retrieved
3497:
3485:. Retrieved
3450:
3446:
3436:
3424:. Retrieved
3420:the original
3415:
3406:
3373:
3369:
3359:
3347:. Retrieved
3326:
3318:the original
3289:
3285:
3275:
3263:. Retrieved
3259:the original
3249:
3237:. Retrieved
3208:. Retrieved
3199:
3190:
3178:. Retrieved
3157:
3132:
3128:
3122:
3110:. Retrieved
3101:
3098:"Hello, Hal"
3091:
3079:. Retrieved
3064:
3045:
3039:
3014:
3010:
3001:
2989:. Retrieved
2959:
2955:
2942:
2930:. Retrieved
2921:
2896:. Retrieved
2875:
2863:. Retrieved
2849:
2837:. Retrieved
2828:
2819:
2811:
2804:. Retrieved
2795:
2786:
2774:. Retrieved
2733:(1): 72–83.
2730:
2726:
2713:
2670:
2666:
2660:
2648:. Retrieved
2634:
2622:. Retrieved
2608:
2596:. Retrieved
2582:
2547:
2541:
2529:. Retrieved
2515:
2388:IBM ViaVoice
2343:Audio mining
2307:
2304:
2286:
2274:Common Voice
2251:
2241:
2239:
2232:
2214:
2196:
2152:
2138:
2134:
2074:
2072:
1962:
1957:
1953:
1949:
1945:
1943:
1883:
1880:
1868:
1865:
1857:
1845:
1830:
1818:
1797:
1734:
1705:
1686:
1676:Apple's Siri
1663:
1657:
1523:
1500:
1496:
1488:
1481:
1474:
1470:
1458:
1443:
1434:
1425:
1412:
1376:
1360:
1345:
1334:
1310:
1285:
1276:
1256:
1243:
1217:
1141:
1126:
1122:
1114:Applications
1100:Google Brain
1083:Google Brain
1076:
1019:
995:
978:
970:
961:autoencoders
958:
943:
936:
928:
913:
909:
906:
856:
844:
788:
767:
763:
760:
756:Markov model
748:
717:
708:
702:
698:
696:
679:
667:
663:Transformers
659:Google Voice
623:
603:
540:
513:
493:
479:
473:
459:
454:
447:
429:
420:
405:
392:
365:Leonard Baum
362:
357:Philadelphia
355:was held in
351:– The first
348:
342:
318:
313:
307:
299:
285:
271:
253:
227:
213:
199:
189:
179:
163:
159:authenticate
144:
140:
138:
106:
98:
77:
73:
69:
65:
61:
44:subfield of
37:
36:
9116:Categories
9064:Autoencoder
9019:Transformer
8887:Alex Graves
8835:OpenAI Five
8739:IBM Watsonx
8361:Convolution
8339:Overfitting
8064:Topic model
7944:Text corpus
7790:Statistical
7657:Text mining
7498:AI-complete
7283:9 September
7162:9 September
7136:9 September
6860:9 September
6677:9 September
6409:9 September
6369:17 December
6323:15 February
6288:23 February
6261:12 February
6230:12 February
6199:12 February
6169:9 September
6077:12 February
6046:11 February
5992:2066/199273
5945:9 September
5709:9 September
5256:9 September
4905:ICASSP 2010
4841:9 September
4629:9 September
4573:9 September
4295:T. Robinson
4131:9 September
4012:9 September
3703:Tech Crunch
3656:23 November
3510:23 November
3376:: 165–182.
2991:9 September
2839:9 September
2806:21 February
2776:21 February
2624:21 February
2598:21 February
2159:Interspeech
1761:vocabulary.
1699:(SWER) and
1683:Performance
1654:Video games
1461:smartphones
1373:Helicopters
1246:health care
1235:Health care
1157:remediation
1026:Alex Graves
863:N-best list
790:would need
377:James Baker
319:recognition
262:John Pierce
204:Gunnar Fant
86:linguistics
58:translation
9142:Categories
9105:Technology
8958:EleutherAI
8917:Fei-Fei Li
8912:Yann LeCun
8825:Q-learning
8808:Decisional
8734:IBM Watson
8642:Midjourney
8534:TensorFlow
8381:Activation
8334:Regression
8329:Clustering
7785:Rule-based
7667:Truecasing
7535:Stop words
7257:16 October
7231:16 October
7110:9 November
6960:27 October
6399:1090351600
5849:2310.13974
5798:1611.05358
5773:1610.03035
5752:1612.02695
5731:1508.04395
5655:2202.09167
5584:(5): 159.
5552:1910.10261
5517:1904.03288
5488:1807.05162
5467:1611.01599
5416:1512.02595
4540:22 October
4270:20 January
4106:2104.00120
4082:2203.09581
4058:2104.01778
4034:1810.04805
3978:2103.15808
3957:2010.11929
3836:: 85–117.
3619:(Report).
3540:2 February
3487:20 January
3426:17 January
3349:17 January
3265:18 January
3239:9 February
3180:23 January
3112:17 January
3081:23 January
2932:22 October
2898:17 January
2680:2007.10729
2673:: 102795.
2507:References
2403:Mondegreen
2297:microphone
2282:TensorFlow
2217:(2008) by
2165:, such as
1642:Telematics
1636:captioning
1619:Captioning
1617:Real Time
1582:eDiscovery
1569:Automatic
1563:Automatic
1558:subtitling
1556:Automatic
1543:spacecraft
1472:services.
1408:navigation
1388:microphone
1379:helicopter
1325:F-16 VISTA
1323:aircraft (
1181:intonation
1142:Automatic
1053:presented
871:Bayes risk
859:re scoring
640:(LSTM), a
519:Windows XP
509:Kai-Fu Lee
470:at a time.
268:took over.
101:vocabulary
8988:MIT CSAIL
8953:Anthropic
8922:Andrew Ng
8820:AlphaZero
8664:VideoPoet
8627:AlphaFold
8564:MindSpore
8518:SpiNNaker
8513:Memristor
8420:Diffusion
8396:Rectifier
8376:Batchnorm
8356:Attention
8351:Adversary
8094:reviewing
7892:standards
7890:Types and
7422:. Wiley.
7221:1357-0978
6733:143159997
6711:CiteSeerX
6667:142730664
6001:2215-1931
5906:209353525
5898:0261-4448
5858:264426545
5600:1999-5903
5068:CiteSeerX
4965:1303.5778
4828:216472225
4820:0957-4174
4672:206561058
4618:Microsoft
4194:206485943
4121:"Li Deng"
3843:1404.7828
3398:259502346
3390:0369-7827
3306:0001-0782
2978:1932-8346
2747:1063-6676
2705:220665533
2323:AI effect
2097:−
2049:−
2025:−
2019:−
2013:−
1989:−
1876:frequency
1872:amplitude
1535:Aerospace
1511:telephony
1446:telephony
1429:synthesis
1392:U.S. Army
1296:resection
1292:brain AVM
1165:dictation
1132:Education
525:in 2005.
500:Sphinx-II
393:mid-1980s
294:1970–1990
273:Raj Reddy
258:Bell Labs
234:(LPC), a
139:The term
9096:Portals
8855:Auto-GPT
8687:Word2vec
8491:Hardware
8408:Datasets
8310:Concepts
8010:Wikidata
7990:FrameNet
7975:BabelNet
7954:Treebank
7924:PropBank
7869:Word2vec
7834:fastText
7715:Stemming
7277:Archived
7225:Archived
7213:Wired UK
7186:Archived
7156:Archived
7130:Archived
7085:Society.
7072:Archived
7033:Archived
7029:vice.com
7011:Archived
6986:Archived
6954:Archived
6929:Archived
6912:57374050
6854:Archived
6813:11 April
6807:Archived
6671:Archived
6597:26 March
6591:Archived
6566:26 March
6560:Archived
6531:Archived
6486:Archived
6464:Archived
6439:Archived
6403:Archived
6360:Archived
6317:Archived
6282:Archived
6255:Archived
6251:BBC News
6224:Archived
6193:Archived
6160:Archived
6156:13790002
6102:Archived
6071:Archived
6040:Archived
6009:86440885
5936:archived
5700:Archived
5439:Archived
5341:Archived
5274:(2015).
5247:Archived
5243:16585863
5163:Archived
5146:14862572
5086:Archived
5013:Archived
4942:Archived
4929:(2007).
4909:Archived
4832:Archived
4731:28 March
4722:Archived
4682:28 March
4676:Archived
4623:Archived
4567:Archived
4505:Archived
4429:Archived
4424:(1991),
4349:Archived
4329:62446313
4297:(1992).
4264:Archived
4238:13953660
4125:Archived
4006:Archived
3950:Scale".
3889:Archived
3884:(2006).
3868:11715509
3860:25462637
3738:Archived
3707:Archived
3676:Archived
3650:Archived
3621:Archived
3591:Archived
3565:Archived
3561:PC World
3534:Archived
3481:Archived
3340:Archived
3233:Archived
3204:Archived
3200:ethw.org
3171:Archived
3106:Archived
3075:Archived
2982:Archived
2926:Archived
2922:PC World
2889:Archived
2859:Archived
2833:Archived
2800:Archived
2767:Archived
2755:26108901
2644:Archived
2618:Archived
2592:Archived
2574:13482115
2525:Archived
2468:VoxForge
2463:VoiceXML
2316:See also
2248:Software
2237:(2012).
2219:Jurafsky
1837:Phonemes
1731:Accuracy
1665:Lifeline
1624:Robotics
1384:facemask
1356:workload
1302:Military
1211:such as
1179:such as
879:lattices
820:splicing
796:phonemes
794:for the
772:cepstral
595:GOOG-411
587:Mandarin
484:AT&T
194:formants
184:Pre-1970
170:big data
126:aircraft
8978:Meta AI
8815:AlphaGo
8799:PanGu-Σ
8769:ChatGPT
8744:Granite
8692:Seq2seq
8671:Whisper
8592:WaveNet
8587:AlexNet
8559:Flux.jl
8539:PyTorch
8391:Sigmoid
8386:Softmax
8251:General
8181:Related
8147:Chatbot
8005:WordNet
7985:DBpedia
7859:Seq2seq
7603:Parsing
7518:Trigram
7192:7 March
6537:15 June
6304:CMUDICT
6108:7 March
5825:1662180
5435:YouTube
5394:22 July
5288:Bibcode
4778:9563026
4513:Li Deng
4379:Bibcode
4174:Bibcode
3933:5 April
3812:1915014
3804:9377276
3744:20 June
3713:21 July
3682:26 July
3630:28 July
3571:28 July
3314:6175701
3137:Bibcode
3019:Bibcode
2865:4 April
2763:7319345
2685:Bibcode
2650:15 June
2531:15 June
2293:Android
2287:Google
2270:Mozilla
1703:(CSR).
1656:, with
1341:g-loads
1244:In the
1213:apraxia
1177:prosody
867:lattice
822:and an
784:phoneme
703:speaker
391:By the
371:at the
312:speech
176:History
124:), and
113:domotic
8993:Huawei
8973:OpenAI
8875:People
8845:MuZero
8707:Gemini
8702:Claude
8637:DALL-E
8549:Theano
8154:(c.f.
7812:models
7800:Neural
7513:Bigram
7508:n-gram
7448:Curlie
7426:
7394:
7375:
7356:
7333:
7314:
7219:
7182:GitHub
7064:
6910:
6900:
6846:
6781:
6731:
6713:
6665:
6397:
6302:E.g.,
6154:
6144:
6007:
5999:
5904:
5896:
5856:
5823:
5813:
5696:ICASSP
5670:
5598:
5241:
5144:
5070:
5009:236321
5007:
4826:
4818:
4776:
4670:
4409:357467
4407:
4342:Waibel
4327:
4317:
4236:
4226:
4192:
3866:
3858:
3810:
3802:
3396:
3388:
3370:Osiris
3312:
3304:
3052:
2976:
2761:
2753:
2745:
2703:
2572:
2562:
2398:Kinect
2363:Braina
2289:Gboard
2278:GitHub
2258:Sphinx
2155:ICASSP
2073:where
1944:where
1674:(e.g.
1537:(e.g.
1517:, and
1404:Canada
1337:JAS-39
1329:Mirage
1220:accent
1197:stress
1195:, and
1193:rhythm
1063:Nvidia
1055:LipNet
1011:Google
699:speech
648:&
591:Google
583:Arabic
523:Nuance
450:PDP-10
432:– The
423:n-gram
395:IBM's
353:ICASSP
122:emails
40:is an
9059:Mamba
8830:SARSA
8794:LLaMA
8789:BLOOM
8774:GPT-J
8764:GPT-4
8759:GPT-3
8754:GPT-2
8749:GPT-1
8712:LaMDA
8544:Keras
8203:spaCy
7848:large
7839:GloVe
7039:1 May
6950:NAEYC
6908:S2CID
6729:S2CID
6663:S2CID
6492:1 May
6442:(PDF)
6431:(PDF)
6363:(PDF)
6344:(PDF)
6163:(PDF)
6152:S2CID
6130:(PDF)
6005:S2CID
5939:(PDF)
5924:(PDF)
5902:S2CID
5854:S2CID
5844:arXiv
5821:S2CID
5793:arXiv
5768:arXiv
5747:arXiv
5726:arXiv
5703:(PDF)
5692:(PDF)
5650:arXiv
5547:arXiv
5512:arXiv
5483:arXiv
5462:arXiv
5445:5 May
5411:arXiv
5388:(PDF)
5377:(PDF)
5344:(PDF)
5333:(PDF)
5250:(PDF)
5239:S2CID
5219:(PDF)
5142:S2CID
5107:(PDF)
5089:(PDF)
5056:(PDF)
5016:(PDF)
5005:S2CID
4985:(PDF)
4960:arXiv
4945:(PDF)
4934:(PDF)
4912:(PDF)
4901:(PDF)
4835:(PDF)
4824:S2CID
4796:(PDF)
4774:S2CID
4725:(PDF)
4702:(PDF)
4668:S2CID
4580:1994.
4405:S2CID
4325:S2CID
4234:S2CID
4190:S2CID
4101:arXiv
4077:arXiv
4053:arXiv
4029:arXiv
3973:arXiv
3952:arXiv
3864:S2CID
3838:arXiv
3808:S2CID
3624:(PDF)
3617:(PDF)
3394:S2CID
3343:(PDF)
3336:(PDF)
3310:S2CID
3210:1 May
3174:(PDF)
3167:(PDF)
2985:(PDF)
2952:(PDF)
2892:(PDF)
2885:(PDF)
2770:(PDF)
2759:S2CID
2723:(PDF)
2701:S2CID
2675:arXiv
2570:S2CID
2479:Lists
2328:ALPAC
2266:Kaldi
2228:DARPA
2193:Books
2171:NAACL
1189:tempo
1185:pitch
1159:, or
1047:Baidu
1015:Apple
885:with
808:delta
718:Both
693:2010s
615:IARPA
555:LIMSI
553:with
537:2000s
527:Apple
442:RIPAC
438:CSELT
304:DARPA
281:chess
8983:Mila
8784:PaLM
8717:Bard
8697:BERT
8680:Text
8659:Sora
7968:Data
7819:BERT
7424:ISBN
7392:ISBN
7373:ISBN
7354:ISBN
7331:ISBN
7312:ISBN
7285:2024
7259:2021
7233:2021
7217:ISSN
7194:2022
7164:2024
7138:2024
7112:2019
7062:ISBN
7041:2018
6962:2023
6898:ISBN
6862:2024
6844:ISBN
6815:2021
6779:ISBN
6679:2024
6599:2014
6568:2014
6539:2013
6494:2018
6411:2024
6395:OCLC
6371:2023
6325:2023
6290:2023
6263:2023
6232:2023
6201:2023
6171:2024
6142:ISBN
6110:2023
6079:2023
6048:2023
5997:ISSN
5947:2024
5894:ISSN
5811:ISBN
5711:2024
5668:ISBN
5596:ISSN
5560:2024
5447:2017
5396:2019
5381:ICML
5258:2024
5201:help
4843:2024
4816:ISSN
4733:2011
4684:2011
4631:2024
4575:2024
4542:2018
4315:ISBN
4272:2015
4224:ISBN
4133:2024
4014:2024
3935:2016
3856:PMID
3800:PMID
3746:2015
3715:2015
3684:2017
3658:2011
3632:2017
3599:2014
3573:2017
3542:2016
3512:2015
3489:2015
3428:2015
3386:ISSN
3351:2015
3302:ISSN
3267:2015
3241:2017
3212:2018
3182:2018
3114:2015
3083:2018
3050:ISBN
2993:2024
2974:ISSN
2934:2018
2900:2015
2867:2019
2841:2024
2808:2014
2778:2014
2751:OCLC
2743:ISSN
2652:2013
2626:2012
2600:2012
2560:ISBN
2533:2013
2300:icon
2187:IEEE
2183:IEEE
2179:IEEE
1662:and
1571:shot
1508:deaf
1346:The
1321:F-16
1263:ARRA
1151:for
1106:and
1098:and
1081:and
1069:and
1013:and
810:and
722:and
585:and
573:and
567:ICSI
557:and
531:Siri
480:1990
474:1987
460:1984
430:1987
406:1982
379:and
349:1976
343:1972
333:and
300:1971
254:1969
228:1966
214:1962
200:1960
190:1952
168:and
88:and
48:and
8724:NMT
8607:OCR
8602:HWR
8554:JAX
8508:VPU
8503:TPU
8498:IPU
8322:SGD
8000:UBY
7446:at
6981:NPR
6890:doi
6836:doi
6771:doi
6721:doi
6655:doi
6352:doi
6134:doi
5987:hdl
5977:doi
5884:doi
5803:doi
5660:doi
5623:doi
5586:doi
5522:doi
5296:doi
5231:doi
5134:doi
5078:doi
4997:doi
4866:doi
4808:doi
4804:153
4764:hdl
4756:doi
4714:doi
4658:doi
4472:doi
4395:hdl
4387:doi
4307:doi
4216:doi
4182:doi
3848:doi
3792:doi
3455:doi
3378:doi
3294:doi
3145:doi
3027:doi
2964:doi
2735:doi
2693:doi
2671:104
2552:doi
2262:HTK
2256:'s
2167:ACL
1504:RSI
1454:IVR
1396:RAE
1352:RAF
1167:or
1096:MIT
1028:of
998:HMM
824:LDA
731:or
617:'s
571:SRI
551:BBN
547:IBM
468:RAM
327:IBM
323:BBN
242:of
218:IBM
143:or
120:or
78:STT
72:or
68:),
66:ASR
9144::
7409:.
7275:.
7249:.
7223:.
7215:.
7211:.
7184:.
7180:.
7098:.
7070:.
7027:.
7009:.
7003:.
6978:.
6952:.
6948:.
6935:".
6906:.
6896:.
6852:.
6842:.
6805:.
6801:.
6777:.
6727:.
6719:.
6707:33
6705:.
6669:.
6661:.
6651:26
6649:.
6645:.
6589:.
6585:.
6547:^
6484:.
6480:.
6462:.
6458:.
6437:.
6401:.
6358:.
6346:.
6315:.
6311:.
6306:,
6253:.
6249:.
6222:.
6218:.
6187:.
6158:.
6150:.
6140:.
6100:.
6096:.
6065:.
6034:.
6030:.
6011:.
6003:.
5995:.
5985:.
5971:.
5967:.
5949:,
5926:,
5900:.
5892:.
5880:50
5878:.
5874:.
5852:,
5819:.
5809:.
5801:.
5698:.
5694:.
5666:.
5658:.
5644:.
5617:.
5594:.
5582:15
5580:.
5576:.
5545:,
5520:.
5506:.
5433:.
5379:.
5339:.
5335:.
5294:.
5284:10
5282:.
5278:.
5245:.
5237:.
5227:21
5225:.
5221:.
5192::
5190:}}
5186:{{
5174:^
5140:.
5130:20
5128:.
5109:.
5084:.
5076:.
5062:.
5058:.
5043:^
5011:.
5003:.
4991:.
4987:.
4940:.
4936:.
4907:.
4903:.
4862:15
4860:.
4830:.
4822:.
4814:.
4802:.
4798:.
4772:.
4762:.
4752:37
4750:.
4720:.
4710:14
4708:.
4704:.
4674:.
4666:.
4654:14
4652:.
4648:.
4615:.
4577:.
4561:.
4544:.
4528:.
4515:).
4493:^
4466:.
4403:.
4393:.
4385:.
4375:26
4373:.
4323:.
4313:.
4301:.
4262:.
4258:.
4246:^
4232:.
4222:.
4202:^
4188:.
4180:.
4170:29
4168:.
4150:^
4091:^
4067:^
4043:^
4002:30
4000:.
3996:.
3939:."
3913:^
3862:.
3854:.
3846:.
3834:61
3832:.
3806:.
3798:.
3786:.
3778:;
3767:^
3736:.
3732:.
3705:.
3701:.
3674:.
3563:.
3559:.
3532:.
3528:.
3467:^
3451:17
3449:.
3445:.
3414:.
3392:.
3384:.
3374:38
3372:.
3368:.
3308:.
3300:.
3290:57
3288:.
3284:.
3231:.
3220:^
3198:.
3169:.
3143:.
3133:62
3131:.
3104:.
3100:.
3073:.
3025:.
3015:46
3013:.
2980:.
2972:.
2958:.
2954:.
2924:.
2920:.
2908:^
2827:.
2810:.
2794:.
2765:.
2757:.
2749:.
2741:.
2729:.
2725:.
2699:.
2691:.
2683:.
2669:.
2568:.
2558:.
2312:.
2302:.
2169:,
2157:,
1541:,
1494:.
1215:.
1191:,
1187:,
1183:,
1094:,
735:.
621:.
569:,
561:,
533:.
444:).
329:,
325:,
302:–
283:.
230:–
216:–
202:–
96:.
84:,
8237:e
8230:t
8223:v
8158:)
7881:,
7850:)
7846:(
7476:e
7469:t
7462:v
7432:.
7400:.
7381:.
7362:.
7339:.
7320:.
7287:.
7261:.
7235:.
7196:.
7114:.
7043:.
6964:.
6914:.
6892::
6864:.
6838::
6817:.
6787:.
6773::
6735:.
6723::
6681:.
6657::
6601:.
6570:.
6541:.
6496:.
6413:.
6373:.
6354::
6327:.
6292:.
6265:.
6234:.
6203:.
6173:.
6136::
6112:.
6081:.
6050:.
6036:2
5989::
5979::
5973:4
5908:.
5886::
5846::
5827:.
5805::
5795::
5776:.
5770::
5755:.
5749::
5734:.
5728::
5713:.
5676:.
5662::
5652::
5629:.
5625::
5602:.
5588::
5549::
5528:.
5524::
5514::
5491:.
5485::
5470:.
5464::
5449:.
5419:.
5413::
5398:.
5361:.
5304:.
5298::
5290::
5260:.
5233::
5203:)
5199:(
5148:.
5136::
5113:.
5080::
5064:7
5037:.
4999::
4993:1
4968:.
4962::
4872:.
4868::
4845:.
4810::
4780:.
4766::
4758::
4735:.
4716::
4686:.
4660::
4633:.
4478:.
4474::
4468:7
4411:.
4397::
4389::
4381::
4331:.
4309::
4274:.
4240:.
4218::
4196:.
4184::
4176::
4135:.
4109:.
4103::
4085:.
4079::
4061:.
4055::
4037:.
4031::
4016:.
3981:.
3975::
3960:.
3954::
3937:.
3870:.
3850::
3840::
3814:.
3794::
3788:9
3748:.
3717:.
3686:.
3660:.
3634:.
3601:.
3575:.
3544:.
3514:.
3491:.
3461:.
3457::
3430:.
3400:.
3380::
3353:.
3296::
3269:.
3243:.
3214:.
3184:.
3151:.
3147::
3139::
3116:.
3085:.
3058:.
3033:.
3029::
3021::
2995:.
2966::
2960:3
2936:.
2902:.
2869:.
2843:.
2780:.
2737::
2731:3
2707:.
2695::
2687::
2677::
2654:.
2628:.
2602:.
2576:.
2554::
2535:.
2115:.
2112:)
2109:d
2106:+
2103:s
2100:(
2094:n
2091:=
2088:h
2075:h
2056:n
2052:i
2046:h
2040:=
2035:n
2031:)
2028:i
2022:d
2016:s
2010:n
2007:(
2001:=
1998:R
1995:E
1992:W
1986:1
1983:=
1980:R
1977:R
1974:W
1958:n
1954:i
1950:d
1946:s
1928:n
1924:)
1921:i
1918:+
1915:d
1912:+
1909:s
1906:(
1900:=
1897:R
1894:E
1891:W
1678:)
1319:/
1261:(
768:n
764:n
685:/
224:.
210:.
76:(
64:(
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.