Search

Mehryar Mohri Phones & Addresses

  • 33 Greenwich St, New York, NY 10014 (212) 229-0258 (212) 741-4530
  • 2 Washington Square Vlg, New York, NY 10012

Work

Company: Google Aug 2004 to May 2018 Position: Visiting faculty and research consultant

Education

Degree: Doctorates, Doctor of Philosophy School / High School: Université Paris Diderot 1988 to 1993 Specialities: Computer Science

Skills

Machine Learning • Mathematics • Computer Science

Industries

Higher Education

Resumes

Resumes

Mehryar Mohri Photo 1

Head, Learning Theory Team, Google Research

View page
Location:
New York, NY
Industry:
Higher Education
Work:
Google Aug 2004 - May 2018
Visiting Faculty and Research Consultant

Google Aug 2004 - May 2018
Head, Learning Theory Team, Google Research

Courant Institute of Mathematical Sciences Aug 2004 - May 2018
Professor

At&T Labs Research/At&T Bell Labs Jan 1994 - Aug 2004
Department Head and Technology Leader and Member of the Technical Staff

Ecole Polytechnique 1992 - 1994
Assistant Professor
Education:
Université Paris Diderot 1988 - 1993
Doctorates, Doctor of Philosophy, Computer Science
Ecole Normale Supérieure 1987 - 1988
Masters, Applied Mathematics
École Polytechnique 1985 - 1987
Skills:
Machine Learning
Mathematics
Computer Science

Publications

Wikipedia

Mehryar Mohri

View page

Mehryar Mohri is a professor of computer science at the Courant Institute of Mathematical Sciences at New York University known for his work in machine ...

Wikipedia References

Mehryar Mohri Photo 2

Mehryar Mohri

Work:
Area of science:

Computer scientist

Company:

New York University faculty

Position:

Author

Education:
Studied at:

University of Paris

Area of science:

Computational linguistics • Automata theory

Academic degree:

Professor

Skills & Activities:
Skill:

Computer science • Algorithms

Us Patents

Systems And Methods For Determinizing And Minimizing A Finite State Transducer For Pattern Recognition

View page
US Patent:
6456971, Sep 24, 2002
Filed:
Oct 27, 2000
Appl. No.:
09/697139
Inventors:
Mehryar Mohri - New York NY
Fernando Carlos Neves Pereira - Westfield NJ
Michael Dennis Riley - New York NY
Assignee:
ATT Corp. - New York NY
International Classification:
G10L 1514
US Classification:
704256, 704 10, 704255, 704257
Abstract:
A pattern recognition system and method for optimal reduction of redundancy and size of a weighted and labeled graph presents receiving speech signals, converting the speech signals into word sequence, interpreting the word sequences in a graph where the graph is labeled with word sequences and weighted with probabilities and determinizing the graph by removing redundant word sequences. The size of the graph can also be minimized by collapsing some nodes of the graph in a reverse determinizing manner. The graph can further be tested for determinizability to determine if the graph can be determinized. The resulting word sequence in the graph may be shown in a display device so that recognition of speech signals can be demonstrated.

Fully Expanded Context-Dependent Networks For Speech Recognition

View page
US Patent:
6574597, Jun 3, 2003
Filed:
Feb 11, 2000
Appl. No.:
09/502501
Inventors:
Mehryar Mohri - New York NY
Michael Dennis Riley - New York NY
Assignee:
ATT Corp. - New York NY
International Classification:
G10L 1500
US Classification:
704251, 704255, 704256, 704257
Abstract:
A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.

System And Methods For Optimizing Networks Of Weighted Unweighted Directed Graphs

View page
US Patent:
6587844, Jul 1, 2003
Filed:
Feb 1, 2000
Appl. No.:
09/495174
Inventors:
Mehryar Mohri - New York NY
Assignee:
ATT Corp. - New York NY
International Classification:
G06F 1518
US Classification:
706 20, 706 12, 704256, 704257
Abstract:
Unweighted finite state automata may be used in speech recognition systems, but considerably reduce the speed and accuracy of the speech recognition system. Unfortunately, developing a suitable training corpus for a speech recognition task is time consuming and expensive, if it is even possible. Additionally, it is unlikely that a training corpus could adequately reflect the various probabilities for the word and/or phoneme combinations. Accordingly, such very-large-vocabulary speech recognition systems often must be used in an unweighted state. The directed graph optimizing systems and methods determine the shortest distances between source and end nodes of a weighted directed graph. These various directed graph optimizing systems and methods also reweight the directed graph based on the determined shortest distances, so that the weights are, for example, front weighted. Accordingly, searches through the directed graph that are based on the total weights of the paths taken will be more efficient. Various directed graph optimizing systems and methods also arbitrarily weight an unweighted directed graph so that the shortest distance and reweighting systems and methods can be used.

Method And Apparatus For Rapid Acoustic Unit Selection From A Large Speech Corpus

View page
US Patent:
6697780, Feb 24, 2004
Filed:
Apr 25, 2000
Appl. No.:
09/557146
Inventors:
Mark Charles Beutnagel - Mendham NJ
Mehryar Mohri - New York NY
Michael Dennis Riley - New York NY
Assignee:
ATT Corp. - New York NY
International Classification:
G10L 1304
US Classification:
704258, 704266
Abstract:
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.

Methods And Apparatus For Rapid Acoustic Unit Selection From A Large Speech Corpus

View page
US Patent:
6701295, Mar 2, 2004
Filed:
Feb 6, 2003
Appl. No.:
10/359171
Inventors:
Mark Charles Beutnagel - Mendham NJ
Mehryar Mohri - New York NY
Michael Dennis Riley - New York NY
Assignee:
ATT Corp. - New York NY
International Classification:
G10L 1306
US Classification:
704258, 704266
Abstract:
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.

System And Method Of Ε Removal Of Weighted Automata And Transducers

View page
US Patent:
7027988, Apr 11, 2006
Filed:
Jul 20, 2001
Appl. No.:
09/910090
Inventors:
Mehryar Mohri - New York NY, US
Assignee:
AT&T Corp. - New York NY
International Classification:
G10L 15/18
G10L 15/28
G06F 17/21
US Classification:
704257, 704255, 704 10
Abstract:
An improved ε-removal method is disclosed that computes for any input weighted automaton A with ε-transitions an equivalent weighted automaton B with no ε-transitions. The method comprises two main steps. The first step comprises computing for each state “p” of the automaton A its ε-closure. The second step in the method comprises modifying the outgoing transitions of each state “p” by removing those labeled with ε. The method next comprises adding to the set of transitions leaving the state “p” non-ε-transitions leaving each state “q” in the set of states reachable from “p” via a path labeled with εwith their weights pre--multiplied by the ε-distance from state “p” to state “q” in the automaton A. State “p” is a final state if some state “q” within the set of states reachable from “p” via a path labeled with εis final and the final weight.

Methods And Apparatus For Rapid Acoustic Unit Selection From A Large Speech Corpus

View page
US Patent:
7082396, Jul 25, 2006
Filed:
Dec 19, 2003
Appl. No.:
10/742274
Inventors:
Mark C. Beutnagel - Mendham NJ, US
Mehryar Mohri - New York NY, US
Michael D. Riley - New York NY, US
Assignee:
AT&T Corp - New York NY
International Classification:
G10L 13/06
US Classification:
704258, 704266
Abstract:
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.

Systems And Methods For Generating Weighted Finite-State Automata Representing Grammars

View page
US Patent:
7181386, Feb 20, 2007
Filed:
Jul 18, 2002
Appl. No.:
10/199220
Inventors:
Mehryar Mohri - New York NY, US
Mark-Jan Nederhof - Groningen, NL
Assignee:
AT&T Corp. - New York NY
International Classification:
G05D 1/00
G05D 3/00
G06F 7/00
G06F 17/00
US Classification:
704 1
Abstract:
A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.
Mehryar F Mohri from New York, NY, age ~61 Get Report