George E Heidorn from 52 Missionary Rd #219, Cromwell, CT 06416, age 86

Method And Apparatus For Identifying Erroneous Characters In Text

View page

US Patent:

6360197, Mar 19, 2002

Filed:

Oct 19, 1999

Appl. No.:

09/420661

Inventors:

Andi Wu - Bellevue WA
George E. Heidorn - Bellevue WA

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 1727

US Classification:

704 9, 707533

Abstract:

A method and apparatus are provided that identify confused characters in a text written in a language having a large number of distinct characters. To identify the confused characters, a set of characters from the text are segmented into individual characters. A confusable character for at least one of the segmented characters is then retrieved. Lexical information is identified for both the segmented characters and the retrieved confusable characters and is used to parse the segmented characters and the confusable characters. Based on the parse, a segmented character is identified that has been confused with a confusable character.

Method And System For Compiling A Lexical Knowledge Base

View page

US Patent:

7383169, Jun 3, 2008

Filed:

Apr 13, 1994

Appl. No.:

08/227247

Inventors:

Lucretia H. Vanderwende - Redmond WA, US
Stephen D. Richardson - Redmond WA, US
Karen Jensen - Bellevue WA, US
George E. Heidorn - Bellevue WA, US
William B. Dolan - Redmond WA, US

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 17/27

US Classification:

704 9

Abstract:

A lexical knowledge base is compiled automatically from a machine-readable source (such as an on-line dictionary or unstructured text). The preferred embodiment of the invention makes use of “backward linking,” by which inverse semantic relations are discerned from the text and used to augment the knowledge base. By this arrangement, on-line dictionaries and other texts can provide formidable sources of “common sense” knowledge about the world.

Information Retrieval Utilizing Semantic Representation Of Text By Identifying Hypernyms And Indexing Multiple Tokenized Semantic Structures To A Same Passage Of Text

View page

US Patent:

61610844, Dec 12, 2000

Filed:

Aug 3, 1999

Appl. No.:

9/366499

Inventors:

John J. Messerly - Bainbridge Island WA
George E. Heidorn - Bellevue WA
Stephen D. Richardson - Redmond WA
William B. Dolan - Redmond WA
Karen Jensen - Bellevue WA

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 1727
G06F 1730

US Classification:

704 9

Abstract:

The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypernyms that each have an "is a" relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.

Method And System For Bootstrapping Statistical Processing Into A Rule-Based Natural Language Parser

View page

US Patent:

59638940, Oct 5, 1999

Filed:

May 20, 1997

Appl. No.:

8/858959

Inventors:

Stephen Darrow Richardson - Redmond WA
George E. Heidorn - Bellevue WA

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 1727
G06F 1728

US Classification:

704 9

Abstract:

A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, a statistical bootstrapping software facility optimizes the operation of a robust natural language parser that uses a set of lexicon entries to determine possible parts of speech of words from an input string and a set of rules to combine words from the input string into syntactic structures. The facility first operates the parser in a statistics compilation mode, in which, for each of many sample input strings, the parser attempts to apply all applicable rules and lexicon entries. While the parser is operating in the statistics compilation mode, the facility compiles statistics indicating the likelihood of success of each rule and lexicon entry, based on the success of each rule and lexicon entry when applied in the statistics compilation mode. After a sufficient body of likelihood of success statistics have been compiled, the facility operates the parser in an efficient parsing mode, in which the facility uses the compiled statistics to optimize the operation of the parser. In order to parse an input string in the efficient parsing mode, the facility causes the parser to apply applicable rules and lexicon entries in the descending order of the likelihood of their success as indicated by the statistics compiled in the statistics compilation mode.

Method And System For Computing Semantic Logical Forms From Syntax Trees

View page

US Patent:

59666862, Oct 12, 1999

Filed:

Jun 28, 1996

Appl. No.:

8/674610

Inventors:

George Heidorn - Bellevue WA
Karen Jensen - Bellevue WA

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 1727

US Classification:

704 9

Abstract:

Methods and computer systems for semantically analyzing natural language sentences. The natural language processing subsystems for morphological and syntactic analysis transform an input sentence into a syntax parse tree. Semantic analysis applies three sets of semantic rules to create a skeletal logical form graph from a syntax parse tree. Semantic analysis then applies two additional sets of semantic rules to provide semantically meaningful labels for the links of the logical form graph, to create additional logical form graph nodes for missing elements, and to unify redundant elements. The final logical form graph represents the complete semantic analysis of an input sentence.

Information Retrieval Utilizing Semantic Representation Of Text And Based On Constrained Expansion Of Query Words

View page

US Patent:

62469770, Jun 12, 2001

Filed:

Aug 3, 1999

Appl. No.:

9/368071

Inventors:

John J. Messerly - Bainbridge Island WA
George E. Heidorn - Bellevue WA
Stephen D. Richardson - Redmond WA
William B. Dolan - Redmond WA
Karen Jensen - Bellevue WA

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 1727
G06F 1730

US Classification:

704 9

Abstract:

The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypemyms that each have an "is a" relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.

Method And System For Bootstrapping Statistical Processing Into A Rule-Based Natural Language Parser

View page

US Patent:

57520526, May 12, 1998

Filed:

Jun 24, 1994

Appl. No.:

8/265845

Inventors:

Stephen Darrow Richardson - Redmond WA
George E. Heidorn - Bellevue WA

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 1727

US Classification:

395759

Abstract:

A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, a statistical bootstrapping software facility optimizes the operation of a robust natural language parser that uses a set of lexicon entries to determine possible parts of speech of words from an input string and a set of rules to combine words from the input string into syntactic structures. The facility first operates the parser in a statistics compilation mode, in which, for each of many sample input strings, the parser attempts to apply all applicable rules and lexicon entries. While the parser is operating in the statistics compilation mode, the facility compiles statistics indicating the likelihood of success of each rule and lexicon entry, based on the success of each rule and lexicon entry when applied in the statistics compilation mode. After a sufficient body of likelihood of success statistics have been compiled, the facility operates the parser in an efficient parsing mode, in which the facility uses the compiled statistics to optimize the operation of the parser. In order to parse an input string in the efficient parsing mode, the facility causes the parser to apply applicable rules and lexicon entries in the descending order of the likelihood of their success as indicated by the statistics compiled in the statistics compilation mode.

Method And System For Identifying And Resolving Commonly Confused Words In A Natural Language Parser

View page

US Patent:

59998962, Dec 7, 1999

Filed:

Jun 25, 1996

Appl. No.:

8/671203

Inventors:

Stephen Darrow Richardson - Redmond WA
George E. Heidorn - Bellevue WA

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 1738

US Classification:

704 9

Abstract:

A method and system for identifying and resolving commonly confused words in a natural language parser is provided. In a preferred embodiment, a computer system parses input text made up of two or more words using a relation that maps from potentially confused words, including one word among the words of the input text, to possibly intended words. The computer system first identifies the possible parts of speech for each word of the input text including the potentially confused word. The computer system then identifies the possible parts of speech for the possibly intended word to which the relation maps the potentially confused word. Finally, the computer system applies syntactic grammar rules to the identified parts of speech such that a complete syntax tree containing a possible part of speech for the possibly intended word is produced and no complete syntax tree containing a possible part of speech for the potentially confused word is produced. According to a further embodiment of the invention, the computer system provides feedback on the input text by outputting an indication that a sentence in the input text is syntactically incorrect and outputting a further indication that the sentence in the input text would be syntactically correct if the potentially confused word in the input text was replaced with the possibly intended word.

George E Heidorn

George Heidorn Phones & Addresses

Work

Education

Publications

Isbn (Books And Publications)

Natural Language Processing: The Plnlp Approach

Us Patents

Method And Apparatus For Identifying Erroneous Characters In Text

Method And System For Compiling A Lexical Knowledge Base

Information Retrieval Utilizing Semantic Representation Of Text By Identifying Hypernyms And Indexing Multiple Tokenized Semantic Structures To A Same Passage Of Text

Method And System For Bootstrapping Statistical Processing Into A Rule-Based Natural Language Parser

Method And System For Computing Semantic Logical Forms From Syntax Trees

Information Retrieval Utilizing Semantic Representation Of Text And Based On Constrained Expansion Of Query Words

Method And System For Bootstrapping Statistical Processing Into A Rule-Based Natural Language Parser

Method And System For Identifying And Resolving Commonly Confused Words In A Natural Language Parser