Philip A Cuadra

from San Francisco, CA

Age ~40

Get Report

Philip Cuadra Phones & Addresses

716 Duboce Ave, San Francisco, CA 94117
Mountain View, CA
Santa Clara, CA
Saint Albans, WV

Publications

Us Patents

Instruction Level Execution Preemption

View page

US Patent:

20130124838, May 16, 2013

Filed:

Nov 10, 2011

Appl. No.:

13/294045

Inventors:

Lacky V. SHAH - Los Altos Hills CA, US
Gregory Scott Palmer - Cedar Park TX, US
Gernot Schaufler - Mountain View CA, US
Samuel H. Duncan - Arlington MA, US
Philip Browning Johnson - Campbell CA, US
Shirish Gadre - Fremont CA, US
Robert Ohannessian - Austin TX, US
Nicholas Wang - Saratoga CA, US
Christopher Lamb - San Jose CA, US
Philip Alexander Cuadra - Mountain View CA, US
Timothy John Purcell - Provo UT, US

International Classification:

G06F 9/38

US Classification:

712234, 712E09062

Abstract:

One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

Signaling, Ordering, And Execution Of Dynamically Generated Tasks In A Processing System

View page

US Patent:

20130160021, Jun 20, 2013

Filed:

Dec 16, 2011

Appl. No.:

13/329169

Inventors:

Timothy John PURCELL - Provo UT, US
Lacky V. Shah - Los Altos Hills CA, US
Sean J. Treichler - Sunnyvale CA, US
Karim M. Abdalla - Menlo Park CA, US
Philip Alexander Cuadra - San Francisco CA, US
Brian Pharris - Cary NC, US

International Classification:

G06F 9/46

US Classification:

718104, 718102

Abstract:

One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.

System And Method For Long Running Compute Using Buffers As Timeslices

View page

US Patent:

20130162661, Jun 27, 2013

Filed:

Dec 21, 2011

Appl. No.:

13/333920

Inventors:

Jeffrey A. Bolz - Austin TX, US
Jeff Smith - Santa Clara CA, US
Jesse Hall - Santa Clara CA, US
David Sodman - Fremont CA, US
Philip Cuadra - San Francisco CA, US
Naveen Leekha - Fremont CA, US

Assignee:

NVIDIA CORPORATION - Santa Clara CA

International Classification:

G06T 1/00

US Classification:

345522

Abstract:

A system and method for using command buffers as timeslices or periods of execution for a long running compute task on a graphics processor. Embodiments of the present invention allow execution of long running compute applications with operating systems that manage and schedule graphics processing unit (GPU) resources and that may have a predetermined execution time limit for each command buffer. The method includes receiving a request from an application and determining a plurality of command buffers required to execute the request. Each of the plurality of command buffers may correspond to some portion of execution time or timeslice. The method further includes sending the plurality of command buffers to an operating system operable for scheduling the plurality of command buffers for execution on a graphics processor. The command buffers from a different request are time multiplexed within the execution of the plurality of command buffers on the graphics processor.

Low Latency Concurrent Computation

View page

US Patent:

20130187935, Jul 25, 2013

Filed:

Jan 24, 2012

Appl. No.:

13/357569

Inventors:

Daniel Elliot Wexler - Soda Springs CA, US
Jeffrey A. Bolz - Austin TX, US
Jesse David Hall - Santa Clara CA, US
Philip Alexander Cuadra - San Francisco CA, US
Naveen Leekha - Durham NC, US
Ignacio Llamas - Sunnyvale CA, US

International Classification:

G06F 15/80

US Classification:

345505

Abstract:

One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.

Automatic Dependent Task Launch

View page

US Patent:

20130198760, Aug 1, 2013

Filed:

Jan 27, 2012

Appl. No.:

13/360581

Inventors:

Philip Alexander CUADRA - San Francisco CA, US
Lacky V. Shah - Los Altos Hills CA, US
Timothy John Purcell - Provo UT, US
Gerald F. Luiz - Los Gatos CA, US

International Classification:

G06F 9/46

US Classification:

718106

Abstract:

One embodiment of the present invention sets forth a technique for automatic launching of a dependent task when execution of a first task completes. Automatically launching the dependent task reduces the latency incurred during the transition from the first task to the dependent task. Information associated with the dependent task is encoded as part of the metadata for the first task. When execution of the first task completes a task scheduling unit is notified and the dependent task is launched without requiring any release or acquisition of a semaphore. The information associated with the dependent task includes an enable flag and a pointer to the dependent task. Once the dependent task is launched, the first task is marked as complete so that memory storing the metadata for the first task may be reused to store metadata for a new task.

Methods And Apparatus For Auto-Throttling Encapsulated Compute Tasks

View page

US Patent:

20130268942, Oct 10, 2013

Filed:

Apr 9, 2012

Appl. No.:

13/442730

Inventors:

Jesse David Hall - Santa Clara CA, US
Philip Alexander Cuadra - San Francisco CA, US
Karim M. Abdalla - Menlo Park CA, US

International Classification:

G06F 9/46

US Classification:

718104

Abstract:

Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.

Technique For Computational Nested Parallelism

View page

US Patent:

20130298133, Nov 7, 2013

Filed:

May 2, 2012

Appl. No.:

13/462649

Inventors:

Stephen JONES - San Francisco CA, US
Philip Alexander Cuadra - San Francisco CA, US
Daniel Elliot Wexler - Soda Springs CA, US
Ignacio Llamas - Sunnyvale CA, US
Lacky V. Shah - Los Altos Hills CA, US
Christopher Lamb - San Jose CA, US

International Classification:

G06F 9/50

US Classification:

718104

Abstract:

One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Compute Work Distribution Reference Counters

View page

US Patent:

20130117758, May 9, 2013

Filed:

Nov 8, 2011

Appl. No.:

13/291369

Inventors:

Philip Alexander Cuadra - Mountain View CA, US
Karim M. Abdalla - Menlo Park CA, US
Luke Durant - Santa Clara CA, US
Gerald F. Luiz - Los Gatos CA, US
Timothy John Purcell - Provo UT, US
Lacky V. Shah - Los Altos Hills CA, US

International Classification:

G06F 9/46

US Classification:

718104

Abstract:

One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.