Method and apparatus for producing, combining, and customizing virtual sound environments. A binaural sound system () includes a transceiver () configured for receiving a signal () containing at least a first type of information and a second type of information. The first type of information includes enunciated data (). The enunciated data specifies certain information intended to be audibly enunciated to a user. The second type of information comprises first type of metadata (-) and a second type of metadata (-). The first type of metadata includes information which identifies a characteristic of the enunciated data exclusive of spatial position information. The second type of metadata identifies a spatial position information associated with the enunciated data.