Conversational Commerce and Speaker Recognition, Part II

Share: Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInShare on RedditEmail this to someonePrint this page

In part I of this post I discussed the applicability of speaker recognition in conversational commerce. It focused on the use of speaker recognition techniques to enable voice authentication in a conversational commerce prototype. This follow up post discusses how we can leverage the same speaker recognition techniques to provide continuous authentication.

Continuous Authentication

Continuous authentication is the process of continually guaranteeing the identity of a system user. We have developed a small JavaScript module that can be used to enable continuous authentication in speech-enabled applications. It leverages the speaker recognition system which was used to enable voice authentication in the conversational commerce prototype. The continuous authentication feature will guarantee that the speaker, at any point in time, is the same user who started the current conversation. The continuous authentication feature uses a text-independent approach and longer audio samples than the base voice authentication approach discussed in part I. Combining information from the initial authentication process with an ongoing speaker analysis ensures better accuracy and, thus, better security. The improved security is important because the continuous authentication feature can be used to control the execution of privileged operations in a conversational commerce system (i.e. purchases, access to sensitive data, etc.).

Continuous authentication can work as follows:

1. The system continuously records voice interactions.
The recordings are used to generate audio samples that are used to build models for continuous authentication.

2. Approximately two minutes of audio data is captured for a registered user.
The first two minutes of voice interactions are recorded and used to create a classification model for continuous authentication. The two-minute audio sample is used to generate a text-independent model for speaker identification
. The speaker identification service monitors the size of the audio sample to estimate when the system has accumulated two minutes of audio data. Once the two minutes of audio data has been collected then the speaker identification service will use the data to create a model. The model is associated with the registered user and is stored for future use by the speaker identification service.

3. Continuous authentication is performed on all subsequent interactions with the system.
All subsequent conversations with the system are recorded. The recorded audio is continuously streamed to the speaker identification service. The speaker identification service scores each incremental voice sample using the continuous authentication models of all registered users and confirms the legitimacy of the interaction. Privileged operations are not allowed unless the speaker is in a continuously authenticated state.
For example, if a user requests pizza delivery and the speaker is in an unauthenticated state then the request will not be processed until the speaker is authenticated again (through more voice, voice login or a second text login). If the user is in a continuously authenticated state then the request is processed.


Figure 1: Registration Flow (requires ~2 mins of audio)


Figure 2: Continuous Authentication Flow


In summary, IBM Emerging Technologies has developed a prototype as part of an exploration into conversational commerce. We have collaborated with the Audio Analytics team in IBM Research to integrate their speaker recognition system into our prototype. This effort has allowed us to validate the applicability and feasibility of speaker recognition techniques to enable features such as voice authentication and continuous authentication in conversational commerce.

Share: Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInShare on RedditEmail this to someonePrint this page

Leave a Reply