Conversational Commerce and Speaker Recognition, Part II
In part I of this post I discussed the applicability of speaker recognition in conversational commerce. It focused on the use of speaker recognition techniques to enable voice authentication in a conversational commerce prototype. This follow up post discusses how we can leverage the same speaker recognition techniques to provide continuous authentication.
Continuous authentication can work as follows:
1. The system continuously records voice interactions.
The recordings are used to generate audio samples that are used to build models for continuous authentication.
2. Approximately two minutes of audio data is captured for a registered user.
The first two minutes of voice interactions are recorded and used to create a classification model for continuous authentication. The two-minute audio sample is used to generate a text-independent model for speaker identification. The speaker identification service monitors the size of the audio sample to estimate when the system has accumulated two minutes of audio data. Once the two minutes of audio data has been collected then the speaker identification service will use the data to create a model. The model is associated with the registered user and is stored for future use by the speaker identification service.
3. Continuous authentication is performed on all subsequent interactions with the system.
All subsequent conversations with the system are recorded. The recorded audio is continuously streamed to the speaker identification service. The speaker identification service scores each incremental voice sample using the continuous authentication models of all registered users and confirms the legitimacy of the interaction. Privileged operations are not allowed unless the speaker is in a continuously authenticated state.
For example, if a user requests pizza delivery and the speaker is in an unauthenticated state then the request will not be processed until the speaker is authenticated again (through more voice, voice login or a second text login). If the user is in a continuously authenticated state then the request is processed.
Figure 1: Registration Flow (requires ~2 mins of audio)
Figure 2: Continuous Authentication Flow
In summary, IBM Emerging Technologies has developed a prototype as part of an exploration into conversational commerce. We have collaborated with the Audio Analytics team in IBM Research to integrate their speaker recognition system into our prototype. This effort has allowed us to validate the applicability and feasibility of speaker recognition techniques to enable features such as voice authentication and continuous authentication in conversational commerce.