People, increasingly dependent on their various devices, are becoming even more increasingly dependent on voice-activated services like Alexa, Google Assistant and Siri. Tapping into the long-promised convenience and flexibility of hands-free voice computing, these leading voice assistants (and similar technologies and solutions) only stand to expand in their intelligence and popularity as AI does exactly what it’s supposed to do: learn. Juniper Research recently predicted that the number of digital voice assistants in use in the next four years (by 2023) will jump to 8 billion (from 2.5 billion at the end of 2018). Many of these assistants today live on mobile phones and tablets, but the breadth of devices is set to explode: smart TVs, wearable devices, smart speakers and other in-home/smart-home devices will become more significant.
And here we’re getting into interesting territory in terms how voice technology will be used. Instead of just using the technology to get basic help, functionality will grow more sophisticated, leading to more complex computing and security needs, for example when e-commerce and banking/financial transactions become more routine.
Voice is just another form of content: Caching is key
With the inevitable expansion of the scope of voice-based technology, we start to realize that, once again, the fundamentals of web performance are essential considerations. End users expect speed and availability to power their high-quality user experiences – and this should be completely invisible and seamless regardless of how the user accesses the content.
Voice content delivery isn't that different from any other type of content delivery. The idea is that the user makes a request via their voice-activated device, whether to Alexa or Google, and the voice request gets streamed through the cloud, and here voice gets converted to text. This text request goes to the backend and the backend handles it as any other request. The backend replies with a text response that goes through the cloud and gets transformed into voice and will be streamed back to the user.
As you can imagine all these transformations happening in the cloud add some latency and are highly dependent on Amazon or Google. Meanwhile it is not only advisable to cache content wherever it’s possible, but really a must-do. It’s really back to basics here: cache everything you can to reduce traffic to the origin server and the creation and delivery time for each piece of content requested.