As WebRTC continues to proliferate, it will need plenty of supplemental tools and services for developers and businesses continue to build increasingly complex applications and do so more rapidly. I have a list of at least six value-added services WebRTC providers should have on tap if they want to increase revenue and bring in more developers. It goes without saying all the services listed below should be easily accessible through APIs and just as easy to use as a WebRTC API call.
At the top of my list is archiving, specifically the ability to take a media stream and store it for future playback. Both audio and video WebRTC apps need to have the ability to record streams. For audio, recorded streams and playback mean quick voice messages "bits" that can be used in an asynchronous fashion for chatting and document comments and more traditional voicemail and conference calling, plus call center usage. Video recording is necessary for recording and archiving both one-to-one and one-to-many conversations, such as in Google Hangouts.
Speech to text is my next necessity. Being able to provide a transcript that can later be indexed and searched should be core to anyone building an archive of recorded sessions, either for personal use or for a larger project in education or business. The Hypervoice people have been keen on this ability to a while, with some integrating speech-to-text (perhaps we should call it speech-to-index or speech-to-search) into their respective product offerings.
Having speech-to-text in real time enables the ability to trigger actions based upon key words and phrases during a conversation. For instance, if a customer is relating a problem to a call center agent via a WebRTC session and mentions that he has a problem with his cable box not working, the call center agent could automatically have a pop-up list of steps for the customer try to fix the problem, plus an automatically triggered "bot" process to test the cable box from the head end to the subscriber to further isolate the problem.
Real-time translation is a tough problem, but one both Microsoft and Google are working on. Skype Translate is now providing real time translation in English, Spanish, French, German, Italian and Mandarin, plus 50 more languages in instant messaging. Given the multicultural and international flavor of the Internet, translation services are going to be a must-have for many businesses as they seek new ways to find new business and support existing customers.
Biometric recognition services will be in hot demand, both for security and other value-added security. Voice recognition can provide one "factor" to authenticate a caller, be it a customer for a call center or any business that needs to verify identity before providing access to additional resources. Facial recognition, just coming into vogue with Microsoft's latest technology rollout, can provide another factor for authentication. A combination of voice and facial recognition can provide more detailed and accurate transcriptions, providing more information for speech-to-text processes and enabling them to differentiate between speakers on a conference call or a question-and-answer session.
Emotional analysis, the ability to analyze tone and cadence to gauge the responsiveness and stress of a person, is already a vital tool for some call centers. Being able to spot an agitated customer and have a third-party intervene can provide quicker call resolution and better customer satisfaction. On a larger scale, emotional analysis will enable companies to gauge the mood and temperature of customers and voters. For example, a spike in angry video rants on a particular topic could indicate discontent with public policy.
Image recognition will be the heaviest lift, but it has numerous applications. Amazon's "Flow" application is able to visually recognize a product, such as a CD, DVD, book, video game cover as well as logos, artwork and other unique attributes, and then provide pricing and ratings from e-commerce sites. The movie and television industry could use it to hunt down bootleg recordings while research firms could use it to catalog and measure the effectiveness of brand placement, just to name a few examples.