Baidu’s research division announced initial results from its speech recognition system named “Deep Speech”.
Deep Speech’s main goal is to improve accuracy in noisy environments such as in restaurants or on public transportation. According to released test results, Deep Speech outperforms previously published results on the Switchboard Hub5’00 benchmark, achieving 16.5% Word Error Rate on the full test set. It also outperforms public web APIs like Google Web Speech as well as commercial systems like Bing Speech Services or Apple Dictation in noisy environments by over 10%.
While the state-of-the-art algorithm may not yet be available publicly, Baidu does provide voice API service to developers through Baidu Yuyin. The API currently drives Baidu’s own voice search and voice recognition in other Baidu products, as well as Qunar (popular travel vertical search app) and Momo (a location-based dating app much like Tinder).
The use of voice recognition is growing in popularity in many areas, including search. According to the Mobile Voice Study released by Google last October, which surveyed 1400 Americans across all age group, 55% of US teens and 41% of US adults uses voice search more than once a day. With voice technology like Siri, Google Now and Microsoft Cortana built into all major smartphone operating systems, and as smartphone penetration grows and wearables becomes more mainstream, we will likely see voice search usage continue to grow, and improvement in recognition accuracy is undoubtedly welcome.