Did you miss a session at the Data Summit? Watch On-Demand Here.
This week, Nvidia announced a slew of AI-focused hardware and software innovations during its March GTC 2022 conference. The company unveiled the Grace CPU Superchip, a data center processor designed to serve high-performance compute and AI applications. And it detailed the H100, the first in a new line of GPU hardware aimed at accelerating AI workloads including training large natural language models.
But one announcement that slipped under the radar was the general availability of Nvidia’s Riva 2.0 SDK, as well as the company’s Riva Enterprise managed offering. Both can be deployed for building speech AI applications and point to the growing market for speech recognition in particular. The speech and voice recognition market is expected to grow from $8.3 billion in 2021 to $22.0 billion by 2026, according to Markets and Markets, driven by enterprise applications.
In 2018, a Pindrop survey of 500 IT and business decision-makers found that 28% were using voice technology with customers. Gartner, meanwhile, predicted in 2019 that 25% of digital workers will use virtual employee assistants daily by 2021. And a recent Opus survey found that 73% of executives see value in AI voice technologies for “operational efficiency.”
“As speech AI is expanding to new applications, data scientists at enterprises are looking to develop, customize and deploy speech applications,” an Nvidia spokesperson told VentureBeat via email. “Riva 2.0 includes strong integration with TAO, a low code solution for data scientists, to customize and deploy speech applications. This is an active area of focus and we plan to make the workflow even more accessible for customers in the future. We have also introduced Riva on embedded platforms for early access, and will have more to share at a later date.”
Nvidia says that Snap, the company behind Snapchat, has integrated Riva’s automatic speech recognition and text to speech technologies into their developer platform. RingCentral, another customer, is leveraging Riva’s automatic speech recognition for video conferencing live-captioning.
Speech technologies span voice generation tools, too, including “voice cloning” tools that use AI to mimic the pitch and prosody of a person’s speech. Last fall, Nvidia unveiled Riva Custom Voice, a new toolkit that the company claims can enable customers to create custom, “human-like” voices with only 30 minutes of speech recording data.
Brand voices like Progressive’s Flo are often tasked with recording phone trees and elearning scripts in corporate training video series. For companies, the costs can add up — one source pegs the average hourly rate for voice actors at $39.63, plus additional fees for interactive voice response (IVR) prompts. Synthetization could boost actors’ productivity by cutting down on the need for additional recordings, potentially freeing the actors up to pursue more creative work — and saving businesses money in the process.
According to Markets and Markets, the global voice cloning market could grow from $456 million in value in 2018 to $1.739 billion by 2023.
As far as what lies on the horizon, Nvidia sees new voice applications going into production across augmented reality, videoconferencing, and conversational AI. Customers’ expectations and focus are on high accuracy as well as ways to customize voice experiences, the company says.
“Low-code solutions for speech AI [will continue to grow] as non-software developers are looking to build, fine-tune, and deploy speech solutions,” the spokesperson continued, referencing low-code development platforms that require little to no coding in order to build voice apps. “New research is bringing emotional text-to-speech, transforming how humans will interact with machines.”
Exciting as these technologies are, they will — and already have — introduced new ethical challenges. For example, fraudsters have used cloning to imitate a CEO’s voice well enough to initiate a wire transfer. And some speech recognition and text-to-speech algorithms have been shown to recognize the voices of minority users less accurately than those with more common inflections.
It’s incumbent on companies like Nvidia to make efforts to address these challenges before deploying their technologies into production. To its credit, the company has taken steps in the right direction, for example prohibiting the use of Riva for the creation of “fraudulent, false, misleading, or deceptive” content as well as content that “promote[s] discrimination, bigotry, racism, hatred, harassment, or harm against any individual or group.” Hopefully, there’s more along this vein to come.
As an addendum to this week’s newsletter, it’s with sadness that I announce I’m leaving VentureBeat to pursue professional opportunities elsewhere. This edition of AI Weekly will be my last — a bittersweet realization, indeed, as I try to find the words to put to paper.
When I joined VentureBeat as an AI staff writer four years ago, I had only the vaguest notion of the difficult journey that lay ahead. I wasn’t exceptionally well-versed in AI — my background was in consumer tech — and the industry’s jargon was overwhelming to me, not to mention contradictory. But as I came to learn particularly from those on the academic side of data science, an open mind — and a willingness to admit ignorance, frankly — is perhaps the most important ingredient in making sense of AI.
I haven’t always been successful in this. But as a reporter, I’ve tried not to lose sight of the fact that my domain knowledge pales in comparison to that of titans of industry and academia. Whether tackling stories about biases in computer vision models or the environmental impact of training language systems, it’s my policy to lean on others for their expert perspectives and present these perspectives, lightly edited, to readers. As I see it, my job is to contextualize and rely on, not to pontificate. There’s a place for pontification, but it’s on opinion pages — not news articles.
I’ve learned a healthy dose of skepticism goes a long way, too, in reporting on AI. It’s not only the snake oil salesmen one must be wary of, but the corporations with well-oiled PR operations, lobbyists, and paid consultants claiming to prevent harms but in fact doing the opposite. I’ve lost track of the number of ethics boards that’ve been dissolved or have proven to be toothless; the number of damaging algorithms have been sold through to customers; and number of companies have attempted to silence or push back against whistleblowers.
The silver lining is regulators’ growing realization of the industry’s deception. But, as elsewhere in Silicon Valley, techno-optimism has revealed itself to be little more than a publicity instrument.
It’s easy to get swept up in the novelty of new technology. I once did — and still do. The challenge is recognizing the danger in this novelty. I’m reminded of the novel When We Cease to Understand the World by the Chilean writer Benjamín Labatut, which examines great scientific discoveries that led to prosperity and untold suffering in equal parts. For example, German chemist Fritz Haber developed the Haber-Bosch process, which synthesizes ammonia from nitrogen and hydrogen gases and almost certainly prevented famine by enabling the mass manufacture of fertilizer. At the same time, the Haber-Bosch process simplified and made cheaper the production of explosives, contributing to millions of deaths suffered by soldiers during World War I.
AI, like the Haber-Bosch process, has the potential for enormous good — and good actors are trying desperately to bring this to fruition. But any technology can be misused, and it’s the job of reporters to uncover and spotlight those misuses — ideally to affect change. It’s my hope that I, along with my distinguished colleagues at VentureBeat, have accomplished this in some small part. Here’s to a future of strong AI reporting.
Thanks for reading,
Senior AI Staff Writer
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More