T-Mobile’s network suffered an outage across the US yesterday, and the Federal Communications Commission is investigating.
FCC Chairman Ajit Pai, who takes an extremely hands-off approach to regulating telecom companies, used his Twitter account to say, “The T-Mobile network outage is unacceptable” and that “the FCC is launching an investigation. We’re demanding answers—and so are American consumers.”
No matter what the investigation finds, Pai may be unlikely to punish T-Mobile or impose any enforceable commitments. For example, an FCC investigation last year into mobile carriers’ response to Hurricane Michael in Florida found that carriers failed to follow their own previous voluntary roaming commitments, unnecessarily prolonging outages. Pai himself called the carriers’ response to the hurricane “completely unacceptable,” just like he did with yesterday’s T-Mobile outage. But Pai’s FCC imposed no punishment related to the bad hurricane response and continued to rely on voluntary measures to prevent recurrences.
T-Mobile CEO Mike Sievert confirmed the outage in a blog post. “Starting just after 12pm ET and continuing throughout the day, T-Mobile has been experiencing a voice and text issue that has intermittently impacted customers in markets across the US,” Sievert wrote. Sievert reported that the “issues are now resolved” just after 1am ET, about 13 hours after the outage began.
T-Mobile mistake may have caused outage
The outage may have been self-inflicted when T-Mobile was making network configuration changes. Cloudflare CEO Matthew Prince last night tweeted that T-Mobile was “making some changes to their network configurations today. Unfortunately, it went badly. The result has been for around the last 6 hours a series of cascading failures for their users, impacting both their voice and data networks.” The T-Mobile problem was “almost certainly entirely of their own team’s making,” he also wrote.
Sievert attributed the outage to “an IP traffic related issue that has created significant capacity issues in the network core throughout the day,” but he did not say what caused the traffic problem or whether it was due to a T-Mobile mistake. We asked T-Mobile to explain the outage cause and will update this article if we get a response.
T-Mobile President of Technology Neville Ray described the problem as “a voice and data issue that has been affecting customers around the country” and said T-Mobile engineers were working to fix it.
(Update: Ray provided some detail on the outage cause about eight hours after this article published, saying that “the trigger event is known to be a leased fiber circuit failure from a third party provider in the Southeast. This is something that happens on every mobile network, so we’ve worked with our vendors to build redundancy and resiliency to make sure that these types of circuit failures don’t affect customers. This redundancy failed us and resulted in an overload situation that was then compounded by other factors.” The overload then caused an “IP traffic storm that spread from the Southeast to create significant capacity issues across the IMS (IP multimedia Subsystem) core network that supports VoLTE calls.” To prevent recurrences, T-Mobile said it has “worked with our IMS and IP vendors to add permanent additional safeguards to prevent this from happening again and we’re continuing to work on determining the cause of the initial overload failure.”)
The T-Mobile outage was so large that it apparently caused some people to think other carriers and websites were down, too. Business Insider wrote that “Downdetector and customers on social media reported that AT&T and Verizon service was down,” but both AT&T and Verizon said their networks were doing fine. “A Verizon spokesperson also told Business Insider the carrier was ‘operating at normal service levels’ and said that, given that ‘another national carrier’ was having issues, calls to and from that carrier might get an error message, resulting in reports of issues,” the article said.
Prince wrote that the phone-service outage “caused a lot of T-Mobile users to complain on Twitter and other forums that they weren’t able to reach popular services.” Downdetector, an outage-monitoring website, “scrapes Twitter” for such reports and consequently “report[ed] those services as being offline” even though they weren’t, he wrote. This contributed to the spread of rumors about a “massive DDoS attack” that also did not happen, he wrote.
Don’t expect much FCC action
Mobile voice services like T-Mobile’s are still classified as common-carrier services under Title II of the Communications Act, but the FCC under Pai deregulated the home and mobile broadband industry and has taken a hands-off approach to ensuring resiliency in phone networks.
“This is, once again, where pretending that broadband is not an essential telecommunications service completely undermines the FCC’s ability to act,” longtime telecom attorney and consumer advocate Harold Feld, the senior VP of advocacy group Public Knowledge, told Ars today. “We’re not talking about an assumption that T-Mobile necessarily did anything wrong. But when we have something this critical to the economy, and where it is literally life and death for people to have the service work reliably, it’s not about ‘trusting the market’ or expecting companies to be on their best behavior. We as a country need to know what is the reality of our broadband networks, the reality of their resilience and reliability, and the reality of what happens when things go wrong. That takes a regulator with real authority to go in, ask hard questions, seize documents if necessary, and compel testimony under oath.”
Several provisions of Title II common-carrier rules that Pai has fought against “give the FCC authority to make sure the network is resilient and reliable,” Feld said. The FCC gutting its own authority “influences how the FCC conducts its investigations,” he said. “[FCC] staff and the carriers know very well that if push comes to shove, companies can simply refuse to give the FCC information that might be too embarrassing. So the FCC is stuck now playing this game where they know they can’t push too hard or they get their bluff called. Carriers have incentive to play along enough to keep the FCC or Congress from re-regulating, but at the end of the day it’s the carriers—not the FCC—that gets to decide how much information to turn over.”