Speech-to-Text
Turn speech into text using Google AI
Convert audio into text transcriptions and integrate speech recognition into applications with easy-to-use APIs.
New customers also get up to $300 in free credits to try Speech-to-Text and other Google Cloud products.
Features
Advanced speech AI
Speech-to-Text can utilize Chirp 3, Google Cloud’s foundation model for speech trained on millions of hours of audio data and billions of text sentences. This contrasts with traditional speech recognition techniques that focus on large amounts of language-specific supervised data. These techniques give users improved recognition and transcription for more spoken languages and accents.
Support for 85+ languages and variants
Build for a global user base with extensive language support. Transcribe short, long, and even streaming audio data. Speech-to-Text also offers users more accurate and globe-spanning deployments for transcription with Chirp 3, the next generation of universal speech models.
Chirp 3: Transcription was built using self-supervised training on millions of hours of audio and 28 billion sentences of text spanning 100+ languages.
Streaming speech recognition
Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).
AI-powered speech recognition and transcription
Speech-to-Text uses model adaptation to improve the accuracy of frequently used words, expand the vocabulary available for transcription, and improve transcription from noisy audio. Model adaptation lets users customize Speech-to-Text to recognize specific words or phrases more frequently than other options that might otherwise be suggested. For example, you could bias Speech-to-Text towards transcribing "weather" over "whether."
Out-of-the-box regulatory and security compliance
Speech-to-Text API v2 gives enterprise and business customers added security and regulatory requirements out of the box. Data residency enables the invocation of transcription models through a fully regionalized service that taps into Google Cloud regions like Singapore and Belgium. Logs for resource generation and transcription are made easily available in the Google Cloud console. And Speech-to-Text API v2 offers enterprise-grade encryption with customer-managed encryption keys for all resources as well as batch transcription.
Speech-to-Text On-Prem
Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.
Multichannel recognition
Speech-to-Text can recognize distinct channels in multichannel situations (for example, video conference) and annotate the transcripts to preserve the order.
Noise robustness
Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation.
Domain-specific models
Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.
Content filtering
Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results.
Transcription evaluation
Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on your configuration.
Automatic punctuation (beta)
Speech-to-Text accurately punctuates transcriptions, such as by providing commas, question marks, and periods.
Speaker diarization
Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance.
Compare Speech-to-Text Chirp model in API and Agent Studio
Chirp 3: Transcription in Agent Platform
A simple to use no code, web-based, graphical user interface.
Rapidly test audio files, quickly prototype, create audio transcription, upload audio or recordings directly into a web browser.
-Enhanced multilingual language detection and transcription
-Supports transcription in 85+ languages and variants
-Supports speaker diarization and model adaptation
-Automatic speech recognition, transcribing audio into text
-Multilingual language detection and transcription
Chirp 3: Transcription on Speech-to-Text V2 API
An API that is the next generation of Google's universal Speech-to-Text model, unifying data from multiple languages.
Building scalable, Enterprise-grade applications.
Easy transcription integration into existing software.
-Enhanced multilingual language detection and transcription
-Supports transcription in 85+ languages and variants
-Supports speaker diarization and model adaptation
-Automatic speech recognition, transcribing audio into text
-Multilingual language detection and transcription
How It Works
Speech-to-Text has three main methods to perform speech recognition: synchronous, asynchronous, and streaming. Each method returns text results based on if transcription is needed in post processing, periodically, or in real time. Simply put, you'll input audio data and then receive a text-based response.
Speech-to-Text has three main methods to perform speech recognition: synchronous, asynchronous, and streaming. Each method returns text results based on if transcription is needed in post processing, periodically, or in real time. Simply put, you'll input audio data and then receive a text-based response.
Demo
Test out the Speech-to-Text API
Quickly create audio transcription from a file upload or directly speaking into a mic.
Transcribe audio
Create an audio transcription
Create an audio transcription
Tutorials, quickstarts, & labs
Create an audio transcription
Create an audio transcription
Caption videos using AI
Create subtitles for videos using AI
Create subtitles for videos using AI
Transcribe your audio and video to include captions. Add subtitles to existing content or in real time to streaming content. Our Chirp 3: Transcription is ideal for indexing or subtitling video and/or multi-speaker content and uses similar machine learning technology as YouTube does for video captioning.
This tutorial shows you how to use the Google Cloud AI services Speech-to-Text API and Translation API to add subtitles to videos and to provide localized subtitles in other languages.
Tutorials, quickstarts, & labs
Create subtitles for videos using AI
Create subtitles for videos using AI
Transcribe your audio and video to include captions. Add subtitles to existing content or in real time to streaming content. Our Chirp 3: Transcription is ideal for indexing or subtitling video and/or multi-speaker content and uses similar machine learning technology as YouTube does for video captioning.
This tutorial shows you how to use the Google Cloud AI services Speech-to-Text API and Translation API to add subtitles to videos and to provide localized subtitles in other languages.
Add Speech-to-Text to apps
How to add Speech-to-Text to apps
How to add Speech-to-Text to apps
Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.
Tutorials, quickstarts, & labs
How to add Speech-to-Text to apps
How to add Speech-to-Text to apps
Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.
Translate audio into text
Language, speech, text, and translation with Google Cloud APIs
Language, speech, text, and translation with Google Cloud APIs
In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.
Tutorials, quickstarts, & labs
Language, speech, text, and translation with Google Cloud APIs
Language, speech, text, and translation with Google Cloud APIs
In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.
Pricing
View pricing details for Speech-to-Text.
How Speech-to-Text pricing works
Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage.
Speech-to-Text V2 API
V2 offers data residency for multi and single region deployments of Chirp 3. V2 does include audit logging and support for customer managed encryption keys.
$0.016
per min
View pricing details for Speech-to-Text.
- Accelerate your digital transformation
- Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help solve your toughest challenges.
- Key benefits
- Featured Products
- AI and Machine Learning
- Business Intelligence
- Compute
- Containers
- Data Analytics
- Databases
- Developer Tools
- Distributed Cloud
- Hybrid and Multicloud
- Industry Specific
- Integration Services
- Management Tools
- Maps and Geospatial
- Media Services
- Migration
- Networking
- Operations
- Productivity and Collaboration
- Security and Identity
- Serverless
- Storage
- Web3
- Featured Products
- Not seeing what you're looking for?
- See all products (100+)
- Not seeing what you're looking for?
- See all AI and machine learning products
- Business Intelligence
- Not seeing what you're looking for?
- See all compute products
- Not seeing what you're looking for?
- See all data analytics products
- Not seeing what you're looking for?
- See all developer tools
- Hybrid and Multicloud
- Industry Specific
- Not seeing what you're looking for?
- See all management tools
- Media Services
- Not seeing what you're looking for?
- See all networking products
- Productivity and Collaboration
- Not seeing what you're looking for?
- See all security and identity products
- Save money with our transparent approach to pricing
- Google Cloud's pay-as-you-go pricing offers automatic savings based on monthly usage and discounted rates for prepaid resources. Contact us today to get a quote.
- Pricing overview and tools
- Learn & build
- Connect
- Accelerate your digital transformation
- Learn more
- Key benefits
- Why Google Cloud
- AI and Agents
- Multicloud
- Global infrastructure
- Data Cloud
- Modern Infrastructure Cloud
- Security
- Productivity and collaboration
- Reports and insights
- Executive insights
- Analyst reports
- Whitepapers
- Customer stories
- Industry Solutions
- Retail
- Consumer Packaged Goods
- Financial Services
- Healthcare and Life Sciences
- Media and Entertainment
- Telecommunications
- Games
- Manufacturing
- Supply Chain and Logistics
- Government
- Education
- See all industry solutions
- See all solutions
- Application Modernization
- CAMP
- Modernize Traditional Applications
- Migrate from PaaS: Cloud Foundry, Openshift
- Migrate from Mainframe
- Modernize Software Delivery
- DevOps Best Practices
- SRE Principles
- Platform Engineering
- Architect for Multicloud
- Artificial Intelligence
- Gemini Enterprise for Customer Experience
- Gemini Enterprise
- AI Commerce Search
- Google Cloud with Gemini
- Physical AI
- APIs and Applications
- New Business Channels Using APIs
- Unlocking Legacy Applications Using APIs
- Open Banking APIx
- Data Analytics
- Data Migration
- Data Lakehouse
- Real-time Analytics
- Marketing Analytics
- Datasets
- Business Intelligence
- Data Analytics Agents
- Geospatial Analytics
- Data Science
- Databases
- Database Migration
- Database Modernization
- Databases for Games
- Google Cloud Databases
- Migrate Oracle workloads to Google Cloud
- Open Source Databases
- SQL Server on Google Cloud
- Gemini for Databases
- Infrastructure
- Application Migration
- SAP on Google Cloud
- High Performance Computing
- Windows on Google Cloud
- Data Center Migration
- Active Assist
- Virtual Desktops
- Rapid Migration and Modernization Program
- Backup and Disaster Recovery
- Red Hat on Google Cloud
- Cross-Cloud Network
- AI Infrastructure
- Productivity and Collaboration
- Google Workspace
- Google Workspace Essentials
- Cloud Identity
- Chrome Enterprise
- Security
- Agentic SOC
- Web App and API Protection
- Security and Resilience Framework
- Risk and compliance as code (RCaC)
- Software Supply Chain Security
- Security Foundation
- Google Cloud Cybershield™
- Startups and SMB
- Startup Program
- Small and Medium Business
- Software as a Service
- Featured Products
- Compute Engine
- Cloud Storage
- BigQuery
- Cloud Run
- Google Kubernetes Engine
- Agent Platform
- Looker
- Apigee API Management
- Cloud SQL
- Gemini Enterprise app
- Cloud CDN
- See all products (100+)
- AI and Machine Learning
- Gemini Enterprise Agent Platform
- Gemini Enterprise app
- Gemini Enterprise for Customer Experience
- Model Garden
- Customer Experience Agent Studio
- Agent Search
- Speech-to-Text
- Text-to-Speech
- Translation AI
- Vision AI
- Contact Center as a Service
- See all AI and machine learning products
- Business Intelligence
- Looker
- Data Studio
- Compute
- Compute Engine
- App Engine
- Cloud GPUs
- Migrate to Virtual Machines
- Spot VMs
- Batch
- Sole-Tenant Nodes
- Bare Metal
- Recommender
- VMware Engine
- Cloud Run
- See all compute products
- Containers
- Google Kubernetes Engine
- Cloud Run
- Cloud Build
- Artifact Registry
- Cloud Code
- Cloud Deploy
- Migrate to Containers
- Deep Learning Containers
- Knative
- Data Analytics
- BigQuery
- Managed Service for Apache Spark
- Dataflow
- Looker
- Lakehouse
- Pub/Sub
- Managed Service for Apache Airflow
- Knowledge Catalog
- Data Analytics Agents
- Data Analytics Migration Services
- Managed Service for Apache Kafka
- See all data analytics products
- Databases
- AlloyDB for PostgreSQL
- Cloud SQL
- Firestore
- Spanner
- Bigtable
- Datastream
- Database Migration Service
- Bare Metal Solution
- Memorystore
- Developer Tools
- Artifact Registry
- Cloud Code
- Cloud Build
- Cloud Deploy
- Cloud Deployment Manager
- Cloud SDK
- Cloud Scheduler
- Cloud Source Repositories
- Infrastructure Manager
- Cloud Workstations
- Gemini Code Assist
- See all developer tools
- Distributed Cloud
- Google Distributed Cloud Connected
- Google Distributed Cloud Air-gapped
- Hybrid and Multicloud
- Google Kubernetes Engine
- Apigee API Management
- Migrate to Containers
- Cloud Build
- Observability
- Cloud Service Mesh
- Google Distributed Cloud
- Industry Specific
- Anti Money Laundering AI
- Cloud Healthcare API
- Device Connect for Fitbit
- Telecom Network Automation
- Telecom Data Fabric
- Telecom Subscriber Insights
- Spectrum Access System (SAS)
- Integration Services
- Application Integration
- Workflows
- Apigee API Management
- Cloud Tasks
- Cloud Scheduler
- Managed Service for Apache Spark
- Cloud Data Fusion
- Managed Service for Apache Airflow
- Pub/Sub
- Eventarc
- Management Tools
- Cloud Shell
- Cloud console
- Cloud Endpoints
- Cloud IAM
- Cloud APIs
- Service Catalog
- Cost Management
- Observability
- Carbon Footprint
- Config Connector
- Active Assist
- See all management tools
- Maps and Geospatial
- Earth Engine
- Google Maps Platform
- Media Services
- Cloud CDN
- Live Stream API
- OpenCue
- Transcoder API
- Video Stitcher API
- Migration
- Migration Center
- Application Migration
- Migrate to Virtual Machines
- Cloud Foundation Toolkit
- Database Migration Service
- Migrate to Containers
- Data Analytics Migration Services
- Rapid Migration and Modernization Program
- Transfer Appliance
- Storage Transfer Service
- VMware Engine
- Networking
- Cloud Armor
- Cloud CDN and Media CDN
- Cloud DNS
- Cloud Load Balancing
- Cloud NAT
- Cloud Connectivity
- Network Connectivity Center
- Network Intelligence Center
- Network Service Tiers
- Virtual Private Cloud
- Private Service Connect
- See all networking products
- Operations
- Cloud Logging
- Cloud Monitoring
- Error Reporting
- Managed Service for Prometheus
- Cloud Trace
- Cloud Profiler
- Cloud Quotas
- Productivity and Collaboration
- AppSheet
- AppSheet Automation
- Gemini Enterprise app
- Google Workspace
- Google Workspace Essentials
- Cloud Identity
- Chrome Enterprise
- Security and Identity
- Cloud IAM
- Sensitive Data Protection
- Mandiant Managed Defense
- Google Threat Intelligence
- Security Command Center
- Cloud Key Management
- Mandiant Incident Response
- Chrome Enterprise Premium
- Assured Workloads
- Google Security Operations
- Mandiant Consulting
- See all security and identity products
- Serverless
- Cloud Run
- Cloud Functions
- App Engine
- Workflows
- API Gateway
- Storage
- Cloud Storage
- Block Storage
- Filestore
- Persistent Disk
- Cloud Storage for Firebase
- Local SSD
- Storage Transfer Service
- Google Cloud Managed Lustre
- Google Cloud NetApp Volumes
- Backup and DR Service
- Web3
- Blockchain Node Engine
- Blockchain RPC
- Save money with our transparent approach to pricing
- Request a quote
- Pricing overview and tools
- Google Cloud pricing
- Pricing calculator
- Google Cloud free tier
- Cost optimization framework
- Cost management tools
- Product-specific Pricing
- Compute Engine
- Cloud SQL
- Google Kubernetes Engine
- Cloud Storage
- BigQuery
- See full price list with 100+ products
- Learn & build
- Google Cloud Free Program
- Solution Generator
- Quickstarts
- Blog
- Learning Hub
- Google Cloud certification
- Cloud computing basics
- Cloud Architecture Center
- Connect
- Innovators
- Developer Center
- Events and webinars
- Google Cloud Community
- Consulting and Partners
- Google Cloud Consulting
- Google Cloud Marketplace
- Find a partner
- Google Cloud partners
















