This extension supports snippets, hover-over, and syntax highlighting of Speech Markdown. Speech Markdown is a text-to-speech formatting language for content authors, designers, and developers. It converts SpeechMarkdown to SSML while handling inconsistencies across a variety of voice assistants and SSML-to-voice engines.
- Speech Markdown
- Multi-Provider TTS Support:
- Amazon Polly, ElevenLabs (including audiotags e.g.
[sarcastically], see examples/elevenlabs_v3-model-audiotags.smd), OpenAI, Azure, SherpaOnnx, Google, PlayHT, IBM Watson, WitAI, SAPI (Windows), eSpeak NG, eSpeak NG WASM
- Amazon Polly, ElevenLabs (including audiotags e.g.
- Voice Selection:
- List and select available voices for the chosen provider. Voice selection is saved per provider.
- Provider Selection:
- Easily switch between TTS providers via a quick pick menu or status bar button.
- Status Bar Integration:
- Quick access buttons for speaking text and selecting TTS provider.
- Quick access button for listing and selecting available voices.
- Output Directory:
- Configure where generated audio files are saved.
- Keyboard Accessibility:
- All major TTS features are accessible via keyboard shortcuts.
See the examples folder for many examples of what you can do.
Speech Markdown outputs platform-compatible Speech Synthesis Markup Language (SSML). Selecting Speech Markdown in an editor, right-clicking and selecting the "Speech Markdown to SSML" menu option provides SSML output for all supported platforms. At the time of this release (v0.0.8) this includes:
- Amazon Alexa
- Amazon Polly
- Amazon Polly Neural
- Microsoft Azure
- Samsung Bixby
- Google Assistant
- Plain Text
The resulting SSML is displayed in the Speech Markdown output channel.
By default, the starting and ending speak tags are included in the output. This can be disabled in Settings -> Extensions -> SpeechMarkdown -> Include Speak Tags. Here are the current configuration options:
Play SSML generated from Speech Markdown. Highlight and select Speech Markdown in the editor, right-click for a context menu and select either:
- Speak Selected SSML (Amazon Polly)
- Speak Selected SSML (Amazon Neural)
New:
- Use the status bar button or
Ctrl+Alt+Pto select a TTS provider. - Use the status bar button or
Ctrl+Alt+Lto list and select voices for the current provider. - Use
Ctrl+Alt+Sto speak selected text or the entire document with the chosen provider and voice. - Output audio files are saved to a configurable directory (see Configuration section).
Watch TTS Provider & Voice Selection + Speak Demo
This will invoke the Amazon Polly API and play the generated MP3 file from your system's default MP3 player. In order to authenticate, you need an AWS account and your credentials:
As of version 0.0.6, IntelliSense is supported in strings. By default, Visual Studio Code does not support IntelliSense in strings. In order to enable it, please see section Enable Intellisense in TypeScript and JavaScript.
Typing any of the following and using ctrl+space will trigger suggestions:
- #[
- (sometext)[
- [
- ; (when used within the brackets of the samples above)
Syntax highlighting is supported in JSON, JavaScript, and TypeScript. Any Speech Markdown tags will be highligted within string literals.
Languages that support Speech Markdown syntax highlighting are:
- JavaScript
- TypeScript
- JSON
- YAML
Hover over the mark up text for additional information.
All Speech Markdown snippets start with "smd." Here's a list of available snippets along with sample Speech Markdown it produces.
- smd address - Speaks the selected text as a street address.
I'm at (150th CT NE, Redmond, WA)[address].
- smd audio - Plays short, pre-recorded audio. Value in the tag should be a fully qualified URL to a publically accessible audio file.
!["https://intro.mp3"] Welcome back.
- smd bleep - 'Bleep' out the content.
You can't say (word)[bleep] on TV.
- smd break - A pause in speech. Valid values are none, x-weak, weak, medium, strong, x-strong.
A pause [break:"250ms"] then continue.
- smd break short - A pause in speech. Value is set in milliseconds or seconds.
A pause [250ms] then continue. A longer pause [1s] then continue.
- smd characters - Speaks a number or text as individual characters.
Countdown: (321)[characters]
The word is spelled: (park)[characters]
- smd date - Speak the text as a date. Valid date format values are mdy, dmy, ymd (not universally supported), ydm, md, dm, ym, my, y, m, d.
The date is (10-11-12)[date:"mdy"]. // October 11th, 2012
The date is (10-11-12)[date:"dmy"]. // November 10th, 2012
The date is (10-11-12)[date:"ymd"]. // Nov 12th, 2010
The date is (10-11-12)[date:"ydm"]. // December 11th, 2010
The date is (10-11)[date:"md"]. // October 11th
The date is (10-11)[date:"dm"]. // November 10th
The date is (10-11)[date:"ym"]. // November 2010
The date is (10-11)[date:"my"]. // October 2011
The date is (10)[date:"y"]. // 2010
The date is (10)[date:"m"]. // October
The date is (10)[date:"d"]. // 10th
- smd disappointed - Sets the spoken text to varying levels of disappointment. Valid disappointed modifiers are low, medium (default), high.
We can switch (from disappointed)[disappointed] to (really disappointed)[disappointed:"high"].
- smd emphasis - Add or remove emphasis from a word or phrase. Valid values are strong, moderate (default), reduced, none.
A (strong)[emphasis:"strong"] level
- smd excited - Sets the spoken text to varying levels of excitement. Valid values are strong, moderate (default), reduced, none.
We can switch (from excited)[excited] to (really excited)[excited:"high"].
- smd expletive - 'Bleep' out the content.
You said (word)[expletive] at school.
- smd fraction - Speaks the value as a fraction.
Add (2/3)[fraction] cup of milk.
Add (1+1/2)[fraction] cups of flour.
- smd interjection - Speaks the text in a more expressive voice.
(Wow)[interjection], I didn't see that coming.
- smd ipa - Provides a phonemic/phonetic pronunciation for the contained text using the International Phonetic Alphabet (IPA).
You say, (pecan)[ipa:"pɪˈkɑːn"].
- smd lang - Add a lang modifier. Valid values are en-US, en-AU, en-GB, en-IN, de-DE, es-ES, it-IT,j a-JP, fr-FR.
In Paris, they pronounce it (Paris)[lang:"fr-FR"].
- smd number - Speaks a number as a cardinal: one, twenty, twelve thousand three hundred forty five, etc. (same as cardinal)
One, two, (3)[number].
- smd ordinal - Speaks a number as an ordinal: first, second, third, etc.
The others came in 2nd and (3)[ordinal].
- smd phone - Speak the number/value as a 7-digit or 10-digit telephone number.
My number is NOT (8675309)[phone:"1"].
- smd pitch - Raise or lower the tone (pitch) of the speech. Valid values are x-low, low, medium (default), high, x-high.
I can speak with my normal pitch, (but also with a much higher pitch)[pitch:"x-high"].
- smd rate - Modify the rate of the speech. Valid values are x-slow, slow, medium (default), fast, x-fast.
When I wake up, (I speak quite slowly)[rate:"x-slow"].
- smd sub - Substitute one word or phrase with a different word or phrase. Often used to expand/clarify abbreviations.
My favorite chemical element is (Al)[sub:"aluminum"],
but Al prefers (Mg)["magnesium"].
- smd time - Add a time modifier. Valid values are hms12, hms24.
The time is (2:30pm)[time:"hms12"].
The time is (2:30pm)[time:"hms24"].
- smd unit - Speaks the value as a unit. Can be a number and unit or just a unit. (e.g. 10 foot, 10 ft, 10 mi, foot, ft, 6'3")
I would walk (500 mi)[unit]
- smd volume - Modify the volume of the speech. Valid volume modifiers are silent, x-soft, soft, medium, loud, x-loud. Default to medium if not specified.
Normal volume for the first sentence. (Louder volume for the second sentence)[volume:"x-loud"].
- smd voice - Apply voice modifier and use any Alexa voice. Valid values are Ivy,Joanna, Joey, Justin,Kendra, Kimberly, Matthew, Salli, Nicole, Russell, Amy, Brian, Emma, Aditi, Raveena, Hans, Marlene, Vicki, Conchita, Enrique, Carla, Giorgio, Mizuki, Takumi, Celine, Lea, and Mathieu.
Why do you keep switching voices (from one)[voice:"Brian"] (to the other)[voice:"Kendra"]?
- smd voice default - Sets the spoken text back to the normal voice for the plaform. Useful when switching to different voices.
#[defaults] Now back to normal speech.
- smd voice device - Apply voice modifier and switch back to the default device voice.
#[voice:'device'] Now back to normal speech.
- smd voice dj - Sets the spoken text similar to how a music/media announcer would speak them.
#[dj] Welcome back to the Morning Zoo!
- smd voice de-DE - Apply voice modifier and limit to de-DE Alexa voices. Valid values are Hans, Marlene, and Vicki.
(Wie geht's?)[voice:'Vicki';lang:'de-DE']
- smd voice en-AU - Apply voice modifier and limit to en-AU Alexa voices. Valid values are Nicole and Russell.
(Bob's gone walkabout)[voice:'Nicole';lang:'en-AU']
- smd voice en-ES - Apply voice modifier and limit to es-ES Alexa voices. Valid values are Conchita and Enrique.
(Ser pan comido)[voice:'Conchita';lang:'en-ES']
- smd voice en-GB - Apply voice modifier and limit to en-GB Alexa voices. Valid values are Amy, Brian, and Emma.
(Look on the bright side of life)[voice:'Brian';lang:'en-GB']
- smd voice en-IN - Apply voice modifier and limit to en-IN Alexa voices. Valid values are Aditi and Raveena.
(How are you?)[voice:'Aditi';lang:'en-IN']
- smd voice en-US - Apply voice modifier and limit to en-US Alexa voices. Valid values are Ivy, Joanna, Joey, Justin, Kendra, Kimberly, Matthew, and Salli.
(I don't sound like Alexa.)[voice:'Salli';lang:'en-US']
- smd voice fr-FR - Apply voice modifier and limit to fr-FR Alexa voices. Valid values are Celine, Lea, and Mathieu.
(Ça marche!)[voice:'Mathieu';lang:'fr-FR']
- smd voice it-IT - Apply voice modifier and limit to it-IT Alexa voices. Valid values are Carla and Giorgio.
(In bocca al lupo)[voice:'Carla';lang:'it-IT']
- smd voice ja-JP - Apply voice modifier and limit to ja-JP Alexa voices. Valid values are Mizuki and Takumi.
(海千山千)[voice:'Mizuki';lang:'ja-JP']
- smd voice newscaster - Sets the spoken text similar to how a news announcer would speak them.
#[newscaster] And now for today's top stories.
- smd whisper - Speak text in a whispered voice.
I want to tell you a secret. (I am not a real human.)[whisper]
There are two approaches to applying snippets.
- Highlight the text.
- Select F1
- Locate the Insert Snippets command
- Locate the Speech Markdown snippet
- Position the cursor in the string literal where you want to insert a snippet.
- Type "smd" and use ctrl+space bar
- Select the snippet
By default Visual Studio Code does not provide IntelliSense handling in strings. For more information, please see:
NOTE: IntelliSense in strings is not available in JSON or YAML. If you wish to use Intellisense in JSON files you can temporarily change the file extension to js.
If you wish to enabled IntelliSense in strings apply the following settings to your project folder.
- Create a .vscode folder
- Add the following settings to a new settings.json file:
"editor.quickSuggestions": {
"other": true,
"comments": false,
"strings": true
}Once configured, IntelliSense works in JavaScript and TypeScript:
| Command | Description | Default Hotkey |
|---|---|---|
speechmarkdown.speakText |
Speak selected text or entire document | Ctrl+Shift+S |
speechmarkdown.listVoices |
List and select available voices | Ctrl+Shift+L |
speechmarkdown.selectTTSProvider |
Select TTS provider | Ctrl+Alt+P |
extension.speechmarkdownpreview |
Convert Speech Markdown to SSML (selection) | (Command Palette) |
extension.speechmarkdownspeakpolly |
Speak selected SSML (Amazon Polly) | (Command Palette) |
extension.speechmarkdownspeakpollyneural |
Speak selected SSML (Amazon Neural) | (Command Palette) |
Tip: All commands are available from the Command Palette (
Ctrl+Shift+P).
Set these in your VS Code settings (Settings UI or settings.json):
| Setting (UI Name) | Description / Example Value |
|---|---|
| TTS Provider | Select the default TTS provider (e.g., Amazon Polly, ElevenLabs, OpenAI, etc.) |
| Amazon Polly Access Key ID | Your AWS account access key ID |
| Amazon Polly Secret Access Key | Your AWS account secret access key |
| Amazon Polly Region | AWS region for Polly (e.g., us-east-1) |
| Amazon Polly Voice | Default voice for Polly (e.g., Joanna, Matthew) |
| ElevenLabs API Key | Your ElevenLabs API key |
| ElevenLabs Voice ID | Voice ID from your ElevenLabs account |
| ElevenLabs Model | Select the model to use for synthesis, e.g. 'eleven_v3' |
| OpenAI API Key | Your OpenAI API key for TTS |
| OpenAI Voice | Voice for OpenAI TTS (e.g., alloy, echo, nova) |
| OpenAI Model | TTS model for OpenAI (e.g., gpt-4o-mini-tts) |
| Azure Subscription Key | Your Azure subscription key for TTS |
| Azure Region | Azure region (e.g., eastus, westeurope) |
| Azure Voice | Voice for Azure TTS (e.g., en-US-AriaNeural) |
| Google Key File Path | Path to Google Cloud service account JSON key |
| Google Voice | Voice for Google TTS |
| PlayHT API Key | Your PlayHT API key |
| PlayHT User ID | Your PlayHT user ID |
| PlayHT Voice | Voice for PlayHT TTS |
| IBM Watson API Key | Your IBM Watson API key |
| IBM Watson Region | IBM Watson region |
| IBM Watson Instance ID | IBM Watson instance ID |
| IBM Watson Voice | Voice for IBM Watson TTS |
| WitAI Token | Your WitAI token |
| WitAI Voice | Voice for WitAI TTS |
| Windows SAPI Voice | Voice for Windows SAPI TTS (Windows only) |
| eSpeak NG Voice | Voice for eSpeak NG TTS |
| eSpeak NG WASM Voice | Voice for eSpeak NG WASM TTS |
| SherpaOnnx Model Path | Path to SherpaOnnx model |
| SherpaOnnx Token | SherpaOnnx token |
| SherpaOnnx Voice | Voice for SherpaOnnx TTS |
You can access the SpeechMarkdown extension settings in several ways:
Method 1: Using the Settings Icon
- Click the gear icon (⚙️) in the bottom left corner of VS Code.
- Select Settings from the menu.
- In the search bar at the top, type
SpeechMarkdown. - Adjust the settings for your chosen TTS provider (e.g., API keys, region, voice) and set the Output Directory if desired.
Method 2: Using the Command Palette
- Press
Ctrl+Shift+P(orCmd+Shift+Pon Mac) to open the Command Palette. - Type and select
Preferences: Open Settings (UI). - Search for
SpeechMarkdownin the settings search bar. - Configure the necessary fields as above.
-
Install dependencies:
npm install
-
Build the extension:
- Use the VS Code Run/Debug panel and click the green Launch Ext button (or press
F5). - Or, run the following in your terminal:
npm run webpack
- Use the VS Code Run/Debug panel and click the green Launch Ext button (or press
The folder test/src/ contains several tests with different TTS providers using the same example text extracted from selected text files of the examples folder.
- Configure your TTS providers (API keys, voiceId to be used,...) in the file
.envof the root folder. Copy.env.exampleand modify it. Don't add the file to the git repository to not expose any API keys!! - Install the dependencies:
npm install - Run a single test:
node test/src/node test/src/test-simple-smd.js
- All major commands are accessible via keyboard shortcuts.
- Status bar buttons are screen reader friendly.
- Quick pick menus for provider and voice selection.













