Contents
Building Your Own Voice Assistant
Home Assistant now offers its own dedicated Voice PE hardware, which makes getting started very easy. But building your own voice assistant has some big advantages: you can do it cheaper (especially if you intend to build several), learn how the technology actually works, and fully customize it to your needs. Instead of being locked to one device, you can put together your own assistants for different rooms around the house.
In this article, we'll walk through how to assemble a small, low-cost voice assistant using off-the-shelf parts. It won't take much more than some jumper cables and a bit of patience to get everything running with Home Assistant.
What You'll Need
- ESP32-C3 N16R8 board – a compact microcontroller with Wi-Fi, perfect for running voice assistants.
- INMP441 microphone – a digital microphone that can clearly capture your voice commands.
- MAX98357A amplifier – allows you to connect a small speaker so the assistant can talk back to you.
- Speaker – any small 4Ω or 8Ω speaker will work with the amplifier for audio output.
- Jumper cables – for connecting the microphone and amplifier to the ESP32 board.
- Optional: Breadboard – keeps the wiring neat and lets you rearrange connections more easily, but you can also just use jumper cables directly.
With this hardware, you'll be able to capture voice, process it through Home Assistant, and get spoken responses – essentially replicating the functionality of commercial smart speakers, but entirely under your control.
USB Ports and Flashing Firmware
The ESP32-S3 DevKitC-1 has multiple GPIO (General Purpose Input/Output) pins that we'll use to connect the microphone and amplifier. Before wiring, check two solder jumpers on your board:
- RGB – make sure this jumper is soldered (closed) if you want to use the onboard WS2812 RGB LED. This guide assumes it is enabled, so ESPHome can control the built-in LED for status feedback.
- In-Out – solder this jumper closed if you want the USB's 5 V to appear on the 5Vin pin. This is required to power the MAX98357A amplifier from the ESP32 board itself. Leave it open only if you plan to supply 5 V from an external source instead.
INMP441 Microphone → ESP32-S3
- L/R → GND (forces the mic to output the left channel)
- WS (Word Select / LRCL) → GPIO21
- SCK (Bit Clock) → GPIO13
- SD (Data Out) → GPIO18
- VDD → 3.3V on ESP32
- GND → any GND pin on ESP32
The INMP441 outputs digital audio using the I²S protocol. SCK and WS keep the data in sync, while SD carries the microphone signal. The L/R pin isn't an audio signal - it just selects whether the mic acts as the left or right channel. In this build it's tied to GND, so the ESP32 reads the left channel.
MAX98357A Amplifier → ESP32-S3
- LRC (Word Select / LRCL) → GPIO16
- BCLK (Bit Clock) → GPIO15
- DIN (Data In) → GPIO17
- GAIN → leave unconnected (defaults to ~9 dB gain)
- SD (Shutdown) → leave unconnected (amp stays on by default)
- GND → any GND pin on ESP32
- VIN → 5Vin pin on ESP32 (requires In-Out jumper soldered if powered from USB)
The MAX98357A takes digital audio from the ESP32 over I²S and drives your speaker. GAIN and SD are optional pins - if you don't connect them, the amp will default to a
Speaker → MAX98357A
- If your speaker has red and black wires: connect red to + and black to -
- If your speaker just has two terminals: look for small + and - markings and connect accordingly
Use a small 4 Ω or 8 Ω speaker. With only one speaker, reversing the polarity won't damage anything - it will still play sound. But for best practice (and to avoid problems if you ever add a second speaker), follow the polarity markings or wire colors consistently.
LED Feedback
The ESP32-S3 DevKitC-1 includes a built-in WS2812 RGB LED on GPIO48. With the RGB jumper soldered, this LED is available to ESPHome and will show different effects for listening, thinking, and replying. No extra wiring is needed.
Power & Ground
Make sure that all modules share a common ground with the ESP32. If you use a breadboard, you can connect all GND pins together along one rail for tidiness.
USB Ports and Flashing Firmware
The ESP32-S3 DevKitC-1 (N16R8) has two USB-C ports on the board. They look similar but serve different purposes:
- USB (sometimes labeled "USB OTG") – this port is wired directly to the ESP32-S3 chip. It supports native USB features and is the main port you'll use for flashing firmware with ESPHome.
- COM (sometimes labeled "UART" or "Debug") – this port goes through a USB-to-serial bridge. It's useful for debugging logs and can also be used for flashing in some cases, but most ESPHome users stick with the USB/OTG port.
Two Ways to Flash ESP32 Firmware
There are two common approaches for flashing firmware to your ESP32 board:
- Connect directly to your PC – plug the ESP32 into your computer via USB/OTG, then use the ESPHome web flasher or esptool to upload firmware.
- Connect to the computer running Home Assistant – plug the ESP32 into the machine where Home Assistant is installed (for example, a Raspberry Pi or server). The ESPHome add-on can then flash the device directly from the HA dashboard.
This is the method I personally use, since it keeps everything managed inside Home Assistant.
Entering Flashing Mode
ESPHome may be able to flash the board automatically. But if you run into errors, you may need to force the chip into "flashing mode" using the buttons on the board:
- BOOT – puts the chip into flashing mode if held during reset.
- EN (Reset) – restarts the board when pressed.
To manually enter flashing mode:
- Connect the board to your PC or HA host via the USB/OTG port.
- Hold down the BOOT button.
- While still holding BOOT, briefly press and release EN (reset).
- Release the BOOT button – the board is now in download mode.
After flashing is complete, press EN again to reboot into normal operation.
First-Time Setup in ESPHome
Once your hardware is wired up, it's time to flash firmware onto the ESP32 so it can work as a voice assistant. Plug the board into your computer or your Home Assistant machine using the USB/OTG port.
Adding a New Device
Open the ESPHome dashboard in Home Assistant (Settings → Add-ons → ESPHome → Open Web UI). Click "+ New Device" and follow the wizard:
- Enter a name for your device (e.g. Hallway-Voice).
- Select ESP32 as the device type. Since we're using the ESP32-S3 DevKitC-1, choose ESP32-S3 from the dropdown.
- Click Next. ESPHome will generate a starter configuration for you.
Encryption Key
ESPHome will show you a secure encryption key. Copy this somewhere safe - you may not need it immediately, but it's used if you want to connect securely to the ESP32 over the ESPHome API later.
Choosing How to Install
At this point ESPHome will ask how you want to install the firmware. You'll see options like:
- Plug into this computer (requires browser support and direct USB connection)
- Connect to computer running ESPHome Device Builder (flashes directly via your Home Assistant machine)
For this article, we'll choose Connect to computer running ESPHome Device Builder, since it keeps everything managed inside Home Assistant without needing special drivers on your PC.
Flashing the Device
Click Install, then pick your device from the list. ESPHome will compile the firmware and flash it to your ESP32. For now we'll just use the default firmware - later we'll update the configuration to include the microphone, speaker, wake word, and LEDs.
Once the flash is complete, the board will reboot and appear in your ESPHome dashboard.
Installing the Custom Configuration
After the first flash, your device is now paired with ESPHome and appears in the dashboard. The encryption key and API settings that were generated during setup will stay the same - you don't need to re-enter these each time. From now on, all we'll be changing are the configuration details that define how the microphone, amplifier, wake word, and LED should behave.
To install the custom code:
- In the ESPHome dashboard, click Edit on your new device.
- Replace the starter YAML with the code below:
Before copying the code, note that most of it should stay exactly as-is. The only parts you'll want to change for your own setup are:
- Device name (
name:underesphome:) – must be unique on your network, e.g. kitchen-assistant. - Friendly name (
friendly_name:) – how it appears in Home Assistant's dashboard. - Wi-Fi details (
wifi.ssidandwifi.password) – use your own network credentials, or keep!secretif you're storing them in asecrets.yamlfile.
Everything else (pin numbers, board type, audio settings, etc.) is already correct for this hardware build.
substitutions:
# AUDIO I2S pinout for INMP441 mic and MAX98357A speaker
lrclk_in: GPIO21
bclk_in: GPIO13
lrclk_out: GPIO16
bclk_out: GPIO15
din: GPIO17
sd: GPIO18
di: GPIO48 # WS2812 ring data in
l_r: "left"
# Voice Assistant Phases
voice_assist_idle_phase_id: '1'
voice_assist_listening_phase_id: '2'
voice_assist_thinking_phase_id: '3'
voice_assist_replying_phase_id: '4'
voice_assist_not_ready_phase_id: '10'
voice_assist_error_phase_id: '11'
voice_assist_muted_phase_id: '12'
esphome:
name: hallway-voice
friendly_name: Hallway-Voice
min_version: 2025.6.2
on_boot:
priority: 600
then:
- wait_until: api.connected
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
esp32:
board: esp32-s3-devkitc-1
flash_size: 16MB
variant: esp32s3
framework:
type: esp-idf
version: 5.4.2 # ← update to at least 5.4.
sdkconfig_options:
CONFIG_ESP32_S3_BOX_BOARD: "y"
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
CONFIG_TCPIP_RECVMBOX_SIZE: "512"
CONFIG_TCP_SND_BUF_DEFAULT: "65535"
CONFIG_TCP_WND_DEFAULT: "512000"
CONFIG_TCP_RECVMBOX_SIZE: "512"
CONFIG_I2S_BUFSIZE: "8192"
CONFIG_SPIRAM_SUPPORT: "y"
CONFIG_SPIRAM_BOOT_INIT: "y"
CONFIG_SPIRAM_USE_MALLOC: "y"
CONFIG_SPIRAM_CACHE_WORKAROUND: "y"
psram:
mode: octal
speed: 80MHz
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
power_save_mode: none
ap:
ssid: "ESPHome_Fallback"
password: "FallbackPassword"
logger:
level: debug
logs:
sensor: WARN
api:
i2c:
- id: bus_a
sda: GPIO5
scl: GPIO6
frequency: 400kHz
globals:
- id: init_in_progress
type: bool
restore_value: no
initial_value: 'true'
- id: voice_assistant_phase
type: int
restore_value: no
initial_value: ${voice_assist_not_ready_phase_id}
- id: is_timer_active
type: bool
restore_value: no
initial_value: 'false'
# --- LED RING (16x WS2812) ---
light:
- platform: esp32_rmt_led_strip
id: led_ring
name: "LED Ring"
pin: ${di}
num_leds: 16
rgb_order: GRB
chipset: ws2812
default_transition_length: 0s
effects:
- pulse:
name: "Pulse"
transition_length: 0.5s
update_interval: 0.5s
- addressable_twinkle:
name: "Working"
twinkle_probability: 5%
progress_interval: 4ms
- addressable_color_wipe:
name: "Wakeword"
colors:
- red: 0%
green: 50%
blue: 0%
num_leds: 12
add_led_interval: 20ms
reverse: false
- addressable_color_wipe:
name: "Connecting"
colors:
- red: 60%
green: 60%
blue: 60%
num_leds: 12
- red: 60%
green: 60%
blue: 0%
num_leds: 12
add_led_interval: 100ms
reverse: true
# --- I2S AUDIO ---
i2s_audio:
- id: i2s_out
i2s_lrclk_pin: ${lrclk_out}
i2s_bclk_pin: ${bclk_out}
# i2s_mclk_pin: GPIO14 # Not required for MAX98357A; leave commented unless wired
- id: i2s_in
i2s_bclk_pin: ${bclk_in}
i2s_lrclk_pin: ${lrclk_in}
microphone:
- platform: i2s_audio
id: i2s_mic
i2s_audio_id: i2s_in
i2s_din_pin: ${sd}
pdm: false
sample_rate: 16000
bits_per_sample: 32bit
channel: left
adc_type: external
speaker:
- platform: i2s_audio
id: i2s_speaker
sample_rate: 48000
i2s_dout_pin: ${din}
bits_per_sample: 32bit
i2s_audio_id: i2s_out
dac_type: external
channel: ${l_r} # "left" or "right" (from substitutions)
# --- MICRO WAKE WORD ---
micro_wake_word:
id: mww
microphone:
microphone: i2s_mic
channels: 0
gain_factor: 4
stop_after_detection: false
models:
- model: https://github.com/kahrendt/microWakeWord/releases/download/okay_nabu_20241226.3/okay_nabu.json
id: okay_nabu
- model: hey_jarvis
id: hey_jarvis
- model: hey_mycroft
id: hey_mycroft
- model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
id: stop
internal: true
vad:
probability_cutoff: 0.05
on_wake_word_detected:
- if:
condition:
lambda: return id(voice_assistant_phase) != ${voice_assist_muted_phase_id};
then:
- light.turn_on:
id: led_ring
effect: "Wakeword"
- delay: 300ms
- voice_assistant.start:
wake_word: !lambda return wake_word;
# --- VOICE ASSISTANT ---
voice_assistant:
id: va
microphone:
microphone: i2s_mic
channels: 0
speaker: i2s_speaker
micro_wake_word: mww
# use_wake_word: true # keep disabled when using micro_wake_word
on_client_connected:
- lambda: |-
id(init_in_progress) = false;
- script.execute: update_leds
on_client_disconnected:
- script.execute: update_leds
on_error:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_error_phase_id};
- script.execute: update_leds
on_start:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
- script.execute: update_leds
on_listening:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
- script.execute: update_leds
on_stt_vad_start:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
- script.execute: update_leds
on_stt_vad_end:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: update_leds
on_tts_start:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: update_leds
on_intent_progress:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: update_leds
on_end:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: update_leds
on_idle:
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: update_leds
# --- SCRIPTS FOR LED STATUS ---
script:
- id: update_leds
mode: restart
then:
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_idle_phase_id};
then:
- light.turn_off: led_ring
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_listening_phase_id};
then:
- light.turn_on:
id: led_ring
effect: Pulse
brightness: 100%
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_thinking_phase_id};
then:
- light.turn_on:
id: led_ring
effect: Working
brightness: 100%
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_replying_phase_id};
then:
- light.turn_on:
id: led_ring
effect: Pulse
brightness: 100%
- if:
condition:
lambda: |
return id(voice_assistant_phase) == ${voice_assist_error_phase_id}
|| id(voice_assistant_phase) == ${voice_assist_not_ready_phase_id};
then:
- light.turn_on:
id: led_ring
effect: Connecting
brightness: 100%
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_muted_phase_id};
then:
- light.turn_on:
id: led_ring
effect: Pulse
brightness: 20%
- Click Save and then Install.
- Select Wirelessly (OTA – Over The Air) to update the device without plugging it in again. The only time you need USB is for the very first flash.
ESPHome will recompile and push the updated firmware. When the device restarts, it will now run with your full voice assistant setup.
Modifying the Configuration
Once your device is up and running, you can fine-tune how it hears you and how it responds. You don't need to change the wiring - just edit a few options in the YAML and reinstall over Wi-Fi (OTA).
Changing Audio Sensitivity & Noise Handling
Two places control how easily your assistant wakes and how clean the captured audio is:
-
Wake word sensitivity – in
micro_wake_word:you'll seegain_factor: 4and a VAD setting. Highergain_factormakes the mic "hotter" (picks up quieter voices but also more room noise). The VAD (vad:→probability_cutoff) tells the system how confident it must be that "this is speech". Lowering the cutoff (e.g. from0.05to0.03) makes it more permissive; raising it (e.g. to0.07) makes it stricter. -
Assist pipeline processing – in
voice_assistant:you can enable built-in processing to reduce noise and set automatic gain. These run in the audio pipeline and are easy to tweak.
Where to change it (example):
# Wake word front-end sensitivity
micro_wake_word:
id: mww
microphone:
microphone: i2s_mic
channels: 0
gain_factor: 4 # Try 3–6; higher = more sensitive (and more noise)
vad:
probability_cutoff: 0.05 # 0.03 = permissive, 0.07 = strict
# Assist pipeline processing (runs during listening)
voice_assistant:
id: va
microphone:
microphone: i2s_mic
channels: 0
speaker: i2s_speaker
micro_wake_word: mww
# Optional audio processing:
noise_suppression_level: 2 # 0–4 (0=off). Start at 1–2; 3–4 is strongest but can dull voices.
auto_gain: 6dBFS # 0–31 dBFS (0=off). Lifts quiet inputs; too high can amplify noise.
volume_multiplier: 1.0 # >0. Multiplies TTS/response volume (e.g. 1.2 is +20%).
How to choose values: After flashing and rebooting, your ESP32 voice assistant will normally be auto-discovered by Home Assistant. You'll see a banner at the bottom of the screen asking to set up the new device. If that banner doesn't appear, you can add it manually: Once added, the device appears as an Assist satellite. In the device page you'll find: These settings let you fine-tune the experience right from Home Assistant's UI, without touching YAML again. For example, you might lower sensitivity if it's waking too often, or switch to a different wake word model. You've now built and configured your very own voice assistant with Home Assistant. From here, you can start using it to issue commands, get spoken responses, and make it part of your smart home setup. For troubleshooting tips, or to learn more ways of improving and expanding your voice assistant, take a look at the other articles in the Voice Assistants category on this site.
Start conservative: noise_suppression_level: 1 or 2, auto_gain: 4dBFS–8dBFS, and leave
volume_multiplier: 1.0. If background hum/fan noise is an issue, try NS at 3; if voices sound "underwater," drop it back down.
If the assistant misses quiet speech, bump auto_gain by a few dBFS; if it triggers on noise, reduce it.
Adding the Device to Home Assistant
What You'll See in Home Assistant
Wrapping Up