Build Your Own ESP32-Based Voice Assistant

Building Your Own Voice Assistant
Wiring the Components
USB Ports and Flashing Firmware
First-Time Setup in ESPHome
Installing the Custom Configuration
Modifying the Configuration
Adding the Device to Home Assistant

Building Your Own Voice Assistant

Home Assistant now offers its own dedicated Voice PE hardware, which makes getting started very easy. But building your own voice assistant has some big advantages: you can do it cheaper (especially if you intend to build several), learn how the technology actually works, and fully customize it to your needs. Instead of being locked to one device, you can put together your own assistants for different rooms around the house.

In this article, we'll walk through how to assemble a small, low-cost voice assistant using off-the-shelf parts. It won't take much more than some jumper cables and a bit of patience to get everything running with Home Assistant.

What You'll Need

ESP32-C3 N16R8 board – a compact microcontroller with Wi-Fi, perfect for running voice assistants.
INMP441 microphone – a digital microphone that can clearly capture your voice commands.
MAX98357A amplifier – allows you to connect a small speaker so the assistant can talk back to you.
Speaker – any small 4Ω or 8Ω speaker will work with the amplifier for audio output.
Jumper cables – for connecting the microphone and amplifier to the ESP32 board.
Optional: Breadboard – keeps the wiring neat and lets you rearrange connections more easily, but you can also just use jumper cables directly.

With this hardware, you'll be able to capture voice, process it through Home Assistant, and get spoken responses – essentially replicating the functionality of commercial smart speakers, but entirely under your control.

USB Ports and Flashing Firmware

The ESP32-S3 DevKitC-1 has multiple GPIO (General Purpose Input/Output) pins that we'll use to connect the microphone and amplifier. Before wiring, check two solder jumpers on your board:

RGB – make sure this jumper is soldered (closed) if you want to use the onboard WS2812 RGB LED. This guide assumes it is enabled, so ESPHome can control the built-in LED for status feedback.
In-Out – solder this jumper closed if you want the USB's 5 V to appear on the 5Vin pin. This is required to power the MAX98357A amplifier from the ESP32 board itself. Leave it open only if you plan to supply 5 V from an external source instead.

INMP441 Microphone → ESP32-S3

L/R → GND (forces the mic to output the left channel)
WS (Word Select / LRCL) → GPIO21
SCK (Bit Clock) → GPIO13
SD (Data Out) → GPIO18
VDD → 3.3V on ESP32
GND → any GND pin on ESP32

The INMP441 outputs digital audio using the I²S protocol. SCK and WS keep the data in sync, while SD carries the microphone signal. The L/R pin isn't an audio signal - it just selects whether the mic acts as the left or right channel. In this build it's tied to GND, so the ESP32 reads the left channel.

MAX98357A Amplifier → ESP32-S3

LRC (Word Select / LRCL) → GPIO16
BCLK (Bit Clock) → GPIO15
DIN (Data In) → GPIO17
GAIN → leave unconnected (defaults to ~9 dB gain)
SD (Shutdown) → leave unconnected (amp stays on by default)
GND → any GND pin on ESP32
VIN → 5Vin pin on ESP32 (requires In-Out jumper soldered if powered from USB)

The MAX98357A takes digital audio from the ESP32 over I²S and drives your speaker. GAIN and SD are optional pins - if you don't connect them, the amp will default to a

Speaker → MAX98357A

If your speaker has red and black wires: connect red to + and black to -
If your speaker just has two terminals: look for small + and - markings and connect accordingly

Use a small 4 Ω or 8 Ω speaker. With only one speaker, reversing the polarity won't damage anything - it will still play sound. But for best practice (and to avoid problems if you ever add a second speaker), follow the polarity markings or wire colors consistently.

LED Feedback

The ESP32-S3 DevKitC-1 includes a built-in WS2812 RGB LED on GPIO48. With the RGB jumper soldered, this LED is available to ESPHome and will show different effects for listening, thinking, and replying. No extra wiring is needed.

Power & Ground

Make sure that all modules share a common ground with the ESP32. If you use a breadboard, you can connect all GND pins together along one rail for tidiness.

USB Ports and Flashing Firmware

The ESP32-S3 DevKitC-1 (N16R8) has two USB-C ports on the board. They look similar but serve different purposes:

USB (sometimes labeled "USB OTG") – this port is wired directly to the ESP32-S3 chip. It supports native USB features and is the main port you'll use for flashing firmware with ESPHome.
COM (sometimes labeled "UART" or "Debug") – this port goes through a USB-to-serial bridge. It's useful for debugging logs and can also be used for flashing in some cases, but most ESPHome users stick with the USB/OTG port.

Two Ways to Flash ESP32 Firmware

There are two common approaches for flashing firmware to your ESP32 board:

Connect directly to your PC – plug the ESP32 into your computer via USB/OTG, then use the ESPHome web flasher or esptool to upload firmware.
Connect to the computer running Home Assistant – plug the ESP32 into the machine where Home Assistant is installed (for example, a Raspberry Pi or server). The ESPHome add-on can then flash the device directly from the HA dashboard.
This is the method I personally use, since it keeps everything managed inside Home Assistant.

Entering Flashing Mode

ESPHome may be able to flash the board automatically. But if you run into errors, you may need to force the chip into "flashing mode" using the buttons on the board:

BOOT – puts the chip into flashing mode if held during reset.
EN (Reset) – restarts the board when pressed.

To manually enter flashing mode:

Connect the board to your PC or HA host via the USB/OTG port.
Hold down the BOOT button.
While still holding BOOT, briefly press and release EN (reset).
Release the BOOT button – the board is now in download mode.

After flashing is complete, press EN again to reboot into normal operation.

First-Time Setup in ESPHome

Once your hardware is wired up, it's time to flash firmware onto the ESP32 so it can work as a voice assistant. Plug the board into your computer or your Home Assistant machine using the USB/OTG port.

Adding a New Device

Open the ESPHome dashboard in Home Assistant (Settings → Add-ons → ESPHome → Open Web UI). Click "+ New Device" and follow the wizard:

Enter a name for your device (e.g. Hallway-Voice).
Select ESP32 as the device type. Since we're using the ESP32-S3 DevKitC-1, choose ESP32-S3 from the dropdown.
Click Next. ESPHome will generate a starter configuration for you.

Encryption Key

ESPHome will show you a secure encryption key. Copy this somewhere safe - you may not need it immediately, but it's used if you want to connect securely to the ESP32 over the ESPHome API later.

Choosing How to Install

At this point ESPHome will ask how you want to install the firmware. You'll see options like:

Plug into this computer (requires browser support and direct USB connection)
Connect to computer running ESPHome Device Builder (flashes directly via your Home Assistant machine)

For this article, we'll choose Connect to computer running ESPHome Device Builder, since it keeps everything managed inside Home Assistant without needing special drivers on your PC.

Flashing the Device

Click Install, then pick your device from the list. ESPHome will compile the firmware and flash it to your ESP32. For now we'll just use the default firmware - later we'll update the configuration to include the microphone, speaker, wake word, and LEDs.

Once the flash is complete, the board will reboot and appear in your ESPHome dashboard.

Installing the Custom Configuration

After the first flash, your device is now paired with ESPHome and appears in the dashboard. The encryption key and API settings that were generated during setup will stay the same - you don't need to re-enter these each time. From now on, all we'll be changing are the configuration details that define how the microphone, amplifier, wake word, and LED should behave.

To install the custom code:

In the ESPHome dashboard, click Edit on your new device.
Replace the starter YAML with the code below:

Before copying the code, note that most of it should stay exactly as-is. The only parts you'll want to change for your own setup are:

Device name (name: under esphome:) – must be unique on your network, e.g. kitchen-assistant.
Friendly name (friendly_name:) – how it appears in Home Assistant's dashboard.
Wi-Fi details (wifi.ssid and wifi.password) – use your own network credentials, or keep !secret if you're storing them in a secrets.yaml file.

Everything else (pin numbers, board type, audio settings, etc.) is already correct for this hardware build.


substitutions:
  # AUDIO I2S pinout for INMP441 mic and MAX98357A speaker
  lrclk_in: GPIO21
  bclk_in: GPIO13
  lrclk_out: GPIO16
  bclk_out: GPIO15
  din: GPIO17
  sd: GPIO18
  di: GPIO48         # WS2812 ring data in
  l_r: "left"

  # Voice Assistant Phases
  voice_assist_idle_phase_id: '1'
  voice_assist_listening_phase_id: '2'
  voice_assist_thinking_phase_id: '3'
  voice_assist_replying_phase_id: '4'
  voice_assist_not_ready_phase_id: '10'
  voice_assist_error_phase_id: '11'
  voice_assist_muted_phase_id: '12'

esphome:
  name: hallway-voice
  friendly_name: Hallway-Voice
  min_version: 2025.6.2
  on_boot:
    priority: 600
    then:
      - wait_until: api.connected
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  variant: esp32s3
  framework:
    type: esp-idf
    version: 5.4.2     # ← update to at least 5.4.
    sdkconfig_options:
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"
      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"
      CONFIG_I2S_BUFSIZE: "8192"
      CONFIG_SPIRAM_SUPPORT: "y"
      CONFIG_SPIRAM_BOOT_INIT: "y"
      CONFIG_SPIRAM_USE_MALLOC: "y"
      CONFIG_SPIRAM_CACHE_WORKAROUND: "y"

psram:
  mode: octal
  speed: 80MHz

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  power_save_mode: none
  ap:
    ssid: "ESPHome_Fallback"
    password: "FallbackPassword"

logger:
  level: debug
  logs:
    sensor: WARN

api:

i2c:
  - id: bus_a
    sda: GPIO5
    scl: GPIO6
    frequency: 400kHz

globals:
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}
  - id: is_timer_active
    type: bool
    restore_value: no
    initial_value: 'false'

# --- LED RING (16x WS2812) ---
light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "LED Ring"
    pin: ${di}
    num_leds: 16
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true

# --- I2S AUDIO ---
i2s_audio:
  - id: i2s_out
    i2s_lrclk_pin: ${lrclk_out}
    i2s_bclk_pin:  ${bclk_out}
    # i2s_mclk_pin:  GPIO14   # Not required for MAX98357A; leave commented unless wired
  - id: i2s_in
    i2s_bclk_pin: ${bclk_in}
    i2s_lrclk_pin: ${lrclk_in}

microphone:
  - platform: i2s_audio
    id: i2s_mic
    i2s_audio_id: i2s_in
    i2s_din_pin: ${sd}
    pdm: false
    sample_rate: 16000
    bits_per_sample: 32bit
    channel: left
    adc_type: external

speaker:
  - platform: i2s_audio
    id: i2s_speaker
    sample_rate: 48000
    i2s_dout_pin: ${din}
    bits_per_sample: 32bit
    i2s_audio_id: i2s_out
    dac_type: external
    channel: ${l_r}     # "left" or "right" (from substitutions)

# --- MICRO WAKE WORD ---
micro_wake_word:
  id: mww
  microphone:
    microphone: i2s_mic
    channels: 0
    gain_factor: 4
  stop_after_detection: false
  models:
    - model: https://github.com/kahrendt/microWakeWord/releases/download/okay_nabu_20241226.3/okay_nabu.json
      id: okay_nabu
    - model: hey_jarvis
      id: hey_jarvis
    - model: hey_mycroft
      id: hey_mycroft
    - model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
      id: stop
      internal: true
  vad:
    probability_cutoff: 0.05
  on_wake_word_detected:
    - if:
        condition:
          lambda: return id(voice_assistant_phase) != ${voice_assist_muted_phase_id};
        then:
          - light.turn_on:
              id: led_ring
              effect: "Wakeword"
          - delay: 300ms
          - voice_assistant.start:
              wake_word: !lambda return wake_word;

# --- VOICE ASSISTANT ---
voice_assistant:
  id: va
  microphone:
    microphone: i2s_mic
    channels: 0
  speaker: i2s_speaker
  micro_wake_word: mww
  # use_wake_word: true   # keep disabled when using micro_wake_word

  on_client_connected:
    - lambda: |-
        id(init_in_progress) = false;
    - script.execute: update_leds

  on_client_disconnected:
    - script.execute: update_leds

  on_error:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_error_phase_id};
    - script.execute: update_leds

  on_start:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: update_leds

  on_listening:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: update_leds

  on_stt_vad_start:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: update_leds

  on_stt_vad_end:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: update_leds

  on_tts_start:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: update_leds

  on_intent_progress:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: update_leds

  on_end:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: update_leds

  on_idle:
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: update_leds

# --- SCRIPTS FOR LED STATUS ---
script:
  - id: update_leds
    mode: restart
    then:
      - if:
          condition:
            lambda: return id(voice_assistant_phase) == ${voice_assist_idle_phase_id};
          then:
            - light.turn_off: led_ring

      - if:
          condition:
            lambda: return id(voice_assistant_phase) == ${voice_assist_listening_phase_id};
          then:
            - light.turn_on:
                id: led_ring
                effect: Pulse
                brightness: 100%

      - if:
          condition:
            lambda: return id(voice_assistant_phase) == ${voice_assist_thinking_phase_id};
          then:
            - light.turn_on:
                id: led_ring
                effect: Working
                brightness: 100%

      - if:
          condition:
            lambda: return id(voice_assistant_phase) == ${voice_assist_replying_phase_id};
          then:
            - light.turn_on:
                id: led_ring
                effect: Pulse
                brightness: 100%

      - if:
          condition:
            lambda: |
              return id(voice_assistant_phase) == ${voice_assist_error_phase_id}
                     || id(voice_assistant_phase) == ${voice_assist_not_ready_phase_id};
          then:
            - light.turn_on:
                id: led_ring
                effect: Connecting
                brightness: 100%

      - if:
          condition:
            lambda: return id(voice_assistant_phase) == ${voice_assist_muted_phase_id};
          then:
            - light.turn_on:
                id: led_ring
                effect: Pulse
                brightness: 20%

Click Save and then Install.
Select Wirelessly (OTA – Over The Air) to update the device without plugging it in again. The only time you need USB is for the very first flash.

ESPHome will recompile and push the updated firmware. When the device restarts, it will now run with your full voice assistant setup.

Modifying the Configuration

Once your device is up and running, you can fine-tune how it hears you and how it responds. You don't need to change the wiring - just edit a few options in the YAML and reinstall over Wi-Fi (OTA).

Changing Audio Sensitivity & Noise Handling

Two places control how easily your assistant wakes and how clean the captured audio is:

Wake word sensitivity – in micro_wake_word: you'll see gain_factor: 4 and a VAD setting. Higher gain_factor makes the mic "hotter" (picks up quieter voices but also more room noise). The VAD (vad: → probability_cutoff) tells the system how confident it must be that "this is speech". Lowering the cutoff (e.g. from 0.05 to 0.03) makes it more permissive; raising it (e.g. to 0.07) makes it stricter.
Assist pipeline processing – in voice_assistant: you can enable built-in processing to reduce noise and set automatic gain. These run in the audio pipeline and are easy to tweak.

Where to change it (example):

# Wake word front-end sensitivity
micro_wake_word:
  id: mww
  microphone:
    microphone: i2s_mic
    channels: 0
    gain_factor: 4          # Try 3–6; higher = more sensitive (and more noise)
  vad:
    probability_cutoff: 0.05  # 0.03 = permissive, 0.07 = strict

# Assist pipeline processing (runs during listening)
voice_assistant:
  id: va
  microphone:
    microphone: i2s_mic
    channels: 0
  speaker: i2s_speaker
  micro_wake_word: mww

  # Optional audio processing:
  noise_suppression_level: 2   # 0–4 (0=off). Start at 1–2; 3–4 is strongest but can dull voices.
  auto_gain: 6dBFS             # 0–31 dBFS (0=off). Lifts quiet inputs; too high can amplify noise.
  volume_multiplier: 1.0       # >0. Multiplies TTS/response volume (e.g. 1.2 is +20%).

How to choose values:
Start conservative: noise_suppression_level: 1 or 2, auto_gain: 4dBFS–8dBFS, and leave volume_multiplier: 1.0. If background hum/fan noise is an issue, try NS at 3; if voices sound "underwater," drop it back down. If the assistant misses quiet speech, bump auto_gain by a few dBFS; if it triggers on noise, reduce it.

Adding the Device to Home Assistant

After flashing and rebooting, your ESP32 voice assistant will normally be auto-discovered by Home Assistant. You'll see a banner at the bottom of the screen asking to set up the new device.

If that banner doesn't appear, you can add it manually:

Go to Settings → Devices & Services.
Click Add Integration and choose ESPHome.
Enter the device's hostname or IP address.
Paste in the API encryption key you saved earlier.

What You'll See in Home Assistant

Once added, the device appears as an Assist satellite. In the device page you'll find:

Media player – this entity represents the speaker output on your device. Home Assistant uses it for TTS playback, and you can also use it in your own automations or scripts to make the assistant say something or play sounds.
Assist – shows the status of the assistant (Idle, Listening, Replying, etc.).
Configuration options – a set of dropdowns and toggles you can adjust without editing YAML:

Assistant – pick which Assist pipeline this satellite uses.
Finished speaking detection – controls how relaxed, or agressive, Home Assistant is when deciding whether you have finished speaking.
LED Ring – toggle the onboard LED feedback on/off.
Mute – disable the microphone without reflashing.
Wake sound – play a short tone when the assistant wakes.
Wake word sensitivity – adjust how easily it responds to the wake word (less or more sensitive).
Wake word – choose which wake word model to use (e.g. "Hey Jarvis").

These settings let you fine-tune the experience right from Home Assistant's UI, without touching YAML again. For example, you might lower sensitivity if it's waking too often, or switch to a different wake word model.

Wrapping Up

You've now built and configured your very own voice assistant with Home Assistant. From here, you can start using it to issue commands, get spoken responses, and make it part of your smart home setup.

For troubleshooting tips, or to learn more ways of improving and expanding your voice assistant, take a look at the other articles in the Voice Assistants category on this site.

Home Assistant Guide

Contents

Building Your Own Voice Assistant

What You'll Need

USB Ports and Flashing Firmware

INMP441 Microphone → ESP32-S3

MAX98357A Amplifier → ESP32-S3

Speaker → MAX98357A

LED Feedback

Power & Ground

USB Ports and Flashing Firmware

Two Ways to Flash ESP32 Firmware

Entering Flashing Mode

First-Time Setup in ESPHome

Adding a New Device

Encryption Key

Choosing How to Install

Flashing the Device

Installing the Custom Configuration

Modifying the Configuration

Changing Audio Sensitivity & Noise Handling

Adding the Device to Home Assistant

What You'll See in Home Assistant

Wrapping Up