Scrape Sensor Explained - Home Assistant Guide

Introduction to the Scrape Integration

The Scrape integration in Home Assistant lets you create sensors that pull data from any public web page, even if the site doesn't offer a formal API. It works by loading the HTML of the page and "scraping" values from elements you specify - like a headline, weather value, or latest version number. This is perfect for simple, static pages where you want to monitor or display a value that isn't already available through an existing integration.

What Can Scrape Do?

Extract any visible text or attribute from a web page using CSS selectors
Support GET or POST requests, with optional payloads, headers, and authentication
Create multiple sensors from the same page or API endpoint
Use templates to clean, format, or process scraped values
Choose how often to update (scan interval)

Limitations & Things to Know

If the site changes its layout or class names, your sensor may stop working
Scrape only reads the page's initial HTML - it cannot capture data loaded later by JavaScript
It's best for small, static values; not suitable for scraping long articles or multiple values from one page (use multiscrape for that)
Each sensor's value (state) is limited to 255 characters in Home Assistant

Step-by-Step: How to Set Up a Scrape Sensor

Pick the web page you want to scrape.
Example: https://www.home-assistant.io/
Find the CSS selector for the element you want.
Use your browser's developer tools (right-click → Inspect) and copy the selector for the headline, value, or tag you need.

Add to your configuration.yaml (YAML method):

scrape:
  - resource: https://www.home-assistant.io/
    sensor:
      - name: "HA Latest Release"
        select: ".release-date"
        value_template: "{{ value | trim }}"
        scan_interval: 3600  # check every hour

Or use the UI method:
In Home Assistant, go to Settings → Devices & Services → Add Integration and search for Scrape.
Enter the page URL under Resource, choose GET (or POST if the site requires it), and add your sensor details such as Name, Selector, and Value template.
Restart Home Assistant (if using YAML) and check your new sensor under Developer Tools → States or add it to your dashboard.

Selectors: What They Are and How to Use Them

A selector tells Scrape exactly which part of the web page to extract. It uses standard CSS selectors, the same language that web pages use for styling.

div - selects every <div> element on the page
.price - selects any element with class "price" (e.g. <span class="price">)
#total - selects an element with id "total"
div.item > span.value - selects a <span> with class "value" that is a direct child of <div class="item">
ul.list li:nth-child(2) - selects the second list item inside a <ul class="list">

How to Find the Right Selector

Open the target page in your browser.
Right-click on the value you want → Inspect.
In the Developer Tools panel, the element is highlighted. Right-click → Copy → Copy selector.
Simplify it if possible - short, stable selectors (like a class or id) are less likely to break.
If multiple elements match your selector, use the Index field to pick one (0 = first, 1 = second, etc.).

What If You Want the Entire Page?

If the site simply returns a plain text value or very small HTML response, you can scrape the whole page content:

Use Select = body (or html) - this grabs all visible text from the page.
If the response isn't HTML (just raw text), you can still scrape it with select: body.
Alternatively, if you want the entire response untouched, you can use the RESTful Sensor instead of Scrape.

Common Configuration Options

resource: The page you want to scrape (required)
select: CSS selector for the element you want (required)
index: If the selector matches multiple elements, which one to use (0 = first)
attribute: Get an HTML attribute (e.g. href or src) instead of inner text
value_template: Format or process the scraped text (Jinja2 template, optional)
scan_interval: How often to refresh (in seconds, default 600 = 10 min)
method: HTTP method (GET or POST)
payload: JSON or form data sent with a POST request (optional)
headers: Add custom request headers (optional)
verify_ssl: Disable this only if the page uses a self-signed certificate - you likely do not need to change this
unit_of_measurement, icon, device_class: Optional display options

255-Character Limit in Sensor States (and how to work around it)

Home Assistant limits the state (the main value) of any entity to 255 characters. If you scrape long text, the state will be cut off and anything that reads the state later will only see the truncated value. There's no way to raise this limit for states.

What does work for long text

The key is to avoid putting the long text into a state in the first place. Instead, store it in attributes (which aren't limited to 255 chars) or outside an entity entirely. Here are proven patterns:

1) If the site has JSON → use `REST` sensor with JSON attributes

The REST sensor can pull JSON and place long fields straight into attributes, while keeping a short state.

rest:
  - resource: https://example.com/article.json
    sensor:
      - name: "News Article"
        value_template: "{{ value_json.id }}"        # short state (e.g., id)
        json_attributes:
          - title
          - body                                    # long text lives here

Then use state_attr('sensor.news_article', 'body') in a Markdown card or automation. This bypasses the 255-char limit because the long text never became a state.

2) For HTML pages (no JSON) → use MultiScrape (HACS)

The community MultiScrape integration can extract multiple fields and place them directly into attributes from an HTML page, avoiding the state limit. Keep the state short (e.g., a length or timestamp) and put the big blob into an attribute. (Install via HACS → Integrations → "multiscrape".)

What does not work

Copying from a Scrape sensor's state into another sensor or attribute later. The value is already truncated by then.
Using input_text for long blobs - it's also limited to 255 characters.

Notes for Scrape users

Scrape itself can't write arbitrary attributes. To keep long text, prefer: REST (with json_attributes), or MultiScrape.
If you must stick with Scrape, consider scraping a short marker (e.g., a content hash or title) as the state and fetch the full content by another method (REST/MQTT/file) for display or processing.

Examples

Scrape the latest Home Assistant version:

scrape:
  - resource: https://www.home-assistant.io/
    sensor:
      - name: "Release"
        select: ".release-date"

Extract a number from a tag and process it:

scrape:
  - resource: https://example.com/
    sensor:
      - name: "UV Index"
        select: "p"
        index: 19
        unit_of_measurement: "UV Index"

Get a link from an attribute:

scrape:
  - resource: https://example.com/news
    sensor:
      - name: "Top Story Link"
        select: ".headline a"
        attribute: "href"

POST request with payload:

scrape:
  - resource: https://example.com/api
    method: POST
    payload: '{"region":"EU"}'
    headers:
      Content-Type: application/json
    sensor:
      - name: "API Value"
        select: ".result"
        value_template: "{{ value | float(0) }}"

Troubleshooting & Tips for Success

If your sensor shows unknown, verify that the selector matches something on the page.
Check that the site doesn't rely on JavaScript to display the data - Scrape can't read dynamically loaded content.
If your selector matches multiple elements, try adjusting the index number.
Test your value_template in Developer Tools → Template to ensure it returns valid output.
Don't scrape too often - keep scan_interval reasonable (e.g. every 10–30 minutes) to avoid being blocked.
If you need more than one or two values from the same page, check out the multiscrape custom integration on HACS.

Summary

The Scrape integration is a powerful yet simple way to bring web data into Home Assistant - perfect for grabbing a version number, price, score, or other small text snippet from a public web page. Find the right CSS selector, use templates to tidy up your data, and remember the 255-character limit when scraping longer text. For advanced cases, multiscrape and rest sensors can extend what's possible.