Introduction to the Scrape Integration
The Scrape integration in Home Assistant lets you create sensors that pull data from any public web page, even if the site doesn't offer a formal API. It works by loading the HTML of the page and "scraping" values from elements you specify - like a headline, weather value, or latest version number. This is perfect for simple, static pages where you want to monitor or display a value that isn't already available through an existing integration.
What Can Scrape Do?
- Extract any visible text or attribute from a web page using CSS selectors
- Support GET or POST requests, with optional payloads, headers, and authentication
- Create multiple sensors from the same page or API endpoint
- Use templates to clean, format, or process scraped values
- Choose how often to update (scan interval)
Limitations & Things to Know
- If the site changes its layout or class names, your sensor may stop working
- Scrape only reads the page's initial HTML - it cannot capture data loaded later by JavaScript
- It's best for small, static values; not suitable for scraping long articles or multiple values from one page (use
multiscrapefor that) - Each sensor's value (state) is limited to 255 characters in Home Assistant
Step-by-Step: How to Set Up a Scrape Sensor
- Pick the web page you want to scrape.
Example:https://www.home-assistant.io/ - Find the CSS selector for the element you want.
Use your browser's developer tools (right-click → Inspect) and copy the selector for the headline, value, or tag you need. - Add to your
configuration.yaml(YAML method):scrape: - resource: https://www.home-assistant.io/ sensor: - name: "HA Latest Release" select: ".release-date" value_template: "{{ value | trim }}" scan_interval: 3600 # check every hour - Or use the UI method:
In Home Assistant, go to Settings → Devices & Services → Add Integration and search for Scrape.
Enter the page URL under Resource, choose GET (or POST if the site requires it), and add your sensor details such as Name, Selector, and Value template. - Restart Home Assistant (if using YAML) and check your new sensor under Developer Tools → States or add it to your dashboard.
Selectors: What They Are and How to Use Them
A selector tells Scrape exactly which part of the web page to extract. It uses standard CSS selectors, the same language that web pages use for styling.
div- selects every<div>element on the page.price- selects any element with class "price" (e.g.<span class="price">)#total- selects an element with id "total"div.item > span.value- selects a<span>with class "value" that is a direct child of<div class="item">ul.list li:nth-child(2)- selects the second list item inside a<ul class="list">
How to Find the Right Selector
- Open the target page in your browser.
- Right-click on the value you want → Inspect.
- In the Developer Tools panel, the element is highlighted. Right-click → Copy → Copy selector.
- Simplify it if possible - short, stable selectors (like a class or id) are less likely to break.
- If multiple elements match your selector, use the Index field to pick one (0 = first, 1 = second, etc.).
What If You Want the Entire Page?
If the site simply returns a plain text value or very small HTML response, you can scrape the whole page content:
- Use Select =
body(orhtml) - this grabs all visible text from the page. - If the response isn't HTML (just raw text), you can still scrape it with
select: body. - Alternatively, if you want the entire response untouched, you can use the RESTful Sensor instead of Scrape.
Common Configuration Options
resource: The page you want to scrape (required)select: CSS selector for the element you want (required)index: If the selector matches multiple elements, which one to use (0 = first)attribute: Get an HTML attribute (e.g.hreforsrc) instead of inner textvalue_template: Format or process the scraped text (Jinja2 template, optional)scan_interval: How often to refresh (in seconds, default 600 = 10 min)method: HTTP method (GETorPOST)payload: JSON or form data sent with aPOSTrequest (optional)headers: Add custom request headers (optional)verify_ssl: Disable this only if the page uses a self-signed certificate - you likely do not need to change thisunit_of_measurement,icon,device_class: Optional display options
255-Character Limit in Sensor States (and how to work around it)
Home Assistant limits the state (the main value) of any entity to 255 characters. If you scrape long text, the state will be cut off and anything that reads the state later will only see the truncated value. There's no way to raise this limit for states.
What does work for long text
The key is to avoid putting the long text into a state in the first place. Instead, store it in attributes (which aren't limited to 255 chars) or outside an entity entirely. Here are proven patterns:
1) If the site has JSON → use REST sensor with JSON attributes
The REST sensor can pull JSON and place long fields straight into attributes, while keeping a short state.
rest:
- resource: https://example.com/article.json
sensor:
- name: "News Article"
value_template: "{{ value_json.id }}" # short state (e.g., id)
json_attributes:
- title
- body # long text lives here
Then use state_attr('sensor.news_article', 'body') in a Markdown card or automation. This bypasses the 255-char limit because the long text never became a state.
2) For HTML pages (no JSON) → use MultiScrape (HACS)
The community MultiScrape integration can extract multiple fields and place them directly into attributes from an HTML page, avoiding the state limit. Keep the state short (e.g., a length or timestamp) and put the big blob into an attribute. (Install via HACS → Integrations → "multiscrape".)
What does not work
- Copying from a Scrape sensor's state into another sensor or attribute later. The value is already truncated by then.
- Using
input_textfor long blobs - it's also limited to 255 characters.
Notes for Scrape users
- Scrape itself can't write arbitrary attributes. To keep long text, prefer: REST (with
json_attributes), or MultiScrape. - If you must stick with Scrape, consider scraping a short marker (e.g., a content hash or title) as the state and fetch the full content by another method (REST/MQTT/file) for display or processing.
Examples
- Scrape the latest Home Assistant version:
scrape: - resource: https://www.home-assistant.io/ sensor: - name: "Release" select: ".release-date" - Extract a number from a tag and process it:
scrape: - resource: https://example.com/ sensor: - name: "UV Index" select: "p" index: 19 unit_of_measurement: "UV Index" - Get a link from an attribute:
scrape: - resource: https://example.com/news sensor: - name: "Top Story Link" select: ".headline a" attribute: "href" - POST request with payload:
scrape: - resource: https://example.com/api method: POST payload: '{"region":"EU"}' headers: Content-Type: application/json sensor: - name: "API Value" select: ".result" value_template: "{{ value | float(0) }}"
Troubleshooting & Tips for Success
- If your sensor shows
unknown, verify that the selector matches something on the page. - Check that the site doesn't rely on JavaScript to display the data - Scrape can't read dynamically loaded content.
- If your selector matches multiple elements, try adjusting the
indexnumber. - Test your
value_templatein Developer Tools → Template to ensure it returns valid output. - Don't scrape too often - keep
scan_intervalreasonable (e.g. every 10–30 minutes) to avoid being blocked. - If you need more than one or two values from the same page, check out the multiscrape custom integration on HACS.
Summary
The Scrape integration is a powerful yet simple way to bring web data into Home Assistant - perfect for grabbing a version number, price, score, or other small text snippet from a public web page. Find the right CSS selector, use templates to tidy up your data, and remember the 255-character limit when scraping longer text. For advanced cases, multiscrape and rest sensors can extend what's possible.