> ## Documentation Index
> Fetch the complete documentation index at: https://docs.beyondwords.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Data attributes

Data attributes let you embed BeyondWords configuration directly in your HTML. Use them to set content metadata (title, author, publish date), override voices and languages for specific paragraphs, mark images for video, add pauses, and improve [segment detection](/docs-and-guides/distribution/player/developer-guides/segment-detection).

They complement [content filters](/docs-and-guides/integrations/content-extraction#filters)—filters remove whole HTML elements; data attributes configure how remaining content is interpreted and synthesized.

<Tip>
  If you send content via the [API](/docs-and-guides/integrations/api-overview), you can set many metadata fields (`title`, `author`, `publish_date`, etc.) directly on the request instead of using global data attributes. Global attributes are most useful when BeyondWords fetches a live page ([Magic Embed](/docs-and-guides/integrations/magic-embed), [RSS Feed Importer](/docs-and-guides/integrations/rss-feed-importer) page extraction).
</Tip>

## How it works

BeyondWords reads `data-beyondwords-*` attributes from your HTML at different stages of processing. Each attribute belongs to one of three **scopes**—global, segment, or document—which determines what it affects and where you should place it.

Attributes must be on valid HTML elements. Plaintext content without HTML tags cannot carry data attributes.

## Attribute scopes

Think of the three scopes as three layers:

|                       | [Global](#global-metadata-attributes)                                                                                                  | [Segment](#segment-attributes)                                                                                         | [Document](#advanced)                                                                              |
| --------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| **What it affects**   | The **content item** as a whole—metadata fields and defaults that apply across the article                                             | Individual **segments**—paragraphs, headings, and images as they are split from your HTML                              | How the **entire HTML document** is processed before segmentation                                  |
| **When it is read**   | During metadata extraction (when BeyondWords fetches a live page)                                                                      | During HTML → segment splitting (`auto_segment`)                                                                       | Before [content filters](/docs-and-guides/integrations/content-extraction#filters) run on the HTML |
| **How many values**   | One per content item—BeyondWords uses the **first** matching element in the document for each global attribute                         | Many—each element can have its own value; voice and language **inherit** from ancestor elements                        | One flag on the root `<html>` element                                                              |
| **Typical placement** | `<body>`, `<article>`, or a page wrapper                                                                                               | `<p>`, `<h1>`, `<div>`, `<img>`, inline `<span>` / `<time>`                                                            | `<html>` only                                                                                      |
| **API alternative**   | Yes—set `title`, `author`, `publish_date`, and similar fields on the [API request](/docs-and-guides/integrations/api-overview) instead | No direct API field—configure in HTML, or use the [Editor](/docs-and-guides/tools/editor) for `manual_segment` content | No dashboard equivalent—must be in the HTML                                                        |

**Global** attributes answer: *what is this article?* They map to content-item fields—title, author, publish date, whether the player should load, default voices for title/body/summary sections, and the single feature image for videos and share pages.

**Segment** attributes answer: *how should this part of the article be synthesized or played back?* They attach to specific HTML elements and flow into individual audio/video segments—a different voice for one paragraph, a pause mid-sentence, a marker for click-to-play, or an in-article image for video.

**Document** attributes answer: *how should BeyondWords process this HTML file?* The only document-scoped attribute today skips dashboard content filters for that HTML—useful when you need the raw markup to pass through unchanged.

### Global vs segment: a concrete example

Both scopes can set voices, but they work at different levels:

```html theme={null}
<article
  data-beyondwords-body-voice-id="100"
  data-beyondwords-title-voice-id="200"
>
  <h1>Article title</h1>
  <!-- title segments use voice 200 (global default for title sections) -->

  <p>First paragraph uses voice 100 (global body default).</p>

  <p data-beyondwords-voice-id="300">
    This paragraph uses voice 300 (segment override).
  </p>
</article>
```

Similarly, `data-beyondwords-feature-image` (global) picks **one** hero image for the content item, while `data-beyondwords-image` (segment) marks **individual** images inside the article for [video](/docs-and-guides/content/video) segments.

## When to use data attributes

| Integration                                                                                                                                                                                                                     | Global metadata attributes                                                        | Segment attributes                              |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | ----------------------------------------------- |
| [Magic Embed](/docs-and-guides/integrations/magic-embed) (live page fetch)                                                                                                                                                      | Yes—extracted from the page                                                       | Yes—from editorial HTML on the page             |
| [RSS Feed Importer](/docs-and-guides/integrations/rss-feed-importer) (page fetch enabled)                                                                                                                                       | Yes—extracted from fetched article HTML                                           | Yes                                             |
| [API](/docs-and-guides/integrations/api-overview) / [WordPress](/docs-and-guides/integrations/publishing-platforms/wordpress) / [Ghost](/docs-and-guides/integrations/publishing-platforms/ghost) (`body` HTML, `auto_segment`) | Prefer API/plugin fields for metadata; attributes in HTML still work for segments | Yes—in submitted HTML                           |
| Dashboard [Editor](/docs-and-guides/tools/editor) (`manual_segment`)                                                                                                                                                            | No—edit metadata and segments in the Editor                                       | N/A—set voices and pauses in the Editor instead |

Regenerate or re-publish content after adding or changing attributes in your HTML template.

## Attribute reference

Each attribute belongs to a [scope](#attribute-scopes)—global, segment, or document.

| Attribute                                                            | Scope                                 | Purpose                                                                                                    |
| -------------------------------------------------------------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| [`data-beyondwords-title`](#title)                                   | [Global](#global-metadata-attributes) | Content title                                                                                              |
| [`data-beyondwords-author`](#author)                                 | [Global](#global-metadata-attributes) | Author name                                                                                                |
| [`data-beyondwords-publish-date`](#publish-date)                     | [Global](#global-metadata-attributes) | Publish date (ISO 8601)                                                                                    |
| [`data-beyondwords-published`](#published)                           | [Global](#global-metadata-attributes) | Whether content is publicly available                                                                      |
| [`data-beyondwords-ads-enabled`](#ads-enabled)                       | [Global](#global-metadata-attributes) | Whether ads are enabled                                                                                    |
| [`data-beyondwords-title-voice-id`](#title-body-and-summary-voice)   | [Global](#global-metadata-attributes) | Voice for title sections                                                                                   |
| [`data-beyondwords-body-voice-id`](#title-body-and-summary-voice)    | [Global](#global-metadata-attributes) | Voice for body sections                                                                                    |
| [`data-beyondwords-summary-voice-id`](#title-body-and-summary-voice) | [Global](#global-metadata-attributes) | Voice for summary/script sections                                                                          |
| [`data-beyondwords-article-language`](#article-language)             | [Global](#global-metadata-attributes) | Default language for synthesis                                                                             |
| [`data-beyondwords-feature-image`](#feature-image)                   | [Global](#global-metadata-attributes) | Content-level feature image (`true` on an `<img>`)                                                         |
| [`data-beyondwords-voice-id`](#voice-override)                       | [Segment](#segment-attributes)        | Voice override for an element and its descendants                                                          |
| [`data-beyondwords-language`](#language-override)                    | [Segment](#segment-attributes)        | Language override for an element and its descendants                                                       |
| [`data-beyondwords-marker`](#segment-markers)                        | [Segment](#segment-attributes)        | Stable ID for [segment detection](/docs-and-guides/distribution/player/developer-guides/segment-detection) |
| [`data-beyondwords-pause`](#pauses)                                  | [Segment](#segment-attributes)        | Pause duration in seconds (max 3)                                                                          |
| [`data-beyondwords-image`](#image-markers-video)                     | [Segment](#segment-attributes)        | Mark an image for [video](/docs-and-guides/content/video) generation                                       |
| [`data-beyondwords-skip-split-clean-filters`](#skip-content-filters) | [Document](#advanced)                 | Skip [content filters](/docs-and-guides/integrations/content-extraction#filters) for this HTML             |

## Global metadata attributes

Global attributes set **content-item-level** metadata and defaults. See [Attribute scopes](#attribute-scopes) for how they differ from segment and document attributes.

Add them to any element in your HTML—commonly on `<body>`, `<article>`, or a wrapper `<div>`. BeyondWords uses the **first** matching element in the document for each attribute.

These are extracted automatically when BeyondWords fetches a live URL ([Magic Embed](/docs-and-guides/integrations/magic-embed), RSS page extraction). When sending HTML via the API, prefer the API's `title`, `author`, and other metadata fields; use global attributes when page fetch is your ingestion path or you need to override extracted values.

### Title

```html theme={null}
<article data-beyondwords-title="My article title">
  ...
</article>
```

### Author

```html theme={null}
<article data-beyondwords-author="Jane Doe">
  ...
</article>
```

### Publish date

ISO 8601 datetime. If the date is in the future, the player will not load until that time. Include a timezone suffix (`Z` or `+01:00`); if omitted, UTC is assumed.

```html theme={null}
<article data-beyondwords-publish-date="2023-01-01T12:00:00Z">
  ...
</article>
```

### Published

Boolean (`"true"` or `"false"`). If `false`, the player will not load regardless of publish date. Content is still generated and visible in the dashboard.

```html theme={null}
<article data-beyondwords-published="false">
  ...
</article>
```

### Ads enabled

Boolean (`"true"` or `"false"`).

```html theme={null}
<article data-beyondwords-ads-enabled="false">
  ...
</article>
```

### Title, body, and summary voice

Set default voices by section using voice IDs from **Content → Preferences → Voices** in your project dashboard. See [voices](/docs-and-guides/voices/overview).

```html theme={null}
<article
  data-beyondwords-title-voice-id="784"
  data-beyondwords-body-voice-id="2194"
  data-beyondwords-summary-voice-id="2194"
>
  ...
</article>
```

If not specified, project default voices are used.

### Article language

Default synthesis language as a locale code (e.g. `en_GB`, `en_US`). If not specified, the project default language is used.

```html theme={null}
<article data-beyondwords-article-language="en_GB">
  ...
</article>
```

### Feature image

Marks the content-level feature image—used in videos and on shareable play pages. Set `data-beyondwords-feature-image="true"` on the chosen `<img>`. BeyondWords uses the first matching image's `src` (resolved to an absolute URL when possible).

```html theme={null}
<img
  data-beyondwords-feature-image="true"
  src="https://example.com/hero.jpeg"
  alt="Article hero image"
/>
```

This is different from `data-beyondwords-image` (see below), which marks images within the article body for video segments.

## Segment attributes

Segment attributes control **per-segment** behavior—how individual paragraphs, headings, and images are synthesized and identified in the player. See [Attribute scopes](#attribute-scopes) for how they differ from global and document attributes.

Set them on specific HTML elements. Nested elements inherit the nearest ancestor's value for voice and language.

### Voice override

Override the voice for a section using a voice ID. Child elements inherit unless they set their own override.

```html theme={null}
<p data-beyondwords-voice-id="784">
  This paragraph uses voice 784.
</p>

<div data-beyondwords-voice-id="2194">
  <p>This paragraph uses voice 2194.</p>
  <p>So does this one.</p>
</div>
```

### Language override

Override the language for a section using a locale code.

```html theme={null}
<p data-beyondwords-language="en_GB">
  This paragraph is synthesized in British English.
</p>

<p data-beyondwords-language="fr_FR">
  Ce paragraphe est synthétisé en français.
</p>
```

### Segment markers

Markers identify segments on your page for player features such as paragraph highlighting and click-to-play. BeyondWords extracts markers from your HTML during processing; you can also add them manually.

Use stable, unique values—we recommend UUIDs. See [segment detection](/docs-and-guides/distribution/player/developer-guides/segment-detection) for full guidance.

```html theme={null}
<h1 data-beyondwords-marker="1af51b2a-72df-4b86-bb7c-87d057231ca0">
  Article title
</h1>

<p data-beyondwords-marker="5d2c6eba-f612-45c7-b987-00fde473d867">
  First paragraph.
</p>
```

### Pauses

Insert a verbal pause at a specific point in a paragraph. Value is a number in seconds (maximum 3), with up to one decimal place. An optional `s` suffix is accepted (`1.0`, `1.2s`).

```html theme={null}
<p>
  The policy is designed to reduce emissions.
  <span data-beyondwords-pause="1.0"></span>
  In practice, it may do the opposite.
</p>
```

You can also use a `<time>` element instead of `<span>`.

### Image markers (video)

Mark images within the article body for [video](/docs-and-guides/content/video) generation. Unlike `data-beyondwords-feature-image`, this applies per image segment in the article—not the content-level hero image.

```html theme={null}
<img src="https://example.com/chart.png" data-beyondwords-image="true" alt="Sales chart" />
```

Set `data-beyondwords-image="false"` to exclude an image that would otherwise be picked up automatically.

## Advanced

Document-scoped attributes affect **processing of the whole HTML file**, not metadata or individual segments. See [Attribute scopes](#attribute-scopes).

### Skip content filters

Set on the root `<html>` element to bypass [content filters](/docs-and-guides/integrations/content-extraction#filters) for that HTML document. BeyondWords still removes `script`, `style`, and HTML comments.

```html theme={null}
<html data-beyondwords-skip-split-clean-filters="true">
  ...
</html>
```

Use sparingly—only when you need the raw HTML to pass through unchanged by dashboard filters (for example, highly controlled CMS output).

You can also target elements with `data-*` attributes using a [Data content filter](/docs-and-guides/integrations/content-extraction#data-element_data)—for example, `exclude` matches elements with a `data-exclude` attribute.

## Example: Magic Embed page

```html theme={null}
<body
  data-beyondwords-author="Jane Doe"
  data-beyondwords-publish-date="2025-06-01T09:00:00Z"
  data-beyondwords-body-voice-id="2194"
  data-beyondwords-article-language="en_GB"
>
  <article>
    <h1 data-beyondwords-marker="uuid-for-title">My article</h1>

    <img
      data-beyondwords-feature-image="true"
      src="https://example.com/hero.jpg"
      alt="Hero"
    />

    <p data-beyondwords-marker="uuid-for-p1">
      Opening paragraph text.
    </p>

    <p data-beyondwords-marker="uuid-for-p2">
      Second paragraph with a pause.
      <span data-beyondwords-pause="0.5"></span>
      And more text after the pause.
    </p>

    <aside data-exclude="true">
      Newsletter sign-up — excluded via content filter, not read aloud.
    </aside>
  </article>
</body>
```

## FAQs

<AccordionGroup>
  <Accordion title="What is the difference between global and segment attributes?">
    **Global** attributes apply to the whole content item—one title, one author, one feature image, default voices per section type. **Segment** attributes apply to individual parts of the HTML as they are split into audio/video segments—a voice override on one paragraph, a pause mid-sentence, a marker for click-to-play. Global attributes use the first matching element in the document; segment attributes inherit from ancestor elements. See [Attribute scopes](#attribute-scopes).
  </Accordion>

  <Accordion title="Should I use data attributes or the API for metadata?">
    If you control a backend integration, set `title`, `author`, `publish_date`, and similar fields on the [API request](/docs-and-guides/integrations/api-overview) directly—these are **global** metadata fields. Use global data attributes when BeyondWords **fetches** your page (Magic Embed, RSS page extraction) and you need to override what automatic extraction would infer. **Segment** attributes (voice overrides, pauses, markers) have no API equivalent—they belong in your HTML.
  </Accordion>

  <Accordion title="What is the difference between feature-image and image attributes?">
    `data-beyondwords-feature-image="true"` marks the single **content-level** hero image (videos, share pages). `data-beyondwords-image="true"` marks individual **in-article** images as segments for video generation. A page can have one feature image and multiple image markers.
  </Accordion>

  <Accordion title="Do I need to add segment markers manually?">
    Usually not. BeyondWords extracts markers from your HTML during processing and the player uses them for highlighting and click-to-play. Add markers manually if [segment detection](/docs-and-guides/distribution/player/developer-guides/segment-detection) is not working on your site—use stable UUIDs and keep them consistent across page updates.
  </Accordion>

  <Accordion title="Can I use data attributes with content filters?">
    Yes—they solve different problems. [Content filters](/docs-and-guides/integrations/content-extraction#filters) remove whole HTML elements before extraction. Data attributes configure metadata and per-segment behavior on the elements that remain. Use both together for best results.
  </Accordion>

  <Accordion title="Where do I find voice IDs?">
    Go to **Content → Preferences → Voices** in your project dashboard. Each voice has a numeric ID you can copy into `data-beyondwords-*-voice-id` attributes.
  </Accordion>
</AccordionGroup>

## Getting help

If you encounter issues or have questions, [contact support](/docs-and-guides/support/get-support). Include a sample of your HTML and which attributes you have configured.
