Skip to main content
Data attributes let you embed BeyondWords configuration directly in your HTML. Use them to set content metadata (title, author, publish date), override voices and languages for specific paragraphs, mark images for video, add pauses, and improve segment detection. They complement content filters—filters remove whole HTML elements; data attributes configure how remaining content is interpreted and synthesized.
If you send content via the API, you can set many metadata fields (title, author, publish_date, etc.) directly on the request instead of using global data attributes. Global attributes are most useful when BeyondWords fetches a live page (Magic Embed, RSS Feed Importer page extraction).

How it works

BeyondWords reads data-beyondwords-* attributes from your HTML at different stages of processing. Each attribute belongs to one of three scopes—global, segment, or document—which determines what it affects and where you should place it. Attributes must be on valid HTML elements. Plaintext content without HTML tags cannot carry data attributes.

Attribute scopes

Think of the three scopes as three layers:
GlobalSegmentDocument
What it affectsThe content item as a whole—metadata fields and defaults that apply across the articleIndividual segments—paragraphs, headings, and images as they are split from your HTMLHow the entire HTML document is processed before segmentation
When it is readDuring metadata extraction (when BeyondWords fetches a live page)During HTML → segment splitting (auto_segment)Before content filters run on the HTML
How many valuesOne per content item—BeyondWords uses the first matching element in the document for each global attributeMany—each element can have its own value; voice and language inherit from ancestor elementsOne flag on the root <html> element
Typical placement<body>, <article>, or a page wrapper<p>, <h1>, <div>, <img>, inline <span> / <time><html> only
API alternativeYes—set title, author, publish_date, and similar fields on the API request insteadNo direct API field—configure in HTML, or use the Editor for manual_segment contentNo dashboard equivalent—must be in the HTML
Global attributes answer: what is this article? They map to content-item fields—title, author, publish date, whether the player should load, default voices for title/body/summary sections, and the single feature image for videos and share pages. Segment attributes answer: how should this part of the article be synthesized or played back? They attach to specific HTML elements and flow into individual audio/video segments—a different voice for one paragraph, a pause mid-sentence, a marker for click-to-play, or an in-article image for video. Document attributes answer: how should BeyondWords process this HTML file? The only document-scoped attribute today skips dashboard content filters for that HTML—useful when you need the raw markup to pass through unchanged.

Global vs segment: a concrete example

Both scopes can set voices, but they work at different levels:
<article
  data-beyondwords-body-voice-id="100"
  data-beyondwords-title-voice-id="200"
>
  <h1>Article title</h1>
  <!-- title segments use voice 200 (global default for title sections) -->

  <p>First paragraph uses voice 100 (global body default).</p>

  <p data-beyondwords-voice-id="300">
    This paragraph uses voice 300 (segment override).
  </p>
</article>
Similarly, data-beyondwords-feature-image (global) picks one hero image for the content item, while data-beyondwords-image (segment) marks individual images inside the article for video segments.

When to use data attributes

IntegrationGlobal metadata attributesSegment attributes
Magic Embed (live page fetch)Yes—extracted from the pageYes—from editorial HTML on the page
RSS Feed Importer (page fetch enabled)Yes—extracted from fetched article HTMLYes
API / WordPress / Ghost (body HTML, auto_segment)Prefer API/plugin fields for metadata; attributes in HTML still work for segmentsYes—in submitted HTML
Dashboard Editor (manual_segment)No—edit metadata and segments in the EditorN/A—set voices and pauses in the Editor instead
Regenerate or re-publish content after adding or changing attributes in your HTML template.

Attribute reference

Each attribute belongs to a scope—global, segment, or document.
AttributeScopePurpose
data-beyondwords-titleGlobalContent title
data-beyondwords-authorGlobalAuthor name
data-beyondwords-publish-dateGlobalPublish date (ISO 8601)
data-beyondwords-publishedGlobalWhether content is publicly available
data-beyondwords-ads-enabledGlobalWhether ads are enabled
data-beyondwords-title-voice-idGlobalVoice for title sections
data-beyondwords-body-voice-idGlobalVoice for body sections
data-beyondwords-summary-voice-idGlobalVoice for summary/script sections
data-beyondwords-article-languageGlobalDefault language for synthesis
data-beyondwords-feature-imageGlobalContent-level feature image (true on an <img>)
data-beyondwords-voice-idSegmentVoice override for an element and its descendants
data-beyondwords-languageSegmentLanguage override for an element and its descendants
data-beyondwords-markerSegmentStable ID for segment detection
data-beyondwords-pauseSegmentPause duration in seconds (max 3)
data-beyondwords-imageSegmentMark an image for video generation
data-beyondwords-skip-split-clean-filtersDocumentSkip content filters for this HTML

Global metadata attributes

Global attributes set content-item-level metadata and defaults. See Attribute scopes for how they differ from segment and document attributes. Add them to any element in your HTML—commonly on <body>, <article>, or a wrapper <div>. BeyondWords uses the first matching element in the document for each attribute. These are extracted automatically when BeyondWords fetches a live URL (Magic Embed, RSS page extraction). When sending HTML via the API, prefer the API’s title, author, and other metadata fields; use global attributes when page fetch is your ingestion path or you need to override extracted values.

Title

<article data-beyondwords-title="My article title">
  ...
</article>

Author

<article data-beyondwords-author="Jane Doe">
  ...
</article>

Publish date

ISO 8601 datetime. If the date is in the future, the player will not load until that time. Include a timezone suffix (Z or +01:00); if omitted, UTC is assumed.
<article data-beyondwords-publish-date="2023-01-01T12:00:00Z">
  ...
</article>

Published

Boolean ("true" or "false"). If false, the player will not load regardless of publish date. Content is still generated and visible in the dashboard.
<article data-beyondwords-published="false">
  ...
</article>

Ads enabled

Boolean ("true" or "false").
<article data-beyondwords-ads-enabled="false">
  ...
</article>

Title, body, and summary voice

Set default voices by section using voice IDs from Content → Preferences → Voices in your project dashboard. See voices.
<article
  data-beyondwords-title-voice-id="784"
  data-beyondwords-body-voice-id="2194"
  data-beyondwords-summary-voice-id="2194"
>
  ...
</article>
If not specified, project default voices are used.

Article language

Default synthesis language as a locale code (e.g. en_GB, en_US). If not specified, the project default language is used.
<article data-beyondwords-article-language="en_GB">
  ...
</article>

Feature image

Marks the content-level feature image—used in videos and on shareable play pages. Set data-beyondwords-feature-image="true" on the chosen <img>. BeyondWords uses the first matching image’s src (resolved to an absolute URL when possible).
<img
  data-beyondwords-feature-image="true"
  src="https://example.com/hero.jpeg"
  alt="Article hero image"
/>
This is different from data-beyondwords-image (see below), which marks images within the article body for video segments.

Segment attributes

Segment attributes control per-segment behavior—how individual paragraphs, headings, and images are synthesized and identified in the player. See Attribute scopes for how they differ from global and document attributes. Set them on specific HTML elements. Nested elements inherit the nearest ancestor’s value for voice and language.

Voice override

Override the voice for a section using a voice ID. Child elements inherit unless they set their own override.
<p data-beyondwords-voice-id="784">
  This paragraph uses voice 784.
</p>

<div data-beyondwords-voice-id="2194">
  <p>This paragraph uses voice 2194.</p>
  <p>So does this one.</p>
</div>

Language override

Override the language for a section using a locale code.
<p data-beyondwords-language="en_GB">
  This paragraph is synthesized in British English.
</p>

<p data-beyondwords-language="fr_FR">
  Ce paragraphe est synthétisé en français.
</p>

Segment markers

Markers identify segments on your page for player features such as paragraph highlighting and click-to-play. BeyondWords extracts markers from your HTML during processing; you can also add them manually. Use stable, unique values—we recommend UUIDs. See segment detection for full guidance.
<h1 data-beyondwords-marker="1af51b2a-72df-4b86-bb7c-87d057231ca0">
  Article title
</h1>

<p data-beyondwords-marker="5d2c6eba-f612-45c7-b987-00fde473d867">
  First paragraph.
</p>

Pauses

Insert a verbal pause at a specific point in a paragraph. Value is a number in seconds (maximum 3), with up to one decimal place. An optional s suffix is accepted (1.0, 1.2s).
<p>
  The policy is designed to reduce emissions.
  <span data-beyondwords-pause="1.0"></span>
  In practice, it may do the opposite.
</p>
You can also use a <time> element instead of <span>.

Image markers (video)

Mark images within the article body for video generation. Unlike data-beyondwords-feature-image, this applies per image segment in the article—not the content-level hero image.
<img src="https://example.com/chart.png" data-beyondwords-image="true" alt="Sales chart" />
Set data-beyondwords-image="false" to exclude an image that would otherwise be picked up automatically.

Advanced

Document-scoped attributes affect processing of the whole HTML file, not metadata or individual segments. See Attribute scopes.

Skip content filters

Set on the root <html> element to bypass content filters for that HTML document. BeyondWords still removes script, style, and HTML comments.
<html data-beyondwords-skip-split-clean-filters="true">
  ...
</html>
Use sparingly—only when you need the raw HTML to pass through unchanged by dashboard filters (for example, highly controlled CMS output). You can also target elements with data-* attributes using a Data content filter—for example, exclude matches elements with a data-exclude attribute.

Example: Magic Embed page

<body
  data-beyondwords-author="Jane Doe"
  data-beyondwords-publish-date="2025-06-01T09:00:00Z"
  data-beyondwords-body-voice-id="2194"
  data-beyondwords-article-language="en_GB"
>
  <article>
    <h1 data-beyondwords-marker="uuid-for-title">My article</h1>

    <img
      data-beyondwords-feature-image="true"
      src="https://example.com/hero.jpg"
      alt="Hero"
    />

    <p data-beyondwords-marker="uuid-for-p1">
      Opening paragraph text.
    </p>

    <p data-beyondwords-marker="uuid-for-p2">
      Second paragraph with a pause.
      <span data-beyondwords-pause="0.5"></span>
      And more text after the pause.
    </p>

    <aside data-exclude="true">
      Newsletter sign-up — excluded via content filter, not read aloud.
    </aside>
  </article>
</body>

FAQs

Global attributes apply to the whole content item—one title, one author, one feature image, default voices per section type. Segment attributes apply to individual parts of the HTML as they are split into audio/video segments—a voice override on one paragraph, a pause mid-sentence, a marker for click-to-play. Global attributes use the first matching element in the document; segment attributes inherit from ancestor elements. See Attribute scopes.
If you control a backend integration, set title, author, publish_date, and similar fields on the API request directly—these are global metadata fields. Use global data attributes when BeyondWords fetches your page (Magic Embed, RSS page extraction) and you need to override what automatic extraction would infer. Segment attributes (voice overrides, pauses, markers) have no API equivalent—they belong in your HTML.
data-beyondwords-feature-image="true" marks the single content-level hero image (videos, share pages). data-beyondwords-image="true" marks individual in-article images as segments for video generation. A page can have one feature image and multiple image markers.
Usually not. BeyondWords extracts markers from your HTML during processing and the player uses them for highlighting and click-to-play. Add markers manually if segment detection is not working on your site—use stable UUIDs and keep them consistent across page updates.
Yes—they solve different problems. Content filters remove whole HTML elements before extraction. Data attributes configure metadata and per-segment behavior on the elements that remain. Use both together for best results.
Go to Content → Preferences → Voices in your project dashboard. Each voice has a numeric ID you can copy into data-beyondwords-*-voice-id attributes.

Getting help

If you encounter issues or have questions, contact support. Include a sample of your HTML and which attributes you have configured.