Overview
Content extraction is the process BeyondWords uses to pull article text into our platform. This is required for features like Magic Embed, extraction-enabled RSS feeds, and the URL importer.
Extraction mode
Automatic
In this mode, article content is automatically identified and structured using AI. The model recognises key elements on your webpage and outputs a clean, well-formatted article for audio generation. Any content filters you’ve configured are also applied during this process.Recommended for all new projects setting up content extraction.
Manual
In this mode, article content is extracted using only the content filters you configure. This gives you full control over exactly which parts of the article are ingested.Legacy
Content is extracted using a combination of content filters and rule-based heuristics. Unlike Automatic extraction it uses predefined conditions to locate content. This approach works well if the structure of your site is consistent, but it is less flexible than the Automatic mode.This mode is recommended only for customers with existing projects already set up and working with this method.
Request headers
For paywalled or protected content, you may need to provide authentication headers to grant our servers access to your content.- Add a Header Name and Header Value.
- Click + to add multiple headers if needed.
- Ensure the headers grant full access to your content.
Requests will be made with User-Agent: BeyondWords Importer
Static IP
If your website requires IP allowlisting, you may need to enable this option to grant our servers access to your content.- Enable Static IP.
- Ensure your server allows full access to your content.
Requests will be sent from 20.234.8.180 or 176.34.249.78