Skip to content

Content Extraction

The extension uses content scripts to extract page content before sending it to the LoomBrain API.

  1. Type detection — URL patterns determine the content type (tweet, video, repo, article)
  2. DOM extraction — A content script runs on the page to extract relevant HTML
  3. Sanitization — HTML is sanitized in an offscreen document to remove scripts and tracking
  4. Size check — If sanitized content exceeds 5MB, falls back to URL-only
  5. API submission — Content is posted to the captures API with metadata

Targets the [data-testid="tweet"] element on Twitter/X. If the element isn’t found, falls back to URL-only capture.

No HTML extraction. The URL is sent to the server, which fetches the transcript and video metadata directly from YouTube.

Extracts the README element ([data-testid="readme"] or #readme article) along with repository description and topics metadata. Falls back to URL-only if README isn’t found.

Captures the full document.documentElement.outerHTML. This is the fallback for any URL that doesn’t match tweet, video, or repo patterns.

All extracted HTML passes through sanitization in a secure offscreen document context. This removes:

  • Script tags and event handlers
  • Common tracking and analytics elements
  • Other potentially dangerous content

The sanitized HTML is what gets sent to the API. The original page is never modified.

LimitValue
Max content size5 MB
Extraction timeout10 seconds

If either limit is exceeded, the extension sends only the URL. The server then fetches and processes the content independently.

The API scans capture fields (title, why, selected_text, URL) for prompt injection attempts. If detected, the capture is flagged for review but still processed.