Source: HighlightsFormat: JSON

Export PDF Highlights to JSON

The raw data of your highlights. Build your own apps, dashboards, or NLP pipelines with ease.

Drop your PDF here

Securely extract highlights and comments in seconds.

Example Output

preview.json

[

{

"content": "Data privacy regulations have evolved...",
"page": 12,
"author": "Reviewer A"

}

]

For Developers & Data Scientists

Parsing PDF objects is notoriously difficult. We abstract away the complexity of the PDF specification and give you clean, valid JSON.

Rich Metadata Extraction

We don't just give you the text. Our JSON schema includes details that are impossible to get with simple copy-pasting:

Geometry: precise [x, y, w, h] bounding box coordinates (rect) for every highlight. Useful for rendering overlays in web apps.
Color: Exact RGB or Hex values, allowing you to programmatically filter notes.
Sequence: Highlights are ordered by appearance, preserving the narrative flow.

Use this data to feed Natural Language Processing (NLP) models, build custom PDF viewers, or create automated archival scripts in Python or Node.js.

Related Tools

Need more formats?

We support Notion, Obsidian, JSON, CSV, and Text exports.

View All Tools

Frequently Asked Questions

Do you provide coordinates?

Yes. We include the `rect` array for every annotation, ideal for mapping back to the visual PDF.

Is the schema stable?

Yes. We use a strictly typed schema found in our documentation.

Can I use it in Python?

Absolutely. It's standard JSON. Load it with `json.load()` and start analyzing with pandas immediately.