Files

yoge 99e1314bf9 refact: eliminate blog/docs content overlap

- Delete blog/copy-math-to-word (EN+ZH) — identical to docs/copy-to-word
- Rewrite blog/pdf-formula-issues as narrative troubleshooting story;
  operational steps now link out to docs/pdf-extraction
- Add "Further reading" cross-links: 4 docs → relevant blog posts
- Add "See also" cross-links: 3 blog posts → relevant docs

Docs = product reference; Blog = narrative/use cases/opinions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-26 16:52:27 +08:00

3.0 KiB

Raw Blame History

title, description, slug, date, tags, order

title

description

slug

date

PDF Extraction

TexPixel can process entire PDF documents and extract every formula from every page — automatically. This is useful for textbooks, research papers, or any multi-page document with mathematical content.

How to Extract from a PDF

Click the upload zone or drag and drop your PDF file.
TexPixel detects all pages and identifies formula regions.
Each recognized formula is listed in the result panel.
Copy individual formulas or export the entire document as DOCX.

What Gets Extracted

TexPixel identifies formulas in PDFs regardless of whether they were:

Typeset in LaTeX (rendered as vector math)
Embedded as images (scanned pages)
A mix of both

For vector PDFs (generated from LaTeX or Word), recognition accuracy is typically 95%+. For scanned/image PDFs, accuracy follows the same image quality guidelines as regular image uploads.

Supported PDF Types

Type	Description	Accuracy
Vector PDF	Created from LaTeX, Word, or typesetting tools	95–99%
Scanned PDF (high quality)	300 DPI scan of printed text	90–97%
Scanned PDF (low quality)	< 150 DPI or poor contrast	60–80%
Photo PDF	Photographed pages embedded as images	75–90%

File Limits

Max file size: 20 MB
Max pages: 50 pages per upload (Pro plan: unlimited)
Processing time: ~2–5 seconds per page

For documents exceeding these limits, split the PDF into smaller chunks before uploading.

Exporting PDF Results

After extraction, you can export in several ways:

Copy individual formula — click any recognized formula to copy its LaTeX
DOCX export — download the full document with formulas as native Word equations
Batch copy — copy all formulas as a list (Pro feature)

Tips for Better PDF Results

Use the original PDF, not a re-scanned copy — vector PDFs give the best results
Avoid password-protected PDFs — these cannot be processed
Crop pages if a PDF has wide margins with no content — smaller pages process faster
Split by chapter for very large documents to stay within page limits

Common Issues

"No formulas found" The PDF may be encrypted, have formulas stored as complex vector paths, or use non-standard encoding. Try converting the page to a PNG image and uploading that instead.

Formulas recognized but garbled This often happens with very low DPI scans. Try using a PDF scanner app to rescan at 300 DPI before uploading.

Processing is slow Large PDFs with many pages can take 30–60 seconds. This is normal. The result will appear when processing is complete.

Further reading: I tried to extract formulas from my professor's PDF — real-world troubleshooting →

Upload a PDF and extract formulas →

3.0 KiB Raw Blame History Unescape Escape