feat: optimize docs pages and add 4 new doc articles (en + zh)
- Rewrote DocsListPage and DocDetailPage with landing.css aesthetic (icon cards, skeleton loader, prose styles, CTA box) - Added docs-specific CSS to landing.css - Created image-to-latex, copy-to-word, ocr-accuracy, pdf-extraction articles in both English and Chinese - Updated DocsSeoSection guide cards to link to real doc slugs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
75
content/docs/en/pdf-extraction.md
Normal file
75
content/docs/en/pdf-extraction.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
title: PDF Extraction
|
||||
description: Extract and convert formulas from PDF documents automatically with TexPixel
|
||||
slug: pdf-extraction
|
||||
date: 2026-03-25
|
||||
tags: [PDF, extraction]
|
||||
order: 6
|
||||
---
|
||||
|
||||
# PDF Extraction
|
||||
|
||||
TexPixel can process entire PDF documents and extract every formula from every page — automatically. This is useful for textbooks, research papers, or any multi-page document with mathematical content.
|
||||
|
||||
## How to Extract from a PDF
|
||||
|
||||
1. Click the upload zone or drag and drop your PDF file.
|
||||
2. TexPixel detects all pages and identifies formula regions.
|
||||
3. Each recognized formula is listed in the result panel.
|
||||
4. Copy individual formulas or export the entire document as DOCX.
|
||||
|
||||
## What Gets Extracted
|
||||
|
||||
TexPixel identifies formulas in PDFs regardless of whether they were:
|
||||
- Typeset in LaTeX (rendered as vector math)
|
||||
- Embedded as images (scanned pages)
|
||||
- A mix of both
|
||||
|
||||
For vector PDFs (generated from LaTeX or Word), recognition accuracy is typically 95%+. For scanned/image PDFs, accuracy follows the same image quality guidelines as regular image uploads.
|
||||
|
||||
## Supported PDF Types
|
||||
|
||||
| Type | Description | Accuracy |
|
||||
|---|---|---|
|
||||
| Vector PDF | Created from LaTeX, Word, or typesetting tools | 95–99% |
|
||||
| Scanned PDF (high quality) | 300 DPI scan of printed text | 90–97% |
|
||||
| Scanned PDF (low quality) | < 150 DPI or poor contrast | 60–80% |
|
||||
| Photo PDF | Photographed pages embedded as images | 75–90% |
|
||||
|
||||
## File Limits
|
||||
|
||||
- **Max file size:** 20 MB
|
||||
- **Max pages:** 50 pages per upload (Pro plan: unlimited)
|
||||
- **Processing time:** ~2–5 seconds per page
|
||||
|
||||
For documents exceeding these limits, split the PDF into smaller chunks before uploading.
|
||||
|
||||
## Exporting PDF Results
|
||||
|
||||
After extraction, you can export in several ways:
|
||||
|
||||
- **Copy individual formula** — click any recognized formula to copy its LaTeX
|
||||
- **DOCX export** — download the full document with formulas as native Word equations
|
||||
- **Batch copy** — copy all formulas as a list (Pro feature)
|
||||
|
||||
## Tips for Better PDF Results
|
||||
|
||||
- **Use the original PDF**, not a re-scanned copy — vector PDFs give the best results
|
||||
- **Avoid password-protected PDFs** — these cannot be processed
|
||||
- **Crop pages** if a PDF has wide margins with no content — smaller pages process faster
|
||||
- **Split by chapter** for very large documents to stay within page limits
|
||||
|
||||
## Common Issues
|
||||
|
||||
**"No formulas found"**
|
||||
The PDF may be encrypted, have formulas stored as complex vector paths, or use non-standard encoding. Try converting the page to a PNG image and uploading that instead.
|
||||
|
||||
**Formulas recognized but garbled**
|
||||
This often happens with very low DPI scans. Try using a PDF scanner app to rescan at 300 DPI before uploading.
|
||||
|
||||
**Processing is slow**
|
||||
Large PDFs with many pages can take 30–60 seconds. This is normal. The result will appear when processing is complete.
|
||||
|
||||
---
|
||||
|
||||
[Upload a PDF and extract formulas →](/app)
|
||||
Reference in New Issue
Block a user