feat: optimize docs pages and add 4 new doc articles (en + zh)

- Rewrote DocsListPage and DocDetailPage with landing.css aesthetic (icon cards, skeleton loader, prose styles, CTA box) - Added docs-specific CSS to landing.css - Created image-to-latex, copy-to-word, ocr-accuracy, pdf-extraction articles in both English and Chinese - Updated DocsSeoSection guide cards to link to real doc slugs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 16:15:22 +08:00
parent dceb775a1b
commit 409bbf742e
14 changed files with 2855 additions and 67 deletions
--- a/content/docs/en/pdf-extraction.md
+++ b/content/docs/en/pdf-extraction.md
@@ -0,0 +1,75 @@
+---
+title: PDF Extraction
+description: Extract and convert formulas from PDF documents automatically with TexPixel
+slug: pdf-extraction
+date: 2026-03-25
+tags: [PDF, extraction]
+order: 6
+---
+
+# PDF Extraction
+
+TexPixel can process entire PDF documents and extract every formula from every page — automatically. This is useful for textbooks, research papers, or any multi-page document with mathematical content.
+
+## How to Extract from a PDF
+
+1. Click the upload zone or drag and drop your PDF file.
+2. TexPixel detects all pages and identifies formula regions.
+3. Each recognized formula is listed in the result panel.
+4. Copy individual formulas or export the entire document as DOCX.
+
+## What Gets Extracted
+
+TexPixel identifies formulas in PDFs regardless of whether they were:
+- Typeset in LaTeX (rendered as vector math)
+- Embedded as images (scanned pages)
+- A mix of both
+
+For vector PDFs (generated from LaTeX or Word), recognition accuracy is typically 95%+. For scanned/image PDFs, accuracy follows the same image quality guidelines as regular image uploads.
+
+## Supported PDF Types
+
+| Type | Description | Accuracy |
+|---|---|---|
+| Vector PDF | Created from LaTeX, Word, or typesetting tools | 95–99% |
+| Scanned PDF (high quality) | 300 DPI scan of printed text | 90–97% |
+| Scanned PDF (low quality) | < 150 DPI or poor contrast | 60–80% |
+| Photo PDF | Photographed pages embedded as images | 75–90% |
+
+## File Limits
+
+- **Max file size:** 20 MB
+- **Max pages:** 50 pages per upload (Pro plan: unlimited)
+- **Processing time:** ~2–5 seconds per page
+
+For documents exceeding these limits, split the PDF into smaller chunks before uploading.
+
+## Exporting PDF Results
+
+After extraction, you can export in several ways:
+
+- **Copy individual formula** — click any recognized formula to copy its LaTeX
+- **DOCX export** — download the full document with formulas as native Word equations
+- **Batch copy** — copy all formulas as a list (Pro feature)
+
+## Tips for Better PDF Results
+
+- **Use the original PDF**, not a re-scanned copy — vector PDFs give the best results
+- **Avoid password-protected PDFs** — these cannot be processed
+- **Crop pages** if a PDF has wide margins with no content — smaller pages process faster
+- **Split by chapter** for very large documents to stay within page limits
+
+## Common Issues
+
+**"No formulas found"**
+The PDF may be encrypted, have formulas stored as complex vector paths, or use non-standard encoding. Try converting the page to a PNG image and uploading that instead.
+
+**Formulas recognized but garbled**
+This often happens with very low DPI scans. Try using a PDF scanner app to rescan at 300 DPI before uploading.
+
+**Processing is slow**
+Large PDFs with many pages can take 30–60 seconds. This is normal. The result will appear when processing is complete.
+
+---
+
+[Upload a PDF and extract formulas →](/app)