feat: add 5 new blog posts (en + zh)

- how-ai-reads-math: plain-English explainer of the recognition pipeline - student-workflow: lecture-to-LaTeX workflow for students - pdf-formula-issues: troubleshooting guide for PDF extraction errors - copy-math-to-word: 3 methods for getting formulas into Word, ranked - researcher-workflow: digitizing handwritten research notes at scale Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 16:46:31 +08:00
parent 012748fc3d
commit 76f1bde56d
10 changed files with 702 additions and 0 deletions
--- a/content/blog/en/2026-01-15-how-ai-reads-math.md
+++ b/content/blog/en/2026-01-15-how-ai-reads-math.md
@@ -0,0 +1,51 @@
+---
+title: "How AI Reads Math: Inside TexPixel's Recognition Engine"
+description: A plain-English explanation of how TexPixel turns a photo of a formula into clean LaTeX code
+slug: how-ai-reads-math
+date: 2026-01-15
+tags: [explainer, technology]
+---
+
+# How AI Reads Math: Inside TexPixel's Recognition Engine
+
+When you upload a photo of a handwritten integral and get back clean LaTeX in under a second, it feels like magic. It's not — but the engineering behind it is genuinely interesting. Here's a plain-English explanation of how TexPixel turns pixels into math.
+
+## Step 1: Image Preprocessing
+
+Before any recognition happens, the image is cleaned up. This step matters more than most people realize.
+
+TexPixel normalizes contrast, removes noise, deskews tilted images, and isolates the formula region from surrounding whitespace, printed text, or ruled lines. A formula photographed under harsh side-lighting — or scanned at a slight angle — is corrected before the model ever sees it.
+
+This is why image quality affects accuracy so much: preprocessing can compensate for minor flaws, but severe blur or extremely low resolution (below ~72 DPI) leaves too little information to work with.
+
+## Step 2: Symbol Detection
+
+The preprocessed image is fed into a visual encoder — a neural network that has learned, from millions of math images, what mathematical symbols look like.
+
+The key challenge here isn't recognizing individual symbols in isolation. It's recognizing them *in context*. The symbol `x` looks different when it's a variable, when it's a multiplication sign, and when it's written in different handwriting styles. The model learns to distinguish these from surrounding context: is there a dot nearby? What's the vertical position relative to a fraction bar?
+
+This contextual understanding is what separates a good math OCR system from a general-purpose character recognizer.
+
+## Step 3: Structure Parsing
+
+Recognizing symbols is only half the problem. Math is two-dimensional in a way that ordinary text is not. A fraction has a numerator above a denominator. An integral has limits at the top and bottom. A matrix arranges expressions in rows and columns.
+
+TexPixel's parser builds a structural tree from the detected symbols — understanding that this expression is a subscript of that symbol, and that expression lives inside a square root. This tree is then serialized into LaTeX, where the structural relationships are encoded as commands like `\frac{}{}`, `\sqrt{}`, `\sum_{}^{}`.
+
+## Step 4: LaTeX Generation
+
+The final step is walking the structural tree and emitting valid LaTeX. This includes choosing the right command for ambiguous cases — for example, whether a large `Σ` should be rendered as `\sum` (display math) or `\Sigma` (inline), based on context.
+
+The output is then validated to ensure it compiles without errors before being returned.
+
+## Why Handwriting Is Harder Than Print
+
+Printed math (from textbooks or PDFs) has consistent, high-contrast strokes. Handwriting varies enormously — in size, slant, stroke weight, and letter formation. Two people's handwritten `7` and `1` can look nearly identical, and two people's `β` can look completely different.
+
+TexPixel's model was trained on a large, diverse dataset of handwritten math to handle this variation. But accuracy on handwriting is always lower than on print — typically 88–95% vs. 95–99%. The [tips in our handwriting guide](/blog/handwriting-tips) can push that toward the upper end.
+
+## The Whole Pipeline in One Second
+
+Preprocessing → symbol detection → structure parsing → LaTeX generation: all of this runs in under a second. It's a well-engineered pipeline, not magic — but the speed still surprises most people the first time they try it.
+
+[Upload a formula and see it in action →](/app)
--- a/content/blog/en/2026-02-01-student-workflow.md
+++ b/content/blog/en/2026-02-01-student-workflow.md
@@ -0,0 +1,71 @@
+---
+title: "From Whiteboard to LaTeX in 3 Seconds: A Student's Workflow"
+description: How students use TexPixel to turn lecture notes and homework into clean digital documents without retyping a single formula
+slug: student-workflow
+date: 2026-02-01
+tags: [tutorial, workflow, students]
+---
+
+# From Whiteboard to LaTeX in 3 Seconds: A Student's Workflow
+
+If you've ever spent 20 minutes wrestling with `\underbrace`, `\overset`, or a nested fraction in LaTeX just to transcribe something your professor wrote in 10 seconds on a whiteboard — this workflow is for you.
+
+## The Problem With Retyping
+
+Retyping formulas by hand is slow, error-prone, and interrupts the flow of note-taking. A single misplaced brace breaks compilation. A wrong symbol — `\mu` instead of `\upsilon`, say — can change the meaning entirely. And some constructs, like large piecewise functions or multi-line aligned systems, take real LaTeX expertise to format correctly.
+
+TexPixel removes all of this friction.
+
+## The Workflow
+
+### During the Lecture
+
+Photograph each formula as it appears on the board. Don't worry about perfect framing — a quick phone shot is fine. A 150+ DPI photo taken under decent lighting gives TexPixel everything it needs.
+
+You don't have to process anything during class. Just build up a folder of photos.
+
+### After Class
+
+1. Open TexPixel. Drag and drop the first photo.
+2. In under a second, you get LaTeX output — paste it directly into your Overleaf document or VS Code `.tex` file.
+3. Repeat for each formula.
+
+For a typical lecture with 10–15 formulas, this takes about 2 minutes. Compare that to 20–30 minutes of manual retyping.
+
+### For Homework
+
+When working through problem sets:
+
+1. Solve the problem on paper as you normally would.
+2. Take a photo of your work.
+3. Upload to TexPixel to extract the key formulas.
+4. Paste into your write-up.
+
+This is especially useful for multi-step derivations where you want to show your work digitally.
+
+## Exporting to Word
+
+Not using LaTeX? If your professor requires Word submissions, use TexPixel's DOCX export. It produces native Word equations — not images — so you can still edit them in Word's equation editor after exporting.
+
+## A Real Example
+
+Here's a typical formula from a linear algebra lecture:
+
+$$A = U \Sigma V^T$$
+
+Manual LaTeX: `A = U \Sigma V^T` — straightforward, but you need to know `\Sigma` and `V^T`.
+
+With TexPixel: photograph it, get `A = U \Sigma V^T` in one second, paste. For more complex expressions — a full SVD decomposition with summation notation and indexed entries — the time savings are even more dramatic.
+
+## Tips for Lecture Photography
+
+- **Position yourself centrally** — formulas at the edges of the board get distorted by perspective
+- **Wait for the professor to finish writing** — partial formulas confuse the parser
+- **Avoid flash** — it creates glare and washes out chalk or whiteboard markers
+- **Crop if needed** — if a photo contains multiple formulas, crop before uploading
+
+## Building a Formula Library
+
+Over a semester, you'll accumulate dozens of recognized formulas. Consider organizing them: paste each into a reference `.tex` file with a short comment. By exam time, you'll have a searchable personal formula sheet that took almost no effort to build.
+
+[Start digitizing your notes →](/app)
--- a/content/blog/en/2026-02-15-pdf-formula-issues.md
+++ b/content/blog/en/2026-02-15-pdf-formula-issues.md
@@ -0,0 +1,73 @@
+---
+title: "Why Your PDF Formulas Come Out Wrong (and How to Fix It)"
+description: The most common reasons PDF formula extraction produces errors, and exactly how to fix each one
+slug: pdf-formula-issues
+date: 2026-02-15
+tags: [troubleshooting, PDF, tips]
+---
+
+# Why Your PDF Formulas Come Out Wrong (and How to Fix It)
+
+PDF formula extraction should be simple — upload, get LaTeX, done. But sometimes the output looks garbled, symbols are missing, or the extractor says no formulas were found. Here's a breakdown of the most common causes and how to fix each one.
+
+## Problem 1: The PDF is a Scan
+
+**Symptoms:** Symbols look correct on screen but extraction output is garbage or empty.
+
+**Why it happens:** A scanned PDF is just a collection of images — there's no actual text layer. The text you see in your PDF reader is either from OCR performed at scan time (often poor quality) or from the image itself.
+
+**Fix:** Run TexPixel's image-based pipeline instead. Export individual pages as PNG at 300 DPI using any PDF viewer (File → Export as Image in Preview, or Adobe Acrobat's Export PDF feature), then upload the PNG directly. Image-based recognition handles scans correctly; direct PDF text extraction does not.
+
+## Problem 2: Low-DPI Scan
+
+**Symptoms:** Some symbols recognized correctly, others replaced with wrong characters or dropped entirely.
+
+**Why it happens:** Below about 150 DPI, strokes in small symbols like `\prime`, `\cdot`, or subscript characters become a few pixels wide — too blurry to reliably distinguish.
+
+**Fix:** Rescan at 300 DPI. Most modern flatbed scanners default to 200 DPI; bumping to 300 produces dramatically better results without significantly increasing file size. For phone scans, use a dedicated scanner app (e.g., Adobe Scan, Microsoft Lens) which applies automatic sharpening and perspective correction.
+
+## Problem 3: Password-Protected PDF
+
+**Symptoms:** "No formulas found" or upload fails entirely.
+
+**Why it happens:** Encrypted PDFs require a password to access their content stream. TexPixel cannot process the content of a locked file.
+
+**Fix:** Remove the password protection before uploading. In Preview (Mac), open with the password, then File → Export as PDF — the exported file won't have the password. In Adobe Reader, use File → Print → Save as PDF.
+
+## Problem 4: Formulas Stored as Vector Paths
+
+**Symptoms:** PDF looks perfect, but extraction returns nothing or incorrect text.
+
+**Why it happens:** Some PDF generators (certain Word versions, some online LaTeX renderers) rasterize or vectorize math into paths — the formulas are essentially drawings, not characters. There's no character stream to extract.
+
+**Fix:** Export the page as a high-resolution PNG (300 DPI), then upload as an image. TexPixel's visual recognition pipeline handles vector-rendered formulas well.
+
+## Problem 5: Multi-Column Layout
+
+**Symptoms:** Formulas from two columns are merged or interleaved in the output.
+
+**Why it happens:** PDF text streams don't always encode reading order correctly, especially in two-column academic papers.
+
+**Fix:** Crop to a single column before uploading. Use any image editor to crop the page into left and right halves, then upload each separately.
+
+## Problem 6: Handwritten Annotations
+
+**Symptoms:** Handwritten notes over a printed formula confuse the output.
+
+**Why it happens:** TexPixel sees both the printed formula and the handwritten annotations together. It may try to recognize the annotations as part of the formula.
+
+**Fix:** Crop tightly to just the printed formula, excluding any handwriting around it.
+
+## Quick Diagnostic Checklist
+
+Before uploading a problematic PDF:
+
+- [ ] Is it a scan or a born-digital PDF?
+- [ ] If a scan, what DPI was it scanned at?
+- [ ] Is it password-protected?
+- [ ] Does it have a two-column layout?
+- [ ] Are there handwritten annotations?
+
+Working through this list resolves the issue 90% of the time.
+
+[Upload your PDF →](/app)
--- a/content/blog/en/2026-03-01-copy-math-to-word.md
+++ b/content/blog/en/2026-03-01-copy-math-to-word.md
@@ -0,0 +1,74 @@
+---
+title: "Copy Math to Word Without Losing Formatting — The Right Way"
+description: Three methods for getting recognized formulas into Microsoft Word, ranked by quality and effort
+slug: copy-math-to-word
+date: 2026-03-01
+tags: [tutorial, Word, export]
+---
+
+# Copy Math to Word Without Losing Formatting — The Right Way
+
+Most people's first instinct when they need a formula in a Word document is to take a screenshot. It works — until you need to resize the document, change the font, or edit the formula. Screenshots break. Native equations don't.
+
+Here are three ways to get TexPixel's output into Word, from best to worst.
+
+## Method 1: DOCX Export (Best)
+
+The cleanest option. TexPixel converts your recognized formula into a native Word equation (OMML format) and packages it in a `.docx` file.
+
+**How:**
+1. Upload your formula image to TexPixel.
+2. Click **Export** → select **DOCX**.
+3. Open the downloaded file in Word.
+4. Select the equation, copy, paste into your target document.
+
+**Why it's best:** The formula is fully editable in Word's built-in equation editor. Double-click it to open the editor, change any symbol, resize it — it behaves exactly like an equation you typed yourself. It also scales correctly when you change font sizes.
+
+**Limitation:** Each upload produces one `.docx` file. If you have many formulas to insert, you'll need to repeat the process or batch them (see below).
+
+## Method 2: Paste LaTeX into Word's Equation Editor (Good)
+
+Word 2019+ and Microsoft 365 support pasting LaTeX directly into equations.
+
+**How:**
+1. Get the LaTeX output from TexPixel (e.g., `x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}`).
+2. In Word, insert a new equation: **Insert → Equation** (or press `Alt+=`).
+3. Make sure the equation box is in **LaTeX mode** (click the dropdown on the right side of the equation box → select "LaTeX").
+4. Paste the LaTeX string. Press **Enter** or click outside.
+
+Word converts the LaTeX to a rendered, editable equation.
+
+**Why it's good:** Fast for single formulas. No file download required.
+
+**Limitation:** Word's LaTeX parser doesn't support all LaTeX commands. Obscure or complex expressions may not render correctly. Test before relying on it for important documents.
+
+## Method 3: Image Export (Worst, But Sometimes Necessary)
+
+Export the formula as a PNG and insert it as an image in Word.
+
+**When to use:** Only when you need the formula in a document being shared with someone who doesn't have Word's equation editor (e.g., older Word versions, third-party editors). Or when a complex formula doesn't render correctly via Methods 1 or 2.
+
+**Downsides:** Not editable. Doesn't scale well. Accessibility tools can't read it.
+
+## Handling Multiple Formulas
+
+If you have many formulas to insert into a single document:
+
+1. Upload each formula image and collect the LaTeX strings.
+2. Open a new Word document.
+3. For each formula, use the **Alt+=** method above to insert them in sequence.
+4. Once all formulas are inserted, copy and paste the entire equation block into your target document.
+
+This is faster than one DOCX export per formula.
+
+## Google Docs
+
+Google Docs doesn't natively support LaTeX paste. Options:
+
+- Use the **Auto-LaTeX Equations** Google Docs add-on, which renders LaTeX strings as inline images.
+- Export as DOCX and open in Google Docs (equations import as images, not editable).
+- Use a tool like `mathpix-markdown-it` to convert to Markdown and render in a Markdown-compatible environment.
+
+For serious equation-heavy work, Word or Overleaf remain better choices than Google Docs.
+
+[Export your next formula to Word →](/app)
--- a/content/blog/en/2026-03-08-researcher-workflow.md
+++ b/content/blog/en/2026-03-08-researcher-workflow.md
@@ -0,0 +1,82 @@
+---
+title: "Digitizing a Decade of Research Notes with TexPixel"
+description: How researchers use TexPixel to convert years of handwritten math into searchable, editable LaTeX documents
+slug: researcher-workflow
+date: 2026-03-08
+tags: [workflow, research, tutorial]
+---
+
+# Digitizing a Decade of Research Notes with TexPixel
+
+Researchers accumulate notebooks. Derivations sketched out at conferences, margin notes on printed papers, whiteboard captures from group meetings, half-finished proofs from 3 AM. For most of history, this material was effectively unsearchable — trapped in physical form, accessible only by paging through stacks of notebooks.
+
+TexPixel changes the equation (so to speak).
+
+## The Scope of the Problem
+
+A typical active researcher might accumulate 5–10 filled notebooks per year, each containing hundreds of equations. Digitizing this by hand — retyping each formula in LaTeX — is essentially impossible. At 3 minutes per formula and 50 formulas per notebook, one year's worth of notes would take over 400 hours to transcribe manually.
+
+With TexPixel, each formula takes under 5 seconds from photo to LaTeX. The same year's worth of notes: under 7 hours.
+
+## A Practical Digitization Workflow
+
+### Step 1: Photograph the Notebooks
+
+Use a phone with a good camera and a document scanner app (Adobe Scan, Microsoft Lens, or Apple's built-in document scanner). These apps:
+- Automatically detect page edges
+- Correct perspective distortion
+- Apply contrast enhancement for faded ink or pencil
+- Export to PDF
+
+Scan a full notebook in 15–20 minutes.
+
+### Step 2: Identify Formula-Dense Pages
+
+Not every page needs digitizing. Quickly flip through and flag pages with equations you'll actually need. A single key derivation or set of equations is often worth digitizing even if the surrounding text isn't.
+
+### Step 3: Batch Process with TexPixel
+
+For each flagged page:
+1. Export the page or crop area as a PNG
+2. Upload to TexPixel
+3. Copy the LaTeX output into your notes
+
+For formula-dense pages, consider cropping individual formulas rather than uploading the full page — this gives more accurate results and cleaner output.
+
+### Step 4: Organize into a Reference Document
+
+Create a `.tex` document (or Overleaf project) structured by topic. Paste each extracted formula with a brief comment about its context:
+
+```latex
+% Variational lower bound — from 2022 NeurIPS derivation
+\mathcal{L}(\theta, \phi) = \mathbb{E}_{q_\phi(z|x)}\left[\log p_\theta(x|z)\right] - D_{KL}(q_\phi(z|x) \| p(z))
+```
+
+After a few sessions, you'll have a searchable, compilable reference document that took a fraction of the time of manual transcription.
+
+## Working with Whiteboards
+
+Conference room whiteboards are particularly valuable targets. A single group meeting might produce 20–30 key equations that would otherwise be lost when someone erases the board.
+
+**Best practice:** Photograph the whiteboard before it's erased (obvious) but also photograph intermediate steps — derivations that get overwritten as the discussion progresses. The intermediate steps are often where the insight lives.
+
+For whiteboards:
+- Photograph straight-on, not at an angle
+- Use even lighting — a photo taken with the lights on and no flash usually works better than using flash, which creates glare on glossy boards
+- Crop each distinct equation before uploading
+
+## Working with Printed Papers
+
+For annotated printed papers, TexPixel can extract both the printed formulas and (with somewhat lower accuracy) handwritten margin notes. Crop tightly to the region you need, and upload each formula separately from its annotations.
+
+## Building a Long-Term Knowledge Base
+
+The real value of digitization compounds over time. A well-organized LaTeX reference document from 5 years of notes is something you can:
+- Search with `grep` or your editor's search
+- Cross-reference with a citation manager
+- Share with collaborators
+- Build on directly when writing new papers
+
+Start with the past year's notebooks. The 7-hour investment pays dividends for years.
+
+[Start digitizing your notes →](/app)