Files
doc_ai_frontend/content/blog/en/2026-03-08-researcher-workflow.md
yoge 99e1314bf9 refact: eliminate blog/docs content overlap
- Delete blog/copy-math-to-word (EN+ZH) — identical to docs/copy-to-word
- Rewrite blog/pdf-formula-issues as narrative troubleshooting story;
  operational steps now link out to docs/pdf-extraction
- Add "Further reading" cross-links: 4 docs → relevant blog posts
- Add "See also" cross-links: 3 blog posts → relevant docs

Docs = product reference; Blog = narrative/use cases/opinions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 16:52:27 +08:00

4.1 KiB
Raw Blame History

title, description, slug, date, tags
title description slug date tags
Digitizing a Decade of Research Notes with TexPixel How researchers use TexPixel to convert years of handwritten math into searchable, editable LaTeX documents researcher-workflow 2026-03-08
workflow
research
tutorial

Digitizing a Decade of Research Notes with TexPixel

Researchers accumulate notebooks. Derivations sketched out at conferences, margin notes on printed papers, whiteboard captures from group meetings, half-finished proofs from 3 AM. For most of history, this material was effectively unsearchable — trapped in physical form, accessible only by paging through stacks of notebooks.

TexPixel changes the equation (so to speak).

The Scope of the Problem

A typical active researcher might accumulate 510 filled notebooks per year, each containing hundreds of equations. Digitizing this by hand — retyping each formula in LaTeX — is essentially impossible. At 3 minutes per formula and 50 formulas per notebook, one year's worth of notes would take over 400 hours to transcribe manually.

With TexPixel, each formula takes under 5 seconds from photo to LaTeX. The same year's worth of notes: under 7 hours.

A Practical Digitization Workflow

Step 1: Photograph the Notebooks

Use a phone with a good camera and a document scanner app (Adobe Scan, Microsoft Lens, or Apple's built-in document scanner). These apps:

  • Automatically detect page edges
  • Correct perspective distortion
  • Apply contrast enhancement for faded ink or pencil
  • Export to PDF

Scan a full notebook in 1520 minutes.

Step 2: Identify Formula-Dense Pages

Not every page needs digitizing. Quickly flip through and flag pages with equations you'll actually need. A single key derivation or set of equations is often worth digitizing even if the surrounding text isn't.

Step 3: Batch Process with TexPixel

For each flagged page:

  1. Export the page or crop area as a PNG
  2. Upload to TexPixel
  3. Copy the LaTeX output into your notes

For formula-dense pages, consider cropping individual formulas rather than uploading the full page — this gives more accurate results and cleaner output.

Step 4: Organize into a Reference Document

Create a .tex document (or Overleaf project) structured by topic. Paste each extracted formula with a brief comment about its context:

% Variational lower bound — from 2022 NeurIPS derivation
\mathcal{L}(\theta, \phi) = \mathbb{E}_{q_\phi(z|x)}\left[\log p_\theta(x|z)\right] - D_{KL}(q_\phi(z|x) \| p(z))

After a few sessions, you'll have a searchable, compilable reference document that took a fraction of the time of manual transcription.

Working with Whiteboards

Conference room whiteboards are particularly valuable targets. A single group meeting might produce 2030 key equations that would otherwise be lost when someone erases the board.

Best practice: Photograph the whiteboard before it's erased (obvious) but also photograph intermediate steps — derivations that get overwritten as the discussion progresses. The intermediate steps are often where the insight lives.

For whiteboards:

  • Photograph straight-on, not at an angle
  • Use even lighting — a photo taken with the lights on and no flash usually works better than using flash, which creates glare on glossy boards
  • Crop each distinct equation before uploading

Working with Printed Papers

For annotated printed papers, TexPixel can extract both the printed formulas and (with somewhat lower accuracy) handwritten margin notes. Crop tightly to the region you need, and upload each formula separately from its annotations.

Building a Long-Term Knowledge Base

The real value of digitization compounds over time. A well-organized LaTeX reference document from 5 years of notes is something you can:

  • Search with grep or your editor's search
  • Cross-reference with a citation manager
  • Share with collaborators
  • Build on directly when writing new papers

Start with the past year's notebooks. The 7-hour investment pays dividends for years.

See also: For PDF file limits, supported types, and export options, see the PDF Extraction documentation →

Start digitizing your notes →