170 Commits
v1.0.1 ... main

Author SHA1 Message Date
6813f3d4f7 fix: dockerfile
Some checks failed
Sphinx: Render docs / build (push) Has been cancelled
Python Linting / lint (push) Has been cancelled
Run Tests with Pytest / test (push) Has been cancelled
2025-12-15 23:21:47 +08:00
ba0968b2da feat: add dockerfile
Some checks failed
Sphinx: Render docs / build (push) Has been cancelled
Python Linting / lint (push) Has been cancelled
Run Tests with Pytest / test (push) Has been cancelled
2025-12-15 22:31:13 +08:00
OleehyO
9b88cec77b Update 2025-08-22 21:45:41 +08:00
OleehyO
154c8fcab5 📝 [docs] Update README with TexTeller 3.0 technical report and dataset release
- Added technical report and dataset release announcements to changelog
- Updated both English and Chinese README files
- Reordered Docker badge in Chinese README for consistency

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-22 15:26:47 +08:00
OleehyO
30f7e93c49 📝 [docs] Update README badges and branding consistency
Add arXiv paper badge, fix TexTeller3.0 capitalization, and update documentation links for improved consistency.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-14 22:41:35 +08:00
OleehyO
4f88499de5 🔧 [chore] Replace pre-commit with ruff for linting workflow
- Update CI workflow to use ruff instead of pre-commit
- Remove E999 from ruff ignore rules in pyproject.toml

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-14 22:34:42 +08:00
OleehyO
bfe070f976 📦️ [chore] Update project for TexTeller 3.0 release
- Update dataset references from TexTeller 1.0 to 3.0 in README files
- Add paper.pdf to assets directory
- Configure pre-commit to exclude assets/ from large file checks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-13 22:01:17 +08:00
OleehyO
af56271e1c 🧑 [chore] Add Claude Code configuration for Git workflow automation
Add Claude agents and commands to enhance developer experience:
- commit-crafter agent for standardized conventional commits
- staged-code-reviewer agent for automated code review
- Commands for code review, GitHub issue fixing, and commit creation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-13 21:59:12 +08:00
三洋三洋
30f88d55ac Upload compare 2025-04-23 22:21:40 +08:00
OleehyO
3d430735a4 [docs] Using uv to install deps 2025-04-23 10:40:12 +00:00
OleehyO
184c890437 [chore] Correct file url 2025-04-23 10:39:24 +00:00
OleehyO
c758dc277b [deps] Pin transformers to 4.47 2025-04-21 12:24:03 +00:00
OleehyO
0ab938aad4 [chore] Setup deps for doc build 2025-04-21 12:24:00 +00:00
OleehyO
90e16fd868 [chore] Update 2025-04-21 12:24:00 +00:00
OleehyO
cab9d664f2 [CD] Add documentation auto-deployment 2025-04-21 12:23:56 +00:00
OleehyO
3f930fdaaf [deps] Add sphnix extension deps 2025-04-21 08:38:06 +00:00
OleehyO
324ab8a03f [docs] Fix typo 2025-04-21 08:21:16 +00:00
OleehyO
a62600f384 [chore] Change logo font 2025-04-21 08:20:16 +00:00
OleehyO
511f69555c 🔧 Fix all ruff typo errors & test CI/CD workflow (#109)
* [chore] Fix ruff typo

* [robot] Fix welcome robot
2025-04-21 13:52:16 +08:00
OleehyO
ae776aa9c7 [CI] Fix deps installation 2025-04-21 05:17:12 +00:00
OleehyO
d46be980ee [CD] Change trigger condition 2025-04-21 05:12:38 +00:00
OleehyO
1201c67237 [chore] Update README_zh.md 2025-04-21 05:11:47 +00:00
OleehyO
d5938c6a2a [deps] Grouped deps & setup vcs 2025-04-21 04:48:29 +00:00
OleehyO
9a388cdfc5 [chore] Update README.md 2025-04-21 04:47:53 +00:00
OleehyO
59bc9bdd41 [CI/CD] Setup complete workflow 2025-04-21 03:00:06 +00:00
OleehyO
7490fa9c5a [chore] Setup vcs and deps 2025-04-21 02:41:46 +00:00
OleehyO
5cf9960a7c [chore] Ignore images 2025-04-21 02:41:06 +00:00
OleehyO
9006edb949 [docs] Set up documentation structure with API reference 2025-04-21 02:38:36 +00:00
OleehyO
d6c659d576 Upload logo.svg 2025-04-21 02:37:28 +00:00
OleehyO
ff02336007 [feat] Support dynamic package vcs 2025-04-21 02:36:13 +00:00
OleehyO
789006894c [docs] Add comprehensive function documentation 2025-04-21 02:34:56 +00:00
OleehyO
2c9ce6b6c1 Add globals test 2025-04-21 02:32:05 +00:00
OleehyO
57b757c0f0 [test] Init 2025-04-19 16:36:48 +00:00
OleehyO
a7a296025a [feat] Add texteller training script 2025-04-19 16:36:43 +00:00
OleehyO
991d6bc00d [CI] Update ruff hook 2025-04-19 14:32:32 +00:00
OleehyO
06edd104e2 [refactor] Init 2025-04-19 14:32:28 +00:00
OleehyO
0e32f3f3bf [chore] Cleanup 2025-04-17 07:08:47 +00:00
OleehyO
6bd68ad3b7 [feat] Support n-gram stop criteria 2025-04-02 03:23:27 +00:00
OleehyO
aae7af445f [deps] Change onnx-gpu to manually install 2025-04-02 02:48:23 +00:00
三洋三洋
38e7c6293f [feat][formatter] Integrate LaTeX formatter for improved formula readability
- Add latex_formatter.py based on tex-fmt (https://github.com/WGUNDERWOOD/tex-fmt)
- Update to_katex.py to use the new formatter
- Enhance LaTeX formula output with better formatting and readability

This integration helps make generated LaTeX formulas more readable and
maintainable by applying consistent formatting rules.
2025-03-01 00:55:41 +08:00
三洋三洋
192e8d6352 [chore] Ignore ruff lint E741 2025-03-01 00:54:57 +08:00
三洋三洋
110cb29d6c [fix] Add project prefix 2025-02-28 23:38:12 +08:00
三洋三洋
abd6057378 [feat] Remove bold style 2025-02-28 23:38:12 +08:00
三洋三洋
e214b508d2 [deps] Add ray serve & python-multipart 2025-02-28 23:37:53 +08:00
三洋三洋
de9deacaf2 [chore] Add build system and pakage location 2025-02-28 23:18:06 +08:00
三洋三洋
cd0f397f20 [chore] Add python related rules 2025-02-28 23:18:03 +08:00
三洋三洋
5668a2e26c [chore] Remove unsed files 2025-02-28 20:54:51 +08:00
三洋三洋
3d546f9993 [chore] exclude paddleocr directory from pre-commit hooks 2025-02-28 20:01:54 +08:00
三洋三洋
a8a005ae10 [chore] Setup project infrastructure 2025-02-28 20:01:52 +08:00
三洋三洋
52fce4d39d [deps] pin transformers to 4.45.2 and sentence-transformers to 3.1.1 2025-02-01 13:00:44 +08:00
OleehyO
b8100517c6 Merge pull request #78 from OleehyO/pre_release
Change to better import dependency
2024-08-07 12:43:15 +08:00
三洋三洋
06701415cc Change to better import dependency 2024-08-07 01:19:26 +08:00
OleehyO
c6eb1b6ea2 Merge pull request #67 from OleehyO/pre_release
Change setting name
2024-07-11 20:34:50 +08:00
三洋三洋
1b685054c9 Change setting name 2024-07-11 20:33:51 +08:00
OleehyO
c835cedcf5 Merge pull request #60 from OleehyO/pre_release
Pre release
2024-06-23 22:16:09 +08:00
三洋三洋
9f3a46e8a9 Update README 2024-06-23 22:14:05 +08:00
三洋三洋
569c72ffe3 Remove onnxruntime-gpu 2024-06-23 22:13:51 +08:00
OleehyO
b4f70a09e0 Merge pull request #59 from OleehyO/pre_release
Pre release
2024-06-22 23:56:45 +08:00
三洋三洋
36a2680d28 Update model config 2024-06-22 22:08:08 +08:00
三洋三洋
c5e859517a Update README 2024-06-22 22:00:14 +08:00
三洋三洋
9638c0030d Support onnx runtime 2024-06-22 22:00:05 +08:00
三洋三洋
8da3fd7418 Add optimum 2024-06-22 21:49:47 +08:00
OleehyO
fb6784b535 Merge pull request #58 from OleehyO/pre_release
Add formula detection service
2024-06-17 21:26:35 +08:00
三洋三洋
76eeb18b83 Add formula detection service 2024-06-17 21:23:55 +08:00
OleehyO
e2d0e91a77 Merge pull request #56 from OleehyO/pre_release
Add docker link
2024-06-11 13:22:17 +08:00
三洋三洋
0d5cd9a75d Add docker link 2024-06-11 13:20:32 +08:00
三洋三洋
624f9531b4 Update server.py
1. Change the default host address to 0.0.0.0.
2. Convert the output to KaTeX.
2024-06-07 12:26:24 +00:00
三洋三洋
aa14674097 Update README 2024-06-07 06:54:23 +00:00
三洋三洋
a7044e0369 Add Apache2.0 license 2024-06-06 13:06:16 +00:00
三洋三洋
837cb6021f Add cover.png 2024-06-06 13:06:16 +00:00
三洋三洋
354833aac8 Modify the names of options in the web.py
Formula only       -> Formula recognition
Text formula mixed -> Paragraph recognition

Improved display during mixed inference
2024-06-06 13:06:16 +00:00
三洋三洋
760bd78c10 Refine mix_inference
1. Add the formula number back to the isolated formula and merge multiple tag.
2. remove bold effect from inline formuals
3. change split environment into aligned
2024-06-06 13:06:11 +00:00
三洋三洋
c0e730f697 Bugfix: to_katex.py
1. Added `change_all` function to fix a bug where some LaTeX formulas with the same wrapper were causing issues.
2. Removed some unnecessary formatting commands.

Bugfix: to_katex.py
2024-06-06 08:25:50 +00:00
三洋三洋
7aad0839c4 Update 2024-05-28 09:51:53 +00:00
三洋三洋
5420e92cc4 Added releasing file 2024-05-28 07:50:09 +00:00
三洋三洋
89aa396cbb Change the model configuration to trocr 2024-05-28 07:50:09 +00:00
三洋三洋
9b11689f22 Using paddleocr with onnxruntime
Deleted the code for test time.
2024-05-28 07:50:09 +00:00
三洋三洋
85d558f772 Added mixed recognition
change suryaocr to paddleocr
2024-05-28 07:50:08 +00:00
三洋三洋
2af1e067c1 Added ONNX file for PaddleOCR model 2024-05-28 07:50:08 +00:00
三洋三洋
6b852d561d Update .gitignore 2024-05-28 07:50:08 +00:00
三洋三洋
e193fe3798 Added code for PaddleOCR inference 2024-05-28 07:50:08 +00:00
三洋三洋
714fef4def Eliminated dependency on paddleocr
Change to trocr
2024-05-28 07:50:08 +00:00
三洋三洋
edef073812 update 2024-05-28 07:50:08 +00:00
OleehyO
1b8f6ba0b6 bugfix: ocr_aug.py
Change "lhy_custom" in ink_swap_color to "random"
2024-05-28 07:49:55 +00:00
三洋三洋
a27cf716ee bugfix: missing filter_fn and inference/train transform 2024-05-12 07:49:04 +00:00
三洋三洋
8557e81374 update 2024-05-12 07:47:35 +00:00
三洋三洋
10e22259a2 update 2024-05-10 03:48:31 +00:00
TonyLee1256
9875fedb1b Update requirements.txt 2024-05-09 00:23:32 +08:00
TonyLee1256
83da4262fd Update mix_inference.py
替换文本OCR模型为paddleocr
2024-05-09 00:23:02 +08:00
TonyLee1256
bd2aaa3e00 Update inference.py
替换文本OCR模型为paddleocr
2024-05-09 00:22:01 +08:00
TonyLee1256
fe7e4a7af0 Update inference.py
增加了计时功能
2024-05-09 00:20:32 +08:00
TonyLee1256
48043d11e3 Update infer_det.py
增加使用gpu进行onnx模型推理的功能
2024-05-09 00:19:39 +08:00
三洋三洋
e495640690 bugfix 2024-05-08 14:34:01 +00:00
三洋三洋
84fa43321f Added Language option in mixed mode 2024-05-07 07:44:24 +00:00
三洋三洋
b116dfae55 Update README 2024-05-07 07:30:29 +00:00
三洋三洋
85b22ff9c7 bugfix 2024-05-07 07:11:34 +00:00
三洋三洋
42959cd6a5 Add train_config.yaml 2024-05-07 07:11:05 +00:00
三洋三洋
4c182aecda update .gitignore 2024-05-07 06:54:53 +00:00
TonyLee1256
d2c1e5e10f bugfix inference.py 2024-05-07 13:28:07 +08:00
TonyLee1256
c5dd0dacd8 Update README_zh.md 2024-05-07 13:27:23 +08:00
TonyLee1256
8981df6bc9 Update README.md 2024-05-07 13:26:50 +08:00
TonyLee1256
bb0594815a Update README.md 2024-05-07 13:25:28 +08:00
TonyLee1256
8c85575260 bugfix inference.py 2024-05-07 13:19:43 +08:00
三洋三洋
7c5a547b1f update 2024-05-02 09:10:21 +00:00
三洋三洋
c6e6622aaf Merge remote-tracking branch 'origin/pre_release' into pre_release 2024-04-21 16:13:49 +00:00
三洋三洋
8fa462b434 update README.md 2024-04-21 16:13:45 +00:00
TonyLee1256
1a7939190f Update rec_infer_from_crop_imgs.py 2024-04-22 00:08:36 +08:00
TonyLee1256
0bb11bebfc Update infer_det.py 2024-04-22 00:07:41 +08:00
TonyLee1256
be19ed8d63 Update README.md 2024-04-21 22:14:23 +08:00
TonyLee1256
0079c07be2 Update README.md 2024-04-21 22:12:22 +08:00
TonyLee1256
b3dd73c716 Update README_zh.md 2024-04-21 22:09:58 +08:00
三洋三洋
188ab88e07 Merge branch 'dev' into pre_release 2024-04-21 13:14:49 +00:00
三洋三洋
9018c62f66 Update README.md 2024-04-21 13:06:01 +00:00
三洋三洋
5cbbfb38d6 1) 修复了to_katex.py的bug; 2)把Box.py中的转化结果写在logs 2024-04-21 12:09:26 +00:00
三洋三洋
11df230200 merge dev后调整了项目结构 2024-04-21 00:48:24 +08:00
三洋三洋
e6dca76123 merge dev后删除了resizer 2024-04-21 00:13:21 +08:00
三洋三洋
185b2e3db6 1) 实现了文本-公式混排识别; 2) 重构了项目结构 2024-04-21 00:05:14 +08:00
三洋三洋
eab6e4c85d update infer_det.py 2024-04-18 00:06:05 +08:00
三洋三洋
48f778eeda 为了支持mixed inference, 重构了目录 2024-04-17 15:24:06 +00:00
三洋三洋
7883d3c07f 修复了merge pre_release分支后导致参数名不一致的bug 2024-04-17 14:47:58 +00:00
三洋三洋
a064b7dbb0 Merge branch 'pre_release' into dev 2024-04-17 10:32:22 +00:00
三洋三洋
f81a31a8c9 checkpoint 2024-04-17 10:20:15 +00:00
三洋三洋
ec3e744376 update README.md 2024-04-17 10:08:46 +00:00
三洋三洋
3cebc2eb2a 前端更新, inference.py更新
1) 前端支持剪贴板粘贴图片.
2) 前端支持模型配置.
3) 修改了inference.py的接口.
4) 删除了不必要的文件
2024-04-17 09:36:40 +00:00
三洋三洋
66d4902871 add contributor 2024-04-12 07:29:36 +00:00
三洋三洋
78d29d49ef update README 2024-04-12 06:16:37 +00:00
三洋三洋
7d1d8ddd77 work in progress 2024-04-12 03:20:04 +00:00
OleehyO
9e8b15ef3a Merge pull request #14 from TonyLee1256/pre_release
新增公式检测模块
2024-04-12 00:46:45 +08:00
TonyLee1256
9e8ac666b0 新增公式检测模块 2024-04-11 16:44:19 +00:00
三洋三洋
1538cb73f8 修改了transforms.py中inference_transform的bug: 在训练的eval阶段没有把png图片转化为np.ndarray 2024-04-11 07:04:58 +00:00
三洋三洋
762012be1f 优化了transform.py中的trim_white_border 2024-04-10 16:09:13 +00:00
三洋三洋
1589fb3217 增加了数据增强的概率 2024-04-09 13:50:35 +00:00
三洋三洋
1db514bdbf inference.py支持katex语法 2024-04-06 12:06:08 +00:00
三洋三洋
840be6b843 update README.md 2024-04-06 11:57:50 +00:00
三洋三洋
93fc22adf5 inference.py支持katex 2024-04-06 11:38:59 +00:00
三洋三洋
8d6d889efa update README.md 2024-04-06 07:43:03 +00:00
三洋三洋
ecd5481bea web demo支持katex, 不再需要本地安装xelatex渲染器 2024-04-06 07:28:46 +00:00
三洋三洋
b5f7166e58 web demo加入了katex支持, 不再需要本地安装xelatex渲染器 2024-04-06 07:18:40 +00:00
三洋三洋
c9c15d27bd inference_transform bugfix 2024-04-06 05:09:50 +00:00
三洋三洋
87ddb86e5e 完成了v3版本:加入自然场景的数据增强 2024-04-05 08:11:06 +00:00
三洋三洋
a4e878da96 Merge remote-tracking branch 'origin/dev' into dev 2024-04-05 08:00:11 +00:00
三洋三洋
70dce92e19 Merge remote-tracking branch 'origin/dev' into dev 2024-04-05 07:52:40 +00:00
三洋三洋
e16f46e856 修改了v3(支持自然场景、混合文字场景识别)版本的inference.py模版 2024-04-05 07:27:07 +00:00
三洋三洋
67426c439f update README.md 2024-04-05 05:19:27 +00:00
三洋三洋
d2090c0d61 Merge remote-tracking branch 'origin/dev' into dev 2024-03-28 14:33:46 +00:00
三洋三洋
5a259065a4 merge v3_nature_scence 2024-03-28 14:33:25 +00:00
三洋三洋
8d94611aba merge v3_nature_scence 2024-03-28 14:22:23 +00:00
三洋三洋
a6a5d07430 Merge remote-tracking branch 'origin/dev' into dev 2024-03-28 13:28:47 +00:00
三洋三洋
63b8e04dab TexTellerv2 release 2024-03-25 13:22:11 +00:00
OleehyO
14b637cd6b Update README_zh.md 2024-03-25 16:35:34 +08:00
OleehyO
86443d0cf7 Update README_zh.md 2024-03-25 16:35:34 +08:00
OleehyO
88d2730752 Update README.md 2024-03-25 16:34:46 +08:00
三洋三洋
3f4b3c9645 update 2024-03-25 08:32:17 +00:00
三洋三洋
5e191ff0fe update 2024-03-25 07:53:11 +00:00
三洋三洋
9c3bb1c22a update mp4 2024-03-25 07:32:33 +00:00
三洋三洋
ef218d67f6 TexTeller v2 2024-03-25 07:11:10 +00:00
三洋三洋
74341c7e8a update 2024-03-19 14:43:03 +00:00
三洋三洋
5d089b5a7f update 2024-03-03 12:09:14 +08:00
三洋三洋
d9ee6b0d9e update 2024-03-01 22:42:15 +08:00
三洋三洋
2d21d2d215 update 2024-02-27 07:44:35 +00:00
三洋三洋
3527a4af47 updated API usage (supports remote calls) 2024-02-27 07:13:36 +00:00
三洋三洋
b4537944d0 Update README_zh.md 2024-02-12 16:33:49 +00:00
三洋三洋
72a60f8611 Update README 2024-02-12 16:27:58 +00:00
三洋三洋
3683623925 Update README_zh.md 2024-02-12 15:02:31 +00:00
三洋三洋
94b0781d84 Update README 2024-02-12 11:46:26 +00:00
三洋三洋
9bc165f955 Update files 2024-02-12 11:40:51 +00:00
三洋三洋
fa6bcda721 update README 2024-02-12 08:44:45 +00:00
三洋三洋
6e2e45a8d6 update README 2024-02-12 08:41:33 +00:00
三洋三洋
b4962bfa98 Initial commit 2024-02-11 10:44:42 +00:00
三洋三洋
f057490bdb Initial commit 2024-02-11 09:14:40 +00:00
165 changed files with 15013 additions and 125911 deletions

View File

@@ -0,0 +1,164 @@
---
name: commit-crafter
description: Expertly creates clean, conventional, and atomic Git commits with pre-commit checks.
---
You are an expert Git assistant. Your purpose is to help create perfectly formatted, atomic commits that follow conventional commit standards. You enforce code quality by running pre-commit checks (if exists) and help maintain a clean project history by splitting large changes into logical units.
## Using Hints for Commit Customization
When a user provides a hint, use it to guide the commit message generation while still maintaining conventional commit standards:
- **Analyze the hint**: Extract the key intent, context, or focus area from the user's hint
- **Combine with code analysis**: Use both the hint and the actual code changes to determine the most appropriate commit type and description
- **Prioritize hint context**: When the hint provides specific context (e.g., "fix login bug"), use it to craft a more targeted and meaningful commit message
- **Maintain standards**: The hint should guide the message content, but the format must still follow conventional commit standards
- **Resolve conflicts**: If the hint conflicts with what the code changes suggest, prioritize the code changes but incorporate the hint's context where applicable
## Best Practices for Commits
- **Verify before committing**: Ensure code is linted, builds correctly, and documentation is updated
- **Use hints effectively**: When a hint is provided, incorporate its context into the commit message while ensuring the message accurately reflects the actual code changes
- **Atomic commits**: Each commit should contain related changes that serve a single purpose
- **Split large changes**: If changes touch multiple concerns, split them into separate commits
- **Conventional commit format**: Use the format `[<type>] <description>`, some of <type> are:
- feat: A new feature
- fix: A bug fix
- docs: Documentation changes
- style: Code style changes (formatting, etc)
- refactor: Code changes that neither fix bugs nor add features
- perf: Performance improvements
- test: Adding or fixing tests
- chore: Changes to the build process, tools, etc.
- **Present tense, imperative mood**: Write commit messages as commands (e.g., "add feature" not "added feature")
- **Concise first line**: Keep the first line under 72 characters
- **Emoji**: Each commit type is paired with an appropriate emoji:
- ✨ [feat] New feature
- 🐛 [fix] Bug fix
- 📝 [docs] Documentation
- 💄 [style] Formatting/style
- ♻️ [refactor] Code refactoring
- ⚡️ [perf] Performance improvements
- ✅ [test] Tests
- 🔧 [chore] Tooling, configuration
- 🚀 [ci] CI/CD improvements
- 🗑️ [revert] Reverting changes
- 🧪 [test] Add a failing test
- 🚨 [fix] Fix compiler/linter warnings
- 🔒️ [fix] Fix security issues
- 👥 [chore] Add or update contributors
- 🚚 [refactor] Move or rename resources
- 🏗️ [refactor] Make architectural changes
- 🔀 [chore] Merge branches
- 📦️ [chore] Add or update compiled files or packages
- [chore] Add a dependency
- [chore] Remove a dependency
- 🌱 [chore] Add or update seed files
- 🧑 [chore] Improve developer experience
- 🧵 [feat] Add or update code related to multithreading or concurrency
- 🔍️ [feat] Improve SEO
- 🏷️ [feat] Add or update types
- 💬 [feat] Add or update text and literals
- 🌐 [feat] Internationalization and localization
- 👔 [feat] Add or update business logic
- 📱 [feat] Work on responsive design
- 🚸 [feat] Improve user experience / usability
- 🩹 [fix] Simple fix for a non-critical issue
- 🥅 [fix] Catch errors
- 👽️ [fix] Update code due to external API changes
- 🔥 [fix] Remove code or files
- 🎨 [style] Improve structure/format of the code
- 🚑️ [fix] Critical hotfix
- 🎉 [chore] Begin a project
- 🔖 [chore] Release/Version tags
- 🚧 [wip] Work in progress
- 💚 [fix] Fix CI build
- 📌 [chore] Pin dependencies to specific versions
- 👷 [ci] Add or update CI build system
- 📈 [feat] Add or update analytics or tracking code
- ✏️ [fix] Fix typos
- ⏪️ [revert] Revert changes
- 📄 [chore] Add or update license
- 💥 [feat] Introduce breaking changes
- 🍱 [assets] Add or update assets
- ♿️ [feat] Improve accessibility
- 💡 [docs] Add or update comments in source code
- 🗃 [db] Perform database related changes
- 🔊 [feat] Add or update logs
- 🔇 [fix] Remove logs
- 🤡 [test] Mock things
- 🥚 [feat] Add or update an easter egg
- 🙈 [chore] Add or update .gitignore file
- 📸 [test] Add or update snapshots
- ⚗️ [experiment] Perform experiments
- 🚩 [feat] Add, update, or remove feature flags
- 💫 [ui] Add or update animations and transitions
- ⚰️ [refactor] Remove dead code
- 🦺 [feat] Add or update code related to validation
- ✈️ [feat] Improve offline support
## Guidelines for Splitting Commits
When analyzing the diff, consider splitting commits based on these criteria:
1. **Different concerns**: Changes to unrelated parts of the codebase
2. **Different types of changes**: Mixing features, fixes, refactoring, etc.
3. **File patterns**: Changes to different types of files (e.g., source code vs documentation)
4. **Logical grouping**: Changes that would be easier to understand or review separately
5. **Size**: Very large changes that would be clearer if broken down
## Examples
Good commit messages:
- ✨ [feat] Add user authentication system
- 🐛 [fix] Resolve memory leak in rendering process
- 📝 [docs] Update API documentation with new endpoints
- ♻️ [refactor] Simplify error handling logic in parser
- 🚨 [fix] Resolve linter warnings in component files
- 🧑 [chore] Improve developer tooling setup process
- 👔 [feat] Implement business logic for transaction validation
- 🩹 [fix] Address minor styling inconsistency in header
- 🚑 [fix] Patch critical security vulnerability in auth flow
- 🎨 [style] Reorganize component structure for better readability
- 🔥 [fix] Remove deprecated legacy code
- 🦺 [feat] Add input validation for user registration form
- 💚 [fix] Resolve failing CI pipeline tests
- 📈 [feat] Implement analytics tracking for user engagement
- 🔒️ [fix] Strengthen authentication password requirements
- ♿️ [feat] Improve form accessibility for screen readers
Examples with hints:
**Hint: "fix user login bug"**
- Code changes: Fix null pointer exception in auth service
- Generated: 🐛 [fix] Resolve null pointer exception in user login flow
**Hint: "API refactoring"**
- Code changes: Extract common validation logic into separate service
- Generated: ♻️ [refactor] Extract API validation logic into shared service
**Hint: "add dark mode support"**
- Code changes: Add CSS variables and theme toggle component
- Generated: ✨ [feat] Implement dark mode support with theme toggle
**Hint: "performance optimization"**
- Code changes: Implement memoization for expensive calculations
- Generated: ⚡️ [perf] Add memoization to optimize calculation performance
Example of splitting commits:
- First commit: ✨ [feat] Add new solc version type definitions
- Second commit: 📝 [docs] Update documentation for new solc versions
- Third commit: 🔧 [chore] Update package.json dependencies
- Fourth commit: 🏷 [feat] Add type definitions for new API endpoints
- Fifth commit: 🧵 [feat] Improve concurrency handling in worker threads
- Sixth commit: 🚨 [fix] Resolve linting issues in new code
- Seventh commit: ✅ [test] Add unit tests for new solc version features
- Eighth commit: 🔒️ [fix] Update dependencies with security vulnerabilities
## Important Notes
- **If no files are staged, abort the process immediately**.
- **Commit staged files only**: Unstaged files are assumed to be intentionally excluded from the current commit.
- **Do not make any pre-commit checks**. If a pre-commit hook is triggered and fails during the commit process, abort the process immediately.
- **Process hints carefully**: When a hint is provided, analyze it to understand the user's intent, but always verify it aligns with the actual code changes.
- **Hint priority**: Use hints to provide context and focus, but the actual code changes should determine the commit type and scope.
- Before committing, review the diff to **identify if multiple commits would be more appropriate**.

View File

@@ -0,0 +1,71 @@
---
name: staged-code-reviewer
description: Reviews staged git changes for quality, security, and performance. Analyzes files in the git index (git diff --cached) and provides actionable, line-by-line feedback.
---
You are a specialized code review agent. Your sole function is to analyze git changes that have been staged for commit. You must ignore unstaged changes, untracked files, and non-code files (e.g., binaries, data). Your review should be direct, objective, and focused on providing actionable improvements.
## Core Directives
1. Analyze Staged Code: Use the output of `git diff --cached` as the exclusive source for your review.
2. Prioritize by Impact: Focus first on security vulnerabilities and critical bugs, then on performance, and finally on code quality and style.
3. Provide Actionable Feedback: Every identified issue must be accompanied by a concrete suggestion for improvement.
## Review Criteria
For each change, evaluate the following:
* Security: Check for hardcoded secrets, injection vulnerabilities (SQL, XSS), insecure direct object references, and missing authentication/authorization.
* Correctness & Reliability: Verify the logic works as intended, includes proper error handling, and considers edge cases.
* Performance: Identify inefficient algorithms, potential bottlenecks, and expensive operations (e.g., N+1 database queries).
* Code Quality: Assess readability, simplicity, naming conventions, and code duplication (DRY principle).
* Test Coverage: Ensure that new logic is accompanied by meaningful tests.
## Critical Issues to Flag Immediately
* Hardcoded credentials, API keys, or tokens.
* SQL or command injection vulnerabilities.
* Cross-Site Scripting (XSS) vulnerabilities.
* Missing or incorrect authentication/authorization checks.
* Use of unsafe functions like eval() without proper sanitization.
## Output Format
Your entire response must follow this structure. Do not deviate.
Start with a summary header:
Staged Code Review
---
Files Reviewed: [List of staged files]
Total Changes: [Number of lines added/removed]
---
Then, for each file with issues, create a section:
### filename.ext
(One-line summary of the changes in this file.)
**CRITICAL ISSUES**
* (Line X): [Concise Issue Title]
Problem: [Clear description of the issue.]
Suggestion: [Specific, actionable improvement.]
Reasoning: [Why the change is necessary (e.g., security, performance).]
**MAJOR ISSUES**
* (Line Y): [Concise Issue Title]
Problem: [Clear description of the issue.]
Suggestion: [Specific, actionable improvement, including code examples if helpful.]
Reasoning: [Why the change is necessary.]
**MINOR ISSUES**
* (Line Z): [Concise Issue Title]
Problem: [Clear description of the issue.]
Suggestion: [Specific, actionable improvement.]
Reasoning: [Why the change is necessary.]
If a file has no issues, state: "No issues found."
If you see well-implemented code, you may optionally add a "Positive Feedback" section to acknowledge it.

View File

@@ -0,0 +1 @@
Use staged-code-reviewer sub agent to perform code review

View File

@@ -0,0 +1,13 @@
Please analyze and fix the GitHub issue: $ARGUMENTS.
Follow these steps:
1. Use `gh issue view` to get the issue details
2. Understand the problem described in the issue
3. Search the codebase for relevant files
4. Implement the necessary changes to fix the issue
5. Write and run tests to verify the fix
6. Ensure code passes linting and type checking
7. Create a descriptive commit message
Remember to use the GitHub CLI (`gh`) for all GitHub-related tasks.

View File

@@ -0,0 +1,16 @@
Use commit-crafter sub agent to make a standardized commit
## Usage
```
/make-commit [hint]
```
**Parameters:**
- `hint` (optional): A brief description or context to help customize the commit message. The hint will be used to guide the commit message generation while maintaining conventional commit standards.
**Examples:**
- `/make-commit` - Generate commit message based purely on code changes
- `/make-commit "API refactoring"` - Guide the commit to focus on API-related changes
- `/make-commit "fix user login bug"` - Provide context about the specific issue being fixed
- `/make-commit "add dark mode support"` - Indicate the feature being added

57
.dockerignore Normal file
View File

@@ -0,0 +1,57 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg-info/
dist/
build/
*.egg
# Virtual environments
venv/
env/
ENV/
.venv
# IDEs
.vscode/
.idea/
*.swp
*.swo
*~
# Git
.git/
.gitignore
# Testing
.pytest_cache/
.coverage
htmlcov/
# Documentation
docs/_build/
# OS
.DS_Store
Thumbs.db
# Cache
.cache/
*.log
# Jupyter
.ipynb_checkpoints/
# Model files (will be mounted from host)
models/
*.pth
*.onnx
examples/
assets/
docs/
tests/
README.docker.md

31
.github/workflows/deploy-doc.yml vendored Normal file
View File

@@ -0,0 +1,31 @@
name: "Sphinx: Render docs"
on: push
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v4
with:
persist-credentials: false
- name: Build HTML
uses: ammaraskar/sphinx-action@7.0.0
with:
pre-build-command: |
apt-get update && apt-get install -y git
pip install uv
uv pip install --system . .[docs]
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: html-docs
path: docs/build/html/
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
if: github.ref == 'refs/heads/main'
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/build/html

41
.github/workflows/pr-welcome.yml vendored Normal file
View File

@@ -0,0 +1,41 @@
name: PR Welcome Bot
on:
pull_request:
types: [opened]
permissions:
pull-requests: write
issues: write
jobs:
welcome:
runs-on: ubuntu-latest
steps:
- name: Post Welcome Comment
uses: actions/github-script@v6
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const prNumber = context.issue.number;
const prAuthor = context.payload.pull_request.user.login;
const welcomeMessage = `
👋 Hello @${prAuthor}, thank you for contributing to this project! 🎉
We've received your Pull Request and the team will review it as soon as possible.
In the meantime, please ensure:
- [ ] Your code follows the project's coding style
- [ ] Relevant tests have been added and are passing
- [ ] Documentation has been updated if needed
If you have any questions, feel free to ask here. Happy coding! 😊
`;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: welcomeMessage
});

34
.github/workflows/publish.yml vendored Normal file
View File

@@ -0,0 +1,34 @@
name: Publish to PyPI
on:
push:
tags:
- 'v*'
jobs:
publish:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.cargo/bin" >> $GITHUB_PATH
- name: Build package with uv
run: |
uv build
- name: Publish to PyPI
env:
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: |
uv publish --token $UV_PUBLISH_TOKEN

27
.github/workflows/python-lint.yml vendored Normal file
View File

@@ -0,0 +1,27 @@
name: Python Linting
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ruff
- name: Run ruff
run: ruff check .

35
.github/workflows/test.yaml vendored Normal file
View File

@@ -0,0 +1,35 @@
name: Run Tests with Pytest
on:
push:
branches:
- main
pull_request:
branches:
- main
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.cargo/bin" >> $GITHUB_PATH
- name: Install dependencies
run: |
uv sync --extra test
- name: Run tests with pytest
run: |
uv run pytest -v tests/

357
.gitignore vendored
View File

@@ -1,10 +1,349 @@
**/__pycache__
**/.vscode
**/train_result
# Created by https://www.toptal.com/developers/gitignore/api/macos,visualstudiocode,pycharm,python
# Edit at https://www.toptal.com/developers/gitignore?templates=macos,visualstudiocode,pycharm,python
**/logs
**/.cache
**/tmp*
**/data
**/*cache
**/ckpt
### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
# Thumbnails
._*
# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
### macOS Patch ###
# iCloud generated files
*.icloud
### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf
# AWS User-specific
.idea/**/aws.xml
# Generated files
.idea/**/contentModel.xml
# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml
# Gradle
.idea/**/gradle.xml
.idea/**/libraries
# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/artifacts
# .idea/compiler.xml
# .idea/jarRepositories.xml
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr
# CMake
cmake-build-*/
# Mongo Explorer plugin
.idea/**/mongoSettings.xml
# File-based project format
*.iws
# IntelliJ
out/
# mpeltonen/sbt-idea plugin
.idea_modules/
# JIRA plugin
atlassian-ide-plugin.xml
# Cursive Clojure plugin
.idea/replstate.xml
# SonarLint plugin
.idea/sonarlint/
# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties
# Editor-based Rest Client
.idea/httpRequests
# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser
### PyCharm Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr
# Sonarlint plugin
# https://plugins.jetbrains.com/plugin/7973-sonarlint
.idea/**/sonarlint/
# SonarQube Plugin
# https://plugins.jetbrains.com/plugin/7238-sonarqube-community-plugin
.idea/**/sonarIssues.xml
# Markdown Navigator plugin
# https://plugins.jetbrains.com/plugin/7896-markdown-navigator-enhanced
.idea/**/markdown-navigator.xml
.idea/**/markdown-navigator-enh.xml
.idea/**/markdown-navigator/
# Cache file creation bug
# See https://youtrack.jetbrains.com/issue/JBR-2257
.idea/$CACHE_FILE$
# CodeStream plugin
# https://plugins.jetbrains.com/plugin/12206-codestream
.idea/codestream.xml
# Azure Toolkit for IntelliJ plugin
# https://plugins.jetbrains.com/plugin/8053-azure-toolkit-for-intellij
.idea/**/azureSettings.xml
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
### Python Patch ###
# Poetry local configuration file - https://python-poetry.org/docs/configuration/#local-configuration
poetry.toml
# ruff
.ruff_cache/
# LSP config files
pyrightconfig.json
### VisualStudioCode ###
**/.vscode
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets
# Local History for Visual Studio Code
.history/
# Built Visual Studio Code Extensions
*.vsix
### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide
# End of https://www.toptal.com/developers/gitignore/api/macos,visualstudiocode,pycharm,python
uv.lock
**/train_result
**/*.onnx
**/*.png
**/*.jpg
**/augraphy_cache

23
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,23 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.6
hooks:
- id: ruff
args: [--fix, --respect-gitignore, --config=pyproject.toml]
exclude: ^texteller/models/thrid_party/paddleocr/
- id: ruff-format
args: [--config=pyproject.toml]
exclude: ^texteller/models/thrid_party/paddleocr/
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml
- id: check-added-large-files
exclude: assets/
- id: check-case-conflict
- id: check-merge-conflict
- id: debug-statements

1
.python-version Normal file
View File

@@ -0,0 +1 @@
3.10

69
Dockerfile Normal file
View File

@@ -0,0 +1,69 @@
# Use NVIDIA CUDA base image with Python 3.12 (CUDA 12.8 for RTX 5080)
FROM nvidia/cuda:12.8.0-base-ubuntu24.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
CUDA_VISIBLE_DEVICES=0
# Configure apt to use Tsinghua mirror (清华源)
RUN sed -i 's@//archive.ubuntu.com@//mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list.d/ubuntu.sources && \
sed -i 's@//security.ubuntu.com@//mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list.d/ubuntu.sources
# Install Python and system dependencies (Ubuntu 24.04 uses Python 3.12)
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
python3-venv \
git \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender-dev \
libgomp1 \
wget \
&& rm -rf /var/lib/apt/lists/*
# Create symlink for python command
RUN ln -sf /usr/bin/python3 /usr/bin/python
# Configure pip to use Tsinghua mirror (清华源) and allow system-wide installs
RUN python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \
python3 -m pip config set global.break-system-packages true
# Upgrade pip (ignore system-installed packages)
RUN pip install --upgrade --ignore-installed pip setuptools wheel
# Set working directory
WORKDIR /app
# Copy project files
COPY . /app/
# Install PyTorch with CUDA support first (cu124 is compatible with CUDA 12.8)
# Note: PyTorch uses official mirror as Tsinghua doesn't host CUDA builds
RUN pip install torch torchvision
# Install the package and dependencies
# Set version manually since .git is excluded by .dockerignore
ENV SETUPTOOLS_SCM_PRETEND_VERSION=1.0.0
RUN pip install -e .
# Install additional dependencies for server
RUN pip install requests
# Expose port for Ray Serve
EXPOSE 8001
# Create cache directory for models
RUN mkdir -p /root/.cache/huggingface/hub
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python3 -c "import requests; requests.get('http://localhost:8001/', timeout=5)" || exit 1
# Default command to start the server (port 8001)
CMD ["texteller", "launch", "-p", "8001"]

202
LICENSE Normal file
View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright OleehyO
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

253
README.docker.md Normal file
View File

@@ -0,0 +1,253 @@
# TexTeller Docker Deployment Guide
This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080).
## Prerequisites
1. **NVIDIA Driver**: Install NVIDIA driver version 525 or later
2. **NVIDIA Container Toolkit**: Required for GPU access in Docker containers
3. **Docker**: Version 20.10 or later
4. **Docker Compose**: Version 1.29 or later (or use `docker compose` v2)
5. **Pre-downloaded Model**: Model should be in `~/.cache/huggingface/hub/models--OleehyO--TexTeller/`
## Setup NVIDIA Container Toolkit
If you haven't installed the NVIDIA Container Toolkit:
```bash
# Add the package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart docker
```
## Quick Start
The easiest way to deploy is using the provided deployment script:
```bash
# Run all checks and deploy
./deploy.sh deploy
# Or check system requirements first
./deploy.sh check
# View available commands
./deploy.sh
```
## Build and Run
### Using the Deployment Script (Recommended)
```bash
# Full deployment (checks, build, and start)
./deploy.sh deploy
# Just build the image
./deploy.sh build
# Start/stop the service
./deploy.sh start
./deploy.sh stop
# View logs
./deploy.sh logs
# Check status
./deploy.sh status
```
### Using Docker Compose
```bash
# Build and start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
```
### Using Docker directly
```bash
# Build the image
docker build -t texteller:latest .
# Run the container
docker run -d \
--name texteller-server \
--gpus '"device=0"' \
-p 8001:8001 \
-v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \
-e CUDA_VISIBLE_DEVICES=0 \
texteller:latest
```
## API Usage
The server accepts JSON requests with either base64-encoded images or image URLs at the `/predict` endpoint.
### Using base64-encoded image
```bash
# Example with base64 image
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{
"image_base64": "..."
}'
```
### Using image URL
```bash
# Example with image URL
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/math_equation.png"
}'
```
### Python client example
```python
import requests
import base64
# Method 1: Using base64
with open("equation.png", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode()
response = requests.post(
"http://localhost:8001/predict",
json={"image_base64": image_base64}
)
print(response.json())
# Method 2: Using URL
response = requests.post(
"http://localhost:8001/predict",
json={"image_url": "https://example.com/math_equation.png"}
)
print(response.json())
```
Or use the provided test script:
```bash
# Test with a local image
python examples/test_server.py path/to/equation.png
# Test with both local and URL
python examples/test_server.py path/to/equation.png https://example.com/formula.png
```
### Response format
Success response:
```json
{
"result": "\\frac{a}{b} = c"
}
```
Error response:
```json
{
"error": "Failed to decode image"
}
```
## Configuration
You can configure the service by modifying environment variables in `docker-compose.yml`:
- `CUDA_VISIBLE_DEVICES`: GPU device ID (default: 0)
- `RAY_NUM_REPLICAS`: Number of Ray Serve replicas (default: 1)
- `RAY_NCPU_PER_REPLICA`: CPUs per replica (default: 4)
- `RAY_NGPU_PER_REPLICA`: GPUs per replica (default: 1)
## Monitoring
```bash
# Check container status
docker ps
# View real-time logs
docker-compose logs -f texteller
# Check GPU usage
nvidia-smi
# Check container resource usage
docker stats texteller-server
```
## Troubleshooting
### GPU not detected
```bash
# Verify NVIDIA runtime is available
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
```
### Port already in use
Change the port mapping in `docker-compose.yml`:
```yaml
ports:
- "8080:8000" # Host port 8080 -> Container port 8000
```
### Model not found
Ensure the model is downloaded to the correct location:
```bash
ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/
```
## Performance Notes
- **RTX 5080**: Optimized for CUDA 12.8 with cuDNN 9
- **Memory**: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB)
- **Throughput**: ~10-20 images/second depending on image complexity
- **Startup time**: ~30-60 seconds for model loading
## Advanced Configuration
### Multiple GPUs
To use multiple GPUs, modify `docker-compose.yml`:
```yaml
environment:
- CUDA_VISIBLE_DEVICES=0,1
- RAY_NUM_REPLICAS=2
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [gpu]
```
### Production deployment
For production, consider:
1. Using a reverse proxy (nginx/traefik) for SSL/TLS
2. Adding authentication middleware
3. Implementing rate limiting
4. Setting up monitoring (Prometheus/Grafana)
5. Using orchestration (Kubernetes) for scaling

221
README.md Normal file
View File

@@ -0,0 +1,221 @@
📄 English | <a href="./assets/README_zh.md">中文</a>
<div align="center">
<h1>
<img src="./assets/fire.svg" width=60, height=60>
𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
<img src="./assets/fire.svg" width=60, height=60>
</h1>
[![](https://img.shields.io/badge/API-Docs-orange.svg?logo=read-the-docs)](https://oleehyo.github.io/TexTeller/)
[![](https://img.shields.io/badge/Data-Texteller3.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas-80M)
[![](https://img.shields.io/badge/Weights-Texteller3.0-yellow.svg?logo=huggingface)](https://huggingface.co/OleehyO/TexTeller)
[![](https://img.shields.io/badge/docker-pull-green.svg?logo=docker)](https://hub.docker.com/r/oleehyo/texteller)
[![](https://img.shields.io/badge/License-Apache_2.0-blue.svg?logo=github)](https://opensource.org/licenses/Apache-2.0)
</div>
https://github.com/OleehyO/TexTeller/assets/56267907/532d1471-a72e-4960-9677-ec6c19db289f
TexTeller is an end-to-end formula recognition model, capable of converting images into corresponding LaTeX formulas.
TexTeller was trained with **80M image-formula pairs** (previous dataset can be obtained [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR) which used a 100K dataset, TexTeller has **stronger generalization abilities** and **higher accuracy**, covering most use cases.
>[!NOTE]
> If you would like to provide feedback or suggestions for this project, feel free to start a discussion in the [Discussions section](https://github.com/OleehyO/TexTeller/discussions).
---
<table>
<tr>
<td>
## 🔖 Table of Contents
- [Getting Started](#-getting-started)
- [Web Demo](#-web-demo)
- [Server](#-server)
- [Python API](#-python-api)
- [Formula Detection](#-formula-detection)
- [Training](#-training)
</td>
<td>
<div align="center">
<figure>
<img src="assets/cover.png" width="800">
<figcaption>
<p>Images that can be recognized by TexTeller</p>
</figcaption>
</figure>
<div>
</div>
</div>
</td>
</tr>
</table>
## 📮 Change Log
<!-- - [2025-08-15] We have published the [technical report](https://arxiv.org/abs/2508.09220) of TexTeller. The model evaluated on the Benchmark (which was trained from scratch and had its handwritten subset filtered based on the test set) is available at https://huggingface.co/OleehyO/TexTeller_en. **Please do not directly use the open-source version of TexTeller3.0 to reproduce the experimental results of handwritten formulas**, as this model includes the test sets of these benchmarks. -->
- [2025-08-15] We have open-sourced the [training dataset](https://huggingface.co/datasets/OleehyO/latex-formulas-80M) of TexTeller 3.0. Please note that the handwritten* subset of this dataset is collected from existing open-source handwritten datasets (including both training and test sets). If you need to use the handwritten* subset for your experimental ablation, please filter the test labels first.
- [2024-06-06] **TexTeller3.0 released!** The training data has been increased to **80M** (**10x more than** TexTeller2.0 and also improved in data diversity). TexTeller3.0's new features:
- Support scanned image, handwritten formulas, English(Chinese) mixed formulas.
- OCR abilities in both Chinese and English for printed images.
- [2024-05-02] Support **paragraph recognition**.
- [2024-04-12] **Formula detection model** released!
- [2024-03-25] TexTeller2.0 released! The training data for TexTeller2.0 has been increased to 7.5M (15x more than TexTeller1.0 and also improved in data quality). The trained TexTeller2.0 demonstrated **superior performance** in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.
> [Here](./assets/test.pdf) are more test images and a horizontal comparison of various recognition models.
## 🚀 Getting Started
1. Install uv:
```bash
pip install uv
```
2. Install the project's dependencies:
```bash
uv pip install texteller
```
3. If your are using CUDA backend, you may need to install `onnxruntime-gpu`:
```bash
uv pip install texteller[onnxruntime-gpu]
```
4. Run the following command to start inference:
```bash
texteller inference "/path/to/image.{jpg,png}"
```
> See `texteller inference --help` for more details
## 🌐 Web Demo
Run the following command:
```bash
texteller web
```
Enter `http://localhost:8501` in a browser to view the web demo.
> [!NOTE]
> Paragraph recognition cannot restore the structure of a document, it can only recognize its content.
## 🖥️ Server
We use [ray serve](https://github.com/ray-project/ray) to provide an API server for TexTeller. To start the server, run the following command:
```bash
texteller launch
```
| Parameter | Description |
| --------- | -------- |
| `-ckpt` | The path to the weights file,*default is TexTeller's pretrained weights*. |
| `-tknz` | The path to the tokenizer,*default is TexTeller's tokenizer*. |
| `-p` | The server's service port,*default is 8000*. |
| `--num-replicas` | The number of service replicas to run on the server,*default is 1 replica*. You can use more replicas to achieve greater throughput.|
| `--ncpu-per-replica` | The number of CPU cores used per service replica,*default is 1*.|
| `--ngpu-per-replica` | The number of GPUs used per service replica,*default is 1*. You can set this value between 0 and 1 to run multiple service replicas on one GPU to share the GPU, thereby improving GPU utilization. (Note, if --num_replicas is 2, --ngpu_per_replica is 0.7, then 2 GPUs must be available) |
| `--num-beams` | The number of beams for beam search,*default is 1*. |
| `--use-onnx` | Perform inference using Onnx Runtime, *disabled by default* |
To send requests to the server:
```python
# client_demo.py
import requests
server_url = "http://127.0.0.1:8000/predict"
img_path = "/path/to/your/image"
with open(img_path, 'rb') as img:
files = {'img': img}
response = requests.post(server_url, files=files)
print(response.text)
```
## 🐍 Python API
We provide several easy-to-use Python APIs for formula OCR scenarios. Please refer to our [documentation](https://oleehyo.github.io/TexTeller/) to learn about the corresponding API interfaces and usage.
## 🔍 Formula Detection
TexTeller's formula detection model is trained on 3,415 images of Chinese materials and 8,272 images from the [IBEM dataset](https://zenodo.org/records/4757865).
<div align="center">
<img src="./assets/det_rec.png" width=250>
</div>
We provide a formula detection interface in the Python API. Please refer to our [API documentation](https://oleehyo.github.io/TexTeller/) for more details.
## 🏋️‍♂️ Training
Please setup your environment before training:
1. Install the dependencies for training:
```bash
uv pip install texteller[train]
```
2. Clone the repository:
```bash
git clone https://github.com/OleehyO/TexTeller.git
```
### Dataset
We provide an example dataset in the `examples/train_texteller/dataset/train` directory, you can place your own training data according to the format of the example dataset.
### Training the Model
In the `examples/train_texteller/` directory, run the following command:
```bash
accelerate launch train.py
```
Training arguments can be adjusted in [`train_config.yaml`](./examples/train_texteller/train_config.yaml).
## 📅 Plans
- [X] ~~Train the model with a larger dataset~~
- [X] ~~Recognition of scanned images~~
- [X] ~~Support for English and Chinese scenarios~~
- [X] ~~Handwritten formulas support~~
- [ ] PDF document recognition
- [ ] Inference acceleration
## ⭐️ Stargazers over time
[![Stargazers over time](https://starchart.cc/OleehyO/TexTeller.svg?variant=adaptive)](https://starchart.cc/OleehyO/TexTeller)
## 👥 Contributors
<a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
<a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
<img src="https://contrib.rocks/image?repo=OleehyO/TexTeller" />
</a>
</a>

219
assets/README_zh.md Normal file
View File

@@ -0,0 +1,219 @@
📄 中文 | [English](../README.md)
<div align="center">
<h1>
<img src="./fire.svg" width=60, height=60>
𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
<img src="./fire.svg" width=60, height=60>
</h1>
[![](https://img.shields.io/badge/API-文档-orange.svg?logo=read-the-docs)](https://oleehyo.github.io/TexTeller/)
[![](https://img.shields.io/badge/数据-TexTeller3.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas-80M)
[![](https://img.shields.io/badge/权重-TexTeller3.0-yellow.svg?logo=huggingface)](https://huggingface.co/OleehyO/TexTeller)
[![](https://img.shields.io/badge/docker-镜像-green.svg?logo=docker)](https://hub.docker.com/r/oleehyo/texteller)
[![](https://img.shields.io/badge/协议-Apache_2.0-blue.svg?logo=github)](https://opensource.org/licenses/Apache-2.0)
</div>
https://github.com/OleehyO/TexTeller/assets/56267907/532d1471-a72e-4960-9677-ec6c19db289f
TexTeller 是一个端到端的公式识别模型,能够将图像转换为对应的 LaTeX 公式。
TexTeller 使用 **8千万图像-公式对** 进行训练(前代数据集可在此[获取](https://huggingface.co/datasets/OleehyO/latex-formulas)),相较 [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR) 使用的 10 万量级数据集TexTeller 具有**更强的泛化能力**和**更高的准确率**,覆盖绝大多数使用场景。
>[!NOTE]
> 如果您想对本项目提出反馈或建议,欢迎前往 [讨论区](https://github.com/OleehyO/TexTeller/discussions) 发起讨论。
---
<table>
<tr>
<td>
## 🔖 目录
- [快速开始](#-快速开始)
- [网页演示](#-网页演示)
- [服务部署](#-服务部署)
- [Python接口](#-python接口)
- [公式检测](#-公式检测)
- [模型训练](#-模型训练)
</td>
<td>
<div align="center">
<figure>
<img src="cover.png" width="800">
<figcaption>
<p>TexTeller 可识别的图像示例</p>
</figcaption>
</figure>
<div>
</div>
</div>
</td>
</tr>
</table>
## 📮 更新日志
<!-- - [2025-08-15] 我们发布了 TexTeller 的[技术报告](https://arxiv.org/abs/2508.09220)。在基准集上评测的模型(从零训练,且对手写子集按测试集进行了过滤)可在 https://huggingface.co/OleehyO/TexTeller_en 获取。**请不要直接使用开源的 TexTeller3.0 版本来复现实验中的手写公式结果**,因为该模型的训练包含了这些基准的测试集。 -->
- [2025-08-15] 我们开源了 TexTeller 3.0 的[训练数据集](https://huggingface.co/datasets/OleehyO/latex-formulas-80M)。其中handwritten* 子集来自现有的开源手写数据集(**包含训练集和测试集**),请不要将该子集用于实验消融。
- [2024-06-06] **TexTeller3.0 发布!** 训练数据增至 **8千万**(是 TexTeller2.0 的 **10倍** 并提升了数据多样性。TexTeller3.0 新特性:
- 支持扫描件、手写公式、中英文混合公式识别
- 支持印刷体中英文混排公式的OCR识别
- [2024-05-02] 支持**段落识别**功能
- [2024-04-12] **公式检测模型**发布!
- [2024-03-25] TexTeller2.0 发布TexTeller2.0 的训练数据增至750万是前代的15倍并提升了数据质量。训练后的 TexTeller2.0 在测试集中展现了**更优性能**,特别是在识别罕见符号、复杂多行公式和矩阵方面表现突出。
> [此处](./test.pdf) 展示了更多测试图像及各类识别模型的横向对比。
## 🚀 快速开始
1. 安装uv
```bash
pip install uv
```
2. 安装项目依赖:
```bash
uv pip install texteller
```
3. 若使用 CUDA 后端,可能需要安装 `onnxruntime-gpu`
```bash
uv pip install texteller[onnxruntime-gpu]
```
4. 运行以下命令开始推理:
```bash
texteller inference "/path/to/image.{jpg,png}"
```
> 更多参数请查看 `texteller inference --help`
## 🌐 网页演示
命令行运行:
```bash
texteller web
```
在浏览器中输入 `http://localhost:8501` 查看网页演示。
> [!NOTE]
> 段落识别无法还原文档结构,仅能识别其内容。
## 🖥️ 服务部署
我们使用 [ray serve](https://github.com/ray-project/ray) 为 TexTeller 提供 API 服务。启动服务:
```bash
texteller launch
```
| 参数 | 说明 |
| --------- | -------- |
| `-ckpt` | 权重文件路径,*默认为 TexTeller 预训练权重* |
| `-tknz` | 分词器路径,*默认为 TexTeller 分词器* |
| `-p` | 服务端口,*默认 8000* |
| `--num-replicas` | 服务副本数,*默认 1*。可使用更多副本来提升吞吐量 |
| `--ncpu-per-replica` | 单个副本使用的CPU核数*默认 1* |
| `--ngpu-per-replica` | 单个副本使用的GPU数*默认 1*。可设置为0~1之间的值来在单卡上运行多个服务副本共享GPU提升GPU利用率注意若--num_replicas为2--ngpu_per_replica为0.7则需有2块可用GPU |
| `--num-beams` | beam search的束宽*默认 1* |
| `--use-onnx` | 使用Onnx Runtime进行推理*默认关闭* |
向服务发送请求:
```python
# client_demo.py
import requests
server_url = "http://127.0.0.1:8000/predict"
img_path = "/path/to/your/image"
with open(img_path, 'rb') as img:
files = {'img': img}
response = requests.post(server_url, files=files)
print(response.text)
```
## 🐍 Python接口
我们为公式OCR场景提供了多个易用的Python API接口请参考[接口文档](https://oleehyo.github.io/TexTeller/)了解对应的API接口及使用方法。
## 🔍 公式检测
TexTeller的公式检测模型在3415张中文资料图像和8272张[IBEM数据集](https://zenodo.org/records/4757865)图像上训练。
<div align="center">
<img src="./det_rec.png" width=250>
</div>
我们在Python接口中提供了公式检测接口详见[接口文档](https://oleehyo.github.io/TexTeller/)。
## 🏋️‍♂️ 模型训练
请按以下步骤配置训练环境:
1. 安装训练依赖:
```bash
uv pip install texteller[train]
```
2. 克隆仓库:
```bash
git clone https://github.com/OleehyO/TexTeller.git
```
### 数据集准备
我们在`examples/train_texteller/dataset/train`目录中提供了示例数据集,您可按照示例数据集的格式放置自己的训练数据。
### 开始训练
在`examples/train_texteller/`目录下运行:
```bash
accelerate launch train.py
```
训练参数可通过[`train_config.yaml`](../examples/train_texteller/train_config.yaml)调整。
## 📅 计划列表
- [X] ~~使用更大规模数据集训练模型~~
- [X] ~~扫描件识别支持~~
- [X] ~~中英文场景支持~~
- [X] ~~手写公式支持~~
- [ ] PDF文档识别
- [ ] 推理加速
## ⭐️ 项目星标
[![Star增长曲线](https://starchart.cc/OleehyO/TexTeller.svg?variant=adaptive)](https://starchart.cc/OleehyO/TexTeller)
## 👥 贡献者
<a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
<a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
<img src="https://contrib.rocks/image?repo=OleehyO/TexTeller" />
</a>
</a>

Binary file not shown.

Binary file not shown.

BIN
assets/cover.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 MiB

BIN
assets/det_rec.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 919 KiB

460
assets/fire.svg Normal file
View File

@@ -0,0 +1,460 @@
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="" width="200px" height="100px" viewBox="0 0 100 100" preserveAspectRatio="xMidYMid">
<defs>
<filter id="ldio-ekpf7uvh2aq-filter" filterUnits="userSpaceOnUse" x="0" y="0" width="100" height="100">
<feGaussianBlur in="SourceGraphic" stdDeviation="3"></feGaussianBlur>
<feComponentTransfer result="cutoff">
<feFuncA type="linear" slope="10" intercept="-5"></feFuncA>
</feComponentTransfer>
</filter>
</defs><g filter="url(#ldio-ekpf7uvh2aq-filter)"><circle cx="45" cy="154.67770829199992" r="42" fill="#e15b64">
<animate attributeName="cy" values="154.67770829199992;-27.568110790210763" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7914508173328552s"></animate>
<animate attributeName="r" values="42;0;0" keyTimes="0;0.6593879177915443;1" dur="1s" repeatCount="indefinite" begin="-0.7914508173328552s"></animate>
</circle><circle cx="53" cy="156.51873756667007" r="43" fill="#e15b64">
<animate attributeName="cy" values="156.51873756667007;-28.593472199379597" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8990601299952956s"></animate>
<animate attributeName="r" values="43;0;0" keyTimes="0;0.9199190750649376;1" dur="1s" repeatCount="indefinite" begin="-0.8990601299952956s"></animate>
</circle><circle cx="22" cy="118.4676277511406" r="6" fill="#e15b64">
<animate attributeName="cy" values="118.4676277511406;-1.812134766063739" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2574158626531723s"></animate>
<animate attributeName="r" values="6;0;0" keyTimes="0;0.7424894336620584;1" dur="1s" repeatCount="indefinite" begin="-0.2574158626531723s"></animate>
</circle><circle cx="56" cy="143.3980016480395" r="34" fill="#e15b64">
<animate attributeName="cy" values="143.3980016480395;-23.264651741765398" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5292591072219247s"></animate>
<animate attributeName="r" values="34;0;0" keyTimes="0;0.8257208789488842;1" dur="1s" repeatCount="indefinite" begin="-0.5292591072219247s"></animate>
</circle><circle cx="43" cy="154.61226210156264" r="43" fill="#e15b64">
<animate attributeName="cy" values="154.61226210156264;-39.72257238426019" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9349241678635103s"></animate>
<animate attributeName="r" values="43;0;0" keyTimes="0;0.6655411648349204;1" dur="1s" repeatCount="indefinite" begin="-0.9349241678635103s"></animate>
</circle><circle cx="36" cy="141.18233539125538" r="23" fill="#e15b64">
<animate attributeName="cy" values="141.18233539125538;-11.919782601799477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9661184430026497s"></animate>
<animate attributeName="r" values="23;0;0" keyTimes="0;0.7340510315067473;1" dur="1s" repeatCount="indefinite" begin="-0.9661184430026497s"></animate>
</circle><circle cx="55" cy="137.61381349909033" r="35" fill="#e15b64">
<animate attributeName="cy" values="137.61381349909033;-27.023105799592948" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7882390392923937s"></animate>
<animate attributeName="r" values="35;0;0" keyTimes="0;0.5596286394923506;1" dur="1s" repeatCount="indefinite" begin="-0.7882390392923937s"></animate>
</circle><circle cx="81" cy="116.42482869722863" r="6" fill="#e15b64">
<animate attributeName="cy" values="116.42482869722863;2.642571962973477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6838551001109257s"></animate>
<animate attributeName="r" values="6;0;0" keyTimes="0;0.8530428185299654;1" dur="1s" repeatCount="indefinite" begin="-0.6838551001109257s"></animate>
</circle><circle cx="51" cy="144.1337397120671" r="41" fill="#e15b64">
<animate attributeName="cy" values="144.1337397120671;-35.62888188299487" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8931867510460544s"></animate>
<animate attributeName="r" values="41;0;0" keyTimes="0;0.9351064787950636;1" dur="1s" repeatCount="indefinite" begin="-0.8931867510460544s"></animate>
</circle><circle cx="22" cy="127.94124738258117" r="20" fill="#e15b64">
<animate attributeName="cy" values="127.94124738258117;-4.588101238414598" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9129507531699166s"></animate>
<animate attributeName="r" values="20;0;0" keyTimes="0;0.9626971761152365;1" dur="1s" repeatCount="indefinite" begin="-0.9129507531699166s"></animate>
</circle><circle cx="51" cy="130.13871763314205" r="21" fill="#e15b64">
<animate attributeName="cy" values="130.13871763314205;-2.771870373434613" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.16276671760313832s"></animate>
<animate attributeName="r" values="21;0;0" keyTimes="0;0.6367210977937845;1" dur="1s" repeatCount="indefinite" begin="-0.16276671760313832s"></animate>
</circle><circle cx="28" cy="130.94671647108635" r="26" fill="#e15b64">
<animate attributeName="cy" values="130.94671647108635;-20.54470862263146" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.010777607623041363s"></animate>
<animate attributeName="r" values="26;0;0" keyTimes="0;0.5986827903483527;1" dur="1s" repeatCount="indefinite" begin="-0.010777607623041363s"></animate>
</circle><circle cx="32" cy="133.57559887485095" r="18" fill="#e15b64">
<animate attributeName="cy" values="133.57559887485095;-13.998747273650661" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6849903294560423s"></animate>
<animate attributeName="r" values="18;0;0" keyTimes="0;0.9272684317035897;1" dur="1s" repeatCount="indefinite" begin="-0.6849903294560423s"></animate>
</circle><circle cx="50" cy="129.2368025879272" r="29" fill="#e15b64">
<animate attributeName="cy" values="129.2368025879272;-21.38222818211007" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2570532837614655s"></animate>
<animate attributeName="r" values="29;0;0" keyTimes="0;0.5349692982819836;1" dur="1s" repeatCount="indefinite" begin="-0.2570532837614655s"></animate>
</circle><circle cx="54" cy="147.67203918209864" r="32" fill="#e15b64">
<animate attributeName="cy" values="147.67203918209864;-23.292000640460095" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8840781999829185s"></animate>
<animate attributeName="r" values="32;0;0" keyTimes="0;0.9905440228534627;1" dur="1s" repeatCount="indefinite" begin="-0.8840781999829185s"></animate>
</circle><circle cx="49" cy="156.33097983975816" r="43" fill="#e15b64">
<animate attributeName="cy" values="156.33097983975816;-30.688836209655307" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6363282840605137s"></animate>
<animate attributeName="r" values="43;0;0" keyTimes="0;0.578321371334853;1" dur="1s" repeatCount="indefinite" begin="-0.6363282840605137s"></animate>
</circle><circle cx="53" cy="150.73132612778645" r="38" fill="#e15b64">
<animate attributeName="cy" values="150.73132612778645;-24.243875812169208" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6889884148164682s"></animate>
<animate attributeName="r" values="38;0;0" keyTimes="0;0.9820908894527897;1" dur="1s" repeatCount="indefinite" begin="-0.6889884148164682s"></animate>
</circle><circle cx="58" cy="136.92364235316566" r="30" fill="#e15b64">
<animate attributeName="cy" values="136.92364235316566;-14.514104757207221" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3274028295945308s"></animate>
<animate attributeName="r" values="30;0;0" keyTimes="0;0.9109990458833535;1" dur="1s" repeatCount="indefinite" begin="-0.3274028295945308s"></animate>
</circle><circle cx="21" cy="125.47085228007643" r="18" fill="#e15b64">
<animate attributeName="cy" values="125.47085228007643;-8.232426956653288" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.11103461733078768s"></animate>
<animate attributeName="r" values="18;0;0" keyTimes="0;0.7718042613876622;1" dur="1s" repeatCount="indefinite" begin="-0.11103461733078768s"></animate>
</circle><circle cx="57" cy="154.13251799723747" r="37" fill="#e15b64">
<animate attributeName="cy" values="154.13251799723747;-18.665203993986026" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8263441768461145s"></animate>
<animate attributeName="r" values="37;0;0" keyTimes="0;0.7148325280461965;1" dur="1s" repeatCount="indefinite" begin="-0.8263441768461145s"></animate>
</circle><circle cx="52" cy="163.55969451733722" r="47" fill="#e15b64">
<animate attributeName="cy" values="163.55969451733722;-45.32343944696123" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.08605155305311041s"></animate>
<animate attributeName="r" values="47;0;0" keyTimes="0;0.8554524873372089;1" dur="1s" repeatCount="indefinite" begin="-0.08605155305311041s"></animate>
</circle><circle cx="43" cy="150.72861891310126" r="42" fill="#e15b64">
<animate attributeName="cy" values="150.72861891310126;-23.942286768617272" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8013052401764136s"></animate>
<animate attributeName="r" values="42;0;0" keyTimes="0;0.6681090498432822;1" dur="1s" repeatCount="indefinite" begin="-0.8013052401764136s"></animate>
</circle><circle cx="62" cy="109.2607457626771" r="2" fill="#e15b64">
<animate attributeName="cy" values="109.2607457626771;3.194634855160243" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7901767326521292s"></animate>
<animate attributeName="r" values="2;0;0" keyTimes="0;0.7018579919397697;1" dur="1s" repeatCount="indefinite" begin="-0.7901767326521292s"></animate>
</circle><circle cx="29" cy="132.04950518708117" r="26" fill="#e15b64">
<animate attributeName="cy" values="132.04950518708117;-24.268419710129816" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9729317633977274s"></animate>
<animate attributeName="r" values="26;0;0" keyTimes="0;0.8277305604086497;1" dur="1s" repeatCount="indefinite" begin="-0.9729317633977274s"></animate>
</circle><circle cx="54" cy="150.69697127653222" r="41" fill="#e15b64">
<animate attributeName="cy" values="150.69697127653222;-27.168516505190766" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5902016146688314s"></animate>
<animate attributeName="r" values="41;0;0" keyTimes="0;0.8175867220161461;1" dur="1s" repeatCount="indefinite" begin="-0.5902016146688314s"></animate>
</circle><circle cx="50" cy="115.01352405454155" r="7" fill="#e15b64">
<animate attributeName="cy" values="115.01352405454155;-4.5076288690789195" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5091907734741129s"></animate>
<animate attributeName="r" values="7;0;0" keyTimes="0;0.6751846924914742;1" dur="1s" repeatCount="indefinite" begin="-0.5091907734741129s"></animate>
</circle><circle cx="65" cy="137.6419430633514" r="34" fill="#e15b64">
<animate attributeName="cy" values="137.6419430633514;-17.00344965868893" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.34747192063247945s"></animate>
<animate attributeName="r" values="34;0;0" keyTimes="0;0.5212737600536792;1" dur="1s" repeatCount="indefinite" begin="-0.34747192063247945s"></animate>
</circle><circle cx="34" cy="127.0455079544209" r="14" fill="#e15b64">
<animate attributeName="cy" values="127.0455079544209;-3.6990759299641454" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4890615261218786s"></animate>
<animate attributeName="r" values="14;0;0" keyTimes="0;0.6183470012170013;1" dur="1s" repeatCount="indefinite" begin="-0.4890615261218786s"></animate>
</circle><circle cx="12" cy="120.43345098845494" r="3" fill="#e15b64">
<animate attributeName="cy" values="120.43345098845494;9.74374931913883" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3026505339978601s"></animate>
<animate attributeName="r" values="3;0;0" keyTimes="0;0.5414300978949788;1" dur="1s" repeatCount="indefinite" begin="-0.3026505339978601s"></animate>
</circle><circle cx="49" cy="161.35205628493102" r="43" fill="#e15b64">
<animate attributeName="cy" values="161.35205628493102;-37.872089939512506" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.38741962448531564s"></animate>
<animate attributeName="r" values="43;0;0" keyTimes="0;0.5096615889177538;1" dur="1s" repeatCount="indefinite" begin="-0.38741962448531564s"></animate>
</circle><circle cx="54" cy="146.5769009919314" r="44" fill="#e15b64">
<animate attributeName="cy" values="146.5769009919314;-38.33530354334875" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.34335748774106034s"></animate>
<animate attributeName="r" values="44;0;0" keyTimes="0;0.743420827137904;1" dur="1s" repeatCount="indefinite" begin="-0.34335748774106034s"></animate>
</circle><circle cx="20" cy="111.24659457696168" r="7" fill="#e15b64">
<animate attributeName="cy" values="111.24659457696168;10.851798254886354" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6282307990647713s"></animate>
<animate attributeName="r" values="7;0;0" keyTimes="0;0.8297799829349941;1" dur="1s" repeatCount="indefinite" begin="-0.6282307990647713s"></animate>
</circle><circle cx="50" cy="164.0676485495781" r="45" fill="#e15b64">
<animate attributeName="cy" values="164.0676485495781;-31.499414285176986" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7760446285439819s"></animate>
<animate attributeName="r" values="45;0;0" keyTimes="0;0.5740694195049653;1" dur="1s" repeatCount="indefinite" begin="-0.7760446285439819s"></animate>
</circle><circle cx="63" cy="121.15583070803987" r="16" fill="#e15b64">
<animate attributeName="cy" values="121.15583070803987;-2.1042758907266066" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2305276534763374s"></animate>
<animate attributeName="r" values="16;0;0" keyTimes="0;0.5205278426126575;1" dur="1s" repeatCount="indefinite" begin="-0.2305276534763374s"></animate>
</circle><circle cx="70" cy="143.94247592516618" r="29" fill="#e15b64">
<animate attributeName="cy" values="143.94247592516618;-23.62297573618442" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5284797120514513s"></animate>
<animate attributeName="r" values="29;0;0" keyTimes="0;0.9336811516026573;1" dur="1s" repeatCount="indefinite" begin="-0.5284797120514513s"></animate>
</circle><circle cx="21" cy="122.79868387744153" r="20" fill="#e15b64">
<animate attributeName="cy" values="122.79868387744153;-13.104461771681535" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8845782118773111s"></animate>
<animate attributeName="r" values="20;0;0" keyTimes="0;0.904216846935756;1" dur="1s" repeatCount="indefinite" begin="-0.8845782118773111s"></animate>
</circle><circle cx="46" cy="143.70707265719267" r="24" fill="#e15b64">
<animate attributeName="cy" values="143.70707265719267;-20.28891701845349" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23245576862802375s"></animate>
<animate attributeName="r" values="24;0;0" keyTimes="0;0.6586288079548765;1" dur="1s" repeatCount="indefinite" begin="-0.23245576862802375s"></animate>
</circle><circle cx="65" cy="140.13731645312657" r="22" fill="#e15b64">
<animate attributeName="cy" values="140.13731645312657;-5.338876455584764" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7182419259629308s"></animate>
<animate attributeName="r" values="22;0;0" keyTimes="0;0.8813907372203135;1" dur="1s" repeatCount="indefinite" begin="-0.7182419259629308s"></animate>
</circle><circle cx="37" cy="139.00958710472267" r="35" fill="#e15b64">
<animate attributeName="cy" values="139.00958710472267;-25.68265144780311" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7030100698848409s"></animate>
<animate attributeName="r" values="35;0;0" keyTimes="0;0.7320613459176248;1" dur="1s" repeatCount="indefinite" begin="-0.7030100698848409s"></animate>
</circle><circle cx="45" cy="146.6744507961619" r="44" fill="#e15b64">
<animate attributeName="cy" values="146.6744507961619;-38.087338695486295" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8319540053556033s"></animate>
<animate attributeName="r" values="44;0;0" keyTimes="0;0.5904241586083279;1" dur="1s" repeatCount="indefinite" begin="-0.8319540053556033s"></animate>
</circle><circle cx="53" cy="116.16529146873187" r="15" fill="#e15b64">
<animate attributeName="cy" values="116.16529146873187;-3.17669223153381" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7864341362651808s"></animate>
<animate attributeName="r" values="15;0;0" keyTimes="0;0.589186107816807;1" dur="1s" repeatCount="indefinite" begin="-0.7864341362651808s"></animate>
</circle><circle cx="29" cy="141.6902909599232" r="23" fill="#e15b64">
<animate attributeName="cy" values="141.6902909599232;-16.250272669063218" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.18084365714200346s"></animate>
<animate attributeName="r" values="23;0;0" keyTimes="0;0.8116571311237253;1" dur="1s" repeatCount="indefinite" begin="-0.18084365714200346s"></animate>
</circle><circle cx="65" cy="143.73302386926983" r="32" fill="#e15b64">
<animate attributeName="cy" values="143.73302386926983;-24.229369251904558" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5786484558188305s"></animate>
<animate attributeName="r" values="32;0;0" keyTimes="0;0.8515606125902615;1" dur="1s" repeatCount="indefinite" begin="-0.5786484558188305s"></animate>
</circle><circle cx="39" cy="143.3951504366216" r="33" fill="#e15b64">
<animate attributeName="cy" values="143.3951504366216;-27.75171362166084" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.1481578769905092s"></animate>
<animate attributeName="r" values="33;0;0" keyTimes="0;0.797255218191478;1" dur="1s" repeatCount="indefinite" begin="-0.1481578769905092s"></animate>
</circle><circle cx="59" cy="129.28605384114482" r="27" fill="#e15b64">
<animate attributeName="cy" values="129.28605384114482;-12.095864862844131" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23581997562886903s"></animate>
<animate attributeName="r" values="27;0;0" keyTimes="0;0.8271538616610963;1" dur="1s" repeatCount="indefinite" begin="-0.23581997562886903s"></animate>
</circle><circle cx="70" cy="144.09835508207823" r="28" fill="#e15b64">
<animate attributeName="cy" values="144.09835508207823;-13.162793363728145" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23606519556482253s"></animate>
<animate attributeName="r" values="28;0;0" keyTimes="0;0.73085815703799;1" dur="1s" repeatCount="indefinite" begin="-0.23606519556482253s"></animate>
</circle><circle cx="48" cy="145.01565757702042" r="44" fill="#e15b64">
<animate attributeName="cy" values="145.01565757702042;-32.30510020024561" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8615348704203486s"></animate>
<animate attributeName="r" values="44;0;0" keyTimes="0;0.9694373671371078;1" dur="1s" repeatCount="indefinite" begin="-0.8615348704203486s"></animate>
</circle><circle cx="95" cy="113.78554320990165" r="4" fill="#e15b64">
<animate attributeName="cy" values="113.78554320990165;-1.2652564238335904" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.21370544900580335s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.5334621383741172;1" dur="1s" repeatCount="indefinite" begin="-0.21370544900580335s"></animate>
</circle><circle cx="57" cy="136.06708935936715" r="34" fill="#e15b64">
<animate attributeName="cy" values="136.06708935936715;-19.758990054858902" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7755376997281404s"></animate>
<animate attributeName="r" values="34;0;0" keyTimes="0;0.9943252777203475;1" dur="1s" repeatCount="indefinite" begin="-0.7755376997281404s"></animate>
</circle><circle cx="72" cy="123.8422572942333" r="19" fill="#e15b64">
<animate attributeName="cy" values="123.8422572942333;-1.0000700639794928" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9670461872772004s"></animate>
<animate attributeName="r" values="19;0;0" keyTimes="0;0.7801926792335607;1" dur="1s" repeatCount="indefinite" begin="-0.9670461872772004s"></animate>
</circle></g><g filter="url(#ldio-ekpf7uvh2aq-filter)"><circle cx="27" cy="136.75172282051147" r="17" fill="#f47e60">
<animate attributeName="cy" values="136.75172282051147;-5.48853662281188" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4403846891955857s"></animate>
<animate attributeName="r" values="17;0;0" keyTimes="0;0.7894732341719188;1" dur="1s" repeatCount="indefinite" begin="-0.4403846891955857s"></animate>
</circle><circle cx="34" cy="132.08290473906044" r="28" fill="#f47e60">
<animate attributeName="cy" values="132.08290473906044;-16.339029232048958" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7882134883361418s"></animate>
<animate attributeName="r" values="28;0;0" keyTimes="0;0.5035175026787356;1" dur="1s" repeatCount="indefinite" begin="-0.7882134883361418s"></animate>
</circle><circle cx="66" cy="127.45606892584162" r="23" fill="#f47e60">
<animate attributeName="cy" values="127.45606892584162;-11.56763185745981" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23537267190332678s"></animate>
<animate attributeName="r" values="23;0;0" keyTimes="0;0.7818578332234903;1" dur="1s" repeatCount="indefinite" begin="-0.23537267190332678s"></animate>
</circle><circle cx="29" cy="124.28337961013858" r="15" fill="#f47e60">
<animate attributeName="cy" values="124.28337961013858;0.8461921465181206" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.30918442080681285s"></animate>
<animate attributeName="r" values="15;0;0" keyTimes="0;0.9741475377259025;1" dur="1s" repeatCount="indefinite" begin="-0.30918442080681285s"></animate>
</circle><circle cx="61" cy="147.91603256008383" r="31" fill="#f47e60">
<animate attributeName="cy" values="147.91603256008383;-14.754981670358578" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.0033816756583812113s"></animate>
<animate attributeName="r" values="31;0;0" keyTimes="0;0.6463193577485268;1" dur="1s" repeatCount="indefinite" begin="-0.0033816756583812113s"></animate>
</circle><circle cx="25" cy="120.64483537229628" r="9" fill="#f47e60">
<animate attributeName="cy" values="120.64483537229628;-7.193123212298179" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6891092543031828s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.8637808572418493;1" dur="1s" repeatCount="indefinite" begin="-0.6891092543031828s"></animate>
</circle><circle cx="12" cy="121.18727231753691" r="4" fill="#f47e60">
<animate attributeName="cy" values="121.18727231753691;15.883181236637633" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.24454851002004097s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.8215012014926046;1" dur="1s" repeatCount="indefinite" begin="-0.24454851002004097s"></animate>
</circle><circle cx="58" cy="136.64954415018815" r="19" fill="#f47e60">
<animate attributeName="cy" values="136.64954415018815;-13.637628862199563" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7672442553828805s"></animate>
<animate attributeName="r" values="19;0;0" keyTimes="0;0.7534841891330046;1" dur="1s" repeatCount="indefinite" begin="-0.7672442553828805s"></animate>
</circle><circle cx="69" cy="120.72538023727738" r="10" fill="#f47e60">
<animate attributeName="cy" values="120.72538023727738;-5.651458016294906" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6587915764098667s"></animate>
<animate attributeName="r" values="10;0;0" keyTimes="0;0.5977129956186352;1" dur="1s" repeatCount="indefinite" begin="-0.6587915764098667s"></animate>
</circle><circle cx="46" cy="122.63158963579554" r="20" fill="#f47e60">
<animate attributeName="cy" values="122.63158963579554;-8.99196405151625" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3698350873089088s"></animate>
<animate attributeName="r" values="20;0;0" keyTimes="0;0.5563937567659611;1" dur="1s" repeatCount="indefinite" begin="-0.3698350873089088s"></animate>
</circle><circle cx="7" cy="121.15700947168602" r="2" fill="#f47e60">
<animate attributeName="cy" values="121.15700947168602;0.605011189845321" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.514133243834255s"></animate>
<animate attributeName="r" values="2;0;0" keyTimes="0;0.7510335363256938;1" dur="1s" repeatCount="indefinite" begin="-0.514133243834255s"></animate>
</circle><circle cx="19" cy="117.69071117783832" r="7" fill="#f47e60">
<animate attributeName="cy" values="117.69071117783832;-2.4512162536532234" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4163222368875168s"></animate>
<animate attributeName="r" values="7;0;0" keyTimes="0;0.9697983093212361;1" dur="1s" repeatCount="indefinite" begin="-0.4163222368875168s"></animate>
</circle><circle cx="34" cy="122.22172344680293" r="22" fill="#f47e60">
<animate attributeName="cy" values="122.22172344680293;-14.875000336072436" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8346904488502503s"></animate>
<animate attributeName="r" values="22;0;0" keyTimes="0;0.9284864899458874;1" dur="1s" repeatCount="indefinite" begin="-0.8346904488502503s"></animate>
</circle><circle cx="48" cy="118.34245443793573" r="12" fill="#f47e60">
<animate attributeName="cy" values="118.34245443793573;6.1569446890589035" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7372012265846987s"></animate>
<animate attributeName="r" values="12;0;0" keyTimes="0;0.9146509122657862;1" dur="1s" repeatCount="indefinite" begin="-0.7372012265846987s"></animate>
</circle><circle cx="38" cy="108.37260349538107" r="4" fill="#f47e60">
<animate attributeName="cy" values="108.37260349538107;-3.9166184571860483" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6955752887050161s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.9793871272170744;1" dur="1s" repeatCount="indefinite" begin="-0.6955752887050161s"></animate>
</circle><circle cx="50" cy="120.05611377372627" r="20" fill="#f47e60">
<animate attributeName="cy" values="120.05611377372627;-19.59128463520709" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8198691615147322s"></animate>
<animate attributeName="r" values="20;0;0" keyTimes="0;0.6017320767396992;1" dur="1s" repeatCount="indefinite" begin="-0.8198691615147322s"></animate>
</circle><circle cx="69" cy="133.11553485199934" r="21" fill="#f47e60">
<animate attributeName="cy" values="133.11553485199934;-7.230262198733577" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6502042470386947s"></animate>
<animate attributeName="r" values="21;0;0" keyTimes="0;0.9802383350633911;1" dur="1s" repeatCount="indefinite" begin="-0.6502042470386947s"></animate>
</circle><circle cx="60" cy="138.10205797824347" r="31" fill="#f47e60">
<animate attributeName="cy" values="138.10205797824347;-21.149182634283513" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8527464543018912s"></animate>
<animate attributeName="r" values="31;0;0" keyTimes="0;0.5593223005306734;1" dur="1s" repeatCount="indefinite" begin="-0.8527464543018912s"></animate>
</circle><circle cx="72" cy="121.45841247692351" r="16" fill="#f47e60">
<animate attributeName="cy" values="121.45841247692351;-5.0851516529984195" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4077549975882817s"></animate>
<animate attributeName="r" values="16;0;0" keyTimes="0;0.5763111141098053;1" dur="1s" repeatCount="indefinite" begin="-0.4077549975882817s"></animate>
</circle><circle cx="56" cy="118.12349945951125" r="10" fill="#f47e60">
<animate attributeName="cy" values="118.12349945951125;-7.082779421666896" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.21747152423150562s"></animate>
<animate attributeName="r" values="10;0;0" keyTimes="0;0.6868094744383062;1" dur="1s" repeatCount="indefinite" begin="-0.21747152423150562s"></animate>
</circle><circle cx="77" cy="119.41951761904794" r="17" fill="#f47e60">
<animate attributeName="cy" values="119.41951761904794;-9.114276721599797" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.48345793287516814s"></animate>
<animate attributeName="r" values="17;0;0" keyTimes="0;0.5135663211192452;1" dur="1s" repeatCount="indefinite" begin="-0.48345793287516814s"></animate>
</circle><circle cx="78" cy="125.60192795392818" r="11" fill="#f47e60">
<animate attributeName="cy" values="125.60192795392818;-6.73068982191926" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23667812050200931s"></animate>
<animate attributeName="r" values="11;0;0" keyTimes="0;0.9898092475181265;1" dur="1s" repeatCount="indefinite" begin="-0.23667812050200931s"></animate>
</circle><circle cx="51" cy="138.224179154187" r="24" fill="#f47e60">
<animate attributeName="cy" values="138.224179154187;-8.55653503677315" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5735700676741093s"></animate>
<animate attributeName="r" values="24;0;0" keyTimes="0;0.9566960986989479;1" dur="1s" repeatCount="indefinite" begin="-0.5735700676741093s"></animate>
</circle><circle cx="41" cy="131.14944604607328" r="21" fill="#f47e60">
<animate attributeName="cy" values="131.14944604607328;-17.847508222350655" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.07696580759865079s"></animate>
<animate attributeName="r" values="21;0;0" keyTimes="0;0.6865631531399743;1" dur="1s" repeatCount="indefinite" begin="-0.07696580759865079s"></animate>
</circle><circle cx="49" cy="128.787268826053" r="17" fill="#f47e60">
<animate attributeName="cy" values="128.787268826053;1.143259231969072" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7890428937034474s"></animate>
<animate attributeName="r" values="17;0;0" keyTimes="0;0.5926722445396657;1" dur="1s" repeatCount="indefinite" begin="-0.7890428937034474s"></animate>
</circle><circle cx="17" cy="120.22416295842616" r="13" fill="#f47e60">
<animate attributeName="cy" values="120.22416295842616;5.932998615440596" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.25642472915187764s"></animate>
<animate attributeName="r" values="13;0;0" keyTimes="0;0.5738477034101163;1" dur="1s" repeatCount="indefinite" begin="-0.25642472915187764s"></animate>
</circle><circle cx="73" cy="127.02191586426626" r="24" fill="#f47e60">
<animate attributeName="cy" values="127.02191586426626;-19.34982189589097" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9257599774553938s"></animate>
<animate attributeName="r" values="24;0;0" keyTimes="0;0.6060248140675957;1" dur="1s" repeatCount="indefinite" begin="-0.9257599774553938s"></animate>
</circle><circle cx="29" cy="122.37303701766326" r="22" fill="#f47e60">
<animate attributeName="cy" values="122.37303701766326;-17.181874655618834" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.11979523584713825s"></animate>
<animate attributeName="r" values="22;0;0" keyTimes="0;0.5778892301319281;1" dur="1s" repeatCount="indefinite" begin="-0.11979523584713825s"></animate>
</circle><circle cx="30" cy="132.91741320840808" r="18" fill="#f47e60">
<animate attributeName="cy" values="132.91741320840808;0.24294121648419775" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6890213202603488s"></animate>
<animate attributeName="r" values="18;0;0" keyTimes="0;0.8587373770805918;1" dur="1s" repeatCount="indefinite" begin="-0.6890213202603488s"></animate>
</circle><circle cx="80" cy="116.72839679840811" r="14" fill="#f47e60">
<animate attributeName="cy" values="116.72839679840811;4.82183707831593" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.08182847032405782s"></animate>
<animate attributeName="r" values="14;0;0" keyTimes="0;0.6809633164153448;1" dur="1s" repeatCount="indefinite" begin="-0.08182847032405782s"></animate>
</circle><circle cx="31" cy="125.20247260666616" r="13" fill="#f47e60">
<animate attributeName="cy" values="125.20247260666616;2.008326413572634" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8369662812852767s"></animate>
<animate attributeName="r" values="13;0;0" keyTimes="0;0.5845779670186058;1" dur="1s" repeatCount="indefinite" begin="-0.8369662812852767s"></animate>
</circle><circle cx="60" cy="125.0794549947879" r="16" fill="#f47e60">
<animate attributeName="cy" values="125.0794549947879;0.7338248372355807" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8948237868324189s"></animate>
<animate attributeName="r" values="16;0;0" keyTimes="0;0.9120596722058173;1" dur="1s" repeatCount="indefinite" begin="-0.8948237868324189s"></animate>
</circle><circle cx="25" cy="126.90612837175388" r="8" fill="#f47e60">
<animate attributeName="cy" values="126.90612837175388;4.0472618983783715" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.39581604043317986s"></animate>
<animate attributeName="r" values="8;0;0" keyTimes="0;0.8074064845720312;1" dur="1s" repeatCount="indefinite" begin="-0.39581604043317986s"></animate>
</circle><circle cx="37" cy="131.42028038990128" r="25" fill="#f47e60">
<animate attributeName="cy" values="131.42028038990128;-22.403977227715075" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.04301794169924622s"></animate>
<animate attributeName="r" values="25;0;0" keyTimes="0;0.524891315929541;1" dur="1s" repeatCount="indefinite" begin="-0.04301794169924622s"></animate>
</circle><circle cx="41" cy="149.05000141391616" r="31" fill="#f47e60">
<animate attributeName="cy" values="149.05000141391616;-19.10046896539864" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7213401886638007s"></animate>
<animate attributeName="r" values="31;0;0" keyTimes="0;0.6890520162965066;1" dur="1s" repeatCount="indefinite" begin="-0.7213401886638007s"></animate>
</circle><circle cx="36" cy="138.58798523568342" r="27" fill="#f47e60">
<animate attributeName="cy" values="138.58798523568342;-15.572058043829461" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.40556498158772736s"></animate>
<animate attributeName="r" values="27;0;0" keyTimes="0;0.8506348676044777;1" dur="1s" repeatCount="indefinite" begin="-0.40556498158772736s"></animate>
</circle><circle cx="78" cy="137.9707233461312" r="20" fill="#f47e60">
<animate attributeName="cy" values="137.9707233461312;-3.6945948738885512" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8880631706610672s"></animate>
<animate attributeName="r" values="20;0;0" keyTimes="0;0.9304971995517395;1" dur="1s" repeatCount="indefinite" begin="-0.8880631706610672s"></animate>
</circle><circle cx="79" cy="134.71673525431498" r="18" fill="#f47e60">
<animate attributeName="cy" values="134.71673525431498;-10.261412982322742" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2848983056723242s"></animate>
<animate attributeName="r" values="18;0;0" keyTimes="0;0.7526875949615255;1" dur="1s" repeatCount="indefinite" begin="-0.2848983056723242s"></animate>
</circle><circle cx="82" cy="111.49802891873294" r="5" fill="#f47e60">
<animate attributeName="cy" values="111.49802891873294;12.140748225430922" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.40945179236345397s"></animate>
<animate attributeName="r" values="5;0;0" keyTimes="0;0.703997116139137;1" dur="1s" repeatCount="indefinite" begin="-0.40945179236345397s"></animate>
</circle><circle cx="68" cy="140.96466884045572" r="22" fill="#f47e60">
<animate attributeName="cy" values="140.96466884045572;-4.079142984351218" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.40439383112303107s"></animate>
<animate attributeName="r" values="22;0;0" keyTimes="0;0.5493704483007363;1" dur="1s" repeatCount="indefinite" begin="-0.40439383112303107s"></animate>
</circle><circle cx="41" cy="116.24169615516264" r="16" fill="#f47e60">
<animate attributeName="cy" values="116.24169615516264;-13.644720096932094" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.22449184929827926s"></animate>
<animate attributeName="r" values="16;0;0" keyTimes="0;0.6587866247823291;1" dur="1s" repeatCount="indefinite" begin="-0.22449184929827926s"></animate>
</circle><circle cx="20" cy="124.66929057881916" r="15" fill="#f47e60">
<animate attributeName="cy" values="124.66929057881916;2.5505611618972814" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.017560126563357925s"></animate>
<animate attributeName="r" values="15;0;0" keyTimes="0;0.6128429739262174;1" dur="1s" repeatCount="indefinite" begin="-0.017560126563357925s"></animate>
</circle><circle cx="63" cy="126.5115900704738" r="26" fill="#f47e60">
<animate attributeName="cy" values="126.5115900704738;-20.921901271813873" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5285257319858678s"></animate>
<animate attributeName="r" values="26;0;0" keyTimes="0;0.9007468611639214;1" dur="1s" repeatCount="indefinite" begin="-0.5285257319858678s"></animate>
</circle><circle cx="90" cy="111.61440083571019" r="6" fill="#f47e60">
<animate attributeName="cy" values="111.61440083571019;11.61930520437923" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8167452043810126s"></animate>
<animate attributeName="r" values="6;0;0" keyTimes="0;0.9810779841180124;1" dur="1s" repeatCount="indefinite" begin="-0.8167452043810126s"></animate>
</circle><circle cx="78" cy="122.50775060552778" r="20" fill="#f47e60">
<animate attributeName="cy" values="122.50775060552778;-4.59807973956865" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.11755589684814727s"></animate>
<animate attributeName="r" values="20;0;0" keyTimes="0;0.6705237343698631;1" dur="1s" repeatCount="indefinite" begin="-0.11755589684814727s"></animate>
</circle><circle cx="31" cy="127.90703241028092" r="9" fill="#f47e60">
<animate attributeName="cy" values="127.90703241028092;0.829718008041219" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5851309189776632s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.6889560303799027;1" dur="1s" repeatCount="indefinite" begin="-0.5851309189776632s"></animate>
</circle><circle cx="65" cy="117.43435709704966" r="4" fill="#f47e60">
<animate attributeName="cy" values="117.43435709704966;15.28596080488979" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8492165554334472s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.5287459347086204;1" dur="1s" repeatCount="indefinite" begin="-0.8492165554334472s"></animate>
</circle><circle cx="89" cy="122.93132420091489" r="3" fill="#f47e60">
<animate attributeName="cy" values="122.93132420091489;5.980513428860888" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.06884209677796871s"></animate>
<animate attributeName="r" values="3;0;0" keyTimes="0;0.5868616814040618;1" dur="1s" repeatCount="indefinite" begin="-0.06884209677796871s"></animate>
</circle><circle cx="68" cy="129.1441504106191" r="26" fill="#f47e60">
<animate attributeName="cy" values="129.1441504106191;-22.781245889673905" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.26191875209122073s"></animate>
<animate attributeName="r" values="26;0;0" keyTimes="0;0.6200648439404779;1" dur="1s" repeatCount="indefinite" begin="-0.26191875209122073s"></animate>
</circle><circle cx="22" cy="130.63745849588264" r="20" fill="#f47e60">
<animate attributeName="cy" values="130.63745849588264;-10.695329441338862" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6192951915425052s"></animate>
<animate attributeName="r" values="20;0;0" keyTimes="0;0.6969346125529845;1" dur="1s" repeatCount="indefinite" begin="-0.6192951915425052s"></animate>
</circle></g><g filter="url(#ldio-ekpf7uvh2aq-filter)"><circle cx="57" cy="123.68953191890479" r="12" fill="#f8b26a">
<animate attributeName="cy" values="123.68953191890479;4.854991577389438" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9097135632734302s"></animate>
<animate attributeName="r" values="12;0;0" keyTimes="0;0.9463910575266388;1" dur="1s" repeatCount="indefinite" begin="-0.9097135632734302s"></animate>
</circle><circle cx="24" cy="124.54645838615471" r="12" fill="#f8b26a">
<animate attributeName="cy" values="124.54645838615471;-11.813810322332547" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.007050694143823311s"></animate>
<animate attributeName="r" values="12;0;0" keyTimes="0;0.7078891674964196;1" dur="1s" repeatCount="indefinite" begin="-0.007050694143823311s"></animate>
</circle><circle cx="54" cy="110.08044357995595" r="3" fill="#f8b26a">
<animate attributeName="cy" values="110.08044357995595;13.402947007936334" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.994432759852213s"></animate>
<animate attributeName="r" values="3;0;0" keyTimes="0;0.8430605754104277;1" dur="1s" repeatCount="indefinite" begin="-0.994432759852213s"></animate>
</circle><circle cx="49" cy="127.80477114160061" r="16" fill="#f8b26a">
<animate attributeName="cy" values="127.80477114160061;2.7658256519770603" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.07188593356616135s"></animate>
<animate attributeName="r" values="16;0;0" keyTimes="0;0.6049768163612267;1" dur="1s" repeatCount="indefinite" begin="-0.07188593356616135s"></animate>
</circle><circle cx="52" cy="112.09746694041411" r="10" fill="#f8b26a">
<animate attributeName="cy" values="112.09746694041411;-2.8104821907767574" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4132445270517203s"></animate>
<animate attributeName="r" values="10;0;0" keyTimes="0;0.7843188648425736;1" dur="1s" repeatCount="indefinite" begin="-0.4132445270517203s"></animate>
</circle><circle cx="68" cy="119.76797510227266" r="15" fill="#f8b26a">
<animate attributeName="cy" values="119.76797510227266;-2.3187957684067317" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6317748306797277s"></animate>
<animate attributeName="r" values="15;0;0" keyTimes="0;0.8464277838946668;1" dur="1s" repeatCount="indefinite" begin="-0.6317748306797277s"></animate>
</circle><circle cx="17" cy="121.7997527406382" r="5" fill="#f8b26a">
<animate attributeName="cy" values="121.7997527406382;13.556957891026624" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9136732084136533s"></animate>
<animate attributeName="r" values="5;0;0" keyTimes="0;0.5349721785314134;1" dur="1s" repeatCount="indefinite" begin="-0.9136732084136533s"></animate>
</circle><circle cx="59" cy="116.30296558149124" r="4" fill="#f8b26a">
<animate attributeName="cy" values="116.30296558149124;-1.0433564145924477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.08891813207741484s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.6574981312374213;1" dur="1s" repeatCount="indefinite" begin="-0.08891813207741484s"></animate>
</circle><circle cx="88" cy="113.1583378513422" r="12" fill="#f8b26a">
<animate attributeName="cy" values="113.1583378513422;1.456869512308952" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.14992898603700067s"></animate>
<animate attributeName="r" values="12;0;0" keyTimes="0;0.9565108058771807;1" dur="1s" repeatCount="indefinite" begin="-0.14992898603700067s"></animate>
</circle><circle cx="84" cy="112.41279273844411" r="10" fill="#f8b26a">
<animate attributeName="cy" values="112.41279273844411;1.6491176590177243" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5833010262862421s"></animate>
<animate attributeName="r" values="10;0;0" keyTimes="0;0.5438806242531744;1" dur="1s" repeatCount="indefinite" begin="-0.5833010262862421s"></animate>
</circle><circle cx="87" cy="120.26530337145327" r="5" fill="#f8b26a">
<animate attributeName="cy" values="120.26530337145327;9.388664939149207" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.05018189342538548s"></animate>
<animate attributeName="r" values="5;0;0" keyTimes="0;0.637897648645736;1" dur="1s" repeatCount="indefinite" begin="-0.05018189342538548s"></animate>
</circle><circle cx="24" cy="123.99448894779877" r="9" fill="#f8b26a">
<animate attributeName="cy" values="123.99448894779877;2.3750067806866078" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8890495329191316s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.663064102718458;1" dur="1s" repeatCount="indefinite" begin="-0.8890495329191316s"></animate>
</circle><circle cx="73" cy="120.00019528994846" r="12" fill="#f8b26a">
<animate attributeName="cy" values="120.00019528994846;-9.503507375076166" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6351313241419324s"></animate>
<animate attributeName="r" values="12;0;0" keyTimes="0;0.9354194941922095;1" dur="1s" repeatCount="indefinite" begin="-0.6351313241419324s"></animate>
</circle><circle cx="74" cy="113.88820186698781" r="4" fill="#f8b26a">
<animate attributeName="cy" values="113.88820186698781;10.570535200732685" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7132998998028989s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.91895021859856;1" dur="1s" repeatCount="indefinite" begin="-0.7132998998028989s"></animate>
</circle><circle cx="68" cy="129.5841522641359" r="12" fill="#f8b26a">
<animate attributeName="cy" values="129.5841522641359;3.894919008898638" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.29330391921510546s"></animate>
<animate attributeName="r" values="12;0;0" keyTimes="0;0.9096568793749455;1" dur="1s" repeatCount="indefinite" begin="-0.29330391921510546s"></animate>
</circle><circle cx="53" cy="119.31720358172306" r="9" fill="#f8b26a">
<animate attributeName="cy" values="119.31720358172306;9.73624644875764" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9958245939061628s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.8571965277158554;1" dur="1s" repeatCount="indefinite" begin="-0.9958245939061628s"></animate>
</circle><circle cx="76" cy="134.80739606982607" r="17" fill="#f8b26a">
<animate attributeName="cy" values="134.80739606982607;0.3932385595869441" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8607153243461125s"></animate>
<animate attributeName="r" values="17;0;0" keyTimes="0;0.8654455107706405;1" dur="1s" repeatCount="indefinite" begin="-0.8607153243461125s"></animate>
</circle><circle cx="75" cy="122.61568996754474" r="7" fill="#f8b26a">
<animate attributeName="cy" values="122.61568996754474;10.652526875734779" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.959721298983397s"></animate>
<animate attributeName="r" values="7;0;0" keyTimes="0;0.6271803990132601;1" dur="1s" repeatCount="indefinite" begin="-0.959721298983397s"></animate>
</circle><circle cx="87" cy="115.0788054109218" r="12" fill="#f8b26a">
<animate attributeName="cy" values="115.0788054109218;-8.15567938666852" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.0690058777440068s"></animate>
<animate attributeName="r" values="12;0;0" keyTimes="0;0.6627211388649489;1" dur="1s" repeatCount="indefinite" begin="-0.0690058777440068s"></animate>
</circle><circle cx="21" cy="118.08738171978098" r="9" fill="#f8b26a">
<animate attributeName="cy" values="118.08738171978098;-4.9475469075625504" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7078831683260647s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.9501044367725069;1" dur="1s" repeatCount="indefinite" begin="-0.7078831683260647s"></animate>
</circle><circle cx="24" cy="128.09150085659442" r="9" fill="#f8b26a">
<animate attributeName="cy" values="128.09150085659442;2.7320353690265122" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.521121701341132s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.7357531229285373;1" dur="1s" repeatCount="indefinite" begin="-0.521121701341132s"></animate>
</circle><circle cx="26" cy="127.49368345428452" r="15" fill="#f8b26a">
<animate attributeName="cy" values="127.49368345428452;-10.361246269666196" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9420307783603239s"></animate>
<animate attributeName="r" values="15;0;0" keyTimes="0;0.7467409545014994;1" dur="1s" repeatCount="indefinite" begin="-0.9420307783603239s"></animate>
</circle><circle cx="39" cy="114.20744515306558" r="6" fill="#f8b26a">
<animate attributeName="cy" values="114.20744515306558;5.606516894440285" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.49268347147689695s"></animate>
<animate attributeName="r" values="6;0;0" keyTimes="0;0.5874854761603912;1" dur="1s" repeatCount="indefinite" begin="-0.49268347147689695s"></animate>
</circle><circle cx="61" cy="123.10463246179438" r="11" fill="#f8b26a">
<animate attributeName="cy" values="123.10463246179438;-5.189366828773049" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.21359109324800063s"></animate>
<animate attributeName="r" values="11;0;0" keyTimes="0;0.6970744691674484;1" dur="1s" repeatCount="indefinite" begin="-0.21359109324800063s"></animate>
</circle><circle cx="37" cy="115.40335155247101" r="10" fill="#f8b26a">
<animate attributeName="cy" values="115.40335155247101;3.4285850566842946" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5344545499798534s"></animate>
<animate attributeName="r" values="10;0;0" keyTimes="0;0.9983685792824288;1" dur="1s" repeatCount="indefinite" begin="-0.5344545499798534s"></animate>
</circle><circle cx="22" cy="124.59228223795324" r="7" fill="#f8b26a">
<animate attributeName="cy" values="124.59228223795324;-3.5076355130396912" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8102510016775601s"></animate>
<animate attributeName="r" values="7;0;0" keyTimes="0;0.6369981578428732;1" dur="1s" repeatCount="indefinite" begin="-0.8102510016775601s"></animate>
</circle><circle cx="34" cy="111.69621652751701" r="5" fill="#f8b26a">
<animate attributeName="cy" values="111.69621652751701;13.965538669421832" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3819120829819431s"></animate>
<animate attributeName="r" values="5;0;0" keyTimes="0;0.9240036927970401;1" dur="1s" repeatCount="indefinite" begin="-0.3819120829819431s"></animate>
</circle><circle cx="61" cy="121.99207528226256" r="6" fill="#f8b26a">
<animate attributeName="cy" values="121.99207528226256;-1.1884130816048284" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.351012424136126s"></animate>
<animate attributeName="r" values="6;0;0" keyTimes="0;0.9527855705617168;1" dur="1s" repeatCount="indefinite" begin="-0.351012424136126s"></animate>
</circle><circle cx="32" cy="115.36386365084275" r="13" fill="#f8b26a">
<animate attributeName="cy" values="115.36386365084275;-7.635796261623495" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.22026693987990997s"></animate>
<animate attributeName="r" values="13;0;0" keyTimes="0;0.6822821982216503;1" dur="1s" repeatCount="indefinite" begin="-0.22026693987990997s"></animate>
</circle><circle cx="38" cy="123.93260454500944" r="10" fill="#f8b26a">
<animate attributeName="cy" values="123.93260454500944;-9.019646946232784" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5897767052001425s"></animate>
<animate attributeName="r" values="10;0;0" keyTimes="0;0.747643174639248;1" dur="1s" repeatCount="indefinite" begin="-0.5897767052001425s"></animate>
</circle><circle cx="91" cy="111.20360670124936" r="4" fill="#f8b26a">
<animate attributeName="cy" values="111.20360670124936;-2.7511383786778185" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5936715943771124s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.5292863982274825;1" dur="1s" repeatCount="indefinite" begin="-0.5936715943771124s"></animate>
</circle><circle cx="93" cy="109.08688866758263" r="6" fill="#f8b26a">
<animate attributeName="cy" values="109.08688866758263;13.986514639855155" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.20182465253134418s"></animate>
<animate attributeName="r" values="6;0;0" keyTimes="0;0.9578727930035874;1" dur="1s" repeatCount="indefinite" begin="-0.20182465253134418s"></animate>
</circle><circle cx="90" cy="115.44258946143852" r="3" fill="#f8b26a">
<animate attributeName="cy" values="115.44258946143852;7.971557449807172" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8138344996352406s"></animate>
<animate attributeName="r" values="3;0;0" keyTimes="0;0.822677504532275;1" dur="1s" repeatCount="indefinite" begin="-0.8138344996352406s"></animate>
</circle><circle cx="24" cy="130.98782632438636" r="15" fill="#f8b26a">
<animate attributeName="cy" values="130.98782632438636;-11.868426017755008" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8574009914089539s"></animate>
<animate attributeName="r" values="15;0;0" keyTimes="0;0.8610318085552064;1" dur="1s" repeatCount="indefinite" begin="-0.8574009914089539s"></animate>
</circle><circle cx="49" cy="122.24309971563434" r="14" fill="#f8b26a">
<animate attributeName="cy" values="122.24309971563434;3.5685994935617273" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4267384904796552s"></animate>
<animate attributeName="r" values="14;0;0" keyTimes="0;0.5503829186981541;1" dur="1s" repeatCount="indefinite" begin="-0.4267384904796552s"></animate>
</circle><circle cx="18" cy="117.38217971971676" r="9" fill="#f8b26a">
<animate attributeName="cy" values="117.38217971971676;6.631006164776416" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6828218424869835s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.6808177575913787;1" dur="1s" repeatCount="indefinite" begin="-0.6828218424869835s"></animate>
</circle><circle cx="78" cy="124.28678852303256" r="15" fill="#f8b26a">
<animate attributeName="cy" values="124.28678852303256;1.3740946843405304" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4161035078940827s"></animate>
<animate attributeName="r" values="15;0;0" keyTimes="0;0.6388001474427218;1" dur="1s" repeatCount="indefinite" begin="-0.4161035078940827s"></animate>
</circle><circle cx="44" cy="106.6189204965897" r="3" fill="#f8b26a">
<animate attributeName="cy" values="106.6189204965897;16.750815514807034" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.0510803765953457s"></animate>
<animate attributeName="r" values="3;0;0" keyTimes="0;0.7907276882734477;1" dur="1s" repeatCount="indefinite" begin="-0.0510803765953457s"></animate>
</circle><circle cx="41" cy="119.64799537397232" r="5" fill="#f8b26a">
<animate attributeName="cy" values="119.64799537397232;6.398667601394809" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4280945050279754s"></animate>
<animate attributeName="r" values="5;0;0" keyTimes="0;0.5751942250658201;1" dur="1s" repeatCount="indefinite" begin="-0.4280945050279754s"></animate>
</circle><circle cx="19" cy="120.0916729802829" r="10" fill="#f8b26a">
<animate attributeName="cy" values="120.0916729802829;-9.513704965243033" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.043405970368113445s"></animate>
<animate attributeName="r" values="10;0;0" keyTimes="0;0.5435267537060107;1" dur="1s" repeatCount="indefinite" begin="-0.043405970368113445s"></animate>
</circle><circle cx="61" cy="123.62714133794762" r="5" fill="#f8b26a">
<animate attributeName="cy" values="123.62714133794762;2.362315551662477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5256540407430482s"></animate>
<animate attributeName="r" values="5;0;0" keyTimes="0;0.9222037100732456;1" dur="1s" repeatCount="indefinite" begin="-0.5256540407430482s"></animate>
</circle><circle cx="64" cy="115.25525614926073" r="13" fill="#f8b26a">
<animate attributeName="cy" values="115.25525614926073;-10.304511881341815" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6633519944592159s"></animate>
<animate attributeName="r" values="13;0;0" keyTimes="0;0.5401283508859178;1" dur="1s" repeatCount="indefinite" begin="-0.6633519944592159s"></animate>
</circle><circle cx="12" cy="129.13660549492693" r="11" fill="#f8b26a">
<animate attributeName="cy" values="129.13660549492693;-7.965594883525825" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9929282227674491s"></animate>
<animate attributeName="r" values="11;0;0" keyTimes="0;0.9536114994321867;1" dur="1s" repeatCount="indefinite" begin="-0.9929282227674491s"></animate>
</circle><circle cx="39" cy="106.95504126040025" r="2" fill="#f8b26a">
<animate attributeName="cy" values="106.95504126040025;5.834416891524681" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.22005892301327157s"></animate>
<animate attributeName="r" values="2;0;0" keyTimes="0;0.6089960643653531;1" dur="1s" repeatCount="indefinite" begin="-0.22005892301327157s"></animate>
</circle><circle cx="30" cy="112.12744151244388" r="8" fill="#f8b26a">
<animate attributeName="cy" values="112.12744151244388;-4.465606537168944" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.24710322548242414s"></animate>
<animate attributeName="r" values="8;0;0" keyTimes="0;0.7479705418636007;1" dur="1s" repeatCount="indefinite" begin="-0.24710322548242414s"></animate>
</circle><circle cx="67" cy="124.83294711941956" r="16" fill="#f8b26a">
<animate attributeName="cy" values="124.83294711941956;-7.6291463245052284" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.614066023590482s"></animate>
<animate attributeName="r" values="16;0;0" keyTimes="0;0.7584434636145084;1" dur="1s" repeatCount="indefinite" begin="-0.614066023590482s"></animate>
</circle><circle cx="22" cy="119.36463088979876" r="4" fill="#f8b26a">
<animate attributeName="cy" values="119.36463088979876;12.12664234343379" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.527385385953813s"></animate>
<animate attributeName="r" values="4;0;0" keyTimes="0;0.5661680148267347;1" dur="1s" repeatCount="indefinite" begin="-0.527385385953813s"></animate>
</circle><circle cx="12" cy="122.52124979151506" r="7" fill="#f8b26a">
<animate attributeName="cy" values="122.52124979151506;3.7506712743784085" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.37225883133903837s"></animate>
<animate attributeName="r" values="7;0;0" keyTimes="0;0.9003327357718601;1" dur="1s" repeatCount="indefinite" begin="-0.37225883133903837s"></animate>
</circle><circle cx="69" cy="130.5210986475815" r="14" fill="#f8b26a">
<animate attributeName="cy" values="130.5210986475815;-0.30973651460238827" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6062299863585278s"></animate>
<animate attributeName="r" values="14;0;0" keyTimes="0;0.9220180768904789;1" dur="1s" repeatCount="indefinite" begin="-0.6062299863585278s"></animate>
</circle><circle cx="20" cy="114.80243604193255" r="9" fill="#f8b26a">
<animate attributeName="cy" values="114.80243604193255;7.19374553530416" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6866227460985781s"></animate>
<animate attributeName="r" values="9;0;0" keyTimes="0;0.6690048284116141;1" dur="1s" repeatCount="indefinite" begin="-0.6866227460985781s"></animate>
</circle></g>
</svg>

After

Width:  |  Height:  |  Size: 58 KiB

14
assets/logo.svg Normal file
View File

@@ -0,0 +1,14 @@
<svg xmlns="http://www.w3.org/2000/svg" width="430" height="80" viewBox="0 0 430 80">
<text
x="50%"
y="50%"
font-family="monaco"
font-size="55"
text-anchor="middle"
dominant-baseline="middle">
<tspan fill="#F37726">{</tspan><tspan fill="#616161">Tex</tspan><tspan fill="#F37726">}</tspan><tspan fill="#616161">Teller</tspan>
</text>
</svg>

After

Width:  |  Height:  |  Size: 377 B

BIN
assets/scss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

BIN
assets/test.pdf Normal file

Binary file not shown.

BIN
assets/web_demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 MiB

259
deploy.sh Executable file
View File

@@ -0,0 +1,259 @@
#!/bin/bash
# TexTeller Docker Deployment Script
set -e # Exit on error
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Configuration
MODEL_PATH="$HOME/.cache/huggingface/hub/models--OleehyO--TexTeller"
CONTAINER_NAME="texteller-server"
IMAGE_NAME="texteller:latest"
PORT=8001
# Function to print colored messages
print_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
print_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
print_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# Check if NVIDIA GPU is available
check_nvidia() {
print_info "Checking NVIDIA GPU availability..."
if ! command -v nvidia-smi &> /dev/null; then
print_error "nvidia-smi not found. Please install NVIDIA drivers."
exit 1
fi
nvidia-smi > /dev/null 2>&1
if [ $? -eq 0 ]; then
print_info "NVIDIA GPU detected:"
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
else
print_error "NVIDIA GPU not detected or drivers not working."
exit 1
fi
}
# Check if Docker is installed
check_docker() {
print_info "Checking Docker installation..."
if ! command -v docker &> /dev/null; then
print_error "Docker not found. Please install Docker."
exit 1
fi
print_info "Docker version: $(docker --version)"
}
# Check if NVIDIA Container Toolkit is installed
check_nvidia_docker() {
print_info "Checking NVIDIA Container Toolkit..."
if ! docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi &> /dev/null; then
print_error "NVIDIA Container Toolkit not working properly."
print_info "Please install it with:"
echo " sudo apt-get install -y nvidia-container-toolkit"
echo " sudo systemctl restart docker"
exit 1
fi
print_info "NVIDIA Container Toolkit is working."
}
# Check if model exists
check_model() {
print_info "Checking model availability..."
if [ ! -d "$MODEL_PATH" ]; then
print_error "Model not found at: $MODEL_PATH"
print_info "Please download the model first using:"
echo " python -c 'from texteller import load_model; load_model()'"
exit 1
fi
print_info "Model found at: $MODEL_PATH"
}
# Build Docker image
build_image() {
print_info "Building Docker image..."
docker build -t $IMAGE_NAME .
if [ $? -eq 0 ]; then
print_info "Docker image built successfully: $IMAGE_NAME"
else
print_error "Failed to build Docker image."
exit 1
fi
}
# Stop and remove existing container
stop_container() {
if [ "$(docker ps -q -f name=$CONTAINER_NAME)" ]; then
print_info "Stopping existing container..."
docker stop $CONTAINER_NAME
fi
if [ "$(docker ps -aq -f name=$CONTAINER_NAME)" ]; then
print_info "Removing existing container..."
docker rm $CONTAINER_NAME
fi
}
# Start container
start_container() {
print_info "Starting TexTeller server container..."
docker run -d \
--name $CONTAINER_NAME \
--gpus '"device=0"' \
-p $PORT:8001 \
--shm-size=2g \
-v "$HOME/.cache/huggingface:/root/.cache/huggingface:ro" \
-e CUDA_VISIBLE_DEVICES=0 \
-e HF_HOME=/root/.cache/huggingface \
-e HF_HUB_OFFLINE=1 \
-e TRANSFORMERS_OFFLINE=1 \
-e RAY_NUM_REPLICAS=1 \
-e RAY_NCPU_PER_REPLICA=4 \
-e RAY_NGPU_PER_REPLICA=1 \
--restart unless-stopped \
$IMAGE_NAME
if [ $? -eq 0 ]; then
print_info "Container started successfully!"
print_info "Server will be available at: http://localhost:$PORT/predict"
else
print_error "Failed to start container."
exit 1
fi
}
# Wait for server to be ready
wait_for_server() {
print_info "Waiting for server to be ready..."
max_attempts=60
attempt=0
while [ $attempt -lt $max_attempts ]; do
if curl -s http://localhost:$PORT/ > /dev/null 2>&1; then
print_info "Server is ready!"
return 0
fi
attempt=$((attempt + 1))
echo -n "."
sleep 1
done
echo ""
print_warn "Server might still be initializing. Check logs with: docker logs -f $CONTAINER_NAME"
}
# Show logs
show_logs() {
print_info "Showing container logs (Ctrl+C to exit)..."
docker logs -f $CONTAINER_NAME
}
# Main deployment workflow
case "${1:-deploy}" in
check)
check_nvidia
check_docker
check_nvidia_docker
check_model
print_info "All checks passed!"
;;
build)
check_docker
build_image
;;
deploy)
check_nvidia
check_docker
check_nvidia_docker
check_model
build_image
stop_container
start_container
wait_for_server
print_info ""
print_info "======================================"
print_info "TexTeller server deployed successfully!"
print_info "======================================"
print_info "API endpoint: http://localhost:$PORT/predict"
print_info ""
print_info "Test the server with:"
print_info " python examples/test_server.py path/to/image.png"
print_info ""
print_info "View logs with:"
print_info " docker logs -f $CONTAINER_NAME"
print_info ""
print_info "Stop the server with:"
print_info " docker stop $CONTAINER_NAME"
;;
start)
if [ "$(docker ps -aq -f name=$CONTAINER_NAME)" ]; then
docker start $CONTAINER_NAME
print_info "Container started."
else
print_error "Container does not exist. Run './deploy.sh deploy' first."
exit 1
fi
;;
stop)
stop_container
print_info "Container stopped."
;;
restart)
docker restart $CONTAINER_NAME
print_info "Container restarted."
;;
logs)
show_logs
;;
status)
if [ "$(docker ps -q -f name=$CONTAINER_NAME)" ]; then
print_info "Container is running."
docker stats --no-stream $CONTAINER_NAME
else
print_warn "Container is not running."
fi
;;
clean)
stop_container
print_info "Removing Docker image..."
docker rmi $IMAGE_NAME 2>/dev/null || true
print_info "Cleanup complete."
;;
*)
echo "Usage: $0 {check|build|deploy|start|stop|restart|logs|status|clean}"
echo ""
echo "Commands:"
echo " check - Check system requirements"
echo " build - Build Docker image only"
echo " deploy - Full deployment (build + start)"
echo " start - Start existing container"
echo " stop - Stop container"
echo " restart - Restart container"
echo " logs - Show container logs"
echo " status - Show container status"
echo " clean - Remove container and image"
exit 1
;;
esac

38
docker-compose.yml Normal file
View File

@@ -0,0 +1,38 @@
version: '3.8'
services:
texteller:
build:
context: .
dockerfile: Dockerfile
container_name: texteller-server
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- CUDA_VISIBLE_DEVICES=0
# Ray Serve configuration
- RAY_NUM_REPLICAS=1
- RAY_NCPU_PER_REPLICA=4
- RAY_NGPU_PER_REPLICA=1
ports:
- "8001:8001"
volumes:
# Mount the model cache directory to avoid downloading models
- ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0'] # Use first GPU (RTX 5080)
capabilities: [gpu]
restart: unless-stopped
command: ["texteller", "launch", "server", "-p", "8001"]
healthcheck:
test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8001/', timeout=5)"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s

20
docs/Makefile Normal file
View File

@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

35
docs/make.bat Normal file
View File

@@ -0,0 +1,35 @@
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
if "%1" == "" goto help
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd

0
docs/requirements.txt Normal file
View File

39
docs/source/api.rst Normal file
View File

@@ -0,0 +1,39 @@
API Reference
=============
This section provides detailed API documentation for the TexTeller package. TexTeller is a tool for detecting and recognizing LaTeX formulas in images and converting mixed text and formula images to markdown.
.. contents:: Table of Contents
:local:
:depth: 2
Image to LaTeX Conversion
-------------------------
.. autofunction:: texteller.api.img2latex
Paragraph to Markdown Conversion
------------------------------
.. autofunction:: texteller.api.paragraph2md
LaTeX Detection
---------------
.. autofunction:: texteller.api.detection.latex_detect
Model Loading
-------------
.. autofunction:: texteller.api.load_model
.. autofunction:: texteller.api.load_tokenizer
.. autofunction:: texteller.api.load_latexdet_model
.. autofunction:: texteller.api.load_textdet_model
.. autofunction:: texteller.api.load_textrec_model
KaTeX Conversion
----------------
.. autofunction:: texteller.api.to_katex

75
docs/source/conf.py Normal file
View File

@@ -0,0 +1,75 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute.
import os
import sys
sys.path.insert(0, os.path.abspath("../.."))
# -- Project information -----------------------------------------------------
project = "TexTeller"
copyright = "2025, TexTeller Team"
author = "TexTeller Team"
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = [
"myst_parser",
"sphinx.ext.duration",
"sphinx.ext.intersphinx",
"sphinx.ext.autosectionlabel",
"sphinx.ext.autodoc",
"sphinx.ext.viewcode",
"sphinx.ext.napoleon",
"sphinx.ext.autosummary",
"sphinx_copybutton",
# 'sphinx.ext.linkcode',
# 'sphinxarg.ext',
"sphinx_design",
"nbsphinx",
]
templates_path = ["_templates"]
exclude_patterns = []
# Autodoc settings
autodoc_member_order = "bysource"
add_module_names = False
autoclass_content = "both"
autodoc_default_options = {
"members": True,
"member-order": "bysource",
"undoc-members": True,
"show-inheritance": True,
"imported-members": True,
}
# Intersphinx settings
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"numpy": ("https://numpy.org/doc/stable", None),
"torch": ("https://pytorch.org/docs/stable", None),
"transformers": ("https://huggingface.co/docs/transformers/main/en", None),
}
html_theme = "sphinx_book_theme"
html_theme_options = {
"repository_url": "https://github.com/OleehyO/TexTeller",
"use_repository_button": True,
"use_issues_button": True,
"use_edit_page_button": True,
"use_download_button": True,
}
html_logo = "../../assets/logo.svg"

77
docs/source/index.rst Normal file
View File

@@ -0,0 +1,77 @@
.. TexTeller documentation master file, created by
sphinx-quickstart on Sun Apr 20 13:05:53 2025.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
TexTeller Documentation
===========================================
Features
--------
- **Image to LaTeX Conversion**: Convert images containing LaTeX formulas to LaTeX code
- **LaTeX Detection**: Detect and locate LaTeX formulas in mixed text/formula images
- **Paragraph to Markdown**: Convert mixed text and formula images to Markdown format
Installation
-----------
You can install TexTeller using pip:
.. code-block:: bash
pip install uv
uv pip install texteller
Quick Start
----------
Converting an image to LaTeX:
.. code-block:: python
from texteller import load_model, load_tokenizer, img2latex
# Load models
model = load_model(use_onnx=False)
tokenizer = load_tokenizer()
# Convert image to LaTeX
latex = img2latex(model, tokenizer, ["path/to/image.png"])[0]
Processing a mixed text/formula image:
.. code-block:: python
from texteller import (
load_model, load_tokenizer, load_latexdet_model,
load_textdet_model, load_textrec_model, paragraph2md
)
# Load all required models
latex_model = load_model()
tokenizer = load_tokenizer()
latex_detector = load_latexdet_model()
text_detector = load_textdet_model()
text_recognizer = load_textrec_model()
# Convert to markdown
markdown = paragraph2md(
"path/to/mixed_image.png",
latex_detector,
text_detector,
text_recognizer,
latex_model,
tokenizer
)
API Documentation
----------------
For detailed API documentation, please see :doc:`./api`.
.. toctree::
:maxdepth: 2
:hidden:
api

10
examples/client_demo.py Normal file
View File

@@ -0,0 +1,10 @@
import requests
server_url = "http://127.0.0.1:8000/predict"
img_path = "/path/to/your/image"
with open(img_path, "rb") as img:
files = {"img": img}
response = requests.post(server_url, files=files)
print(response.text)

77
examples/test_server.py Normal file
View File

@@ -0,0 +1,77 @@
#!/usr/bin/env python3
"""
Example client script to test the TexTeller server API.
"""
import requests
import base64
import sys
from pathlib import Path
def test_base64_request(image_path: str, server_url: str = "http://localhost:8001/predict"):
"""Test the server with a base64-encoded image."""
# Read and encode the image
with open(image_path, "rb") as f:
image_data = f.read()
image_base64 = base64.b64encode(image_data).decode()
# Send request
response = requests.post(server_url, json={"image_base64": image_base64}, headers={"Content-Type": "application/json"})
# Print result
if response.status_code == 200:
result = response.json()
print(f"✓ Success!")
print(f"Result: {result.get('result', 'N/A')}")
return result
else:
print(f"✗ Error: {response.status_code}")
print(f"Response: {response.text}")
return None
def test_url_request(image_url: str, server_url: str = "http://localhost:8001/predict"):
"""Test the server with an image URL."""
# Send request
response = requests.post(server_url, json={"image_url": image_url}, headers={"Content-Type": "application/json"})
# Print result
if response.status_code == 200:
result = response.json()
print(f"✓ Success!")
print(f"Result: {result.get('result', 'N/A')}")
return result
else:
print(f"✗ Error: {response.status_code}")
print(f"Response: {response.text}")
return None
if __name__ == "__main__":
print("=" * 50)
print("TexTeller Server API Test")
print("=" * 50)
# Test with local image if provided
if len(sys.argv) > 1:
image_path = sys.argv[1]
if Path(image_path).exists():
print(f"\nTest 1: Base64 request with local image")
print(f"Image: {image_path}")
test_base64_request(image_path)
else:
print(f"Error: Image file not found: {image_path}")
# Test with URL if provided
if len(sys.argv) > 2:
image_url = sys.argv[2]
print(f"\nTest 2: URL request")
print(f"URL: {image_url}")
test_url_request(image_url)
if len(sys.argv) == 1:
print("\nUsage:")
print(f" python {sys.argv[0]} <image_path> [image_url]")
print("\nExamples:")
print(f" python {sys.argv[0]} equation.png")
print(f" python {sys.argv[0]} equation.png https://example.com/formula.png")

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.2 KiB

View File

@@ -0,0 +1,35 @@
{"file_name": "0.png", "latex_formula": "\\[\\mathbb{C}^{4}\\stackrel{{\\pi_{1}}}{{\\longleftarrow}}\\mathcal{ F}\\stackrel{{\\pi_{2}}}{{\\rightarrow}}\\mathcal{PT},\\]"}
{"file_name": "1.png", "latex_formula": "\\[W^{*}_{Z}(x_{1},x_{2})=W_{f\\lrcorner Z}(y_{1},y_{2})=\\mathcal{P}\\exp\\left( \\int_{\\gamma}A_{\\mu}dx^{\\mu}\\right).\\]"}
{"file_name": "2.png", "latex_formula": "\\[G=W^{*}_{Z}(q,p)=\\tilde{H}H^{-1}\\]"}
{"file_name": "3.png", "latex_formula": "\\[H=W^{*}_{Z}(p,x),\\ \\ \\tilde{H}=W^{*}_{Z}(q,x).\\]"}
{"file_name": "4.png", "latex_formula": "\\[v\\cdot f^{*}A|_{x}=(f\\lrcorner Z)_{*}v\\cdot A|_{f\\lrcorner Z(x)},\\quad x\\in Z, \\ v\\in T_{x}Z.\\]"}
{"file_name": "5.png", "latex_formula": "\\[(f\\lrcorner Z)_{*}v\\cdot A|_{f\\lrcorner Z(x)}=v^{\\alpha\\dot{\\alpha}}\\Big{(} \\frac{\\partial y^{\\beta\\dot{\\beta}}}{\\partial x^{\\alpha\\dot{\\alpha}}}A_{\\beta \\dot{\\beta}}\\Big{)}\\Big{|}_{f\\lrcorner Z(x)},\\ x\\in Z,\\ v\\in T_{x}Z,\\]"}
{"file_name": "6.png", "latex_formula": "\\[\\{T_{i},T_{j}\\}=\\{\\tilde{T}^{i},\\tilde{T}^{j}\\}=0,\\ \\ \\{T_{i},\\tilde{T}^{j}\\}=2i \\delta^{j}_{i}D,\\]"}
{"file_name": "7.png", "latex_formula": "\\[(\\partial_{s},q_{i},\\tilde{q}^{k})\\rightarrow(D,M^{j}_{i}T_{j},\\tilde{M}^{k}_ {l}\\tilde{T}^{l}),\\]"}
{"file_name": "8.png", "latex_formula": "\\[M^{i}_{j}\\tilde{M}^{j}_{k}=\\delta^{i}_{k}.\\]"}
{"file_name": "9.png", "latex_formula": "\\[Q_{i\\alpha}=q_{i\\alpha}+\\omega_{i\\alpha},\\ \\tilde{Q}^{i}_{\\dot{\\alpha}}=q^{i}_{ \\dot{\\alpha}}+\\tilde{\\omega}^{i}_{\\dot{\\alpha}},\\ D_{\\alpha\\dot{\\alpha}}= \\partial_{\\alpha\\dot{\\alpha}}+A_{\\alpha\\dot{\\alpha}}.\\]"}
{"file_name": "10.png", "latex_formula": "\\[\\hat{f}(g,\\theta^{i\\alpha},\\tilde{\\theta}^{\\dot{\\alpha}}_{j})=(f(g),[V^{-1}]^ {\\alpha}_{\\beta}\\theta^{i\\beta},[\\tilde{V}^{-1}]^{\\dot{\\alpha}}_{\\dot{\\beta}} \\tilde{\\theta}^{\\dot{\\beta}}_{j}),\\ g\\in{\\cal G},\\]"}
{"file_name": "11.png", "latex_formula": "\\[v^{\\beta\\dot{\\beta}}V^{\\alpha}_{\\beta}\\tilde{V}^{\\dot{\\alpha}}_{\\dot{\\beta}} =((f\\lrcorner L_{0})_{*}v)^{\\alpha\\dot{\\alpha}},\\]"}
{"file_name": "12.png", "latex_formula": "\\[\\omega_{i\\alpha}=\\tilde{\\theta}^{\\dot{\\alpha}}_{i}h_{\\alpha\\dot{\\alpha}}(x^{ \\beta\\dot{\\beta}},\\tau^{\\beta\\dot{\\beta}}),\\ \\ \\tilde{\\omega}^{i}_{\\alpha}=\\theta^{i\\alpha}\\tilde{h}_{\\alpha\\dot{\\alpha}}(x^{ \\beta\\dot{\\beta}},\\tau^{\\beta\\dot{\\beta}}),\\]"}
{"file_name": "13.png", "latex_formula": "\\[\\begin{split}&\\lambda^{\\alpha}\\hat{f}^{*}\\omega_{i\\alpha}(z)= \\tilde{\\theta}^{\\dot{\\beta}}_{i}\\lambda^{\\alpha}\\left(V^{\\beta}_{\\alpha}h_{ \\beta\\dot{\\beta}}(x^{\\prime},\\tau^{\\prime})\\right),\\\\ &\\tilde{\\lambda}^{\\dot{\\alpha}}\\hat{f}^{*}\\tilde{\\omega}^{i}_{ \\dot{\\alpha}}(z)=\\theta^{i\\beta}\\tilde{\\lambda}^{\\dot{\\alpha}}\\left(\\tilde{V}^ {\\dot{\\beta}}_{\\dot{\\alpha}}\\tilde{h}_{\\beta\\dot{\\beta}}(x^{\\prime},\\tau^{ \\prime})\\right),\\end{split}\\]"}
{"file_name": "14.png", "latex_formula": "\\[A_{\\alpha\\dot{\\alpha}}=A_{\\alpha\\dot{\\alpha}}(x^{\\beta\\dot{\\beta}},\\tau^{ \\beta\\dot{\\beta}})\\]"}
{"file_name": "15.png", "latex_formula": "\\[D=\\lambda^{\\alpha}\\tilde{\\lambda}^{\\dot{\\alpha}}D_{\\alpha\\dot{\\alpha}}\\]"}
{"file_name": "16.png", "latex_formula": "\\[D=\\lambda^{\\alpha}\\tilde{\\lambda}^{\\dot{\\alpha}}\\partial_{\\alpha\\dot{\\alpha}}\\]"}
{"file_name": "17.png", "latex_formula": "\\[[v_{1}\\cdot D^{*},v_{2}\\cdot D^{*}]=0\\]"}
{"file_name": "18.png", "latex_formula": "\\[\\Phi_{A}=(\\omega_{i\\alpha},\\tilde{\\omega}^{i}_{\\dot{\\alpha}},A_{\\alpha\\dot{ \\alpha}})\\]"}
{"file_name": "19.png", "latex_formula": "\\[\\hat{f}:{\\cal F}^{6|4N}\\rightarrow{\\cal F}^{6|4N}\\]"}
{"file_name": "20.png", "latex_formula": "\\[\\sigma=(s,\\xi^{i},\\tilde{\\xi}_{j})\\in\\mathbb{C}^{1|2N}\\]"}
{"file_name": "21.png", "latex_formula": "\\[\\tau^{\\alpha\\dot{\\alpha}}(h_{\\alpha\\dot{\\alpha}}+\\tilde{h}_{\\alpha\\dot{\\alpha} })=0\\]"}
{"file_name": "22.png", "latex_formula": "\\[\\tau^{\\alpha\\dot{\\alpha}}\\rightarrow[V^{-1}]^{\\alpha}_{\\beta}[\\tilde{V}^{-1}]^{ \\dot{\\alpha}}_{\\dot{\\beta}}\\tau^{\\beta\\dot{\\beta}}\\]"}
{"file_name": "23.png", "latex_formula": "\\[\\tau^{\\beta\\dot{\\beta}}=\\sum_{i}\\theta^{i\\beta}\\tilde{\\theta}^{\\dot{\\beta}}_{i}\\]"}
{"file_name": "24.png", "latex_formula": "\\[\\theta^{i\\alpha}\\omega_{i\\alpha}+\\tilde{\\theta}^{i}_{\\dot{\\alpha}}\\tilde{ \\omega}^{\\dot{\\alpha}}_{i}=0\\]"}
{"file_name": "25.png", "latex_formula": "\\[\\tilde{T}^{i}=\\tilde{\\lambda}^{\\dot{\\alpha}}\\tilde{Q}^{i}_{\\dot{\\alpha}}\\]"}
{"file_name": "26.png", "latex_formula": "\\[\\tilde{T}^{i}=\\tilde{\\lambda}^{\\dot{\\alpha}}\\tilde{q}^{i}_{\\dot{\\alpha}}\\]"}
{"file_name": "27.png", "latex_formula": "\\[\\tilde{\\lambda}^{\\dot{\\alpha}}f^{*}A_{\\alpha\\dot{\\alpha}}=H^{-1}\\tilde{ \\lambda}^{\\dot{\\alpha}}\\partial_{\\alpha\\dot{\\alpha}}H\\]"}
{"file_name": "28.png", "latex_formula": "\\[\\tilde{q}^{i}=\\partial_{\\tilde{\\xi}_{i}}+i\\xi^{i}\\partial_{s}\\]"}
{"file_name": "29.png", "latex_formula": "\\[\\tilde{q}^{i}_{\\dot{\\alpha}}=\\frac{\\partial}{\\partial\\tilde{\\theta}^{\\dot{ \\alpha}}_{i}}+i\\theta^{i\\alpha}\\frac{\\partial}{\\partial x^{\\alpha\\dot{\\alpha}}}\\]"}
{"file_name": "30.png", "latex_formula": "\\[f\\lrcorner L(z)=\\pi_{1}\\circ f(z,\\lambda,\\tilde{\\lambda})\\ \\forall z\\in L\\]"}
{"file_name": "31.png", "latex_formula": "\\[q_{i\\alpha}=\\frac{\\partial}{\\partial\\theta^{i\\alpha}}+i\\tilde{\\theta}^{\\dot{ \\alpha}}_{i}\\frac{\\partial}{\\partial x^{\\alpha\\dot{\\alpha}}}\\]"}
{"file_name": "32.png", "latex_formula": "\\[q_{i}=\\partial_{\\xi^{i}}+i\\tilde{\\xi}_{i}\\partial_{s}\\]"}
{"file_name": "33.png", "latex_formula": "\\[v^{\\alpha\\dot{\\alpha}}=\\lambda^{\\alpha}\\tilde{\\lambda}^{\\dot{\\alpha}}\\]"}
{"file_name": "34.png", "latex_formula": "\\[z^{A}=(x^{\\alpha\\dot{\\alpha}},\\theta^{i\\alpha},\\tilde{\\theta}^{\\dot{\\alpha}}_{ j})\\]"}

View File

@@ -0,0 +1,71 @@
from functools import partial
import yaml
from datasets import load_dataset
from transformers import (
Trainer,
TrainingArguments,
)
from texteller import load_model, load_tokenizer
from texteller.constants import MIN_HEIGHT, MIN_WIDTH
from examples.train_texteller.utils import (
collate_fn,
filter_fn,
img_inf_transform,
img_train_transform,
tokenize_fn,
)
def train(model, tokenizer, train_dataset, eval_dataset, collate_fn_with_tokenizer):
training_args = TrainingArguments(**training_config)
trainer = Trainer(
model,
training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=collate_fn_with_tokenizer,
)
trainer.train(resume_from_checkpoint=None)
if __name__ == "__main__":
dataset = load_dataset("imagefolder", data_dir="dataset")["train"]
dataset = dataset.filter(
lambda x: x["image"].height > MIN_HEIGHT and x["image"].width > MIN_WIDTH
)
dataset = dataset.shuffle(seed=42)
dataset = dataset.flatten_indices()
tokenizer = load_tokenizer()
# If you want use your own tokenizer, please modify the path to your tokenizer
# tokenizer = load_tokenizer("/path/to/your/tokenizer")
filter_fn_with_tokenizer = partial(filter_fn, tokenizer=tokenizer)
dataset = dataset.filter(filter_fn_with_tokenizer, num_proc=8)
map_fn = partial(tokenize_fn, tokenizer=tokenizer)
tokenized_dataset = dataset.map(
map_fn, batched=True, remove_columns=dataset.column_names, num_proc=8
)
# Split dataset into train and eval, ratio 9:1
split_dataset = tokenized_dataset.train_test_split(test_size=0.1, seed=42)
train_dataset, eval_dataset = split_dataset["train"], split_dataset["test"]
train_dataset = train_dataset.with_transform(img_train_transform)
eval_dataset = eval_dataset.with_transform(img_inf_transform)
collate_fn_with_tokenizer = partial(collate_fn, tokenizer=tokenizer)
# Train from scratch
model = load_model()
# If you want to train from pre-trained model, please modify the path to your pre-trained checkpoint
# model = load_model("/path/to/your/model_checkpoint")
enable_train = True
training_config = yaml.safe_load(open("train_config.yaml"))
if enable_train:
train(model, tokenizer, train_dataset, eval_dataset, collate_fn_with_tokenizer)

View File

@@ -0,0 +1,32 @@
# For more information, please refer to the official documentation: https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments
seed: 42 # Random seed for reproducibility
use_cpu: false # Whether to use CPU (it's easier to debug with CPU when starting to test the code)
learning_rate: 5.0e-5 # Learning rate
num_train_epochs: 10 # Total number of training epochs
per_device_train_batch_size: 4 # Batch size per GPU for training
per_device_eval_batch_size: 8 # Batch size per GPU for evaluation
output_dir: "train_result" # Output directory
overwrite_output_dir: false # If the output directory exists, do not delete its content
report_to:
- tensorboard # Report logs to TensorBoard
save_strategy: "steps" # Strategy to save checkpoints
save_steps: 500 # Interval of steps to save checkpoints, can be int or a float (0~1), when float it represents the ratio of total training steps (e.g., can set to 1.0 / 2000)
save_total_limit: 5 # Maximum number of models to save. The oldest models will be deleted if this number is exceeded
logging_strategy: "steps" # Log every certain number of steps
logging_steps: 500 # Number of steps between each log
logging_nan_inf_filter: false # Record logs for loss=nan or inf
optim: "adamw_torch" # Optimizer
lr_scheduler_type: "cosine" # Learning rate scheduler
warmup_ratio: 0.1 # Ratio of warmup steps in total training steps (e.g., for 1000 steps, the first 100 steps gradually increase lr from 0 to the set lr)
max_grad_norm: 1.0 # For gradient clipping, ensure the norm of the gradients does not exceed 1.0 (default 1.0)
fp16: false # Whether to use 16-bit floating point for training (generally not recommended, as loss can easily explode)
bf16: false # Whether to use Brain Floating Point (bfloat16) for training (recommended if architecture supports it)
gradient_accumulation_steps: 1 # Gradient accumulation steps, consider this parameter to achieve large batch size effects when batch size cannot be large
jit_mode_eval: false # Whether to use PyTorch jit trace during eval (can speed up the model, but the model must be static, otherwise will throw errors)
torch_compile: false # Whether to use torch.compile to compile the model (for better training and inference performance)
dataloader_pin_memory: true # Can speed up data transfer between CPU and GPU
dataloader_num_workers: 1 # Default is not to use multiprocessing for data loading, usually set to 4*number of GPUs used
evaluation_strategy: "steps" # Evaluation strategy, can be "steps" or "epoch"
eval_steps: 500 # If evaluation_strategy="step"
remove_unused_columns: false # Don't change this unless you really know what you are doing.

View File

@@ -0,0 +1,17 @@
from .functional import (
collate_fn,
filter_fn,
tokenize_fn,
)
from .transforms import (
img_train_transform,
img_inf_transform,
)
__all__ = [
"collate_fn",
"filter_fn",
"tokenize_fn",
"img_train_transform",
"img_inf_transform",
]

View File

@@ -1,14 +1,40 @@
from augraphy import *
"""
Custom augraphy pipeline for training
This file implements a custom augraphy data augmentation pipeline. We found that using augraphy's
default pipeline can cause significant degradation to formula images, potentially losing semantic
information. Therefore, we carefully selected several common augmentation effects,
adjusting their parameters and combination methods to preserve the original semantic information
of the images as much as possible.
"""
from augraphy import (
InkColorSwap,
LinesDegradation,
OneOf,
Dithering,
InkBleed,
InkShifter,
NoiseTexturize,
BrightnessTexturize,
ColorShift,
DirtyDrum,
LightingGradient,
Brightness,
Gamma,
SubtleNoise,
Jpeg,
AugraphyPipeline,
)
import random
def ocr_augmentation_pipeline():
pre_phase = [
# Rescale(scale="optimal", target_dpi = 300, p = 1.0),
]
def get_custom_augraphy():
pre_phase = []
ink_phase = [
InkColorSwap(
ink_swap_color="lhy_custom",
ink_swap_color="random",
ink_swap_sequence_number_range=(5, 10),
ink_swap_min_width_range=(2, 3),
ink_swap_max_width_range=(100, 120),
@@ -16,7 +42,7 @@ def ocr_augmentation_pipeline():
ink_swap_max_height_range=(100, 120),
ink_swap_min_area_range=(10, 20),
ink_swap_max_area_range=(400, 500),
p=0.2
p=0.2,
),
LinesDegradation(
line_roi=(0.0, 0.0, 1.0, 1.0),
@@ -28,9 +54,8 @@ def ocr_augmentation_pipeline():
line_long_to_short_ratio=(5, 7),
line_replacement_probability=(0.4, 0.5),
line_replacement_thickness=(1, 3),
p=0.2
p=0.2,
),
# ============================
OneOf(
[
@@ -44,10 +69,9 @@ def ocr_augmentation_pipeline():
severity=(0.4, 0.6),
),
],
p=0.2
p=0.2,
),
# ============================
# ============================
InkShifter(
text_shift_scale_range=(18, 27),
@@ -56,38 +80,32 @@ def ocr_augmentation_pipeline():
blur_kernel_size=(5, 5),
blur_sigma=0,
noise_type="perlin",
p=0.2
p=0.2,
),
# ============================
]
paper_phase = [
NoiseTexturize( # tested
NoiseTexturize(
sigma_range=(3, 10),
turbulence_range=(2, 5),
texture_width_range=(300, 500),
texture_height_range=(300, 500),
p=0.2
p=0.2,
),
BrightnessTexturize( # tested
texturize_range=(0.9, 0.99),
deviation=0.03,
p=0.2
)
BrightnessTexturize(texturize_range=(0.9, 0.99), deviation=0.03, p=0.2),
]
post_phase = [
ColorShift( # tested
ColorShift(
color_shift_offset_x_range=(3, 5),
color_shift_offset_y_range=(3, 5),
color_shift_iterations=(2, 3),
color_shift_brightness_range=(0.9, 1.1),
color_shift_gaussian_kernel_range=(3, 3),
p=0.2
p=0.2,
),
DirtyDrum( # tested
DirtyDrum(
line_width_range=(1, 6),
line_concentration=random.uniform(0.05, 0.15),
direction=random.randint(0, 2),
@@ -95,9 +113,8 @@ def ocr_augmentation_pipeline():
noise_value=(64, 224),
ksize=random.choice([(3, 3), (5, 5), (7, 7)]),
sigmaX=0,
p=0.2
p=0.2,
),
# =====================================
OneOf(
[
@@ -119,10 +136,9 @@ def ocr_augmentation_pipeline():
gamma_range=(0.9, 1.1),
),
],
p=0.2
p=0.2,
),
# =====================================
# =====================================
OneOf(
[
@@ -130,10 +146,10 @@ def ocr_augmentation_pipeline():
subtle_range=random.randint(5, 10),
),
Jpeg(
quality_range=(85, 95),
quality_range=(70, 95),
),
],
p=0.2
p=0.2,
),
# =====================================
]
@@ -143,7 +159,7 @@ def ocr_augmentation_pipeline():
paper_phase=paper_phase,
post_phase=post_phase,
pre_phase=pre_phase,
log=False
log=False,
)
return pipeline
return pipeline

View File

@@ -0,0 +1,47 @@
from typing import Any
import torch
from transformers import DataCollatorForLanguageModeling
from texteller.constants import MAX_TOKEN_SIZE, MIN_HEIGHT, MIN_WIDTH
def _left_move(x: torch.Tensor, pad_val):
assert len(x.shape) == 2, "x should be 2-dimensional"
lefted_x = torch.ones_like(x)
lefted_x[:, :-1] = x[:, 1:]
lefted_x[:, -1] = pad_val
return lefted_x
def tokenize_fn(samples: dict[str, list[Any]], tokenizer=None) -> dict[str, list[Any]]:
assert tokenizer is not None, "tokenizer should not be None"
tokenized_formula = tokenizer(samples["latex_formula"], return_special_tokens_mask=True)
tokenized_formula["pixel_values"] = samples["image"]
return tokenized_formula
def collate_fn(samples: list[dict[str, Any]], tokenizer=None) -> dict[str, list[Any]]:
assert tokenizer is not None, "tokenizer should not be None"
pixel_values = [dic.pop("pixel_values") for dic in samples]
clm_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
batch = clm_collator(samples)
batch["pixel_values"] = pixel_values
batch["decoder_input_ids"] = batch.pop("input_ids")
batch["decoder_attention_mask"] = batch.pop("attention_mask")
batch["labels"] = _left_move(batch["labels"], -100)
# convert list of Image to a tensor with (B, C, H, W)
batch["pixel_values"] = torch.stack(batch["pixel_values"], dim=0)
return batch
def filter_fn(sample, tokenizer=None) -> bool:
return (
sample["image"].height > MIN_HEIGHT
and sample["image"].width > MIN_WIDTH
and len(tokenizer(sample["latex_formula"])["input_ids"]) < MAX_TOKEN_SIZE - 10
)

View File

@@ -0,0 +1,154 @@
import torch
import random
import numpy as np
import cv2
from torchvision.transforms import v2
from typing import Any
from PIL import Image
from collections import Counter
from texteller.constants import (
IMG_CHANNELS,
MAX_RESIZE_RATIO,
MIN_RESIZE_RATIO,
)
from texteller.utils import transform as inference_transform
from .augraphy_pipe import get_custom_augraphy
augraphy_pipeline = get_custom_augraphy()
def trim_white_border(image: np.ndarray):
if len(image.shape) != 3 or image.shape[2] != 3:
raise ValueError("Image is not in RGB format or channel is not in third dimension")
if image.dtype != np.uint8:
raise ValueError(f"Image should stored in uint8")
corners = [tuple(image[0, 0]), tuple(image[0, -1]), tuple(image[-1, 0]), tuple(image[-1, -1])]
bg_color = Counter(corners).most_common(1)[0][0]
bg_color_np = np.array(bg_color, dtype=np.uint8)
h, w = image.shape[:2]
bg = np.full((h, w, 3), bg_color_np, dtype=np.uint8)
diff = cv2.absdiff(image, bg)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
threshold = 15
_, diff = cv2.threshold(mask, threshold, 255, cv2.THRESH_BINARY)
x, y, w, h = cv2.boundingRect(diff)
trimmed_image = image[y : y + h, x : x + w]
return trimmed_image
def add_white_border(image: np.ndarray, max_size: int) -> np.ndarray:
randi = [random.randint(0, max_size) for _ in range(4)]
pad_height_size = randi[1] + randi[3]
pad_width_size = randi[0] + randi[2]
if pad_height_size + image.shape[0] < 30:
compensate_height = int((30 - (pad_height_size + image.shape[0])) * 0.5) + 1
randi[1] += compensate_height
randi[3] += compensate_height
if pad_width_size + image.shape[1] < 30:
compensate_width = int((30 - (pad_width_size + image.shape[1])) * 0.5) + 1
randi[0] += compensate_width
randi[2] += compensate_width
return v2.functional.pad(
torch.from_numpy(image).permute(2, 0, 1),
padding=randi,
padding_mode="constant",
fill=(255, 255, 255),
)
def padding(images: list[torch.Tensor], required_size: int) -> list[torch.Tensor]:
images = [
v2.functional.pad(
img, padding=[0, 0, required_size - img.shape[2], required_size - img.shape[1]]
)
for img in images
]
return images
def random_resize(images: list[np.ndarray], minr: float, maxr: float) -> list[np.ndarray]:
if len(images[0].shape) != 3 or images[0].shape[2] != 3:
raise ValueError("Image is not in RGB format or channel is not in third dimension")
ratios = [random.uniform(minr, maxr) for _ in range(len(images))]
return [
# Anti-aliasing
cv2.resize(
img, (int(img.shape[1] * r), int(img.shape[0] * r)), interpolation=cv2.INTER_LANCZOS4
)
for img, r in zip(images, ratios)
]
def rotate(image: np.ndarray, min_angle: int, max_angle: int) -> np.ndarray:
# Get the center of the image to define the point of rotation
image_center = tuple(np.array(image.shape[1::-1]) / 2)
# Generate a random angle within the specified range
angle = random.randint(min_angle, max_angle)
# Get the rotation matrix for rotating the image around its center
rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
# Determine the size of the rotated image
cos = np.abs(rotation_mat[0, 0])
sin = np.abs(rotation_mat[0, 1])
new_width = int((image.shape[0] * sin) + (image.shape[1] * cos))
new_height = int((image.shape[0] * cos) + (image.shape[1] * sin))
# Adjust the rotation matrix to take into account translation
rotation_mat[0, 2] += (new_width / 2) - image_center[0]
rotation_mat[1, 2] += (new_height / 2) - image_center[1]
# Rotate the image with the specified border color (white in this case)
rotated_image = cv2.warpAffine(
image, rotation_mat, (new_width, new_height), borderValue=(255, 255, 255)
)
return rotated_image
def ocr_aug(image: np.ndarray) -> np.ndarray:
if random.random() < 0.2:
image = rotate(image, -5, 5)
image = add_white_border(image, max_size=25).permute(1, 2, 0).numpy()
image = augraphy_pipeline(image)
return image
def train_transform(images: list[Image.Image]) -> list[torch.Tensor]:
assert IMG_CHANNELS == 1, "Only support grayscale images for now"
images = [np.array(img.convert("RGB")) for img in images]
# random resize first
images = random_resize(images, MIN_RESIZE_RATIO, MAX_RESIZE_RATIO)
images = [trim_white_border(image) for image in images]
# OCR augmentation
images = [ocr_aug(image) for image in images]
# general transform pipeline
images = inference_transform(images)
return images
def img_train_transform(samples: dict[str, list[Any]]) -> dict[str, list[Any]]:
processed_img = train_transform(samples["pixel_values"])
samples["pixel_values"] = processed_img
return samples
def img_inf_transform(samples: dict[str, list[Any]]) -> dict[str, list[Any]]:
processed_img = inference_transform(samples["pixel_values"])
samples["pixel_values"] = processed_img
return samples

86
pyproject.toml Normal file
View File

@@ -0,0 +1,86 @@
[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"
[project]
name = "texteller"
authors = [
{ name="OleehyO", email="leehy0357@gmail.com" }
]
dynamic = ["version"]
description = "Texteller is a tool for converting rendered image to original latex code"
readme = "README.md"
license = { file = "LICENSE" }
requires-python = ">=3.10"
dependencies = [
"click>=8.1.8",
"colorama>=0.4.6",
"opencv-python-headless>=4.11.0.86",
"pyclipper>=1.3.0.post6",
"shapely>=2.1.0",
"streamlit>=1.44.1",
"streamlit-paste-button>=0.1.2",
"torch>=2.6.0",
"torchvision>=0.21.0",
"transformers==4.47",
"wget>=3.2",
"optimum[onnxruntime]>=1.24.0",
"python-multipart>=0.0.20",
"ray[serve]>=2.44.1",
]
[tool.hatch.version]
source = "vcs"
[tool.ruff]
exclude = [".git", ".mypy_cache", ".ruff_cache", ".venv", "dist"]
target-version = "py310"
line-length = 100
[tool.ruff.format]
line-ending = "lf"
quote-style = "double"
[tool.ruff.lint]
select = ["E", "W"]
ignore = [
"EXE001",
"UP009",
"F401",
"TID252",
"F403",
"F841",
"E501",
"W291",
"W293",
"E741",
"E712",
]
[tool.hatch.build.targets.wheel]
packages = ["texteller"]
[project.scripts]
texteller = "texteller.cli:cli"
[project.optional-dependencies]
onnxruntime-gpu = [
"onnxruntime-gpu>=1.21.0",
]
test = [
"pytest>=8.3.5",
]
train = [
"accelerate>=1.6.0",
"augraphy>=8.2.6",
"datasets>=3.5.0",
"tensorboardx>=2.6.2.2",
]
docs = [
"myst-parser>=4.0.1",
"nbsphinx>=0.9.7",
"sphinx>=8.1.3",
"sphinx-book-theme>=1.1.4",
"sphinx-copybutton>=0.5.2",
"sphinx-design>=0.6.1",
]

View File

@@ -1,13 +0,0 @@
transformers
datasets
evaluate
streamlit
opencv-python
ray[serve]
accelerate
tensorboardX
nltk
python-multipart
pdf2image
augraphy

View File

@@ -1,16 +0,0 @@
import requests
# 服务的 URL
url = "http://127.0.0.1:9900/predict"
# 替换成你要预测的图像的路径
img_path = "/home/lhy/code/TeXify/src/7.png"
# 构造请求数据
data = {"img_path": img_path}
# 发送 POST 请求
response = requests.post(url, json=data)
# 打印响应
print(response.text)

View File

@@ -1,40 +0,0 @@
import os
import argparse
from pathlib import Path
from models.ocr_model.utils.inference import inference
from models.ocr_model.model.TexTeller import TexTeller
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'-img',
type=str,
required=True,
help='path to the input image'
)
parser.add_argument(
'-cuda',
default=False,
action='store_true',
help='use cuda or not'
)
args = parser.parse_args([
'-img', './models/ocr_model/test_img/1.png',
'-cuda'
])
script_dirpath = Path(__file__).resolve().parent
os.chdir(script_dirpath)
model = TexTeller.from_pretrained('./models/ocr_model/model_checkpoint')
tokenizer = TexTeller.get_tokenizer('./models/tokenizer/roberta-tokenizer-550K')
# base = '/home/lhy/code/TeXify/src/models/ocr_model/test_img'
# img_path = [base + f'/{i}.png' for i in range(7, 12)]
img_path = [args.img]
res = inference(model, tokenizer, img_path, args.cuda)
print(res[0])

View File

@@ -1,60 +0,0 @@
# 公式图片(灰度化后)的均值和方差
IMAGE_MEAN = 0.9545467
IMAGE_STD = 0.15394445
# ========================= ocr模型用的参数 ============================= #
# 输入图片的最大最小的宽和高
MIN_HEIGHT = 32
MAX_HEIGHT = 512
MIN_WIDTH = 32
MAX_WIDTH = 1280
# LaTex-OCR中分别是 32、192、32、672
# ocr模型所用数据集pdf转图片所用的Density值(dpi)
TEXIFY_INPUT_DENSITY = 100
# ocr模型的tokenizer中的词典数量
VOCAB_SIZE = 15000
# ocr模型是否固定输入图片的大小
OCR_FIX_SIZE = True
# ocr模型训练时输入图片所固定的大小 (when OCR_FIX_SIZE is True)
OCR_IMG_SIZE = 448
# ocr模型训练时输入图片最大的宽和高when OCR_FIX_SIZE is False
OCR_IMG_MAX_HEIGHT = 512
OCR_IMG_MAX_WIDTH = 768
# ocr模型输入图片的通道数
OCR_IMG_CHANNELS = 1 # 灰度图
# ocr模型训练数据集的最长token数
MAX_TOKEN_SIZE = 1024 # 模型最长的embedding长度(默认512)
# MAX_TOKEN_SIZE = 2048 # 模型最长的embedding长度(默认512)
# MAX_TOKEN_SIZE = 600
# ocr模型训练时随机缩放的比例
MAX_RESIZE_RATIO = 1.15
MIN_RESIZE_RATIO = 0.75
# ocr模型输入的图片要求的最低宽和高(过滤垃圾数据)
MIN_HEIGHT = 12
MIN_WIDTH = 30
# ============================================================================= #
# ========================= Resizer模型用的参数 ============================= #
# Resizer模型所用数据集中图片所用的Density渲染值
RESIZER_INPUT_DENSITY = 200
LABEL_RATIO = 1.0 * TEXIFY_INPUT_DENSITY / RESIZER_INPUT_DENSITY
NUM_CLASSES = 1 # 模型使用回归预测
NUM_CHANNELS = 1 # 输入单通道图片(灰度图)
# Resizer在训练时图片所固定的的大小
RESIZER_IMG_SIZE = 448
# ============================================================================= #

View File

@@ -1,6 +0,0 @@
* Encoder-Decoder架构
* Encoder使用Deit_{BASE}
* Decoder使用RoBERTa_{LARGE}
* Decoder的tokenizer也使用RoBERTa_{LARGE}的

View File

@@ -1,65 +0,0 @@
from pathlib import Path
from models.globals import (
VOCAB_SIZE,
OCR_IMG_SIZE,
OCR_IMG_CHANNELS,
MAX_TOKEN_SIZE
)
from transformers import (
ViTConfig,
ViTModel,
TrOCRConfig,
TrOCRForCausalLM,
RobertaTokenizerFast,
VisionEncoderDecoderModel,
)
class TexTeller(VisionEncoderDecoderModel):
def __init__(self, decoder_path=None, tokenizer_path=None):
encoder = ViTModel(ViTConfig(
image_size=OCR_IMG_SIZE,
num_channels=OCR_IMG_CHANNELS
))
decoder = TrOCRForCausalLM(TrOCRConfig(
vocab_size=VOCAB_SIZE,
max_position_embeddings=MAX_TOKEN_SIZE
))
super().__init__(encoder=encoder, decoder=decoder)
@classmethod
def from_pretrained(cls, model_path: str):
model_path = Path(model_path).resolve()
return VisionEncoderDecoderModel.from_pretrained(str(model_path))
@classmethod
def get_tokenizer(cls, tokenizer_path: str) -> RobertaTokenizerFast:
tokenizer_path = Path(tokenizer_path).resolve()
return RobertaTokenizerFast.from_pretrained(str(tokenizer_path))
if __name__ == "__main__":
pause = 1
# texteller = TexTeller()
# from ..utils.inference import inference
# model = TexTeller.from_pretrained('/home/lhy/code/TexTeller/src/models/ocr_model/model/ckpt')
# model.save_pretrained('/home/lhy/code/TexTeller/src/models/ocr_model/model/ckpt2', safe_serialization=False)
# tokenizer = TexTeller.get_tokenizer('/home/lhy/code/TeXify/src/models/tokenizer/roberta-tokenizer-550Kformulas')
# base = '/home/lhy/code/TeXify/src/models/ocr_model/model'
# imgs_path = [
# # base + '/1.jpg',
# # base + '/2.jpg',
# # base + '/3.jpg',
# # base + '/4.jpg',
# # base + '/5.jpg',
# # base + '/6.jpg',
# base + '/foo.jpg'
# ]
# # res = inference(model, [img1, img2, img3, img4, img5, img6, img7], tokenizer)
# res = inference(model, imgs_path, tokenizer)
# pause = 1

View File

@@ -1,14 +0,0 @@
Congratulations on your download of this fine Rotodesign brand font product. We hope it will bring you many hours of typesetting pleasure and riches beyond your wildest dreams. We DO NOT, however, guarantee either of these things. Your mileage may vary.
This font is freeware, and is provided with no warranties as to its quality or its utility. After all, how much did you pay? Anyway, this font can be copied and used as you wish provided all copies include this readme file. Don't lie to your friends and tell 'em you made it yourself. You only cheat yourself when you do that. In the unlikely event you use this font to design something really cool or that makes you a ton of cash money, that's okay with me, just send me a copy or two of the finished item, and remember me when you get rich and famous. Enjoy!
©2006
Patrick Broderick
Rotodesign
http://www.rotodesign.com
roto@rotodesign.net
Rotodesign
1288 Columbus Ave. #176
San Francisco, CA 94133

View File

@@ -1,168 +0,0 @@
# Copyright 2020 The HuggingFace Evaluate Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Google BLEU (aka GLEU) metric. """
from typing import Dict, List
import datasets
from nltk.translate import gleu_score
import evaluate
from evaluate import MetricInfo
from .tokenizer_13a import Tokenizer13a
_CITATION = """\
@misc{wu2016googles,
title={Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation},
author={Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey
and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin
Johnson and Xiaobing Liu and Łukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto
Kazawa and Keith Stevens and George Kurian and Nishant Patil and Wei Wang and Cliff Young and
Jason Smith and Jason Riesa and Alex Rudnick and Oriol Vinyals and Greg Corrado and Macduff Hughes
and Jeffrey Dean},
year={2016},
eprint={1609.08144},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
"""
_DESCRIPTION = """\
The BLEU score has some undesirable properties when used for single
sentences, as it was designed to be a corpus measure. We therefore
use a slightly different score for our RL experiments which we call
the 'GLEU score'. For the GLEU score, we record all sub-sequences of
1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
compute a recall, which is the ratio of the number of matching n-grams
to the number of total n-grams in the target (ground truth) sequence,
and a precision, which is the ratio of the number of matching n-grams
to the number of total n-grams in the generated output sequence. Then
GLEU score is simply the minimum of recall and precision. This GLEU
score's range is always between 0 (no matches) and 1 (all match) and
it is symmetrical when switching output and target. According to
our experiments, GLEU score correlates quite well with the BLEU
metric on a corpus level but does not have its drawbacks for our per
sentence reward objective.
"""
_KWARGS_DESCRIPTION = """\
Computes corpus-level Google BLEU (GLEU) score of translated segments against one or more references.
Instead of averaging the sentence level GLEU scores (i.e. macro-average precision), Wu et al. (2016) sum up the matching
tokens and the max of hypothesis and reference tokens for each sentence, then compute using the aggregate values.
Args:
predictions (list of str): list of translations to score.
references (list of list of str): list of lists of references for each translation.
tokenizer : approach used for tokenizing `predictions` and `references`.
The default tokenizer is `tokenizer_13a`, a minimal tokenization approach that is equivalent to `mteval-v13a`, used by WMT.
This can be replaced by any function that takes a string as input and returns a list of tokens as output.
min_len (int): The minimum order of n-gram this function should extract. Defaults to 1.
max_len (int): The maximum order of n-gram this function should extract. Defaults to 4.
Returns:
'google_bleu': google_bleu score
Examples:
Example 1:
>>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
'he read the book because he was interested in world history']
>>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat'], \
['he was interested in world history because he read the book']]
>>> google_bleu = evaluate.load("google_bleu")
>>> results = google_bleu.compute(predictions=predictions, references=references)
>>> print(round(results["google_bleu"], 2))
0.44
Example 2:
>>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
'he read the book because he was interested in world history']
>>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat', \
'It is a guide to action that ensures that the rubber duck will never heed the cat commands', \
'It is the practical guide for the rubber duck army never to heed the directions of the cat'], \
['he was interested in world history because he read the book']]
>>> google_bleu = evaluate.load("google_bleu")
>>> results = google_bleu.compute(predictions=predictions, references=references)
>>> print(round(results["google_bleu"], 2))
0.61
Example 3:
>>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
'he read the book because he was interested in world history']
>>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat', \
'It is a guide to action that ensures that the rubber duck will never heed the cat commands', \
'It is the practical guide for the rubber duck army never to heed the directions of the cat'], \
['he was interested in world history because he read the book']]
>>> google_bleu = evaluate.load("google_bleu")
>>> results = google_bleu.compute(predictions=predictions, references=references, min_len=2)
>>> print(round(results["google_bleu"], 2))
0.53
Example 4:
>>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
'he read the book because he was interested in world history']
>>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat', \
'It is a guide to action that ensures that the rubber duck will never heed the cat commands', \
'It is the practical guide for the rubber duck army never to heed the directions of the cat'], \
['he was interested in world history because he read the book']]
>>> google_bleu = evaluate.load("google_bleu")
>>> results = google_bleu.compute(predictions=predictions,references=references, min_len=2, max_len=6)
>>> print(round(results["google_bleu"], 2))
0.4
"""
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class GoogleBleu(evaluate.Metric):
def _info(self) -> MetricInfo:
return evaluate.MetricInfo(
description=_DESCRIPTION,
citation=_CITATION,
inputs_description=_KWARGS_DESCRIPTION,
features=[
datasets.Features(
{
"predictions": datasets.Value("string", id="sequence"),
"references": datasets.Sequence(datasets.Value("string", id="sequence"), id="references"),
}
),
datasets.Features(
{
"predictions": datasets.Value("string", id="sequence"),
"references": datasets.Value("string", id="sequence"),
}
),
],
)
def _compute(
self,
predictions: List[str],
references: List[List[str]],
tokenizer=Tokenizer13a(),
min_len: int = 1,
max_len: int = 4,
) -> Dict[str, float]:
# if only one reference is provided make sure we still use list of lists
if isinstance(references[0], str):
references = [[ref] for ref in references]
references = [[tokenizer(r) for r in ref] for ref in references]
predictions = [tokenizer(p) for p in predictions]
return {
"google_bleu": gleu_score.corpus_gleu(
list_of_references=references, hypotheses=predictions, min_len=min_len, max_len=max_len
)
}

View File

@@ -1,100 +0,0 @@
# Source: https://github.com/mjpost/sacrebleu/blob/master/sacrebleu/tokenizers/tokenizer_13a.py
# Copyright 2020 SacreBLEU Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import re
from functools import lru_cache
class BaseTokenizer:
"""A base dummy tokenizer to derive from."""
def signature(self):
"""
Returns a signature for the tokenizer.
:return: signature string
"""
return "none"
def __call__(self, line):
"""
Tokenizes an input line with the tokenizer.
:param line: a segment to tokenize
:return: the tokenized line
"""
return line
class TokenizerRegexp(BaseTokenizer):
def signature(self):
return "re"
def __init__(self):
self._re = [
# language-dependent part (assuming Western languages)
(re.compile(r"([\{-\~\[-\` -\&\(-\+\:-\@\/])"), r" \1 "),
# tokenize period and comma unless preceded by a digit
(re.compile(r"([^0-9])([\.,])"), r"\1 \2 "),
# tokenize period and comma unless followed by a digit
(re.compile(r"([\.,])([^0-9])"), r" \1 \2"),
# tokenize dash when preceded by a digit
(re.compile(r"([0-9])(-)"), r"\1 \2 "),
# one space only between words
# NOTE: Doing this in Python (below) is faster
# (re.compile(r'\s+'), r' '),
]
@lru_cache(maxsize=2**16)
def __call__(self, line):
"""Common post-processing tokenizer for `13a` and `zh` tokenizers.
:param line: a segment to tokenize
:return: the tokenized line
"""
for (_re, repl) in self._re:
line = _re.sub(repl, line)
# no leading or trailing spaces, single space within words
# return ' '.join(line.split())
# This line is changed with regards to the original tokenizer (seen above) to return individual words
return line.split()
class Tokenizer13a(BaseTokenizer):
def signature(self):
return "13a"
def __init__(self):
self._post_tokenizer = TokenizerRegexp()
@lru_cache(maxsize=2**16)
def __call__(self, line):
"""Tokenizes an input line using a relatively minimal tokenization
that is however equivalent to mteval-v13a, used by WMT.
:param line: a segment to tokenize
:return: the tokenized line
"""
# language-independent part:
line = line.replace("<skipped>", "")
line = line.replace("-\n", "")
line = line.replace("\n", " ")
if "&" in line:
line = line.replace("&quot;", '"')
line = line.replace("&amp;", "&")
line = line.replace("&lt;", "<")
line = line.replace("&gt;", ">")
return self._post_tokenizer(f" {line} ")

View File

@@ -1,114 +0,0 @@
import os
from functools import partial
from pathlib import Path
from datasets import load_dataset
from transformers import Trainer, TrainingArguments, Seq2SeqTrainer, Seq2SeqTrainingArguments, GenerationConfig
from .training_args import CONFIG
from ..model.TexTeller import TexTeller
from ..utils.functional import tokenize_fn, collate_fn, img_train_transform, img_inf_transform, filter_fn
from ..utils.metrics import bleu_metric
from ...globals import MAX_TOKEN_SIZE, MIN_WIDTH, MIN_HEIGHT
def train(model, tokenizer, train_dataset, eval_dataset, collate_fn_with_tokenizer):
training_args = TrainingArguments(**CONFIG)
debug_mode = False
if debug_mode:
training_args.auto_find_batch_size = False
training_args.num_train_epochs = 2
# training_args.per_device_train_batch_size = 3
training_args.per_device_train_batch_size = 2
training_args.per_device_eval_batch_size = 2 * training_args.per_device_train_batch_size
training_args.jit_mode_eval = False
training_args.torch_compile = False
training_args.dataloader_num_workers = 1
trainer = Trainer(
model,
training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=collate_fn_with_tokenizer,
)
trainer.train(resume_from_checkpoint=None)
# trainer.train(resume_from_checkpoint='/home/lhy/code/TexTeller/src/models/ocr_model/train/train_result/TexTellerv2/checkpoint-288000')
def evaluate(model, tokenizer, eval_dataset, collate_fn):
eval_config = CONFIG.copy()
eval_config['predict_with_generate'] = True
generate_config = GenerationConfig(
max_length=MAX_TOKEN_SIZE-100,
num_beams=1,
do_sample=False,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
)
eval_config['generation_config'] = generate_config
eval_config['auto_find_batch_size'] = False
seq2seq_config = Seq2SeqTrainingArguments(**eval_config)
trainer = Seq2SeqTrainer(
model,
seq2seq_config,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=collate_fn,
compute_metrics=partial(bleu_metric, tokenizer=tokenizer)
)
res = trainer.evaluate()
print(res)
if __name__ == '__main__':
cur_path = os.getcwd()
script_dirpath = Path(__file__).resolve().parent
os.chdir(script_dirpath)
dataset = load_dataset(
'/home/lhy/code/TexTeller/src/models/ocr_model/train/data/loader.py'
)['train']
tokenizer = TexTeller.get_tokenizer('/home/lhy/code/TexTeller/src/models/tokenizer/roberta-tokenizer-7Mformulas')
filter_fn_with_tokenizer = partial(filter_fn, tokenizer=tokenizer)
# dataset = dataset.filter(lambda x: x['image'].height > MIN_HEIGHT and x['image'].width > MIN_WIDTH)
dataset = dataset.filter(filter_fn_with_tokenizer, num_proc=16)
dataset = dataset.shuffle(seed=42)
dataset = dataset.flatten_indices()
map_fn = partial(tokenize_fn, tokenizer=tokenizer)
tokenized_dataset = dataset.map(map_fn, batched=True, remove_columns=dataset.column_names, num_proc=8, load_from_cache_file=True)
split_dataset = tokenized_dataset.train_test_split(test_size=0.005, seed=42)
train_dataset, eval_dataset = split_dataset['train'], split_dataset['test']
train_dataset = train_dataset.with_transform(img_train_transform)
eval_dataset = eval_dataset.with_transform(img_inf_transform)
collate_fn_with_tokenizer = partial(collate_fn, tokenizer=tokenizer)
# model = TexTeller()
model = TexTeller.from_pretrained('/home/lhy/code/TexTeller/src/models/ocr_model/model/ckpt')
# ================= debug =======================
# foo = train_dataset[:50]
# bar = eval_dataset[:50]
# ================= debug =======================
enable_train = True
enable_evaluate = True
if enable_train:
train(model, tokenizer, train_dataset, eval_dataset, collate_fn_with_tokenizer)
if enable_evaluate:
evaluate(model, tokenizer, eval_dataset, collate_fn_with_tokenizer)
os.chdir(cur_path)

View File

@@ -1,84 +0,0 @@
CONFIG = {
"seed": 42, # 随机种子,用于确保实验的可重复性
"use_cpu": False, # 是否使用cpu刚开始测试代码的时候先用cpu跑会更容易debug
# "data_seed": 42, # data sampler的采样也固定
# "full_determinism": True, # 使整个训练完全固定这个设置会有害于模型训练只用于debug
"output_dir": "train_result/TexTellerv3", # 输出目录
"overwrite_output_dir": False, # 如果输出目录存在,不删除原先的内容
"report_to": ["tensorboard"], # 输出日志到TensorBoard
#+通过在命令行tensorboard --logdir ./logs 来查看日志
"logging_dir": None, # TensorBoard日志文件的存储目录(使用默认值)
"log_level": "warning", # 其他可选:debug, info, warning, error and critical由低级别到高级别
"logging_strategy": "steps", # 每隔一定步数记录一次日志
"logging_steps": 4000, # 记录日志的步数间隔可以是int也可以是(0~1)的float当是float时表示总的训练步数的ratio(比方说可以设置成1.0 / 2000)
#+通常与eval_steps一致
"logging_nan_inf_filter": False, # 对loss=nan或inf进行记录
"num_train_epochs": 4, # 总的训练轮数
# "max_steps": 3, # 训练的最大步骤数。如果设置了这个参数,
#+那么num_train_epochs将被忽略通常用于调试
# "label_names": ['your_label_name'], # 指定data_loader中的标签名如果不指定则默认为'labels'
"per_device_train_batch_size": 3, # 每个GPU的batch size
"per_device_eval_batch_size": 6, # 每个GPU的evaluation batch size
# "auto_find_batch_size": True, # 自动搜索合适的batch size指数decay
"auto_find_batch_size": False, # 自动搜索合适的batch size指数decay
"optim": "adamw_torch", # 还提供了很多AdamW的变体相较于经典的AdamW更加高效
#+当设置了optim后就不需要在Trainer中传入optimizer
"lr_scheduler_type": "cosine", # 设置lr_scheduler
"warmup_ratio": 0.1, # warmup占整个训练steps的比例(假如训练1000步那么前100步就是从lr=0慢慢长到参数设定的lr)
# "warmup_steps": 500, # 预热步数, 这个参数与warmup_ratio是矛盾的
"weight_decay": 0, # 权重衰减
"learning_rate": 5e-5, # 学习率
"max_grad_norm": 1.0, # 用于梯度裁剪确保梯度的范数不超过1.0默认1.0
"fp16": False, # 是否使用16位浮点数进行训练一般不推荐loss很容易炸
"bf16": False, # 是否使用16位宽浮点数进行训练如果架构支持的话推荐使用
"gradient_accumulation_steps": 2, # 梯度累积步数当batch size无法开很大时可以考虑这个参数来实现大batch size的效果
"gradient_checkpointing": False, # 当为True时会在forward时适当丢弃一些中间量用于backward从而减轻显存压力但会增加forward的时间
"label_smoothing_factor": 0.0, # softlabel等于0时表示未开启
# "debug": "underflow_overflow", # 训练时检查溢出如果发生则会发出警告。该模式通常用于debug
"jit_mode_eval": False, # 是否在eval的时候使用PyTorch jit trace可以加速模型但模型必须是静态的否则会报错
"torch_compile": False, # 是否使用torch.compile来编译模型从而获得更好的训练和推理性能
#+ 要求torch > 2.0,这个功能很好使,当模型跑通的时候可以开起来
# "deepspeed": "your_json_path", # 使用deepspeed来训练需要指定ds_config.json的路径
#+ 在Trainer中使用Deepspeed时一定要注意ds_config.json中的配置是否与Trainer的一致如学习率batch size梯度累积步数等
#+ 如果不一致会出现很奇怪的bug而且一般还很难发现
"dataloader_pin_memory": True, # 可以加快数据在cpu和gpu之间转移的速度
"dataloader_num_workers": 16, # 默认不会使用多进程来加载数据通常设成4*所用的显卡数
"dataloader_drop_last": True, # 丢掉最后一个minibatch保证训练的梯度稳定
"evaluation_strategy": "steps", # 评估策略,可以是"steps"或"epoch"
"eval_steps": 4000, # if evaluation_strategy="step"
#+默认情况下与logging_steps一样可以是int也可以是(0~1)的float当是float时表示总的训练步数的ratio(比方说可以设置成1.0 / 2000)
"save_strategy": "steps", # 保存checkpoint的策略
"save_steps": 4000, # checkpoint保存的步数间隔可以是int也可以是(0~1)的float当是float时表示总的训练步数的ratio(比方说可以设置成1.0 / 2000)
"save_total_limit": 10, # 保存的模型的最大数量。如果超过这个数量,最旧的模型将被删除
"load_best_model_at_end": True, # 训练结束时是否加载最佳模型
#+当设置True时会保存训练时评估结果最好的checkpoint
#+当设置True时evaluation_strategy必须与save_strategy一样并且save_steps必须是eval_steps的整数倍
"metric_for_best_model": "eval_loss", # 用于选择最佳模型的指标(必须与load_best_model_at_end一起用)
#+可以使用compute_metrics输出的evaluation的结果中一个字典的某个值
#+注意Trainer会在compute_metrics输出的字典的键前面加上一个prefix默认就是“eval_”
"greater_is_better": False, # 指标值越小越好(必须与metric_for_best_model一起用)
"do_train": True, # 是否进行训练,通常用于调试
"do_eval": True, # 是否进行评估,通常用于调试
"remove_unused_columns": False, # 是否删除没有用到的列特征默认为True
#+当删除了没用到的列后making it easier to unpack inputs into the models call function
#+注意remove_unused_columns去除列的操作会把传入的dataset的columns_names与模型forward方法中的参数名进行配对对于不存在forward方法中的列名就会直接删掉整个feature
#+因此如果在dataset.with_transform(..)中给数据进行改名那么这个remove操作会直接把原始的数据直接删掉从而导致之后会拿到一个空的dataset导致在对dataset进行切片取值时出问题
#+例如读进来的dataset图片对应的feature name叫"images"而模型forward方法中对应的参数名叫“pixel_values”
#+此时如果是在data.withtransfrom(..)中根据这个"images"生成其他模型forward方法中需要的参数然后再把"images"改名成“pixel_values”那么整个过程就会出问题
#+因为设置了remove_unused_columns=True后会先给dataset进行列名检查然后“images”这个feature会直接被删掉导致with_transform的transform_fn拿不到“images”这个feature
#+所以一个good practice就是对于要改名的特征先提前使用dataset.rename_column进行改名
"push_to_hub": False, # 是否训练完后上传hub需要先在命令行huggingface-cli login进行登录认证的配置配置完后认证信息会存到cache文件夹里
}

View File

@@ -1,59 +0,0 @@
import torch
from transformers import DataCollatorForLanguageModeling
from typing import List, Dict, Any
from .transforms import train_transform, inference_transform
from ...globals import MIN_HEIGHT, MIN_WIDTH, MAX_TOKEN_SIZE
def left_move(x: torch.Tensor, pad_val):
assert len(x.shape) == 2, 'x should be 2-dimensional'
lefted_x = torch.ones_like(x)
lefted_x[:, :-1] = x[:, 1:]
lefted_x[:, -1] = pad_val
return lefted_x
def tokenize_fn(samples: Dict[str, List[Any]], tokenizer=None) -> Dict[str, List[Any]]:
assert tokenizer is not None, 'tokenizer should not be None'
tokenized_formula = tokenizer(samples['latex_formula'], return_special_tokens_mask=True)
tokenized_formula['pixel_values'] = samples['image']
return tokenized_formula
def collate_fn(samples: List[Dict[str, Any]], tokenizer=None) -> Dict[str, List[Any]]:
assert tokenizer is not None, 'tokenizer should not be None'
pixel_values = [dic.pop('pixel_values') for dic in samples]
clm_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
batch = clm_collator(samples)
batch['pixel_values'] = pixel_values
batch['decoder_input_ids'] = batch.pop('input_ids')
batch['decoder_attention_mask'] = batch.pop('attention_mask')
# 左移labels和decoder_attention_mask
batch['labels'] = left_move(batch['labels'], -100)
# 把list of Image转成一个tensor with (B, C, H, W)
batch['pixel_values'] = torch.stack(batch['pixel_values'], dim=0)
return batch
def img_train_transform(samples: Dict[str, List[Any]]) -> Dict[str, List[Any]]:
processed_img = train_transform(samples['pixel_values'])
samples['pixel_values'] = processed_img
return samples
def img_inf_transform(samples: Dict[str, List[Any]]) -> Dict[str, List[Any]]:
processed_img = inference_transform(samples['pixel_values'])
samples['pixel_values'] = processed_img
return samples
def filter_fn(sample, tokenizer=None) -> bool:
return (
sample['image'].height > MIN_HEIGHT and sample['image'].width > MIN_WIDTH
and len(tokenizer(sample['latex_formula'])['input_ids']) < MAX_TOKEN_SIZE - 10
)

View File

@@ -1,39 +0,0 @@
import cv2
import numpy as np
from typing import List
from PIL import Image
def convert2rgb(image_paths: List[str]) -> List[np.ndarray]:
# 输出的np.ndarray的格式为[H, W, C]通道在第三维通道的排列顺序为RGB
processed_images = []
for path in image_paths:
# 读取图片
image = cv2.imread(path, cv2.IMREAD_UNCHANGED)
if image is None:
print(f"Image at {path} could not be read.")
continue
# 检查图片是否使用 uint16 类型
if image.dtype == np.uint16:
raise ValueError(f"Image at {path} is stored in uint16, which is not supported.")
# 获取图片通道数
channels = 1 if len(image.shape) == 2 else image.shape[2]
# 如果是 RGBA (4通道), 转换为 RGB
if channels == 4:
image = cv2.cvtColor(image, cv2.COLOR_BGRA2RGB)
# 如果是 I 模式 (单通道灰度图), 转换为 RGB
elif channels == 1:
image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
# 如果是 BGR (3通道), 转换为 RGB
elif channels == 3:
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
processed_images.append(image)
return processed_images

View File

@@ -1,39 +0,0 @@
import torch
from transformers import RobertaTokenizerFast, GenerationConfig
from typing import List
from models.ocr_model.model.TexTeller import TexTeller
from models.ocr_model.utils.transforms import inference_transform
from models.ocr_model.utils.helpers import convert2rgb
from models.globals import MAX_TOKEN_SIZE
def inference(
model: TexTeller,
tokenizer: RobertaTokenizerFast,
imgs_path: List[str],
use_cuda: bool,
num_beams: int = 1,
) -> List[str]:
model.eval()
imgs = convert2rgb(imgs_path)
imgs = inference_transform(imgs)
pixel_values = torch.stack(imgs)
if use_cuda:
model = model.to('cuda')
pixel_values = pixel_values.to('cuda')
generate_config = GenerationConfig(
max_new_tokens=MAX_TOKEN_SIZE,
num_beams=num_beams,
do_sample=False,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
)
pred = model.generate(pixel_values, generation_config=generate_config)
res = tokenizer.batch_decode(pred, skip_special_tokens=True)
return res

View File

@@ -1,17 +0,0 @@
import evaluate
import numpy as np
from transformers import EvalPrediction, RobertaTokenizer
from typing import Dict
def bleu_metric(eval_preds:EvalPrediction, tokenizer:RobertaTokenizer) -> Dict:
metric = evaluate.load('/home/lhy/code/TexTeller/src/models/ocr_model/train/google_bleu') # 这里需要联网,所以会卡住
logits, labels = eval_preds.predictions, eval_preds.label_ids
preds = logits
# preds = np.argmax(logits, axis=1) # 把logits转成对应的预测标签
labels = np.where(labels == -100, 1, labels)
preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
return metric.compute(predictions=preds, references=labels)

View File

@@ -1,221 +0,0 @@
import torch
import random
import numpy as np
import cv2
from torchvision.transforms import v2
from typing import List
from PIL import Image
from ...globals import (
OCR_IMG_CHANNELS,
OCR_IMG_SIZE,
OCR_FIX_SIZE,
IMAGE_MEAN, IMAGE_STD,
MAX_RESIZE_RATIO, MIN_RESIZE_RATIO
)
from .ocr_aug import ocr_augmentation_pipeline
# train_pipeline = default_augraphy_pipeline(scan_only=True)
train_pipeline = ocr_augmentation_pipeline()
general_transform_pipeline = v2.Compose([
v2.ToImage(), # Convert to tensor, only needed if you had a PIL image
#+返回一个List of torchvision.Imagelist的长度就是batch_size
#+因此在整个Compose pipeline的最后输出的也是一个List of torchvision.Image
#+注意不是返回一整个torchvision.Imagebatch_size的维度是拿出来的
v2.ToDtype(torch.uint8, scale=True), # optional, most input are already uint8 at this point
v2.Grayscale(), # 转灰度图(视具体任务而定)
v2.Resize( # 固定resize到一个正方形上
size=OCR_IMG_SIZE - 1, # size必须小于max_size
interpolation=v2.InterpolationMode.BICUBIC,
max_size=OCR_IMG_SIZE,
antialias=True
),
v2.ToDtype(torch.float32, scale=True), # Normalize expects float input
v2.Normalize(mean=[IMAGE_MEAN], std=[IMAGE_STD]),
# v2.ToPILImage() # 用于观察转换后的结果是否正确debug用
])
def trim_white_border(image: np.ndarray):
# image是一个3维的ndarrayRGB格式维度分布为[H, W, C](通道维在第三维上)
# # 检查images中的第一个元素是否是嵌套的列表结构
# if isinstance(image, list):
# image = np.array(image, dtype=np.uint8)
# 检查图像是否为RGB格式同时检查通道维是不是在第三维上
if len(image.shape) != 3 or image.shape[2] != 3:
raise ValueError("Image is not in RGB format or channel is not in third dimension")
# 检查图片是否使用 uint8 类型
if image.dtype != np.uint8:
raise ValueError(f"Image should stored in uint8")
# 创建与原图像同样大小的纯白背景图像
h, w = image.shape[:2]
bg = np.full((h, w, 3), 255, dtype=np.uint8)
# 计算差异
diff = cv2.absdiff(image, bg)
# 只要差值大于1就全部转化为255
_, diff = cv2.threshold(diff, 1, 255, cv2.THRESH_BINARY)
# 把差值转灰度图
gray_diff = cv2.cvtColor(diff, cv2.COLOR_RGB2GRAY)
# 计算图像中非零像素点的最小外接矩阵
x, y, w, h = cv2.boundingRect(gray_diff)
# 裁剪图像
trimmed_image = image[y:y+h, x:x+w]
return trimmed_image
def add_white_border(image: np.ndarray, max_size: int) -> np.ndarray:
randi = [random.randint(0, max_size) for _ in range(4)]
pad_height_size = randi[1] + randi[3]
pad_width_size = randi[0] + randi[2]
if (pad_height_size + image.shape[0] < 30):
compensate_height = int((30 - (pad_height_size + image.shape[0])) * 0.5) + 1
randi[1] += compensate_height
randi[3] += compensate_height
if (pad_width_size + image.shape[1] < 30):
compensate_width = int((30 - (pad_width_size + image.shape[1])) * 0.5) + 1
randi[0] += compensate_width
randi[2] += compensate_width
return v2.functional.pad(
torch.from_numpy(image).permute(2, 0, 1),
padding=randi,
padding_mode='constant',
fill=(255, 255, 255)
)
def padding(images: List[torch.Tensor], required_size: int) -> List[torch.Tensor]:
images = [
v2.functional.pad(
img,
padding=[0, 0, required_size - img.shape[2], required_size - img.shape[1]]
)
for img in images
]
return images
def random_resize(
images: List[np.ndarray],
minr: float,
maxr: float
) -> List[np.ndarray]:
# np.ndarray的格式3维RGB格式维度分布为[H, W, C](通道维在第三维上)
# # 检查images中的第一个元素是否是嵌套的列表结构
# if isinstance(images[0], list):
# # 将嵌套的列表结构转换为np.ndarray
# images = [np.array(img, dtype=np.uint8) for img in images]
if len(images[0].shape) != 3 or images[0].shape[2] != 3:
raise ValueError("Image is not in RGB format or channel is not in third dimension")
ratios = [random.uniform(minr, maxr) for _ in range(len(images))]
return [
cv2.resize(img, (int(img.shape[1] * r), int(img.shape[0] * r)), interpolation=cv2.INTER_LANCZOS4) # 抗锯齿
for img, r in zip(images, ratios)
]
def rotate(image: np.ndarray, min_angle: int, max_angle: int) -> np.ndarray:
# Get the center of the image to define the point of rotation
image_center = tuple(np.array(image.shape[1::-1]) / 2)
# Generate a random angle within the specified range
angle = random.randint(min_angle, max_angle)
# Get the rotation matrix for rotating the image around its center
rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
# Determine the size of the rotated image
cos = np.abs(rotation_mat[0, 0])
sin = np.abs(rotation_mat[0, 1])
new_width = int((image.shape[0] * sin) + (image.shape[1] * cos))
new_height = int((image.shape[0] * cos) + (image.shape[1] * sin))
# Adjust the rotation matrix to take into account translation
rotation_mat[0, 2] += (new_width / 2) - image_center[0]
rotation_mat[1, 2] += (new_height / 2) - image_center[1]
# Rotate the image with the specified border color (white in this case)
rotated_image = cv2.warpAffine(image, rotation_mat, (new_width, new_height), borderValue=(255, 255, 255))
return rotated_image
def ocr_aug(image: np.ndarray) -> np.ndarray:
# 20%的概率进行随机旋转
if random.random() < 0.2:
image = rotate(image, -5, 5)
# 增加白边
image = add_white_border(image, max_size=25).permute(1, 2, 0).numpy()
# 数据增强
image = train_pipeline(image)
return image
def train_transform(images: List[Image.Image]) -> List[torch.Tensor]:
assert OCR_IMG_CHANNELS == 1 , "Only support grayscale images for now"
assert OCR_FIX_SIZE == True, "Only support fixed size images for now"
images = [np.array(img.convert('RGB')) for img in images]
# random resize first
images = random_resize(images, MIN_RESIZE_RATIO, MAX_RESIZE_RATIO)
# 裁剪掉白边
images = [trim_white_border(image) for image in images]
# OCR augmentation
images = [ocr_aug(image) for image in images]
# general transform pipeline
images = [general_transform_pipeline(image) for image in images]
# padding to fixed size
images = padding(images, OCR_IMG_SIZE)
return images
def inference_transform(images: List[np.ndarray]) -> List[torch.Tensor]:
assert OCR_IMG_CHANNELS == 1 , "Only support grayscale images for now"
assert OCR_FIX_SIZE == True, "Only support fixed size images for now"
images = [np.array(img.convert('RGB')) for img in images]
# 裁剪掉白边
images = [trim_white_border(image) for image in images]
# general transform pipeline
images = [general_transform_pipeline(image) for image in images] # imgs: List[PIL.Image.Image]
# padding to fixed size
images = padding(images, OCR_IMG_SIZE)
return images
if __name__ == '__main__':
from pathlib import Path
from .helpers import convert2rgb
base_dir = Path('/home/lhy/code/TeXify/src/models/ocr_model/model')
imgs_path = [
base_dir / '1.jpg',
base_dir / '2.jpg',
base_dir / '3.jpg',
base_dir / '4.jpg',
base_dir / '5.jpg',
base_dir / '6.jpg',
base_dir / '7.jpg',
]
imgs_path = [str(img_path) for img_path in imgs_path]
imgs = convert2rgb(imgs_path)
res = random_resize(imgs, 0.5, 1.5)
pause = 1

View File

@@ -1,44 +0,0 @@
#!/usr/bin/env python3
import os
import argparse
import torch
from pathlib import Path
from PIL import Image
from .model.Resizer import Resizer
from .utils import preprocess_fn
from munch import Munch
def inference(args):
img = Image.open(args.image)
img = img.convert('RGB') if img.format == 'PNG' else img
processed_img = preprocess_fn({"pixel_values": [img]})
ckt_path = Path(args.checkpoint).resolve()
model = Resizer.from_pretrained(ckt_path)
model.eval()
inpu = torch.stack(processed_img['pixel_values'])
pred = model(inpu) * 1.25
print(pred)
...
if __name__ == "__main__":
cur_dirpath = os.getcwd()
script_dirpath = Path(__file__).resolve().parent
os.chdir(script_dirpath)
parser = argparse.ArgumentParser()
parser.add_argument('-img', '--image', type=str, required=True)
parser.add_argument('-ckt', '--checkpoint', type=str, required=True)
args = parser.parse_args([
'-img', '/home/lhy/code/TeXify/src/models/resizer/foo5_140h.jpg',
'-ckt', '/home/lhy/code/TeXify/src/models/resizer/train/train_result_pred_height_v5'
])
inference(args)
os.chdir(cur_dirpath)

Some files were not shown because too many files have changed in this diff Show More