Update README.md

This commit is contained in:
三洋三洋
2024-04-21 13:06:01 +00:00
parent 5f62c7fbf0
commit 1997145cf6
2 changed files with 12 additions and 20 deletions

View File

@@ -29,7 +29,6 @@ TexTeller was trained with ~~550K~~7.5M image-formula pairs (dataset available [
* 📮[2024-04-12] Trained a **formula detection model**, thereby enhancing the capability to detect and recognize formulas in entire documents (whole-image inference)!
## 🔑 Prerequisites
python=3.10
@@ -85,9 +84,7 @@ TexTeller also supports **formula detection and recognition** on full images, al
### Download Weights
English documentation formula detection [[link](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco_trained_on_IBEM_en_papers.onnx?download=true)]: Trained on 8272 images from the [IBEM dataset](https://zenodo.org/records/4757865).
Chinese documentation formula detection [[link](https://huggingface.co/TonyLee1256/texteller_det/blob/main/rtdetr_r50vd_6x_coco_trained_on_cn_textbook.onnx)]: Trained on 2560 Chinese textbook images (100+ layouts).
Chinese-English documentation formula detection [[link](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco.onnx?download=true)]: Trained on 3415 Chinese textbook images (130+ layouts).
### Formula Detection

View File

@@ -1,4 +1,4 @@
📄 <a href="../README.md">English</a> | 中文
📄 `<a href="../README.md">`English`</a>` | 中文
<div align="center">
<h1>
@@ -48,6 +48,7 @@ python=3.10
```
2. [安装pytorch](https://pytorch.org/get-started/locally/#start-locally)
3. 安装本项目的依赖包:
```bash
@@ -112,9 +113,7 @@ TexTeller还支持对整张图片进行**公式检测+公式识别**,从而对
### 下载权重
英文文档公式检测 [[link](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco_trained_on_IBEM_en_papers.onnx?download=true)]在8272张[IBEM数据集](https://zenodo.org/records/4757865)上训练得到
中文文档公式检测 [[link](https://huggingface.co/TonyLee1256/texteller_det/blob/main/rtdetr_r50vd_6x_coco_trained_on_cn_textbook.onnx)]在2560张中文教材数据(100+版式)上训练得到
中文英文文档公式检测 [[link](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco.onnx?download=true)]在3415张中文教材数据(130+版式)上训练得到
### 公式检测
@@ -149,7 +148,7 @@ python server.py
```
| 参数 | 描述 |
| - | - |
| --- | --- |
| `-ckpt` | 权重文件的路径,*默认为TexTeller的预训练权重*。|
| `-tknz` | 分词器的路径,*默认为TexTeller的分词器*。|
| `-port` | 服务器的服务端口,*默认是8000*。|
@@ -207,13 +206,9 @@ python -m models.ocr_model.train.train
## 📅 计划
- [X] ~~使用更大的数据集来训练模型(7.5M样本,即将发布)~~
- [ ] 扫描图片识别
- [ ] PDF文档识别 + 中英文场景支持
- [ ] 推理加速
- [ ] ...
## ⭐️ 观星曲线