[deps] pin transformers to 4.45.2 and sentence-transformers to 3.1.1

Merge pull request #78 from OleehyO/pre_release
Change to better import dependency
2025-02-01 13:00:44 +08:00 · 2024-08-07 12:43:15 +08:00 · 2024-08-07 01:19:26 +08:00 · 2024-07-11 20:34:50 +08:00 · 2024-07-11 20:33:51 +08:00 · 2024-06-23 22:16:09 +08:00
113 changed files with 12288 additions and 125372 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,10 +1,28 @@
+**/.DS_Store
 **/__pycache__
 **/.vscode
-**/train_result
+**/pyrightconfig.json

-**/logs
-**/.cache
-**/tmp*
-**/data
-**/*cache
+**/dist
+**/build
+*.egg-info
+
+**/train_result
 **/ckpt
+**/ckpts
+**/*.safetensor
+**/trocr-*
+**/large*.onnx
+**/rtdetr_r50vd_6x_coco.onnx
+
+**/*cache
+**/.cache
+
+**/tmp
+**/tmp*
+**/log
+**/logs
+
+**/data
+
+**/*.bin
--- a/202
+++ b/202
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright OleehyO
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -0,0 +1 @@
+include README.md
--- a/README.md
+++ b/README.md
@@ -0,0 +1,282 @@
+📄 English | <a href="./assets/README_zh.md">中文</a>
+
+<div align="center">
+    <h1>
+        <img src="./assets/fire.svg" width=30, height=30> 
+        𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
+        <img src="./assets/fire.svg" width=30, height=30>
+    </h1>
+    <!-- <p align="center">
+        🤗 <a href="https://huggingface.co/OleehyO/TexTeller"> Hugging Face </a>
+    </p> -->
+
+  [![](https://img.shields.io/badge/License-Apache_2.0-blue.svg?logo=github)](https://opensource.org/licenses/Apache-2.0)
+  [![](https://img.shields.io/badge/docker-pull-green.svg?logo=docker)](https://hub.docker.com/r/oleehyo/texteller)
+  [![](https://img.shields.io/badge/Data-Texteller1.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas)
+  [![](https://img.shields.io/badge/Weights-Texteller3.0-yellow.svg?logo=huggingface)](https://huggingface.co/OleehyO/TexTeller)
+
+</div>
+
+<!-- <p align="center">
+
+  <a href="https://opensource.org/licenses/Apache-2.0">
+    <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License">
+  </a>
+  <a href="https://github.com/OleehyO/TexTeller/issues">
+    <img src="https://img.shields.io/badge/Maintained%3F-yes-green.svg" alt="Maintenance">
+  </a>
+  <a href="https://github.com/OleehyO/TexTeller/pulls">
+    <img src="https://img.shields.io/badge/Contributions-welcome-brightgreen.svg?style=flat" alt="Contributions welcome">
+  </a>
+  <a href="https://huggingface.co/datasets/OleehyO/latex-formulas">
+    <img src="https://img.shields.io/badge/Data-Texteller1.0-brightgreen.svg" alt="Data">
+  </a>
+  <a href="https://huggingface.co/OleehyO/TexTeller">
+    <img src="https://img.shields.io/badge/Weights-Texteller3.0-yellow.svg" alt="Weights">
+  </a>
+
+</p> -->
+
+https://github.com/OleehyO/TexTeller/assets/56267907/532d1471-a72e-4960-9677-ec6c19db289f
+
+TexTeller is an end-to-end formula recognition model based on [TrOCR](https://arxiv.org/abs/2109.10282), capable of converting images into corresponding LaTeX formulas.
+
+TexTeller was trained with **80M image-formula pairs** (previous dataset can be obtained [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR) which used a 100K dataset, TexTeller has **stronger generalization abilities** and **higher accuracy**, covering most use cases.
+
+>[!NOTE]
+> If you would like to provide feedback or suggestions for this project, feel free to start a discussion in the [Discussions section](https://github.com/OleehyO/TexTeller/discussions).
+> 
+> Additionally, if you find this project helpful, please don't forget to give it a star⭐️🙏️
+
+---
+
+<table>
+<tr>
+<td>
+
+## 🔖 Table of Contents
+- [Change Log](#-change-log)
+- [Getting Started](#-getting-started)
+- [Web Demo](#-web-demo)
+- [Formula Detection](#-formula-detection)
+- [API Usage](#-api-usage)
+- [Training](#️️-training)
+- [Plans](#-plans)
+- [Stargazers over time](#️-stargazers-over-time)
+- [Contributors](#-contributors)
+
+</td>
+<td>
+
+<div align="center">
+  <figure>
+    <img src="assets/cover.png" width="800">
+    <figcaption>
+      <p>Images that can be recognized by TexTeller</p>
+    </figcaption>
+  </figure>
+  <div>
+    <p>
+      Thanks to the
+      <i>
+        Super Computing Platform of Beijing University of Posts and Telecommunications
+      </i>
+        for supporting this work😘
+    </p>
+    <!-- <img src="assets/scss.png" width="200"> -->
+  </div>
+</div>
+
+
+</td>
+</tr>
+</table>
+
+## 🔄 Change Log
+
+- 📮[2024-06-06] **TexTeller3.0 released!** The training data has been increased to **80M** (**10x more than** TexTeller2.0 and also improved in data diversity). TexTeller3.0's new features:
+
+  - Support scanned image, handwritten formulas, English(Chinese) mixed formulas.
+
+  - OCR abilities in both Chinese and English for printed images.
+
+- 📮[2024-05-02] Support **paragraph recognition**.
+
+- 📮[2024-04-12] **Formula detection model** released!
+
+- 📮[2024-03-25] TexTeller2.0 released! The training data for TexTeller2.0 has been increased to 7.5M (15x more than TexTeller1.0 and also improved in data quality). The trained TexTeller2.0 demonstrated **superior performance** in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.
+
+  > [Here](./assets/test.pdf) are more test images and a horizontal comparison of various recognition models.
+
+## 🚀 Getting Started
+
+1. Clone the repository:
+
+   ```bash
+   git clone https://github.com/OleehyO/TexTeller
+   ```
+
+2. Install the project's dependencies:
+
+   ```bash
+   pip install texteller
+   ```
+
+3. Enter the `src/` directory and run the following command in the terminal to start inference:
+
+   ```bash
+   python inference.py -img "/path/to/image.{jpg,png}" 
+   # use --inference-mode option to enable GPU(cuda or mps) inference
+   #+e.g. python inference.py -img "img.jpg" --inference-mode cuda
+   ```
+
+   > The first time you run it, the required checkpoints will be downloaded from Hugging Face.
+
+### Paragraph Recognition
+
+As demonstrated in the video, TexTeller is also capable of recognizing entire text paragraphs. Although TexTeller has general text OCR capabilities, we still recommend using paragraph recognition for better results:
+
+1. [Download the weights](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco.onnx?download=true) of the formula detection model to the`src/models/det_model/model/`directory
+
+2. Run `inference.py` in the `src/` directory and add the `-mix` option, the results will be output in markdown format.
+
+   ```bash
+   python inference.py -img "/path/to/image.{jpg,png}" -mix
+   ```
+
+TexTeller uses the lightweight [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) model by default for recognizing both Chinese and English text. You can try using a larger model to achieve better recognition results for both Chinese and English:
+
+| Checkpoints | Model Description | Size |
+|-------------|-------------------| ---- |
+| [ch_PP-OCRv4_det.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_det.onnx?download=true) | **Default detection model**, supports Chinese-English text detection | 4.70M |
+| [ch_PP-OCRv4_server_det.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_server_det.onnx?download=true) | High accuracy model, supports Chinese-English text detection | 115M |
+| [ch_PP-OCRv4_rec.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_rec.onnx?download=true) | **Default recoginition model**, supports Chinese-English text recognition | 10.80M |
+| [ch_PP-OCRv4_server_rec.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_server_rec.onnx?download=true) | High accuracy model, supports Chinese-English text recognition | 90.60M |
+
+Place the weights of the recognition/detection model in the `det/` or `rec/` directories within `src/models/third_party/paddleocr/checkpoints/`, and rename them to `default_model.onnx`.
+
+> [!NOTE]
+> Paragraph recognition cannot restore the structure of a document, it can only recognize its content.
+
+## 🌐 Web Demo
+
+Go to the `src/` directory and run the following command:
+
+```bash
+./start_web.sh
+```
+
+Enter `http://localhost:8501` in a browser to view the web demo.
+
+> [!NOTE]
+> 1. For Windows users, please run the `start_web.bat` file.
+> 2. When using onnxruntime + GPU for inference, you need to install onnxruntime-gpu.
+
+## 🔍 Formula Detection
+
+TexTeller’s formula detection model is trained on 3,415 images of Chinese educational materials (with over 130 layouts) and 8,272 images from the [IBEM dataset](https://zenodo.org/records/4757865), and it supports formula detection across entire images.
+
+<div align="center">
+    <img src="./assets/det_rec.png" width=250> 
+</div>
+
+1. Download the model weights and place them in `src/models/det_model/model/` [[link](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco.onnx?download=true)].
+
+2. Run the following command in the `src/` directory, and the results will be saved in `src/subimages/`
+
+<details>
+<summary>Advanced: batch formula recognition</summary>
+
+After **formula detection**, run the following command in the `src/` directory:
+
+```shell
+python rec_infer_from_crop_imgs.py
+```
+
+This will use the results of the previous formula detection to perform batch recognition on all cropped formulas, saving the recognition results as txt files in `src/results/`.
+
+</details>
+
+## 📡 API Usage
+
+We use [ray serve](https://github.com/ray-project/ray) to provide an API interface for TexTeller, allowing you to integrate TexTeller into your own projects. To start the server, you first need to enter the `src/` directory and then run the following command:
+
+```bash
+python server.py
+```
+
+| Parameter | Description |
+| --------- | -------- |
+| `-ckpt` | The path to the weights file,*default is TexTeller's pretrained weights*. |
+| `-tknz` | The path to the tokenizer,*default is TexTeller's tokenizer*. |
+| `-port` | The server's service port,*default is 8000*. |
+| `--inference-mode` | Whether to use "cuda" or "mps" for inference,*default is "cpu"*. |
+| `--num_beams` | The number of beams for beam search,*default is 1*. |
+| `--num_replicas` | The number of service replicas to run on the server,*default is 1 replica*. You can use more replicas to achieve greater throughput.|
+| `--ncpu_per_replica` | The number of CPU cores used per service replica,*default is 1*.|
+| `--ngpu_per_replica` | The number of GPUs used per service replica,*default is 1*. You can set this value between 0 and 1 to run multiple service replicas on one GPU to share the GPU, thereby improving GPU utilization. (Note, if --num_replicas is 2, --ngpu_per_replica is 0.7, then 2 GPUs must be available) |
+| `-onnx` | Perform inference using Onnx Runtime, *disabled by default* |
+
+> [!NOTE]
+> A client demo can be found at `src/client/demo.py`, you can refer to `demo.py` to send requests to the server
+
+## 🏋️‍♂️ Training
+
+### Dataset
+
+We provide an example dataset in the `src/models/ocr_model/train/dataset/` directory, you can place your own images in the `images/` directory and annotate each image with its corresponding formula in `formulas.jsonl`.
+
+After preparing your dataset, you need to **change the `DIR_URL` variable to your own dataset's path** in `**/train/dataset/loader.py`
+
+### Retraining the Tokenizer
+
+If you are using a different dataset, you might need to retrain the tokenizer to obtain a different vocabulary. After configuring your dataset, you can train your own tokenizer with the following command:
+
+1. In `src/models/tokenizer/train.py`, change `new_tokenizer.save_pretrained('./your_dir_name')` to your custom output directory
+
+   > If you want to use a different vocabulary size (default 15K), you need to change the `VOCAB_SIZE` variable in `src/models/globals.py`
+   >
+2. **In the `src/` directory**, run the following command:
+
+   ```bash
+   python -m models.tokenizer.train
+   ```
+
+### Training the Model
+
+1. Modify `num_processes` in `src/train_config.yaml` to match the number of GPUs available for training (default is 1).
+2. In the `src/` directory, run the following command:
+
+   ```bash
+   accelerate launch --config_file ./train_config.yaml -m models.ocr_model.train.train
+   ```
+
+You can set your own tokenizer and checkpoint paths in `src/models/ocr_model/train/train.py` (refer to `train.py` for more information). If you are using the same architecture and vocabulary as TexTeller, you can also fine-tune TexTeller's default weights with your own dataset.
+
+In `src/globals.py` and `src/models/ocr_model/train/train_args.py`, you can change the model's architecture and training hyperparameters.
+
+> [!NOTE]
+> Our training scripts use the [Hugging Face Transformers](https://github.com/huggingface/transformers) library, so you can refer to their [documentation](https://huggingface.co/docs/transformers/v4.32.1/main_classes/trainer#transformers.TrainingArguments) for more details and configurations on training parameters.
+
+## 📅 Plans
+
+- [X] ~~Train the model with a larger dataset~~
+- [X] ~~Recognition of scanned images~~
+- [X] ~~Support for English and Chinese scenarios~~
+- [X] ~~Handwritten formulas support~~
+- [ ] PDF document recognition
+- [ ] Inference acceleration
+- [ ] ...
+
+## ⭐️ Stargazers over time
+
+[![Stargazers over time](https://starchart.cc/OleehyO/TexTeller.svg?variant=adaptive)](https://starchart.cc/OleehyO/TexTeller)
+
+
+## 👥 Contributors
+
+<a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
+   <a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
+      <img src="https://contrib.rocks/image?repo=OleehyO/TexTeller" />
+   </a>
+</a>
--- a/assets/README_zh.md
+++ b/assets/README_zh.md
@@ -0,0 +1,317 @@
+📄 <a href="../README.md">English</a> | 中文
+
+<div align="center">
+    <h1>
+        <img src="./fire.svg" width=30, height=30> 
+        𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
+        <img src="./fire.svg" width=30, height=30>
+    </h1>
+    <!-- <p align="center">
+        🤗 <a href="https://huggingface.co/OleehyO/TexTeller"> Hugging Face </a>
+    </p> -->
+
+  [![](https://img.shields.io/badge/License-Apache_2.0-blue.svg?logo=github)](https://opensource.org/licenses/Apache-2.0)
+  [![](https://img.shields.io/badge/docker-pull-green.svg?logo=docker)](https://hub.docker.com/r/oleehyo/texteller)
+  [![](https://img.shields.io/badge/Data-Texteller1.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas)
+  [![](https://img.shields.io/badge/Weights-Texteller3.0-yellow.svg?logo=huggingface)](https://huggingface.co/OleehyO/TexTeller)
+
+</div>
+
+<!-- <p align="center">
+
+  <a href="https://opensource.org/licenses/Apache-2.0">
+    <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License">
+  </a>
+  <a href="https://github.com/OleehyO/TexTeller/issues">
+    <img src="https://img.shields.io/badge/Maintained%3F-yes-green.svg" alt="Maintenance">
+  </a>
+  <a href="https://github.com/OleehyO/TexTeller/pulls">
+    <img src="https://img.shields.io/badge/Contributions-welcome-brightgreen.svg?style=flat" alt="Contributions welcome">
+  </a>
+  <a href="https://huggingface.co/datasets/OleehyO/latex-formulas">
+    <img src="https://img.shields.io/badge/Data-Texteller1.0-brightgreen.svg" alt="Data">
+  </a>
+  <a href="https://huggingface.co/OleehyO/TexTeller">
+    <img src="https://img.shields.io/badge/Weights-Texteller3.0-yellow.svg" alt="Weights">
+  </a>
+
+</p> -->
+
+https://github.com/OleehyO/TexTeller/assets/56267907/532d1471-a72e-4960-9677-ec6c19db289f
+
+TexTeller是一个基于[TrOCR](https://arxiv.org/abs/2109.10282)的端到端公式识别模型，可以把图片转换为对应的latex公式
+
+TexTeller用了**80M**个图片-公式对进行训练(过去的数据集可以在[这里](https://huggingface.co/datasets/OleehyO/latex-formulas)获取)，相比于[LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)(使用了一个100K的数据集)，TexTeller具有**更强的泛化能力**以及**更高的准确率**，可以覆盖大部分的使用场景。
+
+> [!NOTE]
+> 如果您想为本项目提供一些反馈、建议等，欢迎在[Discussions版块](https://github.com/OleehyO/TexTeller/discussions)发起讨论。
+> 
+> 另外，如果您觉得这个项目对您有帮助，请不要忘记点亮上方的Star⭐️🙏
+
+---
+
+<table>
+<tr>
+<td>
+
+## 🔖 目录
+
+- [变更信息](#-变更信息)
+- [开搞](#-开搞)
+- [常见问题：无法连接到Hugging Face](#-常见问题无法连接到hugging-face)
+- [网页演示](#-网页演示)
+- [公式检测](#-公式检测)
+- [API调用](#-api调用)
+- [训练](#️️-训练)
+- [计划](#-计划)
+- [观星曲线](#️-观星曲线)
+- [贡献者](#-贡献者)
+
+</td>
+<td>
+
+<div align="center">
+  <figure>
+    <img src="cover.png" width="800">
+    <figcaption>
+      <p>可以被TexTeller识别出的图片</p>
+    </figcaption>
+  </figure>
+  <div>
+    <p>
+      感谢
+      <i>
+        北京邮电大学超算平台
+      </i>
+      为本项工作提供支持😘
+    </p>
+  </div>
+</div>
+
+</td>
+</tr>
+</table>
+
+## 🔄 变更信息
+
+- 📮[2024-06-06] **TexTeller3.0**发布! 训练数据集增加到了**80M**(相较于TexTeller2.0增加了**10倍**，并且改善了数据的多样性)。新版的TexTeller具有以下新的特性：
+  - 支持扫描图片、手写公式以及中英文混合的公式。
+  - 在打印图片上具有通用的中英文识别能力。
+
+- 📮[2024-05-02] 支持**段落识别**。
+
+- 📮[2024-04-12] **公式检测模型**发布!
+
+- 📮[2024-03-25] TexTeller2.0发布！TexTeller2.0的训练数据增大到了7.5M(相较于TexTeller1.0增加了~15倍并且数据质量也有所改善)。训练后的TexTeller2.0在测试集中展现出了更加优越的性能，尤其在生僻符号、复杂多行、矩阵的识别场景中。
+
+  > 在[这里](./test.pdf)有更多的测试图片以及各家识别模型的横向对比。
+
+## 🚀 开搞
+
+1. 克隆本仓库:
+
+   ```bash
+   git clone https://github.com/OleehyO/TexTeller
+   ```
+
+2. 安装本项目的依赖包:
+
+   ```bash
+   pip install texteller
+   ```
+
+3. 进入`src/`目录，在终端运行以下命令进行推理:
+
+   ```bash
+    python inference.py -img "/path/to/image.{jpg,png}" 
+    # use --inference-mode option to enable GPU(cuda or mps) inference
+    #+e.g. python inference.py -img "img.jpg" --inference-mode cuda
+   ```
+
+   > 第一次运行时会在Hugging Face上下载所需要的权重
+
+### 段落识别
+
+如演示视频所示，TexTeller还可以识别整个文本段落。尽管TexTeller具备通用的文本OCR能力，但我们仍然建议使用段落识别来获得更好的效果：
+
+1. 下载公式检测模型的权重到`src/models/det_model/model/`目录 [[链接](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco.onnx?download=true)]
+
+2. `src/`目录下运行`inference.py`并添加`-mix`选项，结果会以markdown的格式进行输出。
+
+   ```bash
+   python inference.py -img "/path/to/image.{jpg,png}" -mix
+   ```
+
+TexTeller默认使用轻量的[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)模型来识别中英文，可以尝试使用更大的模型来获取更好的中英文识别效果：
+
+| 权重 | 描述 | 尺寸 |
+|-------------|-------------------| ---- |
+| [ch_PP-OCRv4_det.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_det.onnx?download=true) | **默认的检测模型**，支持中英文检测 | 4.70M |
+| [ch_PP-OCRv4_server_det.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_server_det.onnx?download=true) | 高精度模型，支持中英文检测 | 115M |
+| [ch_PP-OCRv4_rec.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_rec.onnx?download=true) | **默认的识别模型**，支持中英文识别 | 10.80M |
+| [ch_PP-OCRv4_server_rec.onnx](https://huggingface.co/OleehyO/paddleocrv4.onnx/resolve/main/ch_PP-OCRv4_server_rec.onnx?download=true) | 高精度模型，支持中英文识别 | 90.60M |
+
+把识别/检测模型的权重放在`src/models/third_party/paddleocr/checkpoints/`
+下的`det/`或`rec/`目录中，然后重命名为`default_model.onnx`。
+
+> [!NOTE]
+> 段落识别只能识别文档内容，无法还原文档的结构。
+
+## ❓ 常见问题：无法连接到Hugging Face
+
+默认情况下，会在Hugging Face中下载模型权重，**如果你的远端服务器无法连接到Hugging Face**，你可以通过以下命令进行加载：
+
+1. 安装huggingface hub包
+
+   ```bash
+   pip install -U "huggingface_hub[cli]"
+   ```
+
+2. 在能连接Hugging Face的机器上下载模型权重:
+
+   ```bash
+   huggingface-cli download \
+       OleehyO/TexTeller \
+       --repo-type model \
+       --local-dir "your/dir/path" \
+       --local-dir-use-symlinks False
+   ```
+
+3. 把包含权重的目录上传远端服务器，然后把 `src/models/ocr_model/model/TexTeller.py`中的 `REPO_NAME = 'OleehyO/TexTeller'`修改为 `REPO_NAME = 'your/dir/path'`
+
+<!-- 如果你还想在训练模型时开启evaluate，你需要提前下载metric脚本并上传远端服务器：
+
+1. 在能连接Hugging Face的机器上下载metric脚本
+
+   ```bash
+   huggingface-cli download \
+       evaluate-metric/google_bleu \
+       --repo-type space \
+       --local-dir "your/dir/path" \
+       --local-dir-use-symlinks False
+   ```
+
+2. 把这个目录上传远端服务器，并在 `TexTeller/src/models/ocr_model/utils/metrics.py`中把 `evaluate.load('google_bleu')`改为 `evaluate.load('your/dir/path/google_bleu.py')` -->
+
+## 🌐 网页演示
+
+进入 `src/` 目录，运行以下命令
+
+```bash
+./start_web.sh
+```
+
+在浏览器里输入 `http://localhost:8501`就可以看到web demo
+
+> [!NOTE]
+> 1. 对于Windows用户, 请运行 `start_web.bat`文件。
+> 2. 使用onnxruntime + gpu 推理时，需要安装onnxruntime-gpu
+
+## 🔍 公式检测
+
+TexTeller的公式检测模型在3415张中文教材数据(130+版式)和8272张[IBEM数据集](https://zenodo.org/records/4757865)上训练得到，支持对整张图片进行**公式检测**。
+
+<div align="center">
+    <img src="det_rec.png" width=250> 
+</div>
+
+1. 下载公式检测模型的权重到`src/models/det_model/model/`目录 [[链接](https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco.onnx?download=true)]
+
+2. `src/`目录下运行以下命令，结果保存在`src/subimages/`
+
+   ```bash
+   python infer_det.py
+   ```
+
+<details>
+<summary>更进一步：公式批识别</summary>
+
+在进行**公式检测后**，`src/`目录下运行以下命令
+
+```shell
+python rec_infer_from_crop_imgs.py
+```
+
+会基于上一步公式检测的结果，对裁剪出的所有公式进行批量识别，将识别结果在 `src/results/`中保存为txt文件。
+</details>
+
+## 📡 API调用
+
+我们使用[ray serve](https://github.com/ray-project/ray)来对外提供一个TexTeller的API接口，通过使用这个接口，你可以把TexTeller整合到自己的项目里。要想启动server，你需要先进入 `src/`目录然后运行以下命令:
+
+```bash
+python server.py 
+```
+
+| 参数 | 描述 |
+| --- | --- |
+| `-ckpt` | 权重文件的路径，*默认为TexTeller的预训练权重*。|
+| `-tknz` | 分词器的路径，*默认为TexTeller的分词器*。|
+| `-port` | 服务器的服务端口，*默认是8000*。|
+| `--inference-mode` | 使用"cuda"或"mps"推理，*默认为"cpu"*。|
+| `--num_beams` | beam search的beam数量，*默认是1*。|
+| `--num_replicas` | 在服务器上运行的服务副本数量，*默认1个副本*。你可以使用更多的副本来获取更大的吞吐量。|
+| `--ncpu_per_replica` | 每个服务副本所用的CPU核心数，*默认为1*。|
+| `--ngpu_per_replica` | 每个服务副本所用的GPU数量，*默认为1*。你可以把这个值设置成 0~1之间的数，这样会在一个GPU上运行多个服务副本来共享GPU，从而提高GPU的利用率。(注意，如果 --num_replicas 2, --ngpu_per_replica 0.7, 那么就必须要有2个GPU可用) |
+| `-onnx` | 使用Onnx Runtime进行推理，*默认不使用*。|
+
+> [!NOTE]
+> 一个客户端demo可以在 `TexTeller/client/demo.py`找到，你可以参考 `demo.py`来给server发送请求
+
+## 🏋️‍♂️ 训练
+
+### 数据集
+
+我们在 `src/models/ocr_model/train/dataset/`目录中提供了一个数据集的例子，你可以把自己的图片放在 `images`目录然后在 `formulas.jsonl`中为每张图片标注对应的公式。
+
+准备好数据集后，你需要在 `**/train/dataset/loader.py`中把 **`DIR_URL`变量改成你自己数据集的路径**
+
+### 重新训练分词器
+
+如果你使用了不一样的数据集，你可能需要重新训练tokenizer来得到一个不一样的词典。配置好数据集后，可以通过以下命令来训练自己的tokenizer：
+
+1. 在`src/models/tokenizer/train.py`中，修改`new_tokenizer.save_pretrained('./your_dir_name')`为你自定义的输出目录
+
+   > 注意：如果要用一个不一样大小的词典(默认1.5W个token)，你需要在`src/models/globals.py`中修改`VOCAB_SIZE`变量
+
+2. **在`src/`目录下**运行以下命令:
+
+   ```bash
+   python -m models.tokenizer.train
+   ```
+
+### 训练模型
+
+1. 修改`src/train_config.yaml`中的`num_processes`为训练用的显卡数(默认为1)
+
+2. 在`src/`目录下运行以下命令：
+
+   ```bash
+   accelerate launch --config_file ./train_config.yaml -m models.ocr_model.train.train
+   ```
+
+你可以在`src/models/ocr_model/train/train.py`中设置自己的tokenizer和checkpoint路径（请参考`train.py`）。如果你使用了与TexTeller一样的架构和相同的词典，你还可以用自己的数据集来微调TexTeller的默认权重。
+
+> [!NOTE]
+> 我们的训练脚本使用了[Hugging Face Transformers](https://github.com/huggingface/transformers)库, 所以你可以参考他们提供的[文档](https://huggingface.co/docs/transformers/v4.32.1/main_classes/trainer#transformers.TrainingArguments)来获取更多训练参数的细节以及配置。
+
+## 📅 计划
+
+- [X] ~~使用更大的数据集来训练模型~~
+- [X] ~~扫描图片识别~~
+- [X] ~~中英文场景支持~~
+- [X] ~~手写公式识别~~
+- [ ] PDF文档识别
+- [ ] 推理加速
+
+## ⭐️ 观星曲线
+
+[![Stargazers over time](https://starchart.cc/OleehyO/TexTeller.svg?variant=adaptive)](https://starchart.cc/OleehyO/TexTeller)
+
+## 👥 贡献者
+
+<a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
+   <a href="https://github.com/OleehyO/TexTeller/graphs/contributors">
+      <img src="https://contrib.rocks/image?repo=OleehyO/TexTeller" />
+   </a>
+</a>
--- a/assets/cover.png
+++ b/assets/cover.png
--- a/assets/det_rec.png
+++ b/assets/det_rec.png
--- a/assets/fire.svg
+++ b/assets/fire.svg
@@ -0,0 +1,460 @@
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="" width="200px" height="100px" viewBox="0 0 100 100" preserveAspectRatio="xMidYMid">
+<defs>
+  <filter id="ldio-ekpf7uvh2aq-filter" filterUnits="userSpaceOnUse" x="0" y="0" width="100" height="100">
+    <feGaussianBlur in="SourceGraphic" stdDeviation="3"></feGaussianBlur>
+    <feComponentTransfer result="cutoff">
+      <feFuncA type="linear" slope="10" intercept="-5"></feFuncA>
+    </feComponentTransfer>
+  </filter>
+</defs><g filter="url(#ldio-ekpf7uvh2aq-filter)"><circle cx="45" cy="154.67770829199992" r="42" fill="#e15b64">
+  <animate attributeName="cy" values="154.67770829199992;-27.568110790210763" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7914508173328552s"></animate>
+  <animate attributeName="r" values="42;0;0" keyTimes="0;0.6593879177915443;1" dur="1s" repeatCount="indefinite" begin="-0.7914508173328552s"></animate>
+</circle><circle cx="53" cy="156.51873756667007" r="43" fill="#e15b64">
+  <animate attributeName="cy" values="156.51873756667007;-28.593472199379597" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8990601299952956s"></animate>
+  <animate attributeName="r" values="43;0;0" keyTimes="0;0.9199190750649376;1" dur="1s" repeatCount="indefinite" begin="-0.8990601299952956s"></animate>
+</circle><circle cx="22" cy="118.4676277511406" r="6" fill="#e15b64">
+  <animate attributeName="cy" values="118.4676277511406;-1.812134766063739" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2574158626531723s"></animate>
+  <animate attributeName="r" values="6;0;0" keyTimes="0;0.7424894336620584;1" dur="1s" repeatCount="indefinite" begin="-0.2574158626531723s"></animate>
+</circle><circle cx="56" cy="143.3980016480395" r="34" fill="#e15b64">
+  <animate attributeName="cy" values="143.3980016480395;-23.264651741765398" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5292591072219247s"></animate>
+  <animate attributeName="r" values="34;0;0" keyTimes="0;0.8257208789488842;1" dur="1s" repeatCount="indefinite" begin="-0.5292591072219247s"></animate>
+</circle><circle cx="43" cy="154.61226210156264" r="43" fill="#e15b64">
+  <animate attributeName="cy" values="154.61226210156264;-39.72257238426019" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9349241678635103s"></animate>
+  <animate attributeName="r" values="43;0;0" keyTimes="0;0.6655411648349204;1" dur="1s" repeatCount="indefinite" begin="-0.9349241678635103s"></animate>
+</circle><circle cx="36" cy="141.18233539125538" r="23" fill="#e15b64">
+  <animate attributeName="cy" values="141.18233539125538;-11.919782601799477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9661184430026497s"></animate>
+  <animate attributeName="r" values="23;0;0" keyTimes="0;0.7340510315067473;1" dur="1s" repeatCount="indefinite" begin="-0.9661184430026497s"></animate>
+</circle><circle cx="55" cy="137.61381349909033" r="35" fill="#e15b64">
+  <animate attributeName="cy" values="137.61381349909033;-27.023105799592948" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7882390392923937s"></animate>
+  <animate attributeName="r" values="35;0;0" keyTimes="0;0.5596286394923506;1" dur="1s" repeatCount="indefinite" begin="-0.7882390392923937s"></animate>
+</circle><circle cx="81" cy="116.42482869722863" r="6" fill="#e15b64">
+  <animate attributeName="cy" values="116.42482869722863;2.642571962973477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6838551001109257s"></animate>
+  <animate attributeName="r" values="6;0;0" keyTimes="0;0.8530428185299654;1" dur="1s" repeatCount="indefinite" begin="-0.6838551001109257s"></animate>
+</circle><circle cx="51" cy="144.1337397120671" r="41" fill="#e15b64">
+  <animate attributeName="cy" values="144.1337397120671;-35.62888188299487" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8931867510460544s"></animate>
+  <animate attributeName="r" values="41;0;0" keyTimes="0;0.9351064787950636;1" dur="1s" repeatCount="indefinite" begin="-0.8931867510460544s"></animate>
+</circle><circle cx="22" cy="127.94124738258117" r="20" fill="#e15b64">
+  <animate attributeName="cy" values="127.94124738258117;-4.588101238414598" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9129507531699166s"></animate>
+  <animate attributeName="r" values="20;0;0" keyTimes="0;0.9626971761152365;1" dur="1s" repeatCount="indefinite" begin="-0.9129507531699166s"></animate>
+</circle><circle cx="51" cy="130.13871763314205" r="21" fill="#e15b64">
+  <animate attributeName="cy" values="130.13871763314205;-2.771870373434613" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.16276671760313832s"></animate>
+  <animate attributeName="r" values="21;0;0" keyTimes="0;0.6367210977937845;1" dur="1s" repeatCount="indefinite" begin="-0.16276671760313832s"></animate>
+</circle><circle cx="28" cy="130.94671647108635" r="26" fill="#e15b64">
+  <animate attributeName="cy" values="130.94671647108635;-20.54470862263146" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.010777607623041363s"></animate>
+  <animate attributeName="r" values="26;0;0" keyTimes="0;0.5986827903483527;1" dur="1s" repeatCount="indefinite" begin="-0.010777607623041363s"></animate>
+</circle><circle cx="32" cy="133.57559887485095" r="18" fill="#e15b64">
+  <animate attributeName="cy" values="133.57559887485095;-13.998747273650661" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6849903294560423s"></animate>
+  <animate attributeName="r" values="18;0;0" keyTimes="0;0.9272684317035897;1" dur="1s" repeatCount="indefinite" begin="-0.6849903294560423s"></animate>
+</circle><circle cx="50" cy="129.2368025879272" r="29" fill="#e15b64">
+  <animate attributeName="cy" values="129.2368025879272;-21.38222818211007" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2570532837614655s"></animate>
+  <animate attributeName="r" values="29;0;0" keyTimes="0;0.5349692982819836;1" dur="1s" repeatCount="indefinite" begin="-0.2570532837614655s"></animate>
+</circle><circle cx="54" cy="147.67203918209864" r="32" fill="#e15b64">
+  <animate attributeName="cy" values="147.67203918209864;-23.292000640460095" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8840781999829185s"></animate>
+  <animate attributeName="r" values="32;0;0" keyTimes="0;0.9905440228534627;1" dur="1s" repeatCount="indefinite" begin="-0.8840781999829185s"></animate>
+</circle><circle cx="49" cy="156.33097983975816" r="43" fill="#e15b64">
+  <animate attributeName="cy" values="156.33097983975816;-30.688836209655307" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6363282840605137s"></animate>
+  <animate attributeName="r" values="43;0;0" keyTimes="0;0.578321371334853;1" dur="1s" repeatCount="indefinite" begin="-0.6363282840605137s"></animate>
+</circle><circle cx="53" cy="150.73132612778645" r="38" fill="#e15b64">
+  <animate attributeName="cy" values="150.73132612778645;-24.243875812169208" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6889884148164682s"></animate>
+  <animate attributeName="r" values="38;0;0" keyTimes="0;0.9820908894527897;1" dur="1s" repeatCount="indefinite" begin="-0.6889884148164682s"></animate>
+</circle><circle cx="58" cy="136.92364235316566" r="30" fill="#e15b64">
+  <animate attributeName="cy" values="136.92364235316566;-14.514104757207221" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3274028295945308s"></animate>
+  <animate attributeName="r" values="30;0;0" keyTimes="0;0.9109990458833535;1" dur="1s" repeatCount="indefinite" begin="-0.3274028295945308s"></animate>
+</circle><circle cx="21" cy="125.47085228007643" r="18" fill="#e15b64">
+  <animate attributeName="cy" values="125.47085228007643;-8.232426956653288" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.11103461733078768s"></animate>
+  <animate attributeName="r" values="18;0;0" keyTimes="0;0.7718042613876622;1" dur="1s" repeatCount="indefinite" begin="-0.11103461733078768s"></animate>
+</circle><circle cx="57" cy="154.13251799723747" r="37" fill="#e15b64">
+  <animate attributeName="cy" values="154.13251799723747;-18.665203993986026" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8263441768461145s"></animate>
+  <animate attributeName="r" values="37;0;0" keyTimes="0;0.7148325280461965;1" dur="1s" repeatCount="indefinite" begin="-0.8263441768461145s"></animate>
+</circle><circle cx="52" cy="163.55969451733722" r="47" fill="#e15b64">
+  <animate attributeName="cy" values="163.55969451733722;-45.32343944696123" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.08605155305311041s"></animate>
+  <animate attributeName="r" values="47;0;0" keyTimes="0;0.8554524873372089;1" dur="1s" repeatCount="indefinite" begin="-0.08605155305311041s"></animate>
+</circle><circle cx="43" cy="150.72861891310126" r="42" fill="#e15b64">
+  <animate attributeName="cy" values="150.72861891310126;-23.942286768617272" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8013052401764136s"></animate>
+  <animate attributeName="r" values="42;0;0" keyTimes="0;0.6681090498432822;1" dur="1s" repeatCount="indefinite" begin="-0.8013052401764136s"></animate>
+</circle><circle cx="62" cy="109.2607457626771" r="2" fill="#e15b64">
+  <animate attributeName="cy" values="109.2607457626771;3.194634855160243" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7901767326521292s"></animate>
+  <animate attributeName="r" values="2;0;0" keyTimes="0;0.7018579919397697;1" dur="1s" repeatCount="indefinite" begin="-0.7901767326521292s"></animate>
+</circle><circle cx="29" cy="132.04950518708117" r="26" fill="#e15b64">
+  <animate attributeName="cy" values="132.04950518708117;-24.268419710129816" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9729317633977274s"></animate>
+  <animate attributeName="r" values="26;0;0" keyTimes="0;0.8277305604086497;1" dur="1s" repeatCount="indefinite" begin="-0.9729317633977274s"></animate>
+</circle><circle cx="54" cy="150.69697127653222" r="41" fill="#e15b64">
+  <animate attributeName="cy" values="150.69697127653222;-27.168516505190766" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5902016146688314s"></animate>
+  <animate attributeName="r" values="41;0;0" keyTimes="0;0.8175867220161461;1" dur="1s" repeatCount="indefinite" begin="-0.5902016146688314s"></animate>
+</circle><circle cx="50" cy="115.01352405454155" r="7" fill="#e15b64">
+  <animate attributeName="cy" values="115.01352405454155;-4.5076288690789195" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5091907734741129s"></animate>
+  <animate attributeName="r" values="7;0;0" keyTimes="0;0.6751846924914742;1" dur="1s" repeatCount="indefinite" begin="-0.5091907734741129s"></animate>
+</circle><circle cx="65" cy="137.6419430633514" r="34" fill="#e15b64">
+  <animate attributeName="cy" values="137.6419430633514;-17.00344965868893" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.34747192063247945s"></animate>
+  <animate attributeName="r" values="34;0;0" keyTimes="0;0.5212737600536792;1" dur="1s" repeatCount="indefinite" begin="-0.34747192063247945s"></animate>
+</circle><circle cx="34" cy="127.0455079544209" r="14" fill="#e15b64">
+  <animate attributeName="cy" values="127.0455079544209;-3.6990759299641454" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4890615261218786s"></animate>
+  <animate attributeName="r" values="14;0;0" keyTimes="0;0.6183470012170013;1" dur="1s" repeatCount="indefinite" begin="-0.4890615261218786s"></animate>
+</circle><circle cx="12" cy="120.43345098845494" r="3" fill="#e15b64">
+  <animate attributeName="cy" values="120.43345098845494;9.74374931913883" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3026505339978601s"></animate>
+  <animate attributeName="r" values="3;0;0" keyTimes="0;0.5414300978949788;1" dur="1s" repeatCount="indefinite" begin="-0.3026505339978601s"></animate>
+</circle><circle cx="49" cy="161.35205628493102" r="43" fill="#e15b64">
+  <animate attributeName="cy" values="161.35205628493102;-37.872089939512506" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.38741962448531564s"></animate>
+  <animate attributeName="r" values="43;0;0" keyTimes="0;0.5096615889177538;1" dur="1s" repeatCount="indefinite" begin="-0.38741962448531564s"></animate>
+</circle><circle cx="54" cy="146.5769009919314" r="44" fill="#e15b64">
+  <animate attributeName="cy" values="146.5769009919314;-38.33530354334875" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.34335748774106034s"></animate>
+  <animate attributeName="r" values="44;0;0" keyTimes="0;0.743420827137904;1" dur="1s" repeatCount="indefinite" begin="-0.34335748774106034s"></animate>
+</circle><circle cx="20" cy="111.24659457696168" r="7" fill="#e15b64">
+  <animate attributeName="cy" values="111.24659457696168;10.851798254886354" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6282307990647713s"></animate>
+  <animate attributeName="r" values="7;0;0" keyTimes="0;0.8297799829349941;1" dur="1s" repeatCount="indefinite" begin="-0.6282307990647713s"></animate>
+</circle><circle cx="50" cy="164.0676485495781" r="45" fill="#e15b64">
+  <animate attributeName="cy" values="164.0676485495781;-31.499414285176986" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7760446285439819s"></animate>
+  <animate attributeName="r" values="45;0;0" keyTimes="0;0.5740694195049653;1" dur="1s" repeatCount="indefinite" begin="-0.7760446285439819s"></animate>
+</circle><circle cx="63" cy="121.15583070803987" r="16" fill="#e15b64">
+  <animate attributeName="cy" values="121.15583070803987;-2.1042758907266066" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2305276534763374s"></animate>
+  <animate attributeName="r" values="16;0;0" keyTimes="0;0.5205278426126575;1" dur="1s" repeatCount="indefinite" begin="-0.2305276534763374s"></animate>
+</circle><circle cx="70" cy="143.94247592516618" r="29" fill="#e15b64">
+  <animate attributeName="cy" values="143.94247592516618;-23.62297573618442" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5284797120514513s"></animate>
+  <animate attributeName="r" values="29;0;0" keyTimes="0;0.9336811516026573;1" dur="1s" repeatCount="indefinite" begin="-0.5284797120514513s"></animate>
+</circle><circle cx="21" cy="122.79868387744153" r="20" fill="#e15b64">
+  <animate attributeName="cy" values="122.79868387744153;-13.104461771681535" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8845782118773111s"></animate>
+  <animate attributeName="r" values="20;0;0" keyTimes="0;0.904216846935756;1" dur="1s" repeatCount="indefinite" begin="-0.8845782118773111s"></animate>
+</circle><circle cx="46" cy="143.70707265719267" r="24" fill="#e15b64">
+  <animate attributeName="cy" values="143.70707265719267;-20.28891701845349" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23245576862802375s"></animate>
+  <animate attributeName="r" values="24;0;0" keyTimes="0;0.6586288079548765;1" dur="1s" repeatCount="indefinite" begin="-0.23245576862802375s"></animate>
+</circle><circle cx="65" cy="140.13731645312657" r="22" fill="#e15b64">
+  <animate attributeName="cy" values="140.13731645312657;-5.338876455584764" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7182419259629308s"></animate>
+  <animate attributeName="r" values="22;0;0" keyTimes="0;0.8813907372203135;1" dur="1s" repeatCount="indefinite" begin="-0.7182419259629308s"></animate>
+</circle><circle cx="37" cy="139.00958710472267" r="35" fill="#e15b64">
+  <animate attributeName="cy" values="139.00958710472267;-25.68265144780311" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7030100698848409s"></animate>
+  <animate attributeName="r" values="35;0;0" keyTimes="0;0.7320613459176248;1" dur="1s" repeatCount="indefinite" begin="-0.7030100698848409s"></animate>
+</circle><circle cx="45" cy="146.6744507961619" r="44" fill="#e15b64">
+  <animate attributeName="cy" values="146.6744507961619;-38.087338695486295" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8319540053556033s"></animate>
+  <animate attributeName="r" values="44;0;0" keyTimes="0;0.5904241586083279;1" dur="1s" repeatCount="indefinite" begin="-0.8319540053556033s"></animate>
+</circle><circle cx="53" cy="116.16529146873187" r="15" fill="#e15b64">
+  <animate attributeName="cy" values="116.16529146873187;-3.17669223153381" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7864341362651808s"></animate>
+  <animate attributeName="r" values="15;0;0" keyTimes="0;0.589186107816807;1" dur="1s" repeatCount="indefinite" begin="-0.7864341362651808s"></animate>
+</circle><circle cx="29" cy="141.6902909599232" r="23" fill="#e15b64">
+  <animate attributeName="cy" values="141.6902909599232;-16.250272669063218" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.18084365714200346s"></animate>
+  <animate attributeName="r" values="23;0;0" keyTimes="0;0.8116571311237253;1" dur="1s" repeatCount="indefinite" begin="-0.18084365714200346s"></animate>
+</circle><circle cx="65" cy="143.73302386926983" r="32" fill="#e15b64">
+  <animate attributeName="cy" values="143.73302386926983;-24.229369251904558" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5786484558188305s"></animate>
+  <animate attributeName="r" values="32;0;0" keyTimes="0;0.8515606125902615;1" dur="1s" repeatCount="indefinite" begin="-0.5786484558188305s"></animate>
+</circle><circle cx="39" cy="143.3951504366216" r="33" fill="#e15b64">
+  <animate attributeName="cy" values="143.3951504366216;-27.75171362166084" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.1481578769905092s"></animate>
+  <animate attributeName="r" values="33;0;0" keyTimes="0;0.797255218191478;1" dur="1s" repeatCount="indefinite" begin="-0.1481578769905092s"></animate>
+</circle><circle cx="59" cy="129.28605384114482" r="27" fill="#e15b64">
+  <animate attributeName="cy" values="129.28605384114482;-12.095864862844131" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23581997562886903s"></animate>
+  <animate attributeName="r" values="27;0;0" keyTimes="0;0.8271538616610963;1" dur="1s" repeatCount="indefinite" begin="-0.23581997562886903s"></animate>
+</circle><circle cx="70" cy="144.09835508207823" r="28" fill="#e15b64">
+  <animate attributeName="cy" values="144.09835508207823;-13.162793363728145" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23606519556482253s"></animate>
+  <animate attributeName="r" values="28;0;0" keyTimes="0;0.73085815703799;1" dur="1s" repeatCount="indefinite" begin="-0.23606519556482253s"></animate>
+</circle><circle cx="48" cy="145.01565757702042" r="44" fill="#e15b64">
+  <animate attributeName="cy" values="145.01565757702042;-32.30510020024561" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8615348704203486s"></animate>
+  <animate attributeName="r" values="44;0;0" keyTimes="0;0.9694373671371078;1" dur="1s" repeatCount="indefinite" begin="-0.8615348704203486s"></animate>
+</circle><circle cx="95" cy="113.78554320990165" r="4" fill="#e15b64">
+  <animate attributeName="cy" values="113.78554320990165;-1.2652564238335904" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.21370544900580335s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.5334621383741172;1" dur="1s" repeatCount="indefinite" begin="-0.21370544900580335s"></animate>
+</circle><circle cx="57" cy="136.06708935936715" r="34" fill="#e15b64">
+  <animate attributeName="cy" values="136.06708935936715;-19.758990054858902" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7755376997281404s"></animate>
+  <animate attributeName="r" values="34;0;0" keyTimes="0;0.9943252777203475;1" dur="1s" repeatCount="indefinite" begin="-0.7755376997281404s"></animate>
+</circle><circle cx="72" cy="123.8422572942333" r="19" fill="#e15b64">
+  <animate attributeName="cy" values="123.8422572942333;-1.0000700639794928" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9670461872772004s"></animate>
+  <animate attributeName="r" values="19;0;0" keyTimes="0;0.7801926792335607;1" dur="1s" repeatCount="indefinite" begin="-0.9670461872772004s"></animate>
+</circle></g><g filter="url(#ldio-ekpf7uvh2aq-filter)"><circle cx="27" cy="136.75172282051147" r="17" fill="#f47e60">
+  <animate attributeName="cy" values="136.75172282051147;-5.48853662281188" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4403846891955857s"></animate>
+  <animate attributeName="r" values="17;0;0" keyTimes="0;0.7894732341719188;1" dur="1s" repeatCount="indefinite" begin="-0.4403846891955857s"></animate>
+</circle><circle cx="34" cy="132.08290473906044" r="28" fill="#f47e60">
+  <animate attributeName="cy" values="132.08290473906044;-16.339029232048958" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7882134883361418s"></animate>
+  <animate attributeName="r" values="28;0;0" keyTimes="0;0.5035175026787356;1" dur="1s" repeatCount="indefinite" begin="-0.7882134883361418s"></animate>
+</circle><circle cx="66" cy="127.45606892584162" r="23" fill="#f47e60">
+  <animate attributeName="cy" values="127.45606892584162;-11.56763185745981" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23537267190332678s"></animate>
+  <animate attributeName="r" values="23;0;0" keyTimes="0;0.7818578332234903;1" dur="1s" repeatCount="indefinite" begin="-0.23537267190332678s"></animate>
+</circle><circle cx="29" cy="124.28337961013858" r="15" fill="#f47e60">
+  <animate attributeName="cy" values="124.28337961013858;0.8461921465181206" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.30918442080681285s"></animate>
+  <animate attributeName="r" values="15;0;0" keyTimes="0;0.9741475377259025;1" dur="1s" repeatCount="indefinite" begin="-0.30918442080681285s"></animate>
+</circle><circle cx="61" cy="147.91603256008383" r="31" fill="#f47e60">
+  <animate attributeName="cy" values="147.91603256008383;-14.754981670358578" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.0033816756583812113s"></animate>
+  <animate attributeName="r" values="31;0;0" keyTimes="0;0.6463193577485268;1" dur="1s" repeatCount="indefinite" begin="-0.0033816756583812113s"></animate>
+</circle><circle cx="25" cy="120.64483537229628" r="9" fill="#f47e60">
+  <animate attributeName="cy" values="120.64483537229628;-7.193123212298179" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6891092543031828s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.8637808572418493;1" dur="1s" repeatCount="indefinite" begin="-0.6891092543031828s"></animate>
+</circle><circle cx="12" cy="121.18727231753691" r="4" fill="#f47e60">
+  <animate attributeName="cy" values="121.18727231753691;15.883181236637633" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.24454851002004097s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.8215012014926046;1" dur="1s" repeatCount="indefinite" begin="-0.24454851002004097s"></animate>
+</circle><circle cx="58" cy="136.64954415018815" r="19" fill="#f47e60">
+  <animate attributeName="cy" values="136.64954415018815;-13.637628862199563" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7672442553828805s"></animate>
+  <animate attributeName="r" values="19;0;0" keyTimes="0;0.7534841891330046;1" dur="1s" repeatCount="indefinite" begin="-0.7672442553828805s"></animate>
+</circle><circle cx="69" cy="120.72538023727738" r="10" fill="#f47e60">
+  <animate attributeName="cy" values="120.72538023727738;-5.651458016294906" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6587915764098667s"></animate>
+  <animate attributeName="r" values="10;0;0" keyTimes="0;0.5977129956186352;1" dur="1s" repeatCount="indefinite" begin="-0.6587915764098667s"></animate>
+</circle><circle cx="46" cy="122.63158963579554" r="20" fill="#f47e60">
+  <animate attributeName="cy" values="122.63158963579554;-8.99196405151625" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3698350873089088s"></animate>
+  <animate attributeName="r" values="20;0;0" keyTimes="0;0.5563937567659611;1" dur="1s" repeatCount="indefinite" begin="-0.3698350873089088s"></animate>
+</circle><circle cx="7" cy="121.15700947168602" r="2" fill="#f47e60">
+  <animate attributeName="cy" values="121.15700947168602;0.605011189845321" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.514133243834255s"></animate>
+  <animate attributeName="r" values="2;0;0" keyTimes="0;0.7510335363256938;1" dur="1s" repeatCount="indefinite" begin="-0.514133243834255s"></animate>
+</circle><circle cx="19" cy="117.69071117783832" r="7" fill="#f47e60">
+  <animate attributeName="cy" values="117.69071117783832;-2.4512162536532234" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4163222368875168s"></animate>
+  <animate attributeName="r" values="7;0;0" keyTimes="0;0.9697983093212361;1" dur="1s" repeatCount="indefinite" begin="-0.4163222368875168s"></animate>
+</circle><circle cx="34" cy="122.22172344680293" r="22" fill="#f47e60">
+  <animate attributeName="cy" values="122.22172344680293;-14.875000336072436" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8346904488502503s"></animate>
+  <animate attributeName="r" values="22;0;0" keyTimes="0;0.9284864899458874;1" dur="1s" repeatCount="indefinite" begin="-0.8346904488502503s"></animate>
+</circle><circle cx="48" cy="118.34245443793573" r="12" fill="#f47e60">
+  <animate attributeName="cy" values="118.34245443793573;6.1569446890589035" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7372012265846987s"></animate>
+  <animate attributeName="r" values="12;0;0" keyTimes="0;0.9146509122657862;1" dur="1s" repeatCount="indefinite" begin="-0.7372012265846987s"></animate>
+</circle><circle cx="38" cy="108.37260349538107" r="4" fill="#f47e60">
+  <animate attributeName="cy" values="108.37260349538107;-3.9166184571860483" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6955752887050161s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.9793871272170744;1" dur="1s" repeatCount="indefinite" begin="-0.6955752887050161s"></animate>
+</circle><circle cx="50" cy="120.05611377372627" r="20" fill="#f47e60">
+  <animate attributeName="cy" values="120.05611377372627;-19.59128463520709" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8198691615147322s"></animate>
+  <animate attributeName="r" values="20;0;0" keyTimes="0;0.6017320767396992;1" dur="1s" repeatCount="indefinite" begin="-0.8198691615147322s"></animate>
+</circle><circle cx="69" cy="133.11553485199934" r="21" fill="#f47e60">
+  <animate attributeName="cy" values="133.11553485199934;-7.230262198733577" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6502042470386947s"></animate>
+  <animate attributeName="r" values="21;0;0" keyTimes="0;0.9802383350633911;1" dur="1s" repeatCount="indefinite" begin="-0.6502042470386947s"></animate>
+</circle><circle cx="60" cy="138.10205797824347" r="31" fill="#f47e60">
+  <animate attributeName="cy" values="138.10205797824347;-21.149182634283513" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8527464543018912s"></animate>
+  <animate attributeName="r" values="31;0;0" keyTimes="0;0.5593223005306734;1" dur="1s" repeatCount="indefinite" begin="-0.8527464543018912s"></animate>
+</circle><circle cx="72" cy="121.45841247692351" r="16" fill="#f47e60">
+  <animate attributeName="cy" values="121.45841247692351;-5.0851516529984195" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4077549975882817s"></animate>
+  <animate attributeName="r" values="16;0;0" keyTimes="0;0.5763111141098053;1" dur="1s" repeatCount="indefinite" begin="-0.4077549975882817s"></animate>
+</circle><circle cx="56" cy="118.12349945951125" r="10" fill="#f47e60">
+  <animate attributeName="cy" values="118.12349945951125;-7.082779421666896" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.21747152423150562s"></animate>
+  <animate attributeName="r" values="10;0;0" keyTimes="0;0.6868094744383062;1" dur="1s" repeatCount="indefinite" begin="-0.21747152423150562s"></animate>
+</circle><circle cx="77" cy="119.41951761904794" r="17" fill="#f47e60">
+  <animate attributeName="cy" values="119.41951761904794;-9.114276721599797" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.48345793287516814s"></animate>
+  <animate attributeName="r" values="17;0;0" keyTimes="0;0.5135663211192452;1" dur="1s" repeatCount="indefinite" begin="-0.48345793287516814s"></animate>
+</circle><circle cx="78" cy="125.60192795392818" r="11" fill="#f47e60">
+  <animate attributeName="cy" values="125.60192795392818;-6.73068982191926" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.23667812050200931s"></animate>
+  <animate attributeName="r" values="11;0;0" keyTimes="0;0.9898092475181265;1" dur="1s" repeatCount="indefinite" begin="-0.23667812050200931s"></animate>
+</circle><circle cx="51" cy="138.224179154187" r="24" fill="#f47e60">
+  <animate attributeName="cy" values="138.224179154187;-8.55653503677315" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5735700676741093s"></animate>
+  <animate attributeName="r" values="24;0;0" keyTimes="0;0.9566960986989479;1" dur="1s" repeatCount="indefinite" begin="-0.5735700676741093s"></animate>
+</circle><circle cx="41" cy="131.14944604607328" r="21" fill="#f47e60">
+  <animate attributeName="cy" values="131.14944604607328;-17.847508222350655" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.07696580759865079s"></animate>
+  <animate attributeName="r" values="21;0;0" keyTimes="0;0.6865631531399743;1" dur="1s" repeatCount="indefinite" begin="-0.07696580759865079s"></animate>
+</circle><circle cx="49" cy="128.787268826053" r="17" fill="#f47e60">
+  <animate attributeName="cy" values="128.787268826053;1.143259231969072" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7890428937034474s"></animate>
+  <animate attributeName="r" values="17;0;0" keyTimes="0;0.5926722445396657;1" dur="1s" repeatCount="indefinite" begin="-0.7890428937034474s"></animate>
+</circle><circle cx="17" cy="120.22416295842616" r="13" fill="#f47e60">
+  <animate attributeName="cy" values="120.22416295842616;5.932998615440596" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.25642472915187764s"></animate>
+  <animate attributeName="r" values="13;0;0" keyTimes="0;0.5738477034101163;1" dur="1s" repeatCount="indefinite" begin="-0.25642472915187764s"></animate>
+</circle><circle cx="73" cy="127.02191586426626" r="24" fill="#f47e60">
+  <animate attributeName="cy" values="127.02191586426626;-19.34982189589097" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9257599774553938s"></animate>
+  <animate attributeName="r" values="24;0;0" keyTimes="0;0.6060248140675957;1" dur="1s" repeatCount="indefinite" begin="-0.9257599774553938s"></animate>
+</circle><circle cx="29" cy="122.37303701766326" r="22" fill="#f47e60">
+  <animate attributeName="cy" values="122.37303701766326;-17.181874655618834" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.11979523584713825s"></animate>
+  <animate attributeName="r" values="22;0;0" keyTimes="0;0.5778892301319281;1" dur="1s" repeatCount="indefinite" begin="-0.11979523584713825s"></animate>
+</circle><circle cx="30" cy="132.91741320840808" r="18" fill="#f47e60">
+  <animate attributeName="cy" values="132.91741320840808;0.24294121648419775" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6890213202603488s"></animate>
+  <animate attributeName="r" values="18;0;0" keyTimes="0;0.8587373770805918;1" dur="1s" repeatCount="indefinite" begin="-0.6890213202603488s"></animate>
+</circle><circle cx="80" cy="116.72839679840811" r="14" fill="#f47e60">
+  <animate attributeName="cy" values="116.72839679840811;4.82183707831593" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.08182847032405782s"></animate>
+  <animate attributeName="r" values="14;0;0" keyTimes="0;0.6809633164153448;1" dur="1s" repeatCount="indefinite" begin="-0.08182847032405782s"></animate>
+</circle><circle cx="31" cy="125.20247260666616" r="13" fill="#f47e60">
+  <animate attributeName="cy" values="125.20247260666616;2.008326413572634" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8369662812852767s"></animate>
+  <animate attributeName="r" values="13;0;0" keyTimes="0;0.5845779670186058;1" dur="1s" repeatCount="indefinite" begin="-0.8369662812852767s"></animate>
+</circle><circle cx="60" cy="125.0794549947879" r="16" fill="#f47e60">
+  <animate attributeName="cy" values="125.0794549947879;0.7338248372355807" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8948237868324189s"></animate>
+  <animate attributeName="r" values="16;0;0" keyTimes="0;0.9120596722058173;1" dur="1s" repeatCount="indefinite" begin="-0.8948237868324189s"></animate>
+</circle><circle cx="25" cy="126.90612837175388" r="8" fill="#f47e60">
+  <animate attributeName="cy" values="126.90612837175388;4.0472618983783715" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.39581604043317986s"></animate>
+  <animate attributeName="r" values="8;0;0" keyTimes="0;0.8074064845720312;1" dur="1s" repeatCount="indefinite" begin="-0.39581604043317986s"></animate>
+</circle><circle cx="37" cy="131.42028038990128" r="25" fill="#f47e60">
+  <animate attributeName="cy" values="131.42028038990128;-22.403977227715075" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.04301794169924622s"></animate>
+  <animate attributeName="r" values="25;0;0" keyTimes="0;0.524891315929541;1" dur="1s" repeatCount="indefinite" begin="-0.04301794169924622s"></animate>
+</circle><circle cx="41" cy="149.05000141391616" r="31" fill="#f47e60">
+  <animate attributeName="cy" values="149.05000141391616;-19.10046896539864" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7213401886638007s"></animate>
+  <animate attributeName="r" values="31;0;0" keyTimes="0;0.6890520162965066;1" dur="1s" repeatCount="indefinite" begin="-0.7213401886638007s"></animate>
+</circle><circle cx="36" cy="138.58798523568342" r="27" fill="#f47e60">
+  <animate attributeName="cy" values="138.58798523568342;-15.572058043829461" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.40556498158772736s"></animate>
+  <animate attributeName="r" values="27;0;0" keyTimes="0;0.8506348676044777;1" dur="1s" repeatCount="indefinite" begin="-0.40556498158772736s"></animate>
+</circle><circle cx="78" cy="137.9707233461312" r="20" fill="#f47e60">
+  <animate attributeName="cy" values="137.9707233461312;-3.6945948738885512" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8880631706610672s"></animate>
+  <animate attributeName="r" values="20;0;0" keyTimes="0;0.9304971995517395;1" dur="1s" repeatCount="indefinite" begin="-0.8880631706610672s"></animate>
+</circle><circle cx="79" cy="134.71673525431498" r="18" fill="#f47e60">
+  <animate attributeName="cy" values="134.71673525431498;-10.261412982322742" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.2848983056723242s"></animate>
+  <animate attributeName="r" values="18;0;0" keyTimes="0;0.7526875949615255;1" dur="1s" repeatCount="indefinite" begin="-0.2848983056723242s"></animate>
+</circle><circle cx="82" cy="111.49802891873294" r="5" fill="#f47e60">
+  <animate attributeName="cy" values="111.49802891873294;12.140748225430922" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.40945179236345397s"></animate>
+  <animate attributeName="r" values="5;0;0" keyTimes="0;0.703997116139137;1" dur="1s" repeatCount="indefinite" begin="-0.40945179236345397s"></animate>
+</circle><circle cx="68" cy="140.96466884045572" r="22" fill="#f47e60">
+  <animate attributeName="cy" values="140.96466884045572;-4.079142984351218" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.40439383112303107s"></animate>
+  <animate attributeName="r" values="22;0;0" keyTimes="0;0.5493704483007363;1" dur="1s" repeatCount="indefinite" begin="-0.40439383112303107s"></animate>
+</circle><circle cx="41" cy="116.24169615516264" r="16" fill="#f47e60">
+  <animate attributeName="cy" values="116.24169615516264;-13.644720096932094" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.22449184929827926s"></animate>
+  <animate attributeName="r" values="16;0;0" keyTimes="0;0.6587866247823291;1" dur="1s" repeatCount="indefinite" begin="-0.22449184929827926s"></animate>
+</circle><circle cx="20" cy="124.66929057881916" r="15" fill="#f47e60">
+  <animate attributeName="cy" values="124.66929057881916;2.5505611618972814" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.017560126563357925s"></animate>
+  <animate attributeName="r" values="15;0;0" keyTimes="0;0.6128429739262174;1" dur="1s" repeatCount="indefinite" begin="-0.017560126563357925s"></animate>
+</circle><circle cx="63" cy="126.5115900704738" r="26" fill="#f47e60">
+  <animate attributeName="cy" values="126.5115900704738;-20.921901271813873" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5285257319858678s"></animate>
+  <animate attributeName="r" values="26;0;0" keyTimes="0;0.9007468611639214;1" dur="1s" repeatCount="indefinite" begin="-0.5285257319858678s"></animate>
+</circle><circle cx="90" cy="111.61440083571019" r="6" fill="#f47e60">
+  <animate attributeName="cy" values="111.61440083571019;11.61930520437923" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8167452043810126s"></animate>
+  <animate attributeName="r" values="6;0;0" keyTimes="0;0.9810779841180124;1" dur="1s" repeatCount="indefinite" begin="-0.8167452043810126s"></animate>
+</circle><circle cx="78" cy="122.50775060552778" r="20" fill="#f47e60">
+  <animate attributeName="cy" values="122.50775060552778;-4.59807973956865" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.11755589684814727s"></animate>
+  <animate attributeName="r" values="20;0;0" keyTimes="0;0.6705237343698631;1" dur="1s" repeatCount="indefinite" begin="-0.11755589684814727s"></animate>
+</circle><circle cx="31" cy="127.90703241028092" r="9" fill="#f47e60">
+  <animate attributeName="cy" values="127.90703241028092;0.829718008041219" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5851309189776632s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.6889560303799027;1" dur="1s" repeatCount="indefinite" begin="-0.5851309189776632s"></animate>
+</circle><circle cx="65" cy="117.43435709704966" r="4" fill="#f47e60">
+  <animate attributeName="cy" values="117.43435709704966;15.28596080488979" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8492165554334472s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.5287459347086204;1" dur="1s" repeatCount="indefinite" begin="-0.8492165554334472s"></animate>
+</circle><circle cx="89" cy="122.93132420091489" r="3" fill="#f47e60">
+  <animate attributeName="cy" values="122.93132420091489;5.980513428860888" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.06884209677796871s"></animate>
+  <animate attributeName="r" values="3;0;0" keyTimes="0;0.5868616814040618;1" dur="1s" repeatCount="indefinite" begin="-0.06884209677796871s"></animate>
+</circle><circle cx="68" cy="129.1441504106191" r="26" fill="#f47e60">
+  <animate attributeName="cy" values="129.1441504106191;-22.781245889673905" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.26191875209122073s"></animate>
+  <animate attributeName="r" values="26;0;0" keyTimes="0;0.6200648439404779;1" dur="1s" repeatCount="indefinite" begin="-0.26191875209122073s"></animate>
+</circle><circle cx="22" cy="130.63745849588264" r="20" fill="#f47e60">
+  <animate attributeName="cy" values="130.63745849588264;-10.695329441338862" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6192951915425052s"></animate>
+  <animate attributeName="r" values="20;0;0" keyTimes="0;0.6969346125529845;1" dur="1s" repeatCount="indefinite" begin="-0.6192951915425052s"></animate>
+</circle></g><g filter="url(#ldio-ekpf7uvh2aq-filter)"><circle cx="57" cy="123.68953191890479" r="12" fill="#f8b26a">
+  <animate attributeName="cy" values="123.68953191890479;4.854991577389438" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9097135632734302s"></animate>
+  <animate attributeName="r" values="12;0;0" keyTimes="0;0.9463910575266388;1" dur="1s" repeatCount="indefinite" begin="-0.9097135632734302s"></animate>
+</circle><circle cx="24" cy="124.54645838615471" r="12" fill="#f8b26a">
+  <animate attributeName="cy" values="124.54645838615471;-11.813810322332547" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.007050694143823311s"></animate>
+  <animate attributeName="r" values="12;0;0" keyTimes="0;0.7078891674964196;1" dur="1s" repeatCount="indefinite" begin="-0.007050694143823311s"></animate>
+</circle><circle cx="54" cy="110.08044357995595" r="3" fill="#f8b26a">
+  <animate attributeName="cy" values="110.08044357995595;13.402947007936334" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.994432759852213s"></animate>
+  <animate attributeName="r" values="3;0;0" keyTimes="0;0.8430605754104277;1" dur="1s" repeatCount="indefinite" begin="-0.994432759852213s"></animate>
+</circle><circle cx="49" cy="127.80477114160061" r="16" fill="#f8b26a">
+  <animate attributeName="cy" values="127.80477114160061;2.7658256519770603" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.07188593356616135s"></animate>
+  <animate attributeName="r" values="16;0;0" keyTimes="0;0.6049768163612267;1" dur="1s" repeatCount="indefinite" begin="-0.07188593356616135s"></animate>
+</circle><circle cx="52" cy="112.09746694041411" r="10" fill="#f8b26a">
+  <animate attributeName="cy" values="112.09746694041411;-2.8104821907767574" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4132445270517203s"></animate>
+  <animate attributeName="r" values="10;0;0" keyTimes="0;0.7843188648425736;1" dur="1s" repeatCount="indefinite" begin="-0.4132445270517203s"></animate>
+</circle><circle cx="68" cy="119.76797510227266" r="15" fill="#f8b26a">
+  <animate attributeName="cy" values="119.76797510227266;-2.3187957684067317" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6317748306797277s"></animate>
+  <animate attributeName="r" values="15;0;0" keyTimes="0;0.8464277838946668;1" dur="1s" repeatCount="indefinite" begin="-0.6317748306797277s"></animate>
+</circle><circle cx="17" cy="121.7997527406382" r="5" fill="#f8b26a">
+  <animate attributeName="cy" values="121.7997527406382;13.556957891026624" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9136732084136533s"></animate>
+  <animate attributeName="r" values="5;0;0" keyTimes="0;0.5349721785314134;1" dur="1s" repeatCount="indefinite" begin="-0.9136732084136533s"></animate>
+</circle><circle cx="59" cy="116.30296558149124" r="4" fill="#f8b26a">
+  <animate attributeName="cy" values="116.30296558149124;-1.0433564145924477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.08891813207741484s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.6574981312374213;1" dur="1s" repeatCount="indefinite" begin="-0.08891813207741484s"></animate>
+</circle><circle cx="88" cy="113.1583378513422" r="12" fill="#f8b26a">
+  <animate attributeName="cy" values="113.1583378513422;1.456869512308952" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.14992898603700067s"></animate>
+  <animate attributeName="r" values="12;0;0" keyTimes="0;0.9565108058771807;1" dur="1s" repeatCount="indefinite" begin="-0.14992898603700067s"></animate>
+</circle><circle cx="84" cy="112.41279273844411" r="10" fill="#f8b26a">
+  <animate attributeName="cy" values="112.41279273844411;1.6491176590177243" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5833010262862421s"></animate>
+  <animate attributeName="r" values="10;0;0" keyTimes="0;0.5438806242531744;1" dur="1s" repeatCount="indefinite" begin="-0.5833010262862421s"></animate>
+</circle><circle cx="87" cy="120.26530337145327" r="5" fill="#f8b26a">
+  <animate attributeName="cy" values="120.26530337145327;9.388664939149207" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.05018189342538548s"></animate>
+  <animate attributeName="r" values="5;0;0" keyTimes="0;0.637897648645736;1" dur="1s" repeatCount="indefinite" begin="-0.05018189342538548s"></animate>
+</circle><circle cx="24" cy="123.99448894779877" r="9" fill="#f8b26a">
+  <animate attributeName="cy" values="123.99448894779877;2.3750067806866078" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8890495329191316s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.663064102718458;1" dur="1s" repeatCount="indefinite" begin="-0.8890495329191316s"></animate>
+</circle><circle cx="73" cy="120.00019528994846" r="12" fill="#f8b26a">
+  <animate attributeName="cy" values="120.00019528994846;-9.503507375076166" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6351313241419324s"></animate>
+  <animate attributeName="r" values="12;0;0" keyTimes="0;0.9354194941922095;1" dur="1s" repeatCount="indefinite" begin="-0.6351313241419324s"></animate>
+</circle><circle cx="74" cy="113.88820186698781" r="4" fill="#f8b26a">
+  <animate attributeName="cy" values="113.88820186698781;10.570535200732685" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7132998998028989s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.91895021859856;1" dur="1s" repeatCount="indefinite" begin="-0.7132998998028989s"></animate>
+</circle><circle cx="68" cy="129.5841522641359" r="12" fill="#f8b26a">
+  <animate attributeName="cy" values="129.5841522641359;3.894919008898638" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.29330391921510546s"></animate>
+  <animate attributeName="r" values="12;0;0" keyTimes="0;0.9096568793749455;1" dur="1s" repeatCount="indefinite" begin="-0.29330391921510546s"></animate>
+</circle><circle cx="53" cy="119.31720358172306" r="9" fill="#f8b26a">
+  <animate attributeName="cy" values="119.31720358172306;9.73624644875764" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9958245939061628s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.8571965277158554;1" dur="1s" repeatCount="indefinite" begin="-0.9958245939061628s"></animate>
+</circle><circle cx="76" cy="134.80739606982607" r="17" fill="#f8b26a">
+  <animate attributeName="cy" values="134.80739606982607;0.3932385595869441" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8607153243461125s"></animate>
+  <animate attributeName="r" values="17;0;0" keyTimes="0;0.8654455107706405;1" dur="1s" repeatCount="indefinite" begin="-0.8607153243461125s"></animate>
+</circle><circle cx="75" cy="122.61568996754474" r="7" fill="#f8b26a">
+  <animate attributeName="cy" values="122.61568996754474;10.652526875734779" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.959721298983397s"></animate>
+  <animate attributeName="r" values="7;0;0" keyTimes="0;0.6271803990132601;1" dur="1s" repeatCount="indefinite" begin="-0.959721298983397s"></animate>
+</circle><circle cx="87" cy="115.0788054109218" r="12" fill="#f8b26a">
+  <animate attributeName="cy" values="115.0788054109218;-8.15567938666852" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.0690058777440068s"></animate>
+  <animate attributeName="r" values="12;0;0" keyTimes="0;0.6627211388649489;1" dur="1s" repeatCount="indefinite" begin="-0.0690058777440068s"></animate>
+</circle><circle cx="21" cy="118.08738171978098" r="9" fill="#f8b26a">
+  <animate attributeName="cy" values="118.08738171978098;-4.9475469075625504" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.7078831683260647s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.9501044367725069;1" dur="1s" repeatCount="indefinite" begin="-0.7078831683260647s"></animate>
+</circle><circle cx="24" cy="128.09150085659442" r="9" fill="#f8b26a">
+  <animate attributeName="cy" values="128.09150085659442;2.7320353690265122" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.521121701341132s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.7357531229285373;1" dur="1s" repeatCount="indefinite" begin="-0.521121701341132s"></animate>
+</circle><circle cx="26" cy="127.49368345428452" r="15" fill="#f8b26a">
+  <animate attributeName="cy" values="127.49368345428452;-10.361246269666196" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9420307783603239s"></animate>
+  <animate attributeName="r" values="15;0;0" keyTimes="0;0.7467409545014994;1" dur="1s" repeatCount="indefinite" begin="-0.9420307783603239s"></animate>
+</circle><circle cx="39" cy="114.20744515306558" r="6" fill="#f8b26a">
+  <animate attributeName="cy" values="114.20744515306558;5.606516894440285" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.49268347147689695s"></animate>
+  <animate attributeName="r" values="6;0;0" keyTimes="0;0.5874854761603912;1" dur="1s" repeatCount="indefinite" begin="-0.49268347147689695s"></animate>
+</circle><circle cx="61" cy="123.10463246179438" r="11" fill="#f8b26a">
+  <animate attributeName="cy" values="123.10463246179438;-5.189366828773049" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.21359109324800063s"></animate>
+  <animate attributeName="r" values="11;0;0" keyTimes="0;0.6970744691674484;1" dur="1s" repeatCount="indefinite" begin="-0.21359109324800063s"></animate>
+</circle><circle cx="37" cy="115.40335155247101" r="10" fill="#f8b26a">
+  <animate attributeName="cy" values="115.40335155247101;3.4285850566842946" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5344545499798534s"></animate>
+  <animate attributeName="r" values="10;0;0" keyTimes="0;0.9983685792824288;1" dur="1s" repeatCount="indefinite" begin="-0.5344545499798534s"></animate>
+</circle><circle cx="22" cy="124.59228223795324" r="7" fill="#f8b26a">
+  <animate attributeName="cy" values="124.59228223795324;-3.5076355130396912" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8102510016775601s"></animate>
+  <animate attributeName="r" values="7;0;0" keyTimes="0;0.6369981578428732;1" dur="1s" repeatCount="indefinite" begin="-0.8102510016775601s"></animate>
+</circle><circle cx="34" cy="111.69621652751701" r="5" fill="#f8b26a">
+  <animate attributeName="cy" values="111.69621652751701;13.965538669421832" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.3819120829819431s"></animate>
+  <animate attributeName="r" values="5;0;0" keyTimes="0;0.9240036927970401;1" dur="1s" repeatCount="indefinite" begin="-0.3819120829819431s"></animate>
+</circle><circle cx="61" cy="121.99207528226256" r="6" fill="#f8b26a">
+  <animate attributeName="cy" values="121.99207528226256;-1.1884130816048284" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.351012424136126s"></animate>
+  <animate attributeName="r" values="6;0;0" keyTimes="0;0.9527855705617168;1" dur="1s" repeatCount="indefinite" begin="-0.351012424136126s"></animate>
+</circle><circle cx="32" cy="115.36386365084275" r="13" fill="#f8b26a">
+  <animate attributeName="cy" values="115.36386365084275;-7.635796261623495" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.22026693987990997s"></animate>
+  <animate attributeName="r" values="13;0;0" keyTimes="0;0.6822821982216503;1" dur="1s" repeatCount="indefinite" begin="-0.22026693987990997s"></animate>
+</circle><circle cx="38" cy="123.93260454500944" r="10" fill="#f8b26a">
+  <animate attributeName="cy" values="123.93260454500944;-9.019646946232784" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5897767052001425s"></animate>
+  <animate attributeName="r" values="10;0;0" keyTimes="0;0.747643174639248;1" dur="1s" repeatCount="indefinite" begin="-0.5897767052001425s"></animate>
+</circle><circle cx="91" cy="111.20360670124936" r="4" fill="#f8b26a">
+  <animate attributeName="cy" values="111.20360670124936;-2.7511383786778185" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5936715943771124s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.5292863982274825;1" dur="1s" repeatCount="indefinite" begin="-0.5936715943771124s"></animate>
+</circle><circle cx="93" cy="109.08688866758263" r="6" fill="#f8b26a">
+  <animate attributeName="cy" values="109.08688866758263;13.986514639855155" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.20182465253134418s"></animate>
+  <animate attributeName="r" values="6;0;0" keyTimes="0;0.9578727930035874;1" dur="1s" repeatCount="indefinite" begin="-0.20182465253134418s"></animate>
+</circle><circle cx="90" cy="115.44258946143852" r="3" fill="#f8b26a">
+  <animate attributeName="cy" values="115.44258946143852;7.971557449807172" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8138344996352406s"></animate>
+  <animate attributeName="r" values="3;0;0" keyTimes="0;0.822677504532275;1" dur="1s" repeatCount="indefinite" begin="-0.8138344996352406s"></animate>
+</circle><circle cx="24" cy="130.98782632438636" r="15" fill="#f8b26a">
+  <animate attributeName="cy" values="130.98782632438636;-11.868426017755008" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.8574009914089539s"></animate>
+  <animate attributeName="r" values="15;0;0" keyTimes="0;0.8610318085552064;1" dur="1s" repeatCount="indefinite" begin="-0.8574009914089539s"></animate>
+</circle><circle cx="49" cy="122.24309971563434" r="14" fill="#f8b26a">
+  <animate attributeName="cy" values="122.24309971563434;3.5685994935617273" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4267384904796552s"></animate>
+  <animate attributeName="r" values="14;0;0" keyTimes="0;0.5503829186981541;1" dur="1s" repeatCount="indefinite" begin="-0.4267384904796552s"></animate>
+</circle><circle cx="18" cy="117.38217971971676" r="9" fill="#f8b26a">
+  <animate attributeName="cy" values="117.38217971971676;6.631006164776416" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6828218424869835s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.6808177575913787;1" dur="1s" repeatCount="indefinite" begin="-0.6828218424869835s"></animate>
+</circle><circle cx="78" cy="124.28678852303256" r="15" fill="#f8b26a">
+  <animate attributeName="cy" values="124.28678852303256;1.3740946843405304" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4161035078940827s"></animate>
+  <animate attributeName="r" values="15;0;0" keyTimes="0;0.6388001474427218;1" dur="1s" repeatCount="indefinite" begin="-0.4161035078940827s"></animate>
+</circle><circle cx="44" cy="106.6189204965897" r="3" fill="#f8b26a">
+  <animate attributeName="cy" values="106.6189204965897;16.750815514807034" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.0510803765953457s"></animate>
+  <animate attributeName="r" values="3;0;0" keyTimes="0;0.7907276882734477;1" dur="1s" repeatCount="indefinite" begin="-0.0510803765953457s"></animate>
+</circle><circle cx="41" cy="119.64799537397232" r="5" fill="#f8b26a">
+  <animate attributeName="cy" values="119.64799537397232;6.398667601394809" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.4280945050279754s"></animate>
+  <animate attributeName="r" values="5;0;0" keyTimes="0;0.5751942250658201;1" dur="1s" repeatCount="indefinite" begin="-0.4280945050279754s"></animate>
+</circle><circle cx="19" cy="120.0916729802829" r="10" fill="#f8b26a">
+  <animate attributeName="cy" values="120.0916729802829;-9.513704965243033" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.043405970368113445s"></animate>
+  <animate attributeName="r" values="10;0;0" keyTimes="0;0.5435267537060107;1" dur="1s" repeatCount="indefinite" begin="-0.043405970368113445s"></animate>
+</circle><circle cx="61" cy="123.62714133794762" r="5" fill="#f8b26a">
+  <animate attributeName="cy" values="123.62714133794762;2.362315551662477" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.5256540407430482s"></animate>
+  <animate attributeName="r" values="5;0;0" keyTimes="0;0.9222037100732456;1" dur="1s" repeatCount="indefinite" begin="-0.5256540407430482s"></animate>
+</circle><circle cx="64" cy="115.25525614926073" r="13" fill="#f8b26a">
+  <animate attributeName="cy" values="115.25525614926073;-10.304511881341815" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6633519944592159s"></animate>
+  <animate attributeName="r" values="13;0;0" keyTimes="0;0.5401283508859178;1" dur="1s" repeatCount="indefinite" begin="-0.6633519944592159s"></animate>
+</circle><circle cx="12" cy="129.13660549492693" r="11" fill="#f8b26a">
+  <animate attributeName="cy" values="129.13660549492693;-7.965594883525825" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.9929282227674491s"></animate>
+  <animate attributeName="r" values="11;0;0" keyTimes="0;0.9536114994321867;1" dur="1s" repeatCount="indefinite" begin="-0.9929282227674491s"></animate>
+</circle><circle cx="39" cy="106.95504126040025" r="2" fill="#f8b26a">
+  <animate attributeName="cy" values="106.95504126040025;5.834416891524681" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.22005892301327157s"></animate>
+  <animate attributeName="r" values="2;0;0" keyTimes="0;0.6089960643653531;1" dur="1s" repeatCount="indefinite" begin="-0.22005892301327157s"></animate>
+</circle><circle cx="30" cy="112.12744151244388" r="8" fill="#f8b26a">
+  <animate attributeName="cy" values="112.12744151244388;-4.465606537168944" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.24710322548242414s"></animate>
+  <animate attributeName="r" values="8;0;0" keyTimes="0;0.7479705418636007;1" dur="1s" repeatCount="indefinite" begin="-0.24710322548242414s"></animate>
+</circle><circle cx="67" cy="124.83294711941956" r="16" fill="#f8b26a">
+  <animate attributeName="cy" values="124.83294711941956;-7.6291463245052284" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.614066023590482s"></animate>
+  <animate attributeName="r" values="16;0;0" keyTimes="0;0.7584434636145084;1" dur="1s" repeatCount="indefinite" begin="-0.614066023590482s"></animate>
+</circle><circle cx="22" cy="119.36463088979876" r="4" fill="#f8b26a">
+  <animate attributeName="cy" values="119.36463088979876;12.12664234343379" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.527385385953813s"></animate>
+  <animate attributeName="r" values="4;0;0" keyTimes="0;0.5661680148267347;1" dur="1s" repeatCount="indefinite" begin="-0.527385385953813s"></animate>
+</circle><circle cx="12" cy="122.52124979151506" r="7" fill="#f8b26a">
+  <animate attributeName="cy" values="122.52124979151506;3.7506712743784085" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.37225883133903837s"></animate>
+  <animate attributeName="r" values="7;0;0" keyTimes="0;0.9003327357718601;1" dur="1s" repeatCount="indefinite" begin="-0.37225883133903837s"></animate>
+</circle><circle cx="69" cy="130.5210986475815" r="14" fill="#f8b26a">
+  <animate attributeName="cy" values="130.5210986475815;-0.30973651460238827" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6062299863585278s"></animate>
+  <animate attributeName="r" values="14;0;0" keyTimes="0;0.9220180768904789;1" dur="1s" repeatCount="indefinite" begin="-0.6062299863585278s"></animate>
+</circle><circle cx="20" cy="114.80243604193255" r="9" fill="#f8b26a">
+  <animate attributeName="cy" values="114.80243604193255;7.19374553530416" keyTimes="0;1" dur="1s" repeatCount="indefinite" begin="-0.6866227460985781s"></animate>
+  <animate attributeName="r" values="9;0;0" keyTimes="0;0.6690048284116141;1" dur="1s" repeatCount="indefinite" begin="-0.6866227460985781s"></animate>
+</circle></g>
+</svg>
--- a/assets/scss.png
+++ b/assets/scss.png
--- a/assets/test.pdf
+++ b/assets/test.pdf
--- a/assets/web_demo.gif
+++ b/assets/web_demo.gif
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,7 @@
-transformers
+transformers==4.45.2
+sentence-transformers==3.1.1
 datasets
 evaluate
-streamlit
 opencv-python
 ray[serve]
 accelerate
@@ -9,5 +9,11 @@ tensorboardX
 nltk
 python-multipart

-pdf2image
 augraphy
+
+streamlit==1.30
+streamlit-paste-button
+
+shapely
+pyclipper
+onnxruntime-gpu
--- a/setup.py
+++ b/setup.py
@@ -0,0 +1,42 @@
+from setuptools import setup, find_packages
+
+
+# Define the base dependencies
+install_requires = [
+    "torch",
+    "torchvision",
+    "transformers",
+    "datasets",
+    "evaluate",
+    "opencv-python",
+    "ray[serve]",
+    "accelerate",
+    "tensorboardX",
+    "nltk",
+    "python-multipart",
+    "augraphy",
+    "streamlit==1.30",
+    "streamlit-paste-button",
+    "shapely",
+    "pyclipper",
+
+    "optimum[exporters]",
+]
+
+setup(
+    name="texteller",
+    version="0.1.2",
+    author="OleehyO",
+    author_email="1258009915@qq.com",
+    description="A meta-package for installing dependencies",
+    long_description=open('README.md').read(),
+    long_description_content_type="text/markdown",
+    url="https://github.com/OleehyO/TexTeller",
+    packages=find_packages(),
+    install_requires=install_requires,
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "Operating System :: OS Independent",
+    ],
+    python_requires='>=3.10',
+)
--- a/src/client_demo.py
+++ b/src/client_demo.py
@@ -1,16 +1,12 @@
 import requests

-# 服务的 URL
-url = "http://127.0.0.1:9900/predict"
+rec_server_url = "http://127.0.0.1:8000/frec"
+det_server_url = "http://127.0.0.1:8000/fdet"

-# 替换成你要预测的图像的路径
-img_path = "/home/lhy/code/TeXify/src/7.png"
+img_path = "/your/image/path/"
+with open(img_path, 'rb') as img:
+    files = {'img': img}
+    response = requests.post(rec_server_url, files=files)
+    # response = requests.post(det_server_url, files=files)

-# 构造请求数据
-data = {"img_path": img_path}
-
-# 发送 POST 请求
-response = requests.post(url, json=data)
-
-# 打印响应
 print(response.text)
--- a/src/infer_det.py
+++ b/src/infer_det.py
@@ -0,0 +1,85 @@
+import os
+import argparse
+import glob
+import subprocess
+
+import onnxruntime
+from pathlib import Path
+
+from models.det_model.inference import PredictConfig, predict_image
+
+
+parser = argparse.ArgumentParser(description=__doc__)
+parser.add_argument("--infer_cfg", type=str, help="infer_cfg.yml",
+                    default="./models/det_model/model/infer_cfg.yml")
+parser.add_argument('--onnx_file', type=str, help="onnx model file path",
+                    default="./models/det_model/model/rtdetr_r50vd_6x_coco.onnx")
+parser.add_argument("--image_dir", type=str, default='./testImgs')
+parser.add_argument("--image_file", type=str)
+parser.add_argument("--imgsave_dir", type=str, default="./detect_results")
+parser.add_argument('--use_gpu', action='store_true', help='Whether to use GPU for inference', default=True)
+
+
+def get_test_images(infer_dir, infer_img):
+    """
+    Get image path list in TEST mode
+    """
+    assert infer_img is not None or infer_dir is not None, \
+        "--image_file or --image_dir should be set"
+    assert infer_img is None or os.path.isfile(infer_img), \
+            "{} is not a file".format(infer_img)
+    assert infer_dir is None or os.path.isdir(infer_dir), \
+            "{} is not a directory".format(infer_dir)
+
+    # infer_img has a higher priority
+    if infer_img and os.path.isfile(infer_img):
+        return [infer_img]
+
+    images = set()
+    infer_dir = os.path.abspath(infer_dir)
+    assert os.path.isdir(infer_dir), \
+        "infer_dir {} is not a directory".format(infer_dir)
+    exts = ['jpg', 'jpeg', 'png', 'bmp']
+    exts += [ext.upper() for ext in exts]
+    for ext in exts:
+        images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
+    images = list(images)
+
+    assert len(images) > 0, "no image found in {}".format(infer_dir)
+    print("Found {} inference images in total.".format(len(images)))
+
+    return images
+
+def download_file(url, filename):
+    print(f"Downloading {filename}...")
+    subprocess.run(["wget", "-q", "--show-progress", "-O", filename, url], check=True)
+    print("Download complete.")
+
+if __name__ == '__main__':
+    cur_path = os.getcwd()
+    script_dirpath = Path(__file__).resolve().parent
+    os.chdir(script_dirpath)
+
+    FLAGS = parser.parse_args()
+
+    if not os.path.exists(FLAGS.infer_cfg):
+        infer_cfg_url = "https://huggingface.co/TonyLee1256/texteller_det/resolve/main/infer_cfg.yml?download=true"
+        download_file(infer_cfg_url, FLAGS.infer_cfg)
+
+    if not os.path.exists(FLAGS.onnx_file):
+        onnx_file_url = "https://huggingface.co/TonyLee1256/texteller_det/resolve/main/rtdetr_r50vd_6x_coco.onnx?download=true"
+        download_file(onnx_file_url, FLAGS.onnx_file)
+    
+    # load image list
+    img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+
+    if FLAGS.use_gpu:
+        predictor = onnxruntime.InferenceSession(FLAGS.onnx_file, providers=['CUDAExecutionProvider'])
+    else:
+        predictor = onnxruntime.InferenceSession(FLAGS.onnx_file, providers=['CPUExecutionProvider'])
+    # load infer config
+    infer_config = PredictConfig(FLAGS.infer_cfg)
+
+    predict_image(FLAGS.imgsave_dir, infer_config, predictor, img_list)
+
+    os.chdir(cur_path)
--- a/src/inference.py
+++ b/src/inference.py
@@ -1,12 +1,22 @@
 import os
 import argparse
+import cv2 as cv

 from pathlib import Path
-from models.ocr_model.utils.inference import inference
+from onnxruntime import InferenceSession
+from models.thrid_party.paddleocr.infer import predict_det, predict_rec
+from models.thrid_party.paddleocr.infer import utility
+
+from models.utils import mix_inference
+from models.ocr_model.utils.to_katex import to_katex
+from models.ocr_model.utils.inference import inference as latex_inference
+
 from models.ocr_model.model.TexTeller import TexTeller
+from models.det_model.inference import PredictConfig


 if __name__ == '__main__':
+    os.chdir(Path(__file__).resolve().parent)
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '-img', 
@@ -15,26 +25,61 @@ if __name__ == '__main__':
        help='path to the input image'
    )
    parser.add_argument(
-        '-cuda', 
-        default=False,
+        '--inference-mode', 
+        type=str,
+        default='cpu',
+        help='Inference mode, select one of cpu, cuda, or mps'
+    )
+    parser.add_argument(
+        '--num-beam', 
+        type=int,
+        default=1,
+        help='number of beam search for decoding'
+    )
+    parser.add_argument(
+        '-mix', 
        action='store_true',
-        help='use cuda or not'
+        help='use mix mode'
    )
    
-    args = parser.parse_args([
-        '-img', './models/ocr_model/test_img/1.png',
-        '-cuda'
-    ])
+    args = parser.parse_args()
    
-    script_dirpath = Path(__file__).resolve().parent
-    os.chdir(script_dirpath)
+    # You can use your own checkpoint and tokenizer path.
+    print('Loading model and tokenizer...')
+    latex_rec_model = TexTeller.from_pretrained()
+    tokenizer = TexTeller.get_tokenizer()
+    print('Model and tokenizer loaded.')

-    model = TexTeller.from_pretrained('./models/ocr_model/model_checkpoint')
-    tokenizer = TexTeller.get_tokenizer('./models/tokenizer/roberta-tokenizer-550K')
+    img_path = args.img
+    img = cv.imread(img_path)
+    print('Inference...')
+    if not args.mix:
+        res = latex_inference(latex_rec_model, tokenizer, [img], args.inference_mode, args.num_beam)
+        res = to_katex(res[0])
+        print(res)
+    else:
+        infer_config = PredictConfig("./models/det_model/model/infer_cfg.yml")
+        latex_det_model = InferenceSession("./models/det_model/model/rtdetr_r50vd_6x_coco.onnx")

-    # base = '/home/lhy/code/TeXify/src/models/ocr_model/test_img'
-    # img_path = [base + f'/{i}.png' for i in range(7, 12)]
-    img_path = [args.img]
+        use_gpu = args.inference_mode == 'cuda'
+        SIZE_LIMIT = 20 * 1024 * 1024
+        det_model_dir =  "./models/thrid_party/paddleocr/checkpoints/det/default_model.onnx"
+        rec_model_dir =  "./models/thrid_party/paddleocr/checkpoints/rec/default_model.onnx"
+        # The CPU inference of the detection model will be faster than the GPU inference (in onnxruntime)
+        det_use_gpu = False
+        rec_use_gpu = use_gpu and not (os.path.getsize(rec_model_dir) < SIZE_LIMIT)

-    res = inference(model, tokenizer, img_path, args.cuda)
-    print(res[0])
+        paddleocr_args = utility.parse_args()
+        paddleocr_args.use_onnx = True
+        paddleocr_args.det_model_dir = det_model_dir
+        paddleocr_args.rec_model_dir = rec_model_dir
+
+        paddleocr_args.use_gpu = det_use_gpu
+        detector = predict_det.TextDetector(paddleocr_args)
+        paddleocr_args.use_gpu = rec_use_gpu
+        recognizer = predict_rec.TextRecognizer(paddleocr_args)
+        
+        lang_ocr_models = [detector, recognizer]
+        latex_rec_models = [latex_rec_model, tokenizer]
+        res = mix_inference(img_path, infer_config, latex_det_model, lang_ocr_models, latex_rec_models, args.inference_mode, args.num_beam)
+        print(res)
--- a/src/models/det_model/Bbox.py
+++ b/src/models/det_model/Bbox.py
@@ -0,0 +1,91 @@
+import os
+
+from PIL import Image, ImageDraw
+from typing import List
+from pathlib import Path
+
+
+class Point:
+    def __init__(self, x: int, y: int):
+        self.x = int(x)
+        self.y = int(y)
+    
+    def __repr__(self) -> str:
+        return f"Point(x={self.x}, y={self.y})"
+
+
+class Bbox:
+    THREADHOLD = 0.4
+
+    def __init__(self, x, y, h, w, label: str = None, confidence: float = 0, content: str = None):
+        self.p = Point(x, y)
+        self.h = int(h)
+        self.w = int(w)
+        self.label = label
+        self.confidence = confidence
+        self.content = content
+
+    @property
+    def ul_point(self) -> Point:
+        return self.p
+    
+    @property
+    def ur_point(self) -> Point:
+        return Point(self.p.x + self.w, self.p.y)
+    
+    @property
+    def ll_point(self) -> Point:
+        return Point(self.p.x, self.p.y + self.h)
+    
+    @property
+    def lr_point(self) -> Point:
+        return Point(self.p.x + self.w, self.p.y + self.h)
+    
+    
+    def same_row(self, other) -> bool:
+        if (
+            (self.p.y >= other.p.y and self.ll_point.y <= other.ll_point.y)
+            or (self.p.y <= other.p.y and self.ll_point.y >= other.ll_point.y)
+        ):
+            return True
+        if self.ll_point.y <= other.p.y or self.p.y >= other.ll_point.y:
+            return False
+        return 1.0 * abs(self.p.y - other.p.y) / max(self.h, other.h) < self.THREADHOLD
+    
+    def __lt__(self, other) -> bool:
+        '''
+        from top to bottom, from left to right
+        '''
+        if not self.same_row(other):
+            return self.p.y < other.p.y
+        else:
+            return self.p.x < other.p.x
+    
+    def __repr__(self) -> str:
+        return f"Bbox(upper_left_point={self.p}, h={self.h}, w={self.w}), label={self.label}, confident={self.confidence}, content={self.content})"
+
+
+def draw_bboxes(img: Image.Image, bboxes: List[Bbox], name="annotated_image.png"):
+    curr_work_dir = Path(os.getcwd())
+    log_dir = curr_work_dir / "logs"
+    log_dir.mkdir(exist_ok=True)
+    drawer = ImageDraw.Draw(img)
+    for bbox in bboxes:
+        # Calculate the coordinates for the rectangle to be drawn
+        left = bbox.p.x
+        top = bbox.p.y
+        right = bbox.p.x + bbox.w
+        bottom = bbox.p.y + bbox.h
+        
+        # Draw the rectangle on the image
+        drawer.rectangle([left, top, right, bottom], outline="green", width=1)
+        
+        # Optionally, add text label if it exists
+        if bbox.label:
+            drawer.text((left, top), bbox.label, fill="blue")
+        
+        if bbox.content:
+            drawer.text((left, bottom - 10), bbox.content[:10], fill="red")
+
+    # Save the image with drawn rectangles
+    img.save(log_dir / name)
--- a/src/models/det_model/inference.py
+++ b/src/models/det_model/inference.py
@@ -0,0 +1,195 @@
+import os
+import time
+import yaml
+import numpy as np
+import cv2
+
+from tqdm import tqdm
+from typing import List
+from .preprocess import Compose
+from .Bbox import Bbox
+
+
+# Global dictionary
+SUPPORT_MODELS = {
+    'YOLO', 'PPYOLOE', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet',
+    'S2ANet', 'JDE', 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet',
+    'TOOD', 'RetinaNet', 'StrongBaseline', 'STGCN', 'YOLOX', 'HRNet', 
+    'DETR'
+}
+
+
+class PredictConfig(object):
+    """set config of preprocess, postprocess and visualize
+    Args:
+        infer_config (str): path of infer_cfg.yml
+    """
+
+    def __init__(self, infer_config):
+        # parsing Yaml config for Preprocess
+        with open(infer_config) as f:
+            yml_conf = yaml.safe_load(f)
+        self.check_model(yml_conf)
+        self.arch = yml_conf['arch']
+        self.preprocess_infos = yml_conf['Preprocess']
+        self.min_subgraph_size = yml_conf['min_subgraph_size']
+        self.label_list = yml_conf['label_list']
+        self.use_dynamic_shape = yml_conf['use_dynamic_shape']
+        self.draw_threshold = yml_conf.get("draw_threshold", 0.5)
+        self.mask = yml_conf.get("mask", False)
+        self.tracker = yml_conf.get("tracker", None)
+        self.nms = yml_conf.get("NMS", None)
+        self.fpn_stride = yml_conf.get("fpn_stride", None)
+
+        color_pool = [(0, 255, 0), (255, 0, 0), (0, 0, 255), (255, 255, 0), (0, 255, 255)]
+        self.colors = {label: color_pool[i % len(color_pool)] for i, label in enumerate(self.label_list)}
+
+        if self.arch == 'RCNN' and yml_conf.get('export_onnx', False):
+            print(
+                'The RCNN export model is used for ONNX and it only supports batch_size = 1'
+            )
+        self.print_config()
+
+    def check_model(self, yml_conf):
+        """
+        Raises:
+            ValueError: loaded model not in supported model type
+        """
+        for support_model in SUPPORT_MODELS:
+            if support_model in yml_conf['arch']:
+                return True
+        raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[
+            'arch'], SUPPORT_MODELS))
+
+    def print_config(self):
+        print('-----------  Model Configuration -----------')
+        print('%s: %s' % ('Model Arch', self.arch))
+        print('%s: ' % ('Transform Order'))
+        for op_info in self.preprocess_infos:
+            print('--%s: %s' % ('transform op', op_info['type']))
+        print('--------------------------------------------')
+
+
+def draw_bbox(image, outputs, infer_config):
+    for output in outputs:
+        cls_id, score, xmin, ymin, xmax, ymax = output
+        if score > infer_config.draw_threshold:
+            label = infer_config.label_list[int(cls_id)]
+            color = infer_config.colors[label]
+            cv2.rectangle(image, (int(xmin), int(ymin)), (int(xmax), int(ymax)), color, 2)
+            cv2.putText(image, "{}: {:.2f}".format(label, score),
+                        (int(xmin), int(ymin - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
+    return image
+
+
+def predict_image(imgsave_dir, infer_config, predictor, img_list):
+    # load preprocess transforms
+    transforms = Compose(infer_config.preprocess_infos)
+    errImgList = []
+
+    # Check and create subimg_save_dir if not exist
+    subimg_save_dir = os.path.join(imgsave_dir, 'subimages')
+    os.makedirs(subimg_save_dir, exist_ok=True)
+
+    first_image_skipped = False
+    total_time = 0
+    num_images = 0
+    # predict image
+    for img_path in tqdm(img_list):
+        img = cv2.imread(img_path)
+        if img is None:
+            print(f"Warning: Could not read image {img_path}. Skipping...")
+            errImgList.append(img_path)
+            continue
+
+        inputs = transforms(img_path)
+        inputs_name = [var.name for var in predictor.get_inputs()]
+        inputs = {k: inputs[k][None, ] for k in inputs_name}
+
+        # Start timing
+        start_time = time.time()
+
+        outputs = predictor.run(output_names=None, input_feed=inputs)
+
+        # Stop timing
+        end_time = time.time()
+        inference_time = end_time - start_time
+        if not first_image_skipped:
+            first_image_skipped = True
+        else:
+            total_time += inference_time
+            num_images += 1
+        print(f"ONNXRuntime predict time for {os.path.basename(img_path)}: {inference_time:.4f} seconds")
+
+        print("ONNXRuntime predict: ")
+        if infer_config.arch in ["HRNet"]:
+            print(np.array(outputs[0]))
+        else:
+            bboxes = np.array(outputs[0])
+            for bbox in bboxes:
+                if bbox[0] > -1 and bbox[1] > infer_config.draw_threshold:
+                    print(f"{int(bbox[0])} {bbox[1]} "
+                          f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}")
+
+        # Save the subimages (crop from the original image)
+        subimg_counter = 1
+        for output in np.array(outputs[0]):
+            cls_id, score, xmin, ymin, xmax, ymax = output
+            if score > infer_config.draw_threshold:
+                label = infer_config.label_list[int(cls_id)]
+                subimg = img[int(max(ymin, 0)):int(ymax), int(max(xmin, 0)):int(xmax)]
+                if len(subimg) == 0:
+                    continue
+
+                subimg_filename = f"{os.path.splitext(os.path.basename(img_path))[0]}_{label}_{xmin:.2f}_{ymin:.2f}_{xmax:.2f}_{ymax:.2f}.jpg"
+                subimg_path = os.path.join(subimg_save_dir, subimg_filename)
+                cv2.imwrite(subimg_path, subimg)
+                subimg_counter += 1
+
+        # Draw bounding boxes and save the image with bounding boxes
+        img_with_mask = img.copy()
+        for output in np.array(outputs[0]):
+            cls_id, score, xmin, ymin, xmax, ymax = output
+            if score > infer_config.draw_threshold:
+                cv2.rectangle(img_with_mask, (int(xmin), int(ymin)), (int(xmax), int(ymax)), (255, 255, 255), -1) # 盖白
+        
+        img_with_bbox = draw_bbox(img, np.array(outputs[0]), infer_config)
+
+        output_dir = imgsave_dir
+        os.makedirs(output_dir, exist_ok=True)
+        draw_box_dir = os.path.join(output_dir, 'draw_box')
+        mask_white_dir = os.path.join(output_dir, 'mask_white')
+        os.makedirs(draw_box_dir, exist_ok=True)
+        os.makedirs(mask_white_dir, exist_ok=True)
+
+        output_file_mask = os.path.join(mask_white_dir, os.path.basename(img_path))
+        output_file_bbox = os.path.join(draw_box_dir, os.path.basename(img_path))
+        cv2.imwrite(output_file_mask, img_with_mask)
+        cv2.imwrite(output_file_bbox, img_with_bbox)
+
+    avg_time_per_image = total_time / num_images if num_images > 0 else 0
+    print(f"Total inference time for {num_images} images: {total_time:.4f} seconds")
+    print(f"Average time per image: {avg_time_per_image:.4f} seconds")
+    print("ErrorImgs:")
+    print(errImgList)
+
+
+def predict(img_path: str, predictor, infer_config) -> List[Bbox]:
+    transforms = Compose(infer_config.preprocess_infos)
+    inputs = transforms(img_path)
+    inputs_name = [var.name for var in predictor.get_inputs()]
+    inputs = {k: inputs[k][None, ] for k in inputs_name}
+
+    outputs = predictor.run(output_names=None, input_feed=inputs)[0]
+    res = []
+    for output in outputs:
+        cls_name = infer_config.label_list[int(output[0])]
+        score = output[1]
+        xmin = int(max(output[2], 0))
+        ymin = int(max(output[3], 0))
+        xmax = int(output[4])
+        ymax = int(output[5])
+        if score > infer_config.draw_threshold:
+            res.append(Bbox(xmin, ymin, ymax - ymin, xmax - xmin, cls_name, score))
+
+    return res
--- a/src/models/det_model/model/infer_cfg.yml
+++ b/src/models/det_model/model/infer_cfg.yml
@@ -0,0 +1,27 @@
+mode: paddle
+draw_threshold: 0.5
+metric: COCO
+use_dynamic_shape: false
+arch: DETR
+min_subgraph_size: 3
+Preprocess:
+- interp: 2
+  keep_ratio: false
+  target_size:
+  - 1600
+  - 1600
+  type: Resize
+- mean:
+  - 0.0
+  - 0.0
+  - 0.0
+  norm_type: none
+  std:
+  - 1.0
+  - 1.0
+  - 1.0
+  type: NormalizeImage
+- type: Permute
+label_list:
+- isolated
+- embedding
--- a/src/models/det_model/preprocess.py
+++ b/src/models/det_model/preprocess.py
@@ -0,0 +1,499 @@
+import numpy as np
+import cv2
+import copy
+
+
+def decode_image(img_path):
+    if isinstance(img_path, str):
+        with open(img_path, 'rb') as f:
+            im_read = f.read()
+        data = np.frombuffer(im_read, dtype='uint8')
+    else:
+        assert isinstance(img_path, np.ndarray)
+        data = img_path
+
+    im = cv2.imdecode(data, 1)  # BGR mode, but need RGB mode
+    im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+    img_info = {
+        "im_shape": np.array(
+            im.shape[:2], dtype=np.float32),
+        "scale_factor": np.array(
+            [1., 1.], dtype=np.float32)
+    }
+    return im, img_info
+
+
+class Resize(object):
+    """resize image by target_size and max_size
+    Args:
+        target_size (int): the target size of image
+        keep_ratio (bool): whether keep_ratio or not, default true
+        interp (int): method of resize
+    """
+
+    def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
+        if isinstance(target_size, int):
+            target_size = [target_size, target_size]
+        self.target_size = target_size
+        self.keep_ratio = keep_ratio
+        self.interp = interp
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        assert len(self.target_size) == 2
+        assert self.target_size[0] > 0 and self.target_size[1] > 0
+        im_channel = im.shape[2]
+        im_scale_y, im_scale_x = self.generate_scale(im)
+        im = cv2.resize(
+            im,
+            None,
+            None,
+            fx=im_scale_x,
+            fy=im_scale_y,
+            interpolation=self.interp)
+        im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
+        im_info['scale_factor'] = np.array(
+            [im_scale_y, im_scale_x]).astype('float32')
+        return im, im_info
+
+    def generate_scale(self, im):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        origin_shape = im.shape[:2]
+        im_c = im.shape[2]
+        if self.keep_ratio:
+            im_size_min = np.min(origin_shape)
+            im_size_max = np.max(origin_shape)
+            target_size_min = np.min(self.target_size)
+            target_size_max = np.max(self.target_size)
+            im_scale = float(target_size_min) / float(im_size_min)
+            if np.round(im_scale * im_size_max) > target_size_max:
+                im_scale = float(target_size_max) / float(im_size_max)
+            im_scale_x = im_scale
+            im_scale_y = im_scale
+        else:
+            resize_h, resize_w = self.target_size
+            im_scale_y = resize_h / float(origin_shape[0])
+            im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+
+
+class NormalizeImage(object):
+    """normalize image
+    Args:
+        mean (list): im - mean
+        std (list): im / std
+        is_scale (bool): whether need im / 255
+        norm_type (str): type in ['mean_std', 'none']
+    """
+
+    def __init__(self, mean, std, is_scale=True, norm_type='mean_std'):
+        self.mean = mean
+        self.std = std
+        self.is_scale = is_scale
+        self.norm_type = norm_type
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        im = im.astype(np.float32, copy=False)
+        if self.is_scale:
+            scale = 1.0 / 255.0
+            im *= scale
+
+        if self.norm_type == 'mean_std':
+            mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+            std = np.array(self.std)[np.newaxis, np.newaxis, :]
+            im -= mean
+            im /= std
+        return im, im_info
+
+
+class Permute(object):
+    """permute image
+    Args:
+        to_bgr (bool): whether convert RGB to BGR
+        channel_first (bool): whether convert HWC to CHW
+    """
+
+    def __init__(self, ):
+        super(Permute, self).__init__()
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        im = im.transpose((2, 0, 1)).copy()
+        return im, im_info
+
+
+class PadStride(object):
+    """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config
+    Args:
+        stride (bool): model with FPN need image shape % stride == 0
+    """
+
+    def __init__(self, stride=0):
+        self.coarsest_stride = stride
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        coarsest_stride = self.coarsest_stride
+        if coarsest_stride <= 0:
+            return im, im_info
+        im_c, im_h, im_w = im.shape
+        pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
+        pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
+        padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
+        padding_im[:, :im_h, :im_w] = im
+        return padding_im, im_info
+
+
+class LetterBoxResize(object):
+    def __init__(self, target_size):
+        """
+        Resize image to target size, convert normalized xywh to pixel xyxy
+        format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]).
+        Args:
+            target_size (int|list): image target size.
+        """
+        super(LetterBoxResize, self).__init__()
+        if isinstance(target_size, int):
+            target_size = [target_size, target_size]
+        self.target_size = target_size
+
+    def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)):
+        # letterbox: resize a rectangular image to a padded rectangular
+        shape = img.shape[:2]  # [height, width]
+        ratio_h = float(height) / shape[0]
+        ratio_w = float(width) / shape[1]
+        ratio = min(ratio_h, ratio_w)
+        new_shape = (round(shape[1] * ratio),
+                     round(shape[0] * ratio))  # [width, height]
+        padw = (width - new_shape[0]) / 2
+        padh = (height - new_shape[1]) / 2
+        top, bottom = round(padh - 0.1), round(padh + 0.1)
+        left, right = round(padw - 0.1), round(padw + 0.1)
+
+        img = cv2.resize(
+            img, new_shape, interpolation=cv2.INTER_AREA)  # resized, no border
+        img = cv2.copyMakeBorder(
+            img, top, bottom, left, right, cv2.BORDER_CONSTANT,
+            value=color)  # padded rectangular
+        return img, ratio, padw, padh
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        assert len(self.target_size) == 2
+        assert self.target_size[0] > 0 and self.target_size[1] > 0
+        height, width = self.target_size
+        h, w = im.shape[:2]
+        im, ratio, padw, padh = self.letterbox(im, height=height, width=width)
+
+        new_shape = [round(h * ratio), round(w * ratio)]
+        im_info['im_shape'] = np.array(new_shape, dtype=np.float32)
+        im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32)
+        return im, im_info
+
+
+class Pad(object):
+    def __init__(self, size, fill_value=[114.0, 114.0, 114.0]):
+        """
+        Pad image to a specified size.
+        Args:
+            size (list[int]): image target size
+            fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0)
+        """
+        super(Pad, self).__init__()
+        if isinstance(size, int):
+            size = [size, size]
+        self.size = size
+        self.fill_value = fill_value
+
+    def __call__(self, im, im_info):
+        im_h, im_w = im.shape[:2]
+        h, w = self.size
+        if h == im_h and w == im_w:
+            im = im.astype(np.float32)
+            return im, im_info
+
+        canvas = np.ones((h, w, 3), dtype=np.float32)
+        canvas *= np.array(self.fill_value, dtype=np.float32)
+        canvas[0:im_h, 0:im_w, :] = im.astype(np.float32)
+        im = canvas
+        return im, im_info
+
+
+def rotate_point(pt, angle_rad):
+    """Rotate a point by an angle.
+
+    Args:
+        pt (list[float]): 2 dimensional point to be rotated
+        angle_rad (float): rotation angle by radian
+
+    Returns:
+        list[float]: Rotated point.
+    """
+    assert len(pt) == 2
+    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
+    new_x = pt[0] * cs - pt[1] * sn
+    new_y = pt[0] * sn + pt[1] * cs
+    rotated_pt = [new_x, new_y]
+
+    return rotated_pt
+
+
+def _get_3rd_point(a, b):
+    """To calculate the affine matrix, three pairs of points are required. This
+    function is used to get the 3rd point, given 2D points a & b.
+
+    The 3rd point is defined by rotating vector `a - b` by 90 degrees
+    anticlockwise, using b as the rotation center.
+
+    Args:
+        a (np.ndarray): point(x,y)
+        b (np.ndarray): point(x,y)
+
+    Returns:
+        np.ndarray: The 3rd point.
+    """
+    assert len(a) == 2
+    assert len(b) == 2
+    direction = a - b
+    third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
+
+    return third_pt
+
+
+def get_affine_transform(center,
+                         input_size,
+                         rot,
+                         output_size,
+                         shift=(0., 0.),
+                         inv=False):
+    """Get the affine transform matrix, given the center/scale/rot/output_size.
+
+    Args:
+        center (np.ndarray[2, ]): Center of the bounding box (x, y).
+        scale (np.ndarray[2, ]): Scale of the bounding box
+            wrt [width, height].
+        rot (float): Rotation angle (degree).
+        output_size (np.ndarray[2, ]): Size of the destination heatmaps.
+        shift (0-100%): Shift translation ratio wrt the width/height.
+            Default (0., 0.).
+        inv (bool): Option to inverse the affine transform direction.
+            (inv=False: src->dst or inv=True: dst->src)
+
+    Returns:
+        np.ndarray: The transform matrix.
+    """
+    assert len(center) == 2
+    assert len(output_size) == 2
+    assert len(shift) == 2
+    if not isinstance(input_size, (np.ndarray, list)):
+        input_size = np.array([input_size, input_size], dtype=np.float32)
+    scale_tmp = input_size
+
+    shift = np.array(shift)
+    src_w = scale_tmp[0]
+    dst_w = output_size[0]
+    dst_h = output_size[1]
+
+    rot_rad = np.pi * rot / 180
+    src_dir = rotate_point([0., src_w * -0.5], rot_rad)
+    dst_dir = np.array([0., dst_w * -0.5])
+
+    src = np.zeros((3, 2), dtype=np.float32)
+    src[0, :] = center + scale_tmp * shift
+    src[1, :] = center + src_dir + scale_tmp * shift
+    src[2, :] = _get_3rd_point(src[0, :], src[1, :])
+
+    dst = np.zeros((3, 2), dtype=np.float32)
+    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
+    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
+    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
+
+    if inv:
+        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+    else:
+        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+
+    return trans
+
+
+class WarpAffine(object):
+    """Warp affine the image
+    """
+
+    def __init__(self,
+                 keep_res=False,
+                 pad=31,
+                 input_h=512,
+                 input_w=512,
+                 scale=0.4,
+                 shift=0.1):
+        self.keep_res = keep_res
+        self.pad = pad
+        self.input_h = input_h
+        self.input_w = input_w
+        self.scale = scale
+        self.shift = shift
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
+
+        h, w = img.shape[:2]
+
+        if self.keep_res:
+            input_h = (h | self.pad) + 1
+            input_w = (w | self.pad) + 1
+            s = np.array([input_w, input_h], dtype=np.float32)
+            c = np.array([w // 2, h // 2], dtype=np.float32)
+
+        else:
+            s = max(h, w) * 1.0
+            input_h, input_w = self.input_h, self.input_w
+            c = np.array([w / 2., h / 2.], dtype=np.float32)
+
+        trans_input = get_affine_transform(c, s, 0, [input_w, input_h])
+        img = cv2.resize(img, (w, h))
+        inp = cv2.warpAffine(
+            img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR)
+        return inp, im_info
+
+
+# keypoint preprocess
+def get_warp_matrix(theta, size_input, size_dst, size_target):
+    """This code is based on
+        https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
+
+        Calculate the transformation matrix under the constraint of unbiased.
+    Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
+    Data Processing for Human Pose Estimation (CVPR 2020).
+
+    Args:
+        theta (float): Rotation angle in degrees.
+        size_input (np.ndarray): Size of input image [w, h].
+        size_dst (np.ndarray): Size of output image [w, h].
+        size_target (np.ndarray): Size of ROI in input plane [w, h].
+
+    Returns:
+        matrix (np.ndarray): A matrix for transformation.
+    """
+    theta = np.deg2rad(theta)
+    matrix = np.zeros((2, 3), dtype=np.float32)
+    scale_x = size_dst[0] / size_target[0]
+    scale_y = size_dst[1] / size_target[1]
+    matrix[0, 0] = np.cos(theta) * scale_x
+    matrix[0, 1] = -np.sin(theta) * scale_x
+    matrix[0, 2] = scale_x * (
+        -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
+        np.sin(theta) + 0.5 * size_target[0])
+    matrix[1, 0] = np.sin(theta) * scale_y
+    matrix[1, 1] = np.cos(theta) * scale_y
+    matrix[1, 2] = scale_y * (
+        -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
+        np.cos(theta) + 0.5 * size_target[1])
+    return matrix
+
+
+class TopDownEvalAffine(object):
+    """apply affine transform to image and coords
+
+    Args:
+        trainsize (list): [w, h], the standard size used to train
+        use_udp (bool): whether to use Unbiased Data Processing.
+        records(dict): the dict contained the image and coords
+
+    Returns:
+        records (dict): contain the image and coords after tranformed
+
+    """
+
+    def __init__(self, trainsize, use_udp=False):
+        self.trainsize = trainsize
+        self.use_udp = use_udp
+
+    def __call__(self, image, im_info):
+        rot = 0
+        imshape = im_info['im_shape'][::-1]
+        center = im_info['center'] if 'center' in im_info else imshape / 2.
+        scale = im_info['scale'] if 'scale' in im_info else imshape
+        if self.use_udp:
+            trans = get_warp_matrix(
+                rot, center * 2.0,
+                [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        else:
+            trans = get_affine_transform(center, scale, rot, self.trainsize)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+
+        return image, im_info
+
+
+class Compose:
+    def __init__(self, transforms):
+        self.transforms = []
+        for op_info in transforms:
+            new_op_info = op_info.copy()
+            op_type = new_op_info.pop('type')
+            self.transforms.append(eval(op_type)(**new_op_info))
+
+    def __call__(self, img_path):
+        img, im_info = decode_image(img_path)
+        for t in self.transforms:
+            img, im_info = t(img, im_info)
+        inputs = copy.deepcopy(im_info)
+        inputs['image'] = img
+        return inputs
--- a/src/models/globals.py
+++ b/src/models/globals.py
@@ -1,60 +1,23 @@
-# 公式图片(灰度化后)的均值和方差
+# Formula image(grayscale) mean and variance
 IMAGE_MEAN = 0.9545467
 IMAGE_STD  = 0.15394445

-
-# =========================   ocr模型用的参数   ============================= #
-
-# 输入图片的最大最小的宽和高
-MIN_HEIGHT = 32
-MAX_HEIGHT = 512
-MIN_WIDTH  = 32
-MAX_WIDTH  = 1280
-# LaTex-OCR中分别是 32、192、32、672
-
-# ocr模型所用数据集，pdf转图片所用的Density值(dpi)
-TEXIFY_INPUT_DENSITY = 100
-
-# ocr模型的tokenizer中的词典数量
+# Vocabulary size for TexTeller
 VOCAB_SIZE = 15000

-# ocr模型是否固定输入图片的大小
-OCR_FIX_SIZE = True
-# ocr模型训练时，输入图片所固定的大小 (when OCR_FIX_SIZE is True)
-OCR_IMG_SIZE = 448
-# ocr模型训练时，输入图片最大的宽和高（when OCR_FIX_SIZE is False）
-OCR_IMG_MAX_HEIGHT = 512
-OCR_IMG_MAX_WIDTH  = 768
+# Fixed size for input image for TexTeller
+FIXED_IMG_SIZE = 448

-# ocr模型输入图片的通道数
-OCR_IMG_CHANNELS = 1  # 灰度图
+# Image channel for TexTeller
+IMG_CHANNELS = 1  # grayscale image

-# ocr模型训练数据集的最长token数
-MAX_TOKEN_SIZE = 1024     # 模型最长的embedding长度(默认512)
-# MAX_TOKEN_SIZE = 2048     # 模型最长的embedding长度(默认512)
-# MAX_TOKEN_SIZE = 600
+# Max size of token for embedding
+MAX_TOKEN_SIZE = 1024

-# ocr模型训练时随机缩放的比例
+# Scaling ratio for random resizing when training
 MAX_RESIZE_RATIO = 1.15
 MIN_RESIZE_RATIO = 0.75

-# ocr模型输入的图片要求的最低宽和高(过滤垃圾数据)
+# Minimum height and width for input image for TexTeller
 MIN_HEIGHT = 12
 MIN_WIDTH  = 30
-
-# ============================================================================= #
-
-
-# =========================   Resizer模型用的参数   ============================= #
-
-# Resizer模型所用数据集中，图片所用的Density渲染值
-RESIZER_INPUT_DENSITY  = 200   
-
-LABEL_RATIO   = 1.0 * TEXIFY_INPUT_DENSITY / RESIZER_INPUT_DENSITY
-
-NUM_CLASSES   = 1      # 模型使用回归预测
-NUM_CHANNELS  = 1      # 输入单通道图片（灰度图）
-
-# Resizer在训练时，图片所固定的的大小
-RESIZER_IMG_SIZE = 448    
-# ============================================================================= #
--- a/src/models/ocr_model/README.md
+++ b/src/models/ocr_model/README.md
@@ -1,6 +0,0 @@
-* Encoder-Decoder架构
-
-* Encoder使用Deit_{BASE}
-
-* Decoder使用RoBERTa_{LARGE}
-    * Decoder的tokenizer也使用RoBERTa_{LARGE}的
--- a/src/models/ocr_model/model/TexTeller.py
+++ b/src/models/ocr_model/model/TexTeller.py
@@ -1,65 +1,45 @@
 from pathlib import Path

-from models.globals import (
+from ...globals import (
    VOCAB_SIZE,
-    OCR_IMG_SIZE,
-    OCR_IMG_CHANNELS,
+    FIXED_IMG_SIZE,
+    IMG_CHANNELS,
    MAX_TOKEN_SIZE
 )

 from transformers import (
-    ViTConfig,
-    ViTModel,
-    TrOCRConfig,
-    TrOCRForCausalLM,
    RobertaTokenizerFast,
    VisionEncoderDecoderModel,
+    VisionEncoderDecoderConfig
 )


 class TexTeller(VisionEncoderDecoderModel):
-    def __init__(self, decoder_path=None, tokenizer_path=None):
-        encoder = ViTModel(ViTConfig(
-            image_size=OCR_IMG_SIZE,
-            num_channels=OCR_IMG_CHANNELS
-        ))
-        decoder = TrOCRForCausalLM(TrOCRConfig(
-            vocab_size=VOCAB_SIZE,
-            max_position_embeddings=MAX_TOKEN_SIZE
-        ))
-        super().__init__(encoder=encoder, decoder=decoder)
+    REPO_NAME = 'OleehyO/TexTeller'
+    def __init__(self):
+        config = VisionEncoderDecoderConfig.from_pretrained(Path(__file__).resolve().parent / "config.json")
+        config.encoder.image_size              = FIXED_IMG_SIZE
+        config.encoder.num_channels            = IMG_CHANNELS
+        config.decoder.vocab_size              = VOCAB_SIZE
+        config.decoder.max_position_embeddings = MAX_TOKEN_SIZE
+
+        super().__init__(config=config)
    
    @classmethod
-    def from_pretrained(cls, model_path: str):
+    def from_pretrained(cls, model_path: str = None, use_onnx=False, onnx_provider=None):
+        if model_path is None or model_path == 'default':
+            if not use_onnx:
+                return VisionEncoderDecoderModel.from_pretrained(cls.REPO_NAME)
+            else:
+                from optimum.onnxruntime import ORTModelForVision2Seq
+                use_gpu = True if onnx_provider == 'cuda' else False
+                return ORTModelForVision2Seq.from_pretrained(cls.REPO_NAME, provider="CUDAExecutionProvider" if use_gpu else "CPUExecutionProvider")
        model_path = Path(model_path).resolve()
        return VisionEncoderDecoderModel.from_pretrained(str(model_path))

    @classmethod
-    def get_tokenizer(cls, tokenizer_path: str) -> RobertaTokenizerFast:
+    def get_tokenizer(cls, tokenizer_path: str = None) -> RobertaTokenizerFast:
+        if tokenizer_path is None or tokenizer_path == 'default':
+            return RobertaTokenizerFast.from_pretrained(cls.REPO_NAME)
        tokenizer_path = Path(tokenizer_path).resolve()
        return RobertaTokenizerFast.from_pretrained(str(tokenizer_path))
-
-
-if __name__ == "__main__":
-    pause = 1
-    # texteller = TexTeller()
-    # from ..utils.inference import inference
-    # model = TexTeller.from_pretrained('/home/lhy/code/TexTeller/src/models/ocr_model/model/ckpt')
-    # model.save_pretrained('/home/lhy/code/TexTeller/src/models/ocr_model/model/ckpt2', safe_serialization=False)
-    # tokenizer = TexTeller.get_tokenizer('/home/lhy/code/TeXify/src/models/tokenizer/roberta-tokenizer-550Kformulas')
-
-    # base = '/home/lhy/code/TeXify/src/models/ocr_model/model'
-    # imgs_path = [
-    #     # base + '/1.jpg',
-    #     # base + '/2.jpg',
-    #     # base + '/3.jpg',
-    #     # base + '/4.jpg',
-    #     # base + '/5.jpg',
-    #     # base + '/6.jpg',
-    #     base + '/foo.jpg'
-    # ]
-
-    # # res = inference(model, [img1, img2, img3, img4, img5, img6, img7], tokenizer)
-    # res = inference(model, imgs_path, tokenizer)
-    # pause = 1
-
--- a/src/models/ocr_model/model/config.json
+++ b/src/models/ocr_model/model/config.json
@@ -0,0 +1,168 @@
+{
+  "_name_or_path": "OleehyO/TexTeller",
+  "architectures": [
+    "VisionEncoderDecoderModel"
+  ],
+  "decoder": {
+    "_name_or_path": "",
+    "activation_dropout": 0.0,
+    "activation_function": "gelu",
+    "add_cross_attention": true,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "classifier_dropout": 0.0,
+    "cross_attention_hidden_size": 768,
+    "d_model": 1024,
+    "decoder_attention_heads": 16,
+    "decoder_ffn_dim": 4096,
+    "decoder_layerdrop": 0.0,
+    "decoder_layers": 12,
+    "decoder_start_token_id": 2,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.1,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "init_std": 0.02,
+    "is_decoder": true,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layernorm_embedding": true,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 1024,
+    "min_length": 0,
+    "model_type": "trocr",
+    "no_repeat_ngram_size": 0,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "scale_embedding": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_cache": false,
+    "use_learned_position_embeddings": true,
+    "vocab_size": 15000
+  },
+  "encoder": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_probs_dropout_prob": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "encoder_stride": 16,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "gelu",
+    "hidden_dropout_prob": 0.0,
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_size": 448,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-12,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "vit",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_channels": 1,
+    "num_hidden_layers": 12,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": 16,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "qkv_bias": false,
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  },
+  "is_encoder_decoder": true,
+  "model_type": "vision-encoder-decoder",
+  "tie_word_embeddings": false,
+  "transformers_version": "4.41.2",
+  "use_cache": true
+}
--- a/src/models/ocr_model/train/dataset/formulas.jsonl
+++ b/src/models/ocr_model/train/dataset/formulas.jsonl
@@ -0,0 +1,35 @@
+{"img_name": "0.png", "formula": "\\[\\mathbb{C}^{4}\\stackrel{{\\pi_{1}}}{{\\longleftarrow}}\\mathcal{ F}\\stackrel{{\\pi_{2}}}{{\\rightarrow}}\\mathcal{PT},\\]"}
+{"img_name": "1.png", "formula": "\\[W^{*}_{Z}(x_{1},x_{2})=W_{f\\lrcorner Z}(y_{1},y_{2})=\\mathcal{P}\\exp\\left( \\int_{\\gamma}A_{\\mu}dx^{\\mu}\\right).\\]"}
+{"img_name": "2.png", "formula": "\\[G=W^{*}_{Z}(q,p)=\\tilde{H}H^{-1}\\]"}
+{"img_name": "3.png", "formula": "\\[H=W^{*}_{Z}(p,x),\\ \\ \\tilde{H}=W^{*}_{Z}(q,x).\\]"}
+{"img_name": "4.png", "formula": "\\[v\\cdot f^{*}A|_{x}=(f\\lrcorner Z)_{*}v\\cdot A|_{f\\lrcorner Z(x)},\\quad x\\in Z, \\ v\\in T_{x}Z.\\]"}
+{"img_name": "5.png", "formula": "\\[(f\\lrcorner Z)_{*}v\\cdot A|_{f\\lrcorner Z(x)}=v^{\\alpha\\dot{\\alpha}}\\Big{(} \\frac{\\partial y^{\\beta\\dot{\\beta}}}{\\partial x^{\\alpha\\dot{\\alpha}}}A_{\\beta \\dot{\\beta}}\\Big{)}\\Big{|}_{f\\lrcorner Z(x)},\\ x\\in Z,\\ v\\in T_{x}Z,\\]"}
+{"img_name": "6.png", "formula": "\\[\\{T_{i},T_{j}\\}=\\{\\tilde{T}^{i},\\tilde{T}^{j}\\}=0,\\ \\ \\{T_{i},\\tilde{T}^{j}\\}=2i \\delta^{j}_{i}D,\\]"}
+{"img_name": "7.png", "formula": "\\[(\\partial_{s},q_{i},\\tilde{q}^{k})\\rightarrow(D,M^{j}_{i}T_{j},\\tilde{M}^{k}_ {l}\\tilde{T}^{l}),\\]"}
+{"img_name": "8.png", "formula": "\\[M^{i}_{j}\\tilde{M}^{j}_{k}=\\delta^{i}_{k}.\\]"}
+{"img_name": "9.png", "formula": "\\[Q_{i\\alpha}=q_{i\\alpha}+\\omega_{i\\alpha},\\ \\tilde{Q}^{i}_{\\dot{\\alpha}}=q^{i}_{ \\dot{\\alpha}}+\\tilde{\\omega}^{i}_{\\dot{\\alpha}},\\ D_{\\alpha\\dot{\\alpha}}= \\partial_{\\alpha\\dot{\\alpha}}+A_{\\alpha\\dot{\\alpha}}.\\]"}
+{"img_name": "10.png", "formula": "\\[\\hat{f}(g,\\theta^{i\\alpha},\\tilde{\\theta}^{\\dot{\\alpha}}_{j})=(f(g),[V^{-1}]^ {\\alpha}_{\\beta}\\theta^{i\\beta},[\\tilde{V}^{-1}]^{\\dot{\\alpha}}_{\\dot{\\beta}} \\tilde{\\theta}^{\\dot{\\beta}}_{j}),\\ g\\in{\\cal G},\\]"}
+{"img_name": "11.png", "formula": "\\[v^{\\beta\\dot{\\beta}}V^{\\alpha}_{\\beta}\\tilde{V}^{\\dot{\\alpha}}_{\\dot{\\beta}} =((f\\lrcorner L_{0})_{*}v)^{\\alpha\\dot{\\alpha}},\\]"}
+{"img_name": "12.png", "formula": "\\[\\omega_{i\\alpha}=\\tilde{\\theta}^{\\dot{\\alpha}}_{i}h_{\\alpha\\dot{\\alpha}}(x^{ \\beta\\dot{\\beta}},\\tau^{\\beta\\dot{\\beta}}),\\ \\ \\tilde{\\omega}^{i}_{\\alpha}=\\theta^{i\\alpha}\\tilde{h}_{\\alpha\\dot{\\alpha}}(x^{ \\beta\\dot{\\beta}},\\tau^{\\beta\\dot{\\beta}}),\\]"}
+{"img_name": "13.png", "formula": "\\[\\begin{split}&\\lambda^{\\alpha}\\hat{f}^{*}\\omega_{i\\alpha}(z)= \\tilde{\\theta}^{\\dot{\\beta}}_{i}\\lambda^{\\alpha}\\left(V^{\\beta}_{\\alpha}h_{ \\beta\\dot{\\beta}}(x^{\\prime},\\tau^{\\prime})\\right),\\\\ &\\tilde{\\lambda}^{\\dot{\\alpha}}\\hat{f}^{*}\\tilde{\\omega}^{i}_{ \\dot{\\alpha}}(z)=\\theta^{i\\beta}\\tilde{\\lambda}^{\\dot{\\alpha}}\\left(\\tilde{V}^ {\\dot{\\beta}}_{\\dot{\\alpha}}\\tilde{h}_{\\beta\\dot{\\beta}}(x^{\\prime},\\tau^{ \\prime})\\right),\\end{split}\\]"}
+{"img_name": "14.png", "formula": "\\[A_{\\alpha\\dot{\\alpha}}=A_{\\alpha\\dot{\\alpha}}(x^{\\beta\\dot{\\beta}},\\tau^{ \\beta\\dot{\\beta}})\\]"}
+{"img_name": "15.png", "formula": "\\[D=\\lambda^{\\alpha}\\tilde{\\lambda}^{\\dot{\\alpha}}D_{\\alpha\\dot{\\alpha}}\\]"}
+{"img_name": "16.png", "formula": "\\[D=\\lambda^{\\alpha}\\tilde{\\lambda}^{\\dot{\\alpha}}\\partial_{\\alpha\\dot{\\alpha}}\\]"}
+{"img_name": "17.png", "formula": "\\[[v_{1}\\cdot D^{*},v_{2}\\cdot D^{*}]=0\\]"}
+{"img_name": "18.png", "formula": "\\[\\Phi_{A}=(\\omega_{i\\alpha},\\tilde{\\omega}^{i}_{\\dot{\\alpha}},A_{\\alpha\\dot{ \\alpha}})\\]"}
+{"img_name": "19.png", "formula": "\\[\\hat{f}:{\\cal F}^{6|4N}\\rightarrow{\\cal F}^{6|4N}\\]"}
+{"img_name": "20.png", "formula": "\\[\\sigma=(s,\\xi^{i},\\tilde{\\xi}_{j})\\in\\mathbb{C}^{1|2N}\\]"}
+{"img_name": "21.png", "formula": "\\[\\tau^{\\alpha\\dot{\\alpha}}(h_{\\alpha\\dot{\\alpha}}+\\tilde{h}_{\\alpha\\dot{\\alpha} })=0\\]"}
+{"img_name": "22.png", "formula": "\\[\\tau^{\\alpha\\dot{\\alpha}}\\rightarrow[V^{-1}]^{\\alpha}_{\\beta}[\\tilde{V}^{-1}]^{ \\dot{\\alpha}}_{\\dot{\\beta}}\\tau^{\\beta\\dot{\\beta}}\\]"}
+{"img_name": "23.png", "formula": "\\[\\tau^{\\beta\\dot{\\beta}}=\\sum_{i}\\theta^{i\\beta}\\tilde{\\theta}^{\\dot{\\beta}}_{i}\\]"}
+{"img_name": "24.png", "formula": "\\[\\theta^{i\\alpha}\\omega_{i\\alpha}+\\tilde{\\theta}^{i}_{\\dot{\\alpha}}\\tilde{ \\omega}^{\\dot{\\alpha}}_{i}=0\\]"}
+{"img_name": "25.png", "formula": "\\[\\tilde{T}^{i}=\\tilde{\\lambda}^{\\dot{\\alpha}}\\tilde{Q}^{i}_{\\dot{\\alpha}}\\]"}
+{"img_name": "26.png", "formula": "\\[\\tilde{T}^{i}=\\tilde{\\lambda}^{\\dot{\\alpha}}\\tilde{q}^{i}_{\\dot{\\alpha}}\\]"}
+{"img_name": "27.png", "formula": "\\[\\tilde{\\lambda}^{\\dot{\\alpha}}f^{*}A_{\\alpha\\dot{\\alpha}}=H^{-1}\\tilde{ \\lambda}^{\\dot{\\alpha}}\\partial_{\\alpha\\dot{\\alpha}}H\\]"}
+{"img_name": "28.png", "formula": "\\[\\tilde{q}^{i}=\\partial_{\\tilde{\\xi}_{i}}+i\\xi^{i}\\partial_{s}\\]"}
+{"img_name": "29.png", "formula": "\\[\\tilde{q}^{i}_{\\dot{\\alpha}}=\\frac{\\partial}{\\partial\\tilde{\\theta}^{\\dot{ \\alpha}}_{i}}+i\\theta^{i\\alpha}\\frac{\\partial}{\\partial x^{\\alpha\\dot{\\alpha}}}\\]"}
+{"img_name": "30.png", "formula": "\\[f\\lrcorner L(z)=\\pi_{1}\\circ f(z,\\lambda,\\tilde{\\lambda})\\ \\forall z\\in L\\]"}
+{"img_name": "31.png", "formula": "\\[q_{i\\alpha}=\\frac{\\partial}{\\partial\\theta^{i\\alpha}}+i\\tilde{\\theta}^{\\dot{ \\alpha}}_{i}\\frac{\\partial}{\\partial x^{\\alpha\\dot{\\alpha}}}\\]"}
+{"img_name": "32.png", "formula": "\\[q_{i}=\\partial_{\\xi^{i}}+i\\tilde{\\xi}_{i}\\partial_{s}\\]"}
+{"img_name": "33.png", "formula": "\\[v^{\\alpha\\dot{\\alpha}}=\\lambda^{\\alpha}\\tilde{\\lambda}^{\\dot{\\alpha}}\\]"}
+{"img_name": "34.png", "formula": "\\[z^{A}=(x^{\\alpha\\dot{\\alpha}},\\theta^{i\\alpha},\\tilde{\\theta}^{\\dot{\\alpha}}_{ j})\\]"}
--- a/src/models/ocr_model/train/dataset/images/0.png
+++ b/src/models/ocr_model/train/dataset/images/0.png
--- a/src/models/ocr_model/train/dataset/images/1.png
+++ b/src/models/ocr_model/train/dataset/images/1.png
--- a/src/models/ocr_model/train/dataset/images/10.png
+++ b/src/models/ocr_model/train/dataset/images/10.png
--- a/src/models/ocr_model/train/dataset/images/11.png
+++ b/src/models/ocr_model/train/dataset/images/11.png
--- a/src/models/ocr_model/train/dataset/images/12.png
+++ b/src/models/ocr_model/train/dataset/images/12.png
--- a/src/models/ocr_model/train/dataset/images/13.png
+++ b/src/models/ocr_model/train/dataset/images/13.png
--- a/src/models/ocr_model/train/dataset/images/14.png
+++ b/src/models/ocr_model/train/dataset/images/14.png
--- a/src/models/ocr_model/train/dataset/images/15.png
+++ b/src/models/ocr_model/train/dataset/images/15.png
--- a/src/models/ocr_model/train/dataset/images/16.png
+++ b/src/models/ocr_model/train/dataset/images/16.png
--- a/src/models/ocr_model/train/dataset/images/17.png
+++ b/src/models/ocr_model/train/dataset/images/17.png
--- a/src/models/ocr_model/train/dataset/images/18.png
+++ b/src/models/ocr_model/train/dataset/images/18.png
--- a/src/models/ocr_model/train/dataset/images/19.png
+++ b/src/models/ocr_model/train/dataset/images/19.png
--- a/src/models/ocr_model/train/dataset/images/2.png
+++ b/src/models/ocr_model/train/dataset/images/2.png
--- a/src/models/ocr_model/train/dataset/images/20.png
+++ b/src/models/ocr_model/train/dataset/images/20.png
--- a/src/models/ocr_model/train/dataset/images/21.png
+++ b/src/models/ocr_model/train/dataset/images/21.png
--- a/src/models/ocr_model/train/dataset/images/22.png
+++ b/src/models/ocr_model/train/dataset/images/22.png
--- a/src/models/ocr_model/train/dataset/images/23.png
+++ b/src/models/ocr_model/train/dataset/images/23.png
--- a/src/models/ocr_model/train/dataset/images/24.png
+++ b/src/models/ocr_model/train/dataset/images/24.png
--- a/src/models/ocr_model/train/dataset/images/25.png
+++ b/src/models/ocr_model/train/dataset/images/25.png
--- a/src/models/ocr_model/train/dataset/images/26.png
+++ b/src/models/ocr_model/train/dataset/images/26.png
--- a/src/models/ocr_model/train/dataset/images/27.png
+++ b/src/models/ocr_model/train/dataset/images/27.png
--- a/src/models/ocr_model/train/dataset/images/28.png
+++ b/src/models/ocr_model/train/dataset/images/28.png
--- a/src/models/ocr_model/train/dataset/images/29.png
+++ b/src/models/ocr_model/train/dataset/images/29.png
--- a/src/models/ocr_model/train/dataset/images/3.png
+++ b/src/models/ocr_model/train/dataset/images/3.png
--- a/src/models/ocr_model/train/dataset/images/30.png
+++ b/src/models/ocr_model/train/dataset/images/30.png
--- a/src/models/ocr_model/train/dataset/images/31.png
+++ b/src/models/ocr_model/train/dataset/images/31.png
--- a/src/models/ocr_model/train/dataset/images/32.png
+++ b/src/models/ocr_model/train/dataset/images/32.png
--- a/src/models/ocr_model/train/dataset/images/33.png
+++ b/src/models/ocr_model/train/dataset/images/33.png
--- a/src/models/ocr_model/train/dataset/images/34.png
+++ b/src/models/ocr_model/train/dataset/images/34.png
--- a/src/models/ocr_model/train/dataset/images/4.png
+++ b/src/models/ocr_model/train/dataset/images/4.png
--- a/src/models/ocr_model/train/dataset/images/5.png
+++ b/src/models/ocr_model/train/dataset/images/5.png
--- a/src/models/ocr_model/train/dataset/images/6.png
+++ b/src/models/ocr_model/train/dataset/images/6.png
--- a/src/models/ocr_model/train/dataset/images/7.png
+++ b/src/models/ocr_model/train/dataset/images/7.png
--- a/src/models/ocr_model/train/dataset/images/8.png
+++ b/src/models/ocr_model/train/dataset/images/8.png
--- a/src/models/ocr_model/train/dataset/images/9.png
+++ b/src/models/ocr_model/train/dataset/images/9.png
--- a/src/models/ocr_model/train/dataset/loader.py
+++ b/src/models/ocr_model/train/dataset/loader.py
@@ -0,0 +1,50 @@
+from PIL import Image
+from pathlib import Path
+import datasets
+import json
+
+DIR_URL = Path('absolute/path/to/dataset/directory')
+# e.g. DIR_URL = Path('/home/OleehyO/TeXTeller/src/models/ocr_model/train/dataset')
+
+
+class LatexFormulas(datasets.GeneratorBasedBuilder):
+    BUILDER_CONFIGS = []
+
+    def _info(self):
+        return datasets.DatasetInfo(
+            features=datasets.Features({
+                "image": datasets.Image(),
+                "latex_formula": datasets.Value("string")
+            })
+        )
+
+    def _split_generators(self, dl_manager: datasets.DownloadManager):
+        dir_path = Path(dl_manager.download(str(DIR_URL)))
+        assert dir_path.is_dir()
+
+        return [
+            datasets.SplitGenerator(
+                name=datasets.Split.TRAIN,
+                gen_kwargs={
+                    'dir_path': dir_path,
+                }
+            )
+        ]
+
+    def _generate_examples(self, dir_path: Path):
+        images_path   = dir_path / 'images'
+        formulas_path = dir_path / 'formulas.jsonl'
+
+        img2formula = {}
+        with formulas_path.open('r', encoding='utf-8') as f:
+            for line in f:
+                single_json = json.loads(line)
+                img2formula[single_json['img_name']] = single_json['formula']
+
+        for img_path in images_path.iterdir():
+            if img_path.suffix not in ['.jpg', '.png']:
+                continue
+            yield str(img_path), {
+                "image": Image.open(img_path),
+                "latex_formula": img2formula[img_path.name]
+            }
--- a/src/models/ocr_model/train/fonts/JINKY.ttf
+++ b/src/models/ocr_model/train/fonts/JINKY.ttf
--- a/src/models/ocr_model/train/fonts/Rotodesign
+++ b/src/models/ocr_model/train/fonts/Rotodesign
@@ -1,14 +0,0 @@
-Congratulations on your download of this fine Rotodesign brand font product. We hope it will bring you many hours of typesetting pleasure and riches beyond your wildest dreams. We DO NOT, however, guarantee either of these things. Your mileage may vary. 
-
-This font is freeware, and is provided with no warranties as to its quality or its utility. After all, how much did you pay? Anyway, this font can be copied and used as you wish provided all copies include this readme file. Don't lie to your friends and tell 'em you made it yourself. You only cheat yourself when you do that. In the unlikely event you use this font to design something really cool or that makes you a ton of cash money, that's okay with me, just send me a copy or two of the finished item, and remember me when you get rich and famous. Enjoy!
-
-©2006 
-Patrick Broderick
-Rotodesign
-
-http://www.rotodesign.com
-roto@rotodesign.net
-
-Rotodesign
-1288 Columbus Ave. #176
-San Francisco, CA 94133
--- a/src/models/ocr_model/train/fonts/font_type.zip
+++ b/src/models/ocr_model/train/fonts/font_type.zip
--- a/src/models/ocr_model/train/google_bleu/google_bleu.py
+++ b/src/models/ocr_model/train/google_bleu/google_bleu.py
@@ -1,168 +0,0 @@
-# Copyright 2020 The HuggingFace Evaluate Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-""" Google BLEU (aka GLEU) metric. """
-
-from typing import Dict, List
-
-import datasets
-from nltk.translate import gleu_score
-
-import evaluate
-from evaluate import MetricInfo
-
-from .tokenizer_13a import Tokenizer13a
-
-
-_CITATION = """\
-@misc{wu2016googles,
-      title={Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation},
-      author={Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey
-              and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin
-              Johnson and Xiaobing Liu and Łukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto
-              Kazawa and Keith Stevens and George Kurian and Nishant Patil and Wei Wang and Cliff Young and
-              Jason Smith and Jason Riesa and Alex Rudnick and Oriol Vinyals and Greg Corrado and Macduff Hughes
-              and Jeffrey Dean},
-      year={2016},
-      eprint={1609.08144},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
-}
-"""
-
-_DESCRIPTION = """\
-The BLEU score has some undesirable properties when used for single
-sentences, as it was designed to be a corpus measure. We therefore
-use a slightly different score for our RL experiments which we call
-the 'GLEU score'. For the GLEU score, we record all sub-sequences of
-1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
-compute a recall, which is the ratio of the number of matching n-grams
-to the number of total n-grams in the target (ground truth) sequence,
-and a precision, which is the ratio of the number of matching n-grams
-to the number of total n-grams in the generated output sequence. Then
-GLEU score is simply the minimum of recall and precision. This GLEU
-score's range is always between 0 (no matches) and 1 (all match) and
-it is symmetrical when switching output and target. According to
-our experiments, GLEU score correlates quite well with the BLEU
-metric on a corpus level but does not have its drawbacks for our per
-sentence reward objective.
-"""
-
-_KWARGS_DESCRIPTION = """\
-Computes corpus-level Google BLEU (GLEU) score of translated segments against one or more references.
-Instead of averaging the sentence level GLEU scores (i.e. macro-average precision), Wu et al. (2016) sum up the matching
-tokens and the max of hypothesis and reference tokens for each sentence, then compute using the aggregate values.
-
-Args:
-    predictions (list of str): list of translations to score.
-    references (list of list of str): list of lists of references for each translation.
-    tokenizer : approach used for tokenizing `predictions` and `references`.
-        The default tokenizer is `tokenizer_13a`, a minimal tokenization approach that is equivalent to `mteval-v13a`, used by WMT.
-        This can be replaced by any function that takes a string as input and returns a list of tokens as output.
-    min_len (int): The minimum order of n-gram this function should extract. Defaults to 1.
-    max_len (int): The maximum order of n-gram this function should extract. Defaults to 4.
-
-Returns:
-    'google_bleu': google_bleu score
-
-Examples:
-    Example 1:
-        >>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
-        'he read the book because he was interested in world history']
-        >>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat'], \
-        ['he was interested in world history because he read the book']]
-        >>> google_bleu = evaluate.load("google_bleu")
-        >>> results = google_bleu.compute(predictions=predictions, references=references)
-        >>> print(round(results["google_bleu"], 2))
-        0.44
-
-    Example 2:
-        >>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
-        'he read the book because he was interested in world history']
-        >>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat', \
-        'It is a guide to action that ensures that the rubber duck will never heed the cat commands', \
-        'It is the practical guide for the rubber duck army never to heed the directions of the cat'], \
-        ['he was interested in world history because he read the book']]
-        >>> google_bleu = evaluate.load("google_bleu")
-        >>> results = google_bleu.compute(predictions=predictions, references=references)
-        >>> print(round(results["google_bleu"], 2))
-        0.61
-
-    Example 3:
-        >>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
-        'he read the book because he was interested in world history']
-        >>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat', \
-        'It is a guide to action that ensures that the rubber duck will never heed the cat commands', \
-        'It is the practical guide for the rubber duck army never to heed the directions of the cat'], \
-        ['he was interested in world history because he read the book']]
-        >>> google_bleu = evaluate.load("google_bleu")
-        >>> results = google_bleu.compute(predictions=predictions, references=references, min_len=2)
-        >>> print(round(results["google_bleu"], 2))
-        0.53
-
-    Example 4:
-        >>> predictions = ['It is a guide to action which ensures that the rubber duck always disobeys the commands of the cat', \
-        'he read the book because he was interested in world history']
-        >>> references = [['It is the guiding principle which guarantees the rubber duck forces never being under the command of the cat', \
-        'It is a guide to action that ensures that the rubber duck will never heed the cat commands', \
-        'It is the practical guide for the rubber duck army never to heed the directions of the cat'], \
-        ['he was interested in world history because he read the book']]
-        >>> google_bleu = evaluate.load("google_bleu")
-        >>> results = google_bleu.compute(predictions=predictions,references=references, min_len=2, max_len=6)
-        >>> print(round(results["google_bleu"], 2))
-        0.4
-"""
-
-
-@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
-class GoogleBleu(evaluate.Metric):
-    def _info(self) -> MetricInfo:
-        return evaluate.MetricInfo(
-            description=_DESCRIPTION,
-            citation=_CITATION,
-            inputs_description=_KWARGS_DESCRIPTION,
-            features=[
-                datasets.Features(
-                    {
-                        "predictions": datasets.Value("string", id="sequence"),
-                        "references": datasets.Sequence(datasets.Value("string", id="sequence"), id="references"),
-                    }
-                ),
-                datasets.Features(
-                    {
-                        "predictions": datasets.Value("string", id="sequence"),
-                        "references": datasets.Value("string", id="sequence"),
-                    }
-                ),
-            ],
-        )
-
-    def _compute(
-        self,
-        predictions: List[str],
-        references: List[List[str]],
-        tokenizer=Tokenizer13a(),
-        min_len: int = 1,
-        max_len: int = 4,
-    ) -> Dict[str, float]:
-        # if only one reference is provided make sure we still use list of lists
-        if isinstance(references[0], str):
-            references = [[ref] for ref in references]
-
-        references = [[tokenizer(r) for r in ref] for ref in references]
-        predictions = [tokenizer(p) for p in predictions]
-        return {
-            "google_bleu": gleu_score.corpus_gleu(
-                list_of_references=references, hypotheses=predictions, min_len=min_len, max_len=max_len
-            )
-        }
--- a/src/models/ocr_model/train/google_bleu/tokenizer_13a.py
+++ b/src/models/ocr_model/train/google_bleu/tokenizer_13a.py
@@ -1,100 +0,0 @@
-# Source: https://github.com/mjpost/sacrebleu/blob/master/sacrebleu/tokenizers/tokenizer_13a.py
-# Copyright 2020 SacreBLEU Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import re
-from functools import lru_cache
-
-
-class BaseTokenizer:
-    """A base dummy tokenizer to derive from."""
-
-    def signature(self):
-        """
-        Returns a signature for the tokenizer.
-        :return: signature string
-        """
-        return "none"
-
-    def __call__(self, line):
-        """
-        Tokenizes an input line with the tokenizer.
-        :param line: a segment to tokenize
-        :return: the tokenized line
-        """
-        return line
-
-
-class TokenizerRegexp(BaseTokenizer):
-    def signature(self):
-        return "re"
-
-    def __init__(self):
-        self._re = [
-            # language-dependent part (assuming Western languages)
-            (re.compile(r"([\{-\~\[-\` -\&\(-\+\:-\@\/])"), r" \1 "),
-            # tokenize period and comma unless preceded by a digit
-            (re.compile(r"([^0-9])([\.,])"), r"\1 \2 "),
-            # tokenize period and comma unless followed by a digit
-            (re.compile(r"([\.,])([^0-9])"), r" \1 \2"),
-            # tokenize dash when preceded by a digit
-            (re.compile(r"([0-9])(-)"), r"\1 \2 "),
-            # one space only between words
-            # NOTE: Doing this in Python (below) is faster
-            # (re.compile(r'\s+'), r' '),
-        ]
-
-    @lru_cache(maxsize=2**16)
-    def __call__(self, line):
-        """Common post-processing tokenizer for `13a` and `zh` tokenizers.
-        :param line: a segment to tokenize
-        :return: the tokenized line
-        """
-        for (_re, repl) in self._re:
-            line = _re.sub(repl, line)
-
-        # no leading or trailing spaces, single space within words
-        # return ' '.join(line.split())
-        # This line is changed with regards to the original tokenizer (seen above) to return individual words
-        return line.split()
-
-
-class Tokenizer13a(BaseTokenizer):
-    def signature(self):
-        return "13a"
-
-    def __init__(self):
-        self._post_tokenizer = TokenizerRegexp()
-
-    @lru_cache(maxsize=2**16)
-    def __call__(self, line):
-        """Tokenizes an input line using a relatively minimal tokenization
-        that is however equivalent to mteval-v13a, used by WMT.
-
-        :param line: a segment to tokenize
-        :return: the tokenized line
-        """
-
-        # language-independent part:
-        line = line.replace("<skipped>", "")
-        line = line.replace("-\n", "")
-        line = line.replace("\n", " ")
-
-        if "&" in line:
-            line = line.replace("&quot;", '"')
-            line = line.replace("&amp;", "&")
-            line = line.replace("&lt;", "<")
-            line = line.replace("&gt;", ">")
-
-        return self._post_tokenizer(f" {line} ")
--- a/src/models/ocr_model/train/train.py
+++ b/src/models/ocr_model/train/train.py
@@ -4,7 +4,13 @@ from functools import partial
 from pathlib import Path

 from datasets import load_dataset
-from transformers import Trainer, TrainingArguments, Seq2SeqTrainer, Seq2SeqTrainingArguments, GenerationConfig
+from transformers import (
+    Trainer, 
+    TrainingArguments, 
+    Seq2SeqTrainer, 
+    Seq2SeqTrainingArguments, 
+    GenerationConfig
+)

 from .training_args import CONFIG
 from ..model.TexTeller import TexTeller
@@ -15,17 +21,6 @@ from ...globals import MAX_TOKEN_SIZE, MIN_WIDTH, MIN_HEIGHT

 def train(model, tokenizer, train_dataset, eval_dataset, collate_fn_with_tokenizer):
    training_args = TrainingArguments(**CONFIG)
-    debug_mode = False
-    if debug_mode:
-        training_args.auto_find_batch_size = False
-        training_args.num_train_epochs = 2
-        # training_args.per_device_train_batch_size = 3
-        training_args.per_device_train_batch_size = 2
-        training_args.per_device_eval_batch_size = 2 * training_args.per_device_train_batch_size
-        training_args.jit_mode_eval = False
-        training_args.torch_compile = False
-        training_args.dataloader_num_workers = 1
-    
    trainer = Trainer(
        model,
        training_args,
@@ -38,14 +33,13 @@ def train(model, tokenizer, train_dataset, eval_dataset, collate_fn_with_tokeniz
    )

    trainer.train(resume_from_checkpoint=None)
-    # trainer.train(resume_from_checkpoint='/home/lhy/code/TexTeller/src/models/ocr_model/train/train_result/TexTellerv2/checkpoint-288000')


 def evaluate(model, tokenizer, eval_dataset, collate_fn):
    eval_config = CONFIG.copy()
    eval_config['predict_with_generate'] = True
    generate_config = GenerationConfig(
-        max_length=MAX_TOKEN_SIZE-100,
+        max_new_tokens=MAX_TOKEN_SIZE,
        num_beams=1,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
@@ -53,7 +47,6 @@ def evaluate(model, tokenizer, eval_dataset, collate_fn):
        bos_token_id=tokenizer.bos_token_id,
    )
    eval_config['generation_config'] = generate_config
-    eval_config['auto_find_batch_size'] = False
    seq2seq_config = Seq2SeqTrainingArguments(**eval_config)

    trainer = Seq2SeqTrainer(
@@ -66,49 +59,51 @@ def evaluate(model, tokenizer, eval_dataset, collate_fn):
        compute_metrics=partial(bleu_metric, tokenizer=tokenizer)
    )

-    res = trainer.evaluate()
-    print(res)
+    eval_res = trainer.evaluate()
+    print(eval_res)
    

 if __name__ == '__main__':
-    cur_path = os.getcwd()
    script_dirpath = Path(__file__).resolve().parent
    os.chdir(script_dirpath)

-    dataset = load_dataset(
-        '/home/lhy/code/TexTeller/src/models/ocr_model/train/data/loader.py'
-    )['train']
-    tokenizer = TexTeller.get_tokenizer('/home/lhy/code/TexTeller/src/models/tokenizer/roberta-tokenizer-7Mformulas')
-    filter_fn_with_tokenizer = partial(filter_fn, tokenizer=tokenizer)
-
-    # dataset = dataset.filter(lambda x: x['image'].height > MIN_HEIGHT and x['image'].width > MIN_WIDTH)
-    dataset = dataset.filter(filter_fn_with_tokenizer, num_proc=16)
+    dataset = load_dataset(str(Path('./dataset/loader.py').resolve()))['train']
+    dataset = dataset.filter(lambda x: x['image'].height > MIN_HEIGHT and x['image'].width > MIN_WIDTH)
    dataset = dataset.shuffle(seed=42)
    dataset = dataset.flatten_indices()

+    tokenizer = TexTeller.get_tokenizer()
+    # If you want use your own tokenizer, please modify the path to your tokenizer
+    #+tokenizer = TexTeller.get_tokenizer('/path/to/your/tokenizer')
+    filter_fn_with_tokenizer = partial(filter_fn, tokenizer=tokenizer)
+    dataset = dataset.filter(
+        filter_fn_with_tokenizer,
+        num_proc=8
+    )
+
    map_fn = partial(tokenize_fn, tokenizer=tokenizer)
-    tokenized_dataset = dataset.map(map_fn, batched=True, remove_columns=dataset.column_names, num_proc=8, load_from_cache_file=True)
+    tokenized_dataset = dataset.map(map_fn, batched=True, remove_columns=dataset.column_names, num_proc=8)

-    split_dataset = tokenized_dataset.train_test_split(test_size=0.005, seed=42)
+    # Split dataset into train and eval, ratio 9:1
+    split_dataset = tokenized_dataset.train_test_split(test_size=0.1, seed=42)    
    train_dataset, eval_dataset = split_dataset['train'], split_dataset['test']
-
    train_dataset = train_dataset.with_transform(img_train_transform)
    eval_dataset  = eval_dataset.with_transform(img_inf_transform)
-
    collate_fn_with_tokenizer = partial(collate_fn, tokenizer=tokenizer)
-    # model = TexTeller()
-    model = TexTeller.from_pretrained('/home/lhy/code/TexTeller/src/models/ocr_model/model/ckpt')

-    # =================  debug  =======================
-    # foo = train_dataset[:50]
-    # bar = eval_dataset[:50]
-    # =================  debug  =======================
+    # Train from scratch
+    model = TexTeller()
+    # or train from TexTeller pre-trained model: model = TexTeller.from_pretrained()
+
+    # If you want to train from pre-trained model, please modify the path to your pre-trained checkpoint
+    #+e.g.
+    #+model = TexTeller.from_pretrained(
+    #+    '/path/to/your/model_checkpoint'
+    #+)

    enable_train = True
-    enable_evaluate = True
+    enable_evaluate = False
    if enable_train:
        train(model, tokenizer, train_dataset, eval_dataset, collate_fn_with_tokenizer)  
-    if enable_evaluate:
+    if enable_evaluate and len(eval_dataset) > 0:
        evaluate(model, tokenizer, eval_dataset, collate_fn_with_tokenizer)
-
-    os.chdir(cur_path)
--- a/src/models/ocr_model/train/training_args.py
+++ b/src/models/ocr_model/train/training_args.py
@@ -1,84 +1,38 @@
 CONFIG = {
-    "seed": 42,                            # 随机种子，用于确保实验的可重复性
-    "use_cpu": False,                      # 是否使用cpu（刚开始测试代码的时候先用cpu跑会更容易debug）
-    # "data_seed": 42,                     # data sampler的采样也固定
-    # "full_determinism": True,            # 使整个训练完全固定（这个设置会有害于模型训练，只用于debug）
+    "seed": 42,                            # Random seed for reproducibility
+    "use_cpu": False,                      # Whether to use CPU (it's easier to debug with CPU when starting to test the code)
+    "learning_rate": 5e-5,                 # Learning rate
+    "num_train_epochs": 10,                # Total number of training epochs
+    "per_device_train_batch_size": 4,      # Batch size per GPU for training
+    "per_device_eval_batch_size": 8,       # Batch size per GPU for evaluation

-    "output_dir": "train_result/TexTellerv3",          # 输出目录
-    "overwrite_output_dir": False,         # 如果输出目录存在，不删除原先的内容
-    "report_to": ["tensorboard"],          # 输出日志到TensorBoard，
-                                           #+通过在命令行：tensorboard --logdir ./logs 来查看日志
+    "output_dir": "train_result",          # Output directory
+    "overwrite_output_dir": False,         # If the output directory exists, do not delete its content
+    "report_to": ["tensorboard"],          # Report logs to TensorBoard

-    "logging_dir": None,                   # TensorBoard日志文件的存储目录(使用默认值)
-    "log_level": "warning",                   # 其他可选:‘debug’, ‘info’, ‘warning’, ‘error’ and ‘critical’（由低级别到高级别）
-    "logging_strategy": "steps",           # 每隔一定步数记录一次日志
-    "logging_steps": 4000,                  # 记录日志的步数间隔，可以是int也可以是(0~1)的float，当是float时表示总的训练步数的ratio(比方说可以设置成1.0 / 2000)
-                                           #+通常与eval_steps一致
-    "logging_nan_inf_filter": False,       # 对loss=nan或inf进行记录
+    "save_strategy": "steps",              # Strategy to save checkpoints
+    "save_steps": 500,                     # Interval of steps to save checkpoints, can be int or a float (0~1), when float it represents the ratio of total training steps (e.g., can set to 1.0 / 2000)
+    "save_total_limit": 5,                 # Maximum number of models to save. The oldest models will be deleted if this number is exceeded

-    "num_train_epochs": 4,                # 总的训练轮数
-    # "max_steps": 3,                      # 训练的最大步骤数。如果设置了这个参数，
-                                           #+那么num_train_epochs将被忽略（通常用于调试）
+    "logging_strategy": "steps",           # Log every certain number of steps
+    "logging_steps": 500,                  # Number of steps between each log
+    "logging_nan_inf_filter": False,       # Record logs for loss=nan or inf

-    # "label_names": ['your_label_name'],  # 指定data_loader中的标签名，如果不指定则默认为'labels'
+    "optim": "adamw_torch",                # Optimizer
+    "lr_scheduler_type": "cosine",         # Learning rate scheduler
+    "warmup_ratio": 0.1,                   # Ratio of warmup steps in total training steps (e.g., for 1000 steps, the first 100 steps gradually increase lr from 0 to the set lr)
+    "max_grad_norm": 1.0,                  # For gradient clipping, ensure the norm of the gradients does not exceed 1.0 (default 1.0)
+    "fp16": False,                         # Whether to use 16-bit floating point for training (generally not recommended, as loss can easily explode)
+    "bf16": False,                         # Whether to use Brain Floating Point (bfloat16) for training (recommended if architecture supports it)
+    "gradient_accumulation_steps": 1,      # Gradient accumulation steps, consider this parameter to achieve large batch size effects when batch size cannot be large
+    "jit_mode_eval": False,                # Whether to use PyTorch jit trace during eval (can speed up the model, but the model must be static, otherwise will throw errors)
+    "torch_compile": False,                # Whether to use torch.compile to compile the model (for better training and inference performance)

-    "per_device_train_batch_size": 3,    # 每个GPU的batch size
-    "per_device_eval_batch_size": 6,      # 每个GPU的evaluation batch size
-    # "auto_find_batch_size": True,          # 自动搜索合适的batch size（指数decay）
-    "auto_find_batch_size": False,          # 自动搜索合适的batch size（指数decay）
+    "dataloader_pin_memory": True,         # Can speed up data transfer between CPU and GPU
+    "dataloader_num_workers": 1,           # Default is not to use multiprocessing for data loading, usually set to 4*number of GPUs used

-    "optim": "adamw_torch",                # 还提供了很多AdamW的变体（相较于经典的AdamW更加高效）
-                                           #+当设置了optim后，就不需要在Trainer中传入optimizer
-    "lr_scheduler_type": "cosine",         # 设置lr_scheduler
-    "warmup_ratio": 0.1,                   # warmup占整个训练steps的比例(假如训练1000步，那么前100步就是从lr=0慢慢长到参数设定的lr)
-    # "warmup_steps": 500,                 # 预热步数, 这个参数与warmup_ratio是矛盾的
-    "weight_decay": 0,                     # 权重衰减
-    "learning_rate": 5e-5,                 # 学习率
-    "max_grad_norm": 1.0,                  # 用于梯度裁剪，确保梯度的范数不超过1.0（默认1.0）
-    "fp16": False,                         # 是否使用16位浮点数进行训练（一般不推荐，loss很容易炸）
-    "bf16": False,                         # 是否使用16位宽浮点数进行训练（如果架构支持的话推荐使用）
-    "gradient_accumulation_steps": 2,      # 梯度累积步数，当batch size无法开很大时，可以考虑这个参数来实现大batch size的效果
-    "gradient_checkpointing": False,       # 当为True时，会在forward时适当丢弃一些中间量（用于backward），从而减轻显存压力（但会增加forward的时间）
-    "label_smoothing_factor": 0.0,         # softlabel，等于0时表示未开启
-    # "debug": "underflow_overflow",       # 训练时检查溢出，如果发生，则会发出警告。（该模式通常用于debug）
-    "jit_mode_eval": False,                 # 是否在eval的时候使用PyTorch jit trace（可以加速模型，但模型必须是静态的，否则会报错）
-    "torch_compile": False,                 # 是否使用torch.compile来编译模型（从而获得更好的训练和推理性能）
-                                           #+ 要求torch > 2.0，这个功能很好使，当模型跑通的时候可以开起来
-    # "deepspeed": "your_json_path",       #  使用deepspeed来训练，需要指定ds_config.json的路径
-                                           #+ 在Trainer中使用Deepspeed时一定要注意ds_config.json中的配置是否与Trainer的一致（如学习率，batch size，梯度累积步数等）
-                                           #+ 如果不一致，会出现很奇怪的bug（而且一般还很难发现）													
+    "evaluation_strategy": "steps",        # Evaluation strategy, can be "steps" or "epoch"
+    "eval_steps": 500,                     # If evaluation_strategy="step"

-    "dataloader_pin_memory": True,         # 可以加快数据在cpu和gpu之间转移的速度
-    "dataloader_num_workers": 16,          # 默认不会使用多进程来加载数据，通常设成4*所用的显卡数
-    "dataloader_drop_last": True,          # 丢掉最后一个minibatch，保证训练的梯度稳定
-
-    "evaluation_strategy": "steps",        # 评估策略，可以是"steps"或"epoch"
-    "eval_steps": 4000,                     # if evaluation_strategy="step"
-                                           #+默认情况下与logging_steps一样，可以是int也可以是(0~1)的float，当是float时表示总的训练步数的ratio(比方说可以设置成1.0 / 2000)
-
-    "save_strategy": "steps",              # 保存checkpoint的策略
-    "save_steps": 4000,                     # checkpoint保存的步数间隔，可以是int也可以是(0~1)的float，当是float时表示总的训练步数的ratio(比方说可以设置成1.0 / 2000)
-    "save_total_limit": 10,                 # 保存的模型的最大数量。如果超过这个数量，最旧的模型将被删除
-
-    "load_best_model_at_end": True,        # 训练结束时是否加载最佳模型
-                                           #+当设置True时，会保存训练时评估结果最好的checkpoint
-                                           #+当设置True时，evaluation_strategy必须与save_strategy一样，并且save_steps必须是eval_steps的整数倍
-    "metric_for_best_model": "eval_loss",  # 用于选择最佳模型的指标(必须与load_best_model_at_end一起用)
-                                           #+可以使用compute_metrics输出的evaluation的结果中（一个字典）的某个值
-                                           #+注意：Trainer会在compute_metrics输出的字典的键前面加上一个prefix，默认就是“eval_”
-    "greater_is_better": False,            # 指标值越小越好(必须与metric_for_best_model一起用)
-
-    "do_train": True,                      # 是否进行训练，通常用于调试
-    "do_eval": True,                       # 是否进行评估，通常用于调试
-
-    "remove_unused_columns": False,        # 是否删除没有用到的列（特征），默认为True
-                                           #+当删除了没用到的列后，making it easier to unpack inputs into the model’s call function
-    #+注意：remove_unused_columns去除列的操作会把传入的dataset的columns_names与模型forward方法中的参数名进行配对，对于不存在forward方法中的列名就会直接删掉整个feature
-    #+因此如果在dataset.with_transform(..)中给数据进行改名，那么这个remove操作会直接把原始的数据直接删掉，从而导致之后会拿到一个空的dataset，导致在对dataset进行切片取值时出问题
-    #+例如读进来的dataset图片对应的feature name叫"images"，而模型forward方法中对应的参数名叫“pixel_values”，
-    #+此时如果是在data.withtransfrom(..)中根据这个"images"生成其他模型forward方法中需要的参数，然后再把"images"改名成“pixel_values”，那么整个过程就会出问题
-    #+因为设置了remove_unused_columns=True后，会先给dataset进行列名检查，然后“images”这个feature会直接被删掉（导致with_transform的transform_fn拿不到“images”这个feature）
-    #+所以一个good practice就是：对于要改名的特征，先提前使用dataset.rename_column进行改名
-
-    "push_to_hub": False,                  # 是否训练完后上传hub，需要先在命令行：huggingface-cli login进行登录认证的配置，配置完后，认证信息会存到cache文件夹里
+    "remove_unused_columns": False,        # Don't change this unless you really know what you are doing.
 }
--- a/src/models/ocr_model/utils/helpers.py
+++ b/src/models/ocr_model/utils/helpers.py
@@ -1,37 +1,24 @@
 import cv2
 import numpy as np
 from typing import List
-from PIL import Image


 def convert2rgb(image_paths: List[str]) -> List[np.ndarray]:
-    # 输出的np.ndarray的格式为：[H, W, C]（通道在第三维），通道的排列顺序为RGB
    processed_images = []
-
    for path in image_paths:
-        # 读取图片
        image = cv2.imread(path, cv2.IMREAD_UNCHANGED)
-
        if image is None:
            print(f"Image at {path} could not be read.")
            continue
-
-        # 检查图片是否使用 uint16 类型
        if image.dtype == np.uint16:
-            raise ValueError(f"Image at {path} is stored in uint16, which is not supported.")
+            print(f'Converting {path} to 8-bit, image may be lossy.')
+            image = cv2.convertScaleAbs(image, alpha=(255.0/65535.0))

-        # 获取图片通道数
        channels = 1 if len(image.shape) == 2 else image.shape[2]
-
-        # 如果是 RGBA (4通道), 转换为 RGB
        if channels == 4:
            image = cv2.cvtColor(image, cv2.COLOR_BGRA2RGB)
-
-        # 如果是 I 模式 (单通道灰度图), 转换为 RGB
        elif channels == 1:
            image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
-
-        # 如果是 BGR (3通道), 转换为 RGB
        elif channels == 3:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        processed_images.append(image)
--- a/src/models/ocr_model/utils/inference.py
+++ b/src/models/ocr_model/utils/inference.py
@@ -1,33 +1,43 @@
 import torch
+import numpy as np

 from transformers import RobertaTokenizerFast, GenerationConfig
-from typing import List
+from typing import List, Union

-from models.ocr_model.model.TexTeller import TexTeller
-from models.ocr_model.utils.transforms import inference_transform
-from models.ocr_model.utils.helpers import convert2rgb
-from models.globals import MAX_TOKEN_SIZE
+from .transforms import inference_transform
+from .helpers import convert2rgb
+from ..model.TexTeller import TexTeller
+from ...globals import MAX_TOKEN_SIZE


 def inference(
    model: TexTeller, 
    tokenizer: RobertaTokenizerFast,
-    imgs_path: List[str], 
-    use_cuda: bool,
+    imgs: Union[List[str], List[np.ndarray]], 
+    accelerator: str = 'cpu',
    num_beams: int = 1,
+    max_tokens = None
 ) -> List[str]:
+    if imgs == []:
+        return []
+    if hasattr(model, 'eval'):
+        # not onnx session, turn model.eval()
        model.eval()
-    imgs = convert2rgb(imgs_path)
+    if isinstance(imgs[0], str):
+        imgs = convert2rgb(imgs) 
+    else:  # already numpy array(rgb format)
+        assert isinstance(imgs[0], np.ndarray)
+        imgs = imgs 
    imgs = inference_transform(imgs)
    pixel_values = torch.stack(imgs)

-    if use_cuda:
-        model = model.to('cuda')
-        pixel_values = pixel_values.to('cuda')
-
+    if hasattr(model, 'eval'):
+        # not onnx session, move weights to device
+        model = model.to(accelerator)
+    pixel_values = pixel_values.to(accelerator)

    generate_config = GenerationConfig(
-        max_new_tokens=MAX_TOKEN_SIZE,
+        max_new_tokens=MAX_TOKEN_SIZE if max_tokens is None else max_tokens,
        num_beams=num_beams,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
--- a/src/models/ocr_model/utils/metrics.py
+++ b/src/models/ocr_model/utils/metrics.py
@@ -1,14 +1,20 @@
 import evaluate
 import numpy as np
-from transformers import EvalPrediction, RobertaTokenizer
-from typing import Dict
+import os

-def bleu_metric(eval_preds:EvalPrediction, tokenizer:RobertaTokenizer) -> Dict:
-    metric = evaluate.load('/home/lhy/code/TexTeller/src/models/ocr_model/train/google_bleu')  # 这里需要联网，所以会卡住
+from pathlib import Path
+from typing import Dict
+from transformers import EvalPrediction, RobertaTokenizer
+
+
+def bleu_metric(eval_preds: EvalPrediction, tokenizer: RobertaTokenizer) -> Dict:
+    cur_dir = Path(os.getcwd())
+    os.chdir(Path(__file__).resolve().parent)
+    metric = evaluate.load('google_bleu')  # Will download the metric from huggingface if not already downloaded
+    os.chdir(cur_dir)
    
    logits, labels = eval_preds.predictions, eval_preds.label_ids
    preds = logits
-    # preds = np.argmax(logits, axis=1)  # 把logits转成对应的预测标签

    labels = np.where(labels == -100, 1, labels)

--- a/src/models/ocr_model/utils/ocr_aug.py
+++ b/src/models/ocr_model/utils/ocr_aug.py
@@ -3,12 +3,11 @@ import random

 def ocr_augmentation_pipeline():
    pre_phase = [
-        # Rescale(scale="optimal", target_dpi = 300,  p = 1.0),
    ]

    ink_phase = [
        InkColorSwap(
-            ink_swap_color="lhy_custom",
+            ink_swap_color="random",
            ink_swap_sequence_number_range=(5, 10),
            ink_swap_min_width_range=(2, 3),
            ink_swap_max_width_range=(100, 120),
@@ -16,7 +15,8 @@ def ocr_augmentation_pipeline():
            ink_swap_max_height_range=(100, 120),
            ink_swap_min_area_range=(10, 20),
            ink_swap_max_area_range=(400, 500),
-            p=0.2
+            # p=0.2
+            p=0.4
        ),
        LinesDegradation(
            line_roi=(0.0, 0.0, 1.0, 1.0),
@@ -28,7 +28,8 @@ def ocr_augmentation_pipeline():
            line_long_to_short_ratio=(5, 7),
            line_replacement_probability=(0.4, 0.5),
            line_replacement_thickness=(1, 3),
-            p=0.2
+            # p=0.2
+            p=0.4
        ),

        #  ============================
@@ -44,7 +45,8 @@ def ocr_augmentation_pipeline():
                    severity=(0.4, 0.6),
                ),
            ],
-            p=0.2
+            # p=0.2
+            p=0.4
        ),
        #  ============================

@@ -56,7 +58,8 @@ def ocr_augmentation_pipeline():
            blur_kernel_size=(5, 5),
            blur_sigma=0,
            noise_type="perlin",
-            p=0.2
+            # p=0.2
+            p=0.4
        ),
        #  ============================

@@ -68,12 +71,14 @@ def ocr_augmentation_pipeline():
            turbulence_range=(2, 5),
            texture_width_range=(300, 500),
            texture_height_range=(300, 500),
-            p=0.2
+            # p=0.2
+            p=0.4
        ),
        BrightnessTexturize(  # tested
            texturize_range=(0.9, 0.99),
            deviation=0.03,
-            p=0.2
+            # p=0.2
+            p=0.4
        )
    ]

@@ -84,7 +89,8 @@ def ocr_augmentation_pipeline():
            color_shift_iterations=(2, 3),
            color_shift_brightness_range=(0.9, 1.1),
            color_shift_gaussian_kernel_range=(3, 3),
-            p=0.2
+            # p=0.2
+            p=0.4
        ),

        DirtyDrum(  # tested
@@ -95,7 +101,8 @@ def ocr_augmentation_pipeline():
            noise_value=(64, 224),
            ksize=random.choice([(3, 3), (5, 5), (7, 7)]),
            sigmaX=0,
-            p=0.2
+            # p=0.2
+            p=0.4
        ),

        # =====================================
@@ -119,7 +126,8 @@ def ocr_augmentation_pipeline():
                    gamma_range=(0.9, 1.1),
                ),
            ],
-            p=0.2
+            # p=0.2
+            p=0.4
        ),
        # =====================================

@@ -130,10 +138,11 @@ def ocr_augmentation_pipeline():
                    subtle_range=random.randint(5, 10),
                ),
                Jpeg(
-                    quality_range=(85, 95),
+                    quality_range=(70, 95),
                ),
            ],
-            p=0.2
+            # p=0.2
+            p=0.4
        ),
        # =====================================
    ]
--- a/src/models/ocr_model/utils/to_katex.py
+++ b/src/models/ocr_model/utils/to_katex.py
@@ -0,0 +1,180 @@
+import re
+
+
+def change(input_str, old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l, new_surr_r):
+    result = ""
+    i = 0
+    n = len(input_str)
+    
+    while i < n:
+        if input_str[i:i+len(old_inst)] == old_inst:
+            # check if the old_inst is followed by old_surr_l
+            start = i + len(old_inst)
+        else:
+            result += input_str[i]
+            i += 1
+            continue
+
+        if start < n and input_str[start] == old_surr_l:
+            # found an old_inst followed by old_surr_l, now look for the matching old_surr_r
+            count = 1
+            j = start + 1
+            escaped = False
+            while j < n and count > 0:
+                if input_str[j] == '\\' and not escaped:
+                    escaped = True
+                    j += 1
+                    continue
+                if input_str[j] == old_surr_r and not escaped:
+                    count -= 1
+                    if count == 0:
+                        break
+                elif input_str[j] == old_surr_l and not escaped:
+                    count += 1
+                escaped = False
+                j += 1
+            
+            if count == 0:
+                assert j < n
+                assert input_str[start] == old_surr_l
+                assert input_str[j] == old_surr_r
+                inner_content = input_str[start + 1:j]
+                # Replace the content with new pattern
+                result += new_inst + new_surr_l + inner_content + new_surr_r
+                i = j + 1
+                continue
+            else:
+                assert count >= 1
+                assert j == n
+                print("Warning: unbalanced surrogate pair in input string")
+                result += new_inst + new_surr_l
+                i = start + 1
+                continue
+        else:
+            result += input_str[i:start]
+            i = start
+    
+    if old_inst != new_inst and (old_inst + old_surr_l) in result:
+        return change(result, old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l, new_surr_r)
+    else:
+        return result
+
+
+def find_substring_positions(string, substring):
+    positions = [match.start() for match in re.finditer(re.escape(substring), string)]
+    return positions
+
+
+def rm_dollar_surr(content):
+    pattern = re.compile(r'\\[a-zA-Z]+\$.*?\$|\$.*?\$')
+    matches = pattern.findall(content)
+    
+    for match in matches:
+        if not re.match(r'\\[a-zA-Z]+', match):
+            new_match = match.strip('$')
+            content = content.replace(match, ' ' + new_match + ' ')
+    
+    return content
+
+
+def change_all(input_str, old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l, new_surr_r):
+    pos = find_substring_positions(input_str, old_inst + old_surr_l)
+    res = list(input_str)
+    for p in pos[::-1]:
+        res[p:] = list(change(''.join(res[p:]), old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l, new_surr_r))
+    res = ''.join(res)
+    return res
+
+
+def to_katex(formula: str) -> str:
+    res = formula
+    # remove mbox surrounding
+    res = change_all(res, r'\mbox ', r' ', r'{', r'}', r'', r'')
+    res = change_all(res, r'\mbox', r' ', r'{', r'}', r'', r'')
+    # remove hbox surrounding
+    res = re.sub(r'\\hbox to ?-? ?\d+\.\d+(pt)?\{', r'\\hbox{', res)
+    res = change_all(res, r'\hbox', r' ', r'{', r'}', r'', r' ')
+    # remove raise surrounding
+    res = re.sub(r'\\raise ?-? ?\d+\.\d+(pt)?', r' ', res)
+    # remove makebox
+    res = re.sub(r'\\makebox ?\[\d+\.\d+(pt)?\]\{', r'\\makebox{', res)
+    res = change_all(res, r'\makebox', r' ', r'{', r'}', r'', r' ')
+    # remove vbox surrounding, scalebox surrounding
+    res = re.sub(r'\\raisebox\{-? ?\d+\.\d+(pt)?\}\{', r'\\raisebox{', res)
+    res = re.sub(r'\\scalebox\{-? ?\d+\.\d+(pt)?\}\{', r'\\scalebox{', res)
+    res = change_all(res, r'\scalebox', r' ', r'{', r'}', r'', r' ')
+    res = change_all(res, r'\raisebox', r' ', r'{', r'}', r'', r' ')
+    res = change_all(res, r'\vbox', r' ', r'{', r'}', r'', r' ')
+
+
+    origin_instructions = [
+        r'\Huge',
+        r'\huge',
+        r'\LARGE',
+        r'\Large',
+        r'\large',
+        r'\normalsize',
+        r'\small',
+        r'\footnotesize',
+        r'\tiny'
+    ]
+    for (old_ins, new_ins) in zip(origin_instructions, origin_instructions):
+        res = change_all(res, old_ins, new_ins, r'$', r'$', '{', '}')
+    res = change_all(res, r'\boldmath ', r'\bm', r'{', r'}', r'{', r'}')
+    res = change_all(res, r'\boldmath', r'\bm', r'{', r'}', r'{', r'}')
+    res = change_all(res, r'\boldmath ', r'\bm', r'$', r'$', r'{', r'}')
+    res = change_all(res, r'\boldmath', r'\bm', r'$', r'$', r'{', r'}')
+    res = change_all(res, r'\scriptsize', r'\scriptsize', r'$', r'$', r'{', r'}')
+    res = change_all(res, r'\emph', r'\textit', r'{', r'}', r'{', r'}')
+    res = change_all(res, r'\emph ', r'\textit', r'{', r'}', r'{', r'}')
+    
+    origin_instructions = [
+        r'\left',
+        r'\middle',
+        r'\right',
+        r'\big',
+        r'\Big',
+        r'\bigg',
+        r'\Bigg',
+        r'\bigl',
+        r'\Bigl',
+        r'\biggl',
+        r'\Biggl',
+        r'\bigm',
+        r'\Bigm',
+        r'\biggm',
+        r'\Biggm',
+        r'\bigr',
+        r'\Bigr',
+        r'\biggr',
+        r'\Biggr'
+    ]
+    for origin_ins in origin_instructions:
+        res = change_all(res, origin_ins, origin_ins, r'{', r'}', r'', r'')
+
+    res = re.sub(r'\\\[(.*?)\\\]', r'\1\\newline', res)
+
+    if res.endswith(r'\newline'):
+        res = res[:-8]
+
+    # remove multiple spaces
+    res = re.sub(r'(\\,){1,}', ' ', res)
+    res = re.sub(r'(\\!){1,}', ' ', res)
+    res = re.sub(r'(\\;){1,}', ' ', res)
+    res = re.sub(r'(\\:){1,}', ' ', res)
+    res = re.sub(r'\\vspace\{.*?}', '', res)
+
+    # merge consecutive text
+    def merge_texts(match):
+        texts = match.group(0)
+        merged_content = ''.join(re.findall(r'\\text\{([^}]*)\}', texts))
+        return f'\\text{{{merged_content}}}'
+    res = re.sub(r'(\\text\{[^}]*\}\s*){2,}', merge_texts, res)
+
+    res = res.replace(r'\bf ', '')
+    res = rm_dollar_surr(res)
+
+    # remove extra spaces (keeping only one)
+    res = re.sub(r' +', ' ', res)
+
+    return res.strip()
--- a/src/models/ocr_model/utils/transforms.py
+++ b/src/models/ocr_model/utils/transforms.py
@@ -4,13 +4,13 @@ import numpy as np
 import cv2

 from torchvision.transforms import v2
-from typing import List
+from typing import List, Union
 from PIL import Image
+from collections import Counter

 from ...globals import (
-    OCR_IMG_CHANNELS,
-    OCR_IMG_SIZE,
-    OCR_FIX_SIZE,
+    IMG_CHANNELS,
+    FIXED_IMG_SIZE,
    IMAGE_MEAN, IMAGE_STD,
    MAX_RESIZE_RATIO, MIN_RESIZE_RATIO
 )
@@ -20,58 +20,47 @@ from .ocr_aug import ocr_augmentation_pipeline
 train_pipeline = ocr_augmentation_pipeline()

 general_transform_pipeline = v2.Compose([
-    v2.ToImage(),    # Convert to tensor, only needed if you had a PIL image
-                     #+返回一个List of torchvision.Image，list的长度就是batch_size
-                     #+因此在整个Compose pipeline的最后，输出的也是一个List of torchvision.Image
-                     #+注意：不是返回一整个torchvision.Image，batch_size的维度是拿出来的
+    v2.ToImage(),    
    v2.ToDtype(torch.uint8, scale=True),  # optional, most input are already uint8 at this point
-    v2.Grayscale(),  # 转灰度图（视具体任务而定）
+    v2.Grayscale(),

-    v2.Resize(       # 固定resize到一个正方形上
-        size=OCR_IMG_SIZE - 1,  # size必须小于max_size 
+    v2.Resize(
+        size=FIXED_IMG_SIZE - 1,
        interpolation=v2.InterpolationMode.BICUBIC,
-        max_size=OCR_IMG_SIZE,
+        max_size=FIXED_IMG_SIZE,
        antialias=True
    ),

    v2.ToDtype(torch.float32, scale=True),  # Normalize expects float input
    v2.Normalize(mean=[IMAGE_MEAN], std=[IMAGE_STD]),

-    # v2.ToPILImage()  # 用于观察转换后的结果是否正确（debug用）
+    # v2.ToPILImage()
 ])


 def trim_white_border(image: np.ndarray):
-    # image是一个3维的ndarray，RGB格式，维度分布为[H, W, C]（通道维在第三维上）
-
-    # # 检查images中的第一个元素是否是嵌套的列表结构
-    # if isinstance(image, list):
-    #     image = np.array(image, dtype=np.uint8)
-
-    # 检查图像是否为RGB格式，同时检查通道维是不是在第三维上
    if len(image.shape) != 3 or image.shape[2] != 3:
        raise ValueError("Image is not in RGB format or channel is not in third dimension")

-    # 检查图片是否使用 uint8 类型
    if image.dtype != np.uint8:
        raise ValueError(f"Image should stored in uint8")

-    # 创建与原图像同样大小的纯白背景图像
+    corners = [tuple(image[0, 0]), tuple(image[0, -1]),
+               tuple(image[-1, 0]), tuple(image[-1, -1])]
+    bg_color = Counter(corners).most_common(1)[0][0]
+    bg_color_np = np.array(bg_color, dtype=np.uint8)
+    
    h, w = image.shape[:2]
-    bg = np.full((h, w, 3), 255, dtype=np.uint8)
+    bg = np.full((h, w, 3), bg_color_np, dtype=np.uint8)

-    # 计算差异
    diff = cv2.absdiff(image, bg)
+    mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)

-    # 只要差值大于1，就全部转化为255
-    _, diff = cv2.threshold(diff, 1, 255, cv2.THRESH_BINARY)
+    threshold = 15
+    _, diff = cv2.threshold(mask, threshold, 255, cv2.THRESH_BINARY)

-    # 把差值转灰度图
-    gray_diff = cv2.cvtColor(diff, cv2.COLOR_RGB2GRAY)
-    # 计算图像中非零像素点的最小外接矩阵
-    x, y, w, h = cv2.boundingRect(gray_diff) 
+    x, y, w, h = cv2.boundingRect(diff) 

-    # 裁剪图像
    trimmed_image = image[y:y+h, x:x+w]

    return trimmed_image
@@ -113,13 +102,6 @@ def random_resize(
    minr: float, 
    maxr: float
 ) -> List[np.ndarray]:
-    # np.ndarray的格式：3维，RGB格式，维度分布为[H, W, C]（通道维在第三维上）
-
-    # # 检查images中的第一个元素是否是嵌套的列表结构
-    # if isinstance(images[0], list):
-    #     # 将嵌套的列表结构转换为np.ndarray
-    #     images = [np.array(img, dtype=np.uint8) for img in images]
-
    if len(images[0].shape) != 3 or images[0].shape[2] != 3:
        raise ValueError("Image is not in RGB format or channel is not in third dimension")

@@ -157,24 +139,19 @@ def rotate(image: np.ndarray, min_angle: int, max_angle: int) -> np.ndarray:


 def ocr_aug(image: np.ndarray) -> np.ndarray:
-    # 20%的概率进行随机旋转
    if random.random() < 0.2:
        image = rotate(image, -5, 5)
-    # 增加白边
    image = add_white_border(image, max_size=25).permute(1, 2, 0).numpy()
-    # 数据增强
    image = train_pipeline(image)
    return image


 def train_transform(images: List[Image.Image]) -> List[torch.Tensor]:
-    assert OCR_IMG_CHANNELS == 1 , "Only support grayscale images for now"
-    assert OCR_FIX_SIZE == True, "Only support fixed size images for now"
+    assert IMG_CHANNELS == 1 , "Only support grayscale images for now"

    images = [np.array(img.convert('RGB')) for img in images]
    # random resize first
    images = random_resize(images, MIN_RESIZE_RATIO, MAX_RESIZE_RATIO)
-    # 裁剪掉白边
    images = [trim_white_border(image) for image in images]

    # OCR augmentation
@@ -183,39 +160,17 @@ def train_transform(images: List[Image.Image]) -> List[torch.Tensor]:
    # general transform pipeline
    images = [general_transform_pipeline(image) for image in  images]
    # padding to fixed size
-    images = padding(images, OCR_IMG_SIZE)
+    images = padding(images, FIXED_IMG_SIZE)
    return images


-def inference_transform(images: List[np.ndarray]) -> List[torch.Tensor]:
-    assert OCR_IMG_CHANNELS == 1 , "Only support grayscale images for now"
-    assert OCR_FIX_SIZE == True, "Only support fixed size images for now"
-    images = [np.array(img.convert('RGB')) for img in images]
-    # 裁剪掉白边
+def inference_transform(images: List[Union[np.ndarray, Image.Image]]) -> List[torch.Tensor]:
+    assert IMG_CHANNELS == 1 , "Only support grayscale images for now"
+    images = [np.array(img.convert('RGB')) if isinstance(img, Image.Image) else img for img in images]
    images = [trim_white_border(image) for image in images]
    # general transform pipeline
    images = [general_transform_pipeline(image) for image in  images]  # imgs: List[PIL.Image.Image]
    # padding to fixed size
-    images = padding(images, OCR_IMG_SIZE)
+    images = padding(images, FIXED_IMG_SIZE)

    return images
-
-
-if __name__ == '__main__':
-    from pathlib import Path
-    from .helpers import convert2rgb
-    base_dir = Path('/home/lhy/code/TeXify/src/models/ocr_model/model')
-    imgs_path = [
-        base_dir / '1.jpg',
-        base_dir / '2.jpg',
-        base_dir / '3.jpg',
-        base_dir / '4.jpg',
-        base_dir / '5.jpg',
-        base_dir / '6.jpg',
-        base_dir / '7.jpg',
-    ]
-    imgs_path = [str(img_path) for img_path in imgs_path]
-    imgs = convert2rgb(imgs_path)
-    res = random_resize(imgs, 0.5, 1.5)
-    pause = 1
-
--- a/src/models/resizer/inference.py
+++ b/src/models/resizer/inference.py
@@ -1,44 +0,0 @@
-#!/usr/bin/env python3
-import os
-import argparse
-import torch
-
-from pathlib import Path
-from PIL import Image
-from .model.Resizer import Resizer
-from .utils import preprocess_fn
-
-from munch import Munch
-
-
-def inference(args):
-    img = Image.open(args.image)
-    img = img.convert('RGB') if img.format == 'PNG' else img
-    processed_img = preprocess_fn({"pixel_values": [img]})
-
-    ckt_path = Path(args.checkpoint).resolve()
-    model = Resizer.from_pretrained(ckt_path)
-    model.eval()
-    inpu = torch.stack(processed_img['pixel_values'])
-    pred = model(inpu) * 1.25
-    print(pred)
-
-    ...
-
-
-if __name__ == "__main__":
-    cur_dirpath = os.getcwd()
-    script_dirpath = Path(__file__).resolve().parent
-    os.chdir(script_dirpath)
-
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-img', '--image', type=str, required=True)
-    parser.add_argument('-ckt', '--checkpoint', type=str, required=True)
-
-    args = parser.parse_args([
-        '-img', '/home/lhy/code/TeXify/src/models/resizer/foo5_140h.jpg',
-        '-ckt', '/home/lhy/code/TeXify/src/models/resizer/train/train_result_pred_height_v5'
-    ])
-    inference(args)
-
-    os.chdir(cur_dirpath)
--- a/src/models/resizer/model/Resizer.py
+++ b/src/models/resizer/model/Resizer.py
@@ -1,5 +0,0 @@
-from transformers import ResNetForImageClassification
-
-class Resizer(ResNetForImageClassification):
-    def __init__(self, config):
-        super().__init__(config)
--- a/src/models/resizer/train/train.py
+++ b/src/models/resizer/train/train.py
@@ -1,122 +0,0 @@
-import os
-import datasets
-
-from pathlib import Path
-from transformers import (
-    ResNetConfig,
-    TrainingArguments,
-    Trainer
-)
-
-from ..utils import preprocess_fn
-from ..model.Resizer import Resizer
-from ...globals import NUM_CHANNELS, NUM_CLASSES, RESIZER_IMG_SIZE
-
-
-def train():
-    cur_dirpath = os.getcwd()
-    script_dirpath = Path(__file__).resolve().parent
-    os.chdir(script_dirpath)
-
-    data = datasets.load_dataset("./dataset").shuffle(seed=42)
-    data = data.rename_column("images", "pixel_values")
-    data.flatten_indices()
-    data = data.with_transform(preprocess_fn)
-    train_data, test_data = data['train'], data['test']
-
-    config = ResNetConfig(
-        num_channels=NUM_CHANNELS,
-        num_labels=NUM_CLASSES,
-        img_size=RESIZER_IMG_SIZE
-    )
-    model = Resizer(config)
-    model = Resizer.from_pretrained("/home/lhy/code/TeXify/src/models/resizer/train/train_result_pred_height_v4/checkpoint-213000")
-
-    training_args = TrainingArguments(
-        # resume_from_checkpoint="/home/lhy/code/TeXify/src/models/resizer/train/train_result_pred_height_v3/checkpoint-94500",
-        max_grad_norm=1.0,
-        # use_cpu=True,
-        seed=42,                            # 随机种子，用于确保实验的可重复性
-        # data_seed=42,                     # data sampler的采样也固定
-        # full_determinism=True,            # 使整个训练完全固定（这个设置会有害于模型训练，只用于debug）
-
-        output_dir='./train_result_pred_height_v5',        # 输出目录
-        overwrite_output_dir=False,         # 如果输出目录存在，不删除原先的内容
-        report_to=["tensorboard"],          # 输出日志到TensorBoard，
-                                            #+通过在命令行：tensorboard --logdir ./logs 来查看日志
-
-        logging_dir=None,               # TensorBoard日志文件的存储目录
-        log_level="info",
-        logging_strategy="steps",           # 每隔一定步数记录一次日志
-        logging_steps=500,                  # 记录日志的步数间隔
-        logging_nan_inf_filter=False,       # 对loss=nan或inf进行记录
-
-        num_train_epochs=50,                 # 总的训练轮数
-        # max_steps=3,                      # 训练的最大步骤数。如果设置了这个参数，
-                                            #+那么num_train_epochs将被忽略（通常用于调试）
-
-        # label_names = ['your_label_name'],    # 指定data_loader中的标签名，如果不指定则默认为'labels'
-
-        per_device_train_batch_size=55,     # 每个GPU的batch size
-        per_device_eval_batch_size=48*2,      # 每个GPU的evaluation batch size
-        auto_find_batch_size=False,         # 自动搜索合适的batch size（指数decay）
-
-        optim = 'adamw_torch',              # 还提供了很多AdamW的变体（相较于经典的AdamW更加高效）
-                                            #+当设置了optim后，就不需要在Trainer中传入optimizer
-        lr_scheduler_type="cosine",         # 设置lr_scheduler
-        warmup_ratio=0.1,                   # warmup占整个训练steps的比例
-        # warmup_steps=500,                 # 预热步数
-        weight_decay=0,                     # 权重衰减
-        learning_rate=5e-5,                 # 学习率
-        fp16=False,                         # 是否使用16位浮点数进行训练
-        gradient_accumulation_steps=1,      # 梯度累积步数，当batch size无法开很大时，可以考虑这个参数来实现大batch size的效果
-        gradient_checkpointing=False,       # 当为True时，会在forward时适当丢弃一些中间量（用于backward），从而减轻显存压力（但会增加forward的时间）
-        label_smoothing_factor=0.0,         # softlabel，等于0时表示未开启
-        # debug='underflow_overflow',       # 训练时检查溢出，如果发生，则会发出警告。（该模式通常用于debug）
-        torch_compile=True,                # 是否使用torch.compile来编译模型（从而获得更好的训练和推理性能）
-                                            #+ 要求torch > 2.0，并且这个功能现在还不是很稳定
-        # deepspeed='your_json_path',       #  使用deepspeed来训练，需要指定ds_config.json的路径
-                                            #+ 在Trainer中使用Deepspeed时一定要注意ds_config.json中的配置是否与Trainer的一致（如学习率，batch size，梯度累积步数等）
-                                            #+ 如果不一致，会出现很奇怪的bug（而且一般还很难发现）													
-
-        dataloader_pin_memory=True,         # 可以加快数据在cpu和gpu之间转移的速度
-        dataloader_num_workers=16,           # 默认不会使用多进程来加载数据
-        dataloader_drop_last=True,          # 丢掉最后一个minibatch
-
-        evaluation_strategy="steps",        # 评估策略，可以是"steps"或"epoch"
-        eval_steps=500,                       # if evaluation_strategy="step"
-        # eval_steps=10,                     # if evaluation_strategy="step"
-
-        save_strategy="steps",              # 保存checkpoint的策略
-        save_steps=1500,                    # 模型保存的步数间隔
-        save_total_limit=5,                 # 保存的模型的最大数量。如果超过这个数量，最旧的模型将被删除
-
-        load_best_model_at_end=True,        # 训练结束时是否加载最佳模型
-        metric_for_best_model="eval_loss",  # 用于选择最佳模型的指标
-        greater_is_better=False,            # 指标值越小越好
-
-        do_train=True,                      # 是否进行训练，通常用于调试
-        do_eval=True,                       # 是否进行评估，通常用于调试
-
-        remove_unused_columns=True,         # 是否删除没有用到的列（特征），默认为True
-                                            #+当删除了没用到的列后，making it easier to unpack inputs into the model’s call function
-
-        push_to_hub=False,                  # 是否训练完后上传hub，需要先在命令行：huggingface-cli login进行登录认证的配置，配置完后，认证信息会存到cache文件夹里
-        hub_model_id="a_different_name",    # 模型的名字
-                                            #+每次保存模型时，都会上传到hub，
-                                            #+训练完后，记得trainer.push_to_hub()，会将模型使用的参数以及验证集上的结果传到hub上 
-    )
-
-    trainer = Trainer(
-        model,
-        training_args,
-        train_dataset=train_data,
-        eval_dataset=test_data,
-    )
-    trainer.train()
-
-    os.chdir(cur_dirpath)
-
-
-if __name__ == '__main__':
-    train()
--- a/src/models/resizer/utils/init.py
+++ b/src/models/resizer/utils/init.py
@@ -1 +0,0 @@
-from .preprocess import preprocess_fn
--- a/src/models/resizer/utils/preprocess.py
+++ b/src/models/resizer/utils/preprocess.py
@@ -1,75 +0,0 @@
-import torch
-from torchvision.transforms import v2
-
-from PIL import Image, ImageChops
-from ...globals import (
-    IMAGE_MEAN, IMAGE_STD, 
-    LABEL_RATIO,
-    RESIZER_IMG_SIZE,
-    NUM_CHANNELS
-)
-
-from typing import (
-    Any,
-    List,
-    Dict,
-)
-
-
-def trim_white_border(image: Image):
-    if image.mode == 'RGB':
-        bg_color = (255, 255, 255)
-    elif image.mode == 'RGBA':
-        bg_color = (255, 255, 255, 255)
-    elif image.mode == 'L':
-        bg_color = 255
-    else:
-        raise ValueError("Unsupported image mode")
-    bg = Image.new(image.mode, image.size, bg_color)
-    diff = ImageChops.difference(image, bg)
-    diff = ImageChops.add(diff, diff, 2.0, -100)
-    bbox = diff.getbbox()
-    if bbox:
-        return image.crop(bbox)
-
-
-def preprocess_fn(samples: Dict[str, List[Any]]) -> Dict[str, List[Any]]:
-    imgs = samples['pixel_values']
-    imgs = [trim_white_border(img) for img in imgs]
-    labels = [float(img.height * LABEL_RATIO) for img in imgs]
-
-    assert NUM_CHANNELS == 1, "Only support grayscale images"
-    transform = v2.Compose([
-        v2.ToImage(),
-        v2.ToDtype(torch.uint8, scale=True),
-        v2.Grayscale(),
-        v2.Resize(
-            size=RESIZER_IMG_SIZE - 1,  # size必须小于max_size 
-            interpolation=v2.InterpolationMode.BICUBIC,
-            max_size=RESIZER_IMG_SIZE,
-            antialias=True
-        ),
-        v2.ToDtype(torch.float32, scale=True),
-        v2.Normalize(mean=[IMAGE_MEAN], std=[IMAGE_STD]),
-    ])
-    imgs = transform(imgs)
-    imgs = [
-        v2.functional.pad(
-            img,
-            padding=[0, 0, RESIZER_IMG_SIZE - img.shape[2], RESIZER_IMG_SIZE - img.shape[1]]
-        )
-        for img in imgs
-    ]
-
-    res = {'pixel_values': imgs, 'labels': labels}
-    return res
-
-
-if __name__ == "__main__":  # unit test
-    import datasets
-    data = datasets.load_dataset("/home/lhy/code/TeXify/src/models/resizer/train/dataset/dataset.py").shuffle(seed=42)
-    data = data.with_transform(preprocess_fn)
-    train_data, test_data = data['train'], data['test']
-
-    inpu = train_data[:10]
-    pause = 1
--- a/src/models/thrid_party/paddleocr/checkpoints/det/default_model.onnx
+++ b/src/models/thrid_party/paddleocr/checkpoints/det/default_model.onnx
--- a/src/models/thrid_party/paddleocr/checkpoints/rec/default_model.onnx
+++ b/src/models/thrid_party/paddleocr/checkpoints/rec/default_model.onnx
--- a/src/models/thrid_party/paddleocr/infer/CTCLabelDecode.py
+++ b/src/models/thrid_party/paddleocr/infer/CTCLabelDecode.py
@@ -0,0 +1,215 @@
+import re
+import numpy as np
+import os
+from pathlib import Path
+
+
+class BaseRecLabelDecode(object):
+    """Convert between text-label and text-index"""
+
+    def __init__(self, character_dict_path=None, use_space_char=False):
+        cur_path = os.getcwd()
+        scriptDir = Path(__file__).resolve().parent
+        os.chdir(scriptDir)
+        character_dict_path = str(Path(scriptDir / "ppocr_keys_v1.txt"))
+
+        self.beg_str = "sos"
+        self.end_str = "eos"
+        self.reverse = False
+        self.character_str = []
+
+        if character_dict_path is None:
+            self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+            dict_character = list(self.character_str)
+        else:
+            with open(character_dict_path, "rb") as fin:
+                lines = fin.readlines()
+                for line in lines:
+                    line = line.decode("utf-8").strip("\n").strip("\r\n")
+                    self.character_str.append(line)
+            if use_space_char:
+                self.character_str.append(" ")
+            dict_character = list(self.character_str)
+            if "arabic" in character_dict_path:
+                self.reverse = True
+
+        dict_character = self.add_special_char(dict_character)
+        self.dict = {}
+        for i, char in enumerate(dict_character):
+            self.dict[char] = i
+        self.character = dict_character
+        os.chdir(cur_path)
+
+    def pred_reverse(self, pred):
+        pred_re = []
+        c_current = ""
+        for c in pred:
+            if not bool(re.search("[a-zA-Z0-9 :*./%+-]", c)):
+                if c_current != "":
+                    pred_re.append(c_current)
+                pred_re.append(c)
+                c_current = ""
+            else:
+                c_current += c
+        if c_current != "":
+            pred_re.append(c_current)
+
+        return "".join(pred_re[::-1])
+
+    def add_special_char(self, dict_character):
+        return dict_character
+
+    def get_word_info(self, text, selection):
+        """
+        Group the decoded characters and record the corresponding decoded positions.
+
+        Args:
+            text: the decoded text
+            selection: the bool array that identifies which columns of features are decoded as non-separated characters
+        Returns:
+            word_list: list of the grouped words
+            word_col_list: list of decoding positions corresponding to each character in the grouped word
+            state_list: list of marker to identify the type of grouping words, including two types of grouping words:
+                        - 'cn': continous chinese characters (e.g., 你好啊)
+                        - 'en&num': continous english characters (e.g., hello), number (e.g., 123, 1.123), or mixed of them connected by '-' (e.g., VGG-16)
+                        The remaining characters in text are treated as separators between groups (e.g., space, '(', ')', etc.).
+        """
+        state = None
+        word_content = []
+        word_col_content = []
+        word_list = []
+        word_col_list = []
+        state_list = []
+        valid_col = np.where(selection == True)[0]
+
+        for c_i, char in enumerate(text):
+            if "\u4e00" <= char <= "\u9fff":
+                c_state = "cn"
+            elif bool(re.search("[a-zA-Z0-9]", char)):
+                c_state = "en&num"
+            else:
+                c_state = "splitter"
+
+            if (
+                char == "."
+                and state == "en&num"
+                and c_i + 1 < len(text)
+                and bool(re.search("[0-9]", text[c_i + 1]))
+            ):  # grouping floting number
+                c_state = "en&num"
+            if (
+                char == "-" and state == "en&num"
+            ):  # grouping word with '-', such as 'state-of-the-art'
+                c_state = "en&num"
+
+            if state == None:
+                state = c_state
+
+            if state != c_state:
+                if len(word_content) != 0:
+                    word_list.append(word_content)
+                    word_col_list.append(word_col_content)
+                    state_list.append(state)
+                    word_content = []
+                    word_col_content = []
+                state = c_state
+
+            if state != "splitter":
+                word_content.append(char)
+                word_col_content.append(valid_col[c_i])
+
+        if len(word_content) != 0:
+            word_list.append(word_content)
+            word_col_list.append(word_col_content)
+            state_list.append(state)
+
+        return word_list, word_col_list, state_list
+
+    def decode(
+        self,
+        text_index,
+        text_prob=None,
+        is_remove_duplicate=False,
+        return_word_box=False,
+    ):
+        """convert text-index into text-label."""
+        result_list = []
+        ignored_tokens = self.get_ignored_tokens()
+        batch_size = len(text_index)
+        for batch_idx in range(batch_size):
+            selection = np.ones(len(text_index[batch_idx]), dtype=bool)
+            if is_remove_duplicate:
+                selection[1:] = text_index[batch_idx][1:] != text_index[batch_idx][:-1]
+            for ignored_token in ignored_tokens:
+                selection &= text_index[batch_idx] != ignored_token
+
+            char_list = [
+                self.character[text_id] for text_id in text_index[batch_idx][selection]
+            ]
+            if text_prob is not None:
+                conf_list = text_prob[batch_idx][selection]
+            else:
+                conf_list = [1] * len(selection)
+            if len(conf_list) == 0:
+                conf_list = [0]
+
+            text = "".join(char_list)
+
+            if self.reverse:  # for arabic rec
+                text = self.pred_reverse(text)
+
+            if return_word_box:
+                word_list, word_col_list, state_list = self.get_word_info(
+                    text, selection
+                )
+                result_list.append(
+                    (
+                        text,
+                        np.mean(conf_list).tolist(),
+                        [
+                            len(text_index[batch_idx]),
+                            word_list,
+                            word_col_list,
+                            state_list,
+                        ],
+                    )
+                )
+            else:
+                result_list.append((text, np.mean(conf_list).tolist()))
+        return result_list
+
+    def get_ignored_tokens(self):
+        return [0]  # for ctc blank
+
+
+class CTCLabelDecode(BaseRecLabelDecode):
+    """Convert between text-label and text-index"""
+
+    def __init__(self, character_dict_path=None, use_space_char=False, **kwargs):
+        super(CTCLabelDecode, self).__init__(character_dict_path, use_space_char)
+
+    def __call__(self, preds, label=None, return_word_box=False, *args, **kwargs):
+        if isinstance(preds, tuple) or isinstance(preds, list):
+            preds = preds[-1]
+        assert isinstance(preds, np.ndarray)
+        preds_idx = preds.argmax(axis=2)
+        preds_prob = preds.max(axis=2)
+        text = self.decode(
+            preds_idx,
+            preds_prob,
+            is_remove_duplicate=True,
+            return_word_box=return_word_box,
+        )
+        if return_word_box:
+            for rec_idx, rec in enumerate(text):
+                wh_ratio = kwargs["wh_ratio_list"][rec_idx]
+                max_wh_ratio = kwargs["max_wh_ratio"]
+                rec[2][0] = rec[2][0] * (wh_ratio / max_wh_ratio)
+        if label is None:
+            return text
+        label = self.decode(label)
+        return text, label
+
+    def add_special_char(self, dict_character):
+        dict_character = ["blank"] + dict_character
+        return dict_character
--- a/src/models/thrid_party/paddleocr/infer/DBPostProcess.py
+++ b/src/models/thrid_party/paddleocr/infer/DBPostProcess.py
@@ -0,0 +1,229 @@
+import numpy as np
+import cv2
+
+from shapely.geometry import Polygon
+import pyclipper
+
+
+class DBPostProcess(object):
+    """
+    The post process for Differentiable Binarization (DB).
+    """
+
+    def __init__(
+        self,
+        thresh=0.3,
+        box_thresh=0.7,
+        max_candidates=1000,
+        unclip_ratio=2.0,
+        use_dilation=False,
+        score_mode="fast",
+        box_type="quad",
+        **kwargs
+    ):
+        self.thresh = thresh
+        self.box_thresh = box_thresh
+        self.max_candidates = max_candidates
+        self.unclip_ratio = unclip_ratio
+        self.min_size = 3
+        self.score_mode = score_mode
+        self.box_type = box_type
+        assert score_mode in [
+            "slow",
+            "fast",
+        ], "Score mode must be in [slow, fast] but got: {}".format(score_mode)
+
+        self.dilation_kernel = None if not use_dilation else np.array([[1, 1], [1, 1]])
+
+    def polygons_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
+        """
+        _bitmap: single map with shape (1, H, W),
+            whose values are binarized as {0, 1}
+        """
+
+        bitmap = _bitmap
+        height, width = bitmap.shape
+
+        boxes = []
+        scores = []
+
+        contours, _ = cv2.findContours(
+            (bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE
+        )
+
+        for contour in contours[: self.max_candidates]:
+            epsilon = 0.002 * cv2.arcLength(contour, True)
+            approx = cv2.approxPolyDP(contour, epsilon, True)
+            points = approx.reshape((-1, 2))
+            if points.shape[0] < 4:
+                continue
+
+            score = self.box_score_fast(pred, points.reshape(-1, 2))
+            if self.box_thresh > score:
+                continue
+
+            if points.shape[0] > 2:
+                box = self.unclip(points, self.unclip_ratio)
+                if len(box) > 1:
+                    continue
+            else:
+                continue
+            box = box.reshape(-1, 2)
+
+            _, sside = self.get_mini_boxes(box.reshape((-1, 1, 2)))
+            if sside < self.min_size + 2:
+                continue
+
+            box = np.array(box)
+            box[:, 0] = np.clip(np.round(box[:, 0] / width * dest_width), 0, dest_width)
+            box[:, 1] = np.clip(
+                np.round(box[:, 1] / height * dest_height), 0, dest_height
+            )
+            boxes.append(box.tolist())
+            scores.append(score)
+        return boxes, scores
+
+    def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
+        """
+        _bitmap: single map with shape (1, H, W),
+                whose values are binarized as {0, 1}
+        """
+
+        bitmap = _bitmap
+        height, width = bitmap.shape
+
+        outs = cv2.findContours(
+            (bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE
+        )
+        if len(outs) == 3:
+            img, contours, _ = outs[0], outs[1], outs[2]
+        elif len(outs) == 2:
+            contours, _ = outs[0], outs[1]
+
+        num_contours = min(len(contours), self.max_candidates)
+
+        boxes = []
+        scores = []
+        for index in range(num_contours):
+            contour = contours[index]
+            points, sside = self.get_mini_boxes(contour)
+            if sside < self.min_size:
+                continue
+            points = np.array(points)
+            if self.score_mode == "fast":
+                score = self.box_score_fast(pred, points.reshape(-1, 2))
+            else:
+                score = self.box_score_slow(pred, contour)
+            if self.box_thresh > score:
+                continue
+
+            box = self.unclip(points, self.unclip_ratio).reshape(-1, 1, 2)
+            box, sside = self.get_mini_boxes(box)
+            if sside < self.min_size + 2:
+                continue
+            box = np.array(box)
+
+            box[:, 0] = np.clip(np.round(box[:, 0] / width * dest_width), 0, dest_width)
+            box[:, 1] = np.clip(
+                np.round(box[:, 1] / height * dest_height), 0, dest_height
+            )
+            boxes.append(box.astype("int32"))
+            scores.append(score)
+        return np.array(boxes, dtype="int32"), scores
+
+    def unclip(self, box, unclip_ratio):
+        poly = Polygon(box)
+        distance = poly.area * unclip_ratio / poly.length
+        offset = pyclipper.PyclipperOffset()
+        offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
+        expanded = np.array(offset.Execute(distance))
+        return expanded
+
+    def get_mini_boxes(self, contour):
+        bounding_box = cv2.minAreaRect(contour)
+        points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
+
+        index_1, index_2, index_3, index_4 = 0, 1, 2, 3
+        if points[1][1] > points[0][1]:
+            index_1 = 0
+            index_4 = 1
+        else:
+            index_1 = 1
+            index_4 = 0
+        if points[3][1] > points[2][1]:
+            index_2 = 2
+            index_3 = 3
+        else:
+            index_2 = 3
+            index_3 = 2
+
+        box = [points[index_1], points[index_2], points[index_3], points[index_4]]
+        return box, min(bounding_box[1])
+
+    def box_score_fast(self, bitmap, _box):
+        """
+        box_score_fast: use bbox mean score as the mean score
+        """
+        h, w = bitmap.shape[:2]
+        box = _box.copy()
+        xmin = np.clip(np.floor(box[:, 0].min()).astype("int32"), 0, w - 1)
+        xmax = np.clip(np.ceil(box[:, 0].max()).astype("int32"), 0, w - 1)
+        ymin = np.clip(np.floor(box[:, 1].min()).astype("int32"), 0, h - 1)
+        ymax = np.clip(np.ceil(box[:, 1].max()).astype("int32"), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+        box[:, 0] = box[:, 0] - xmin
+        box[:, 1] = box[:, 1] - ymin
+        cv2.fillPoly(mask, box.reshape(1, -1, 2).astype("int32"), 1)
+        return cv2.mean(bitmap[ymin : ymax + 1, xmin : xmax + 1], mask)[0]
+
+    def box_score_slow(self, bitmap, contour):
+        """
+        box_score_slow: use polyon mean score as the mean score
+        """
+        h, w = bitmap.shape[:2]
+        contour = contour.copy()
+        contour = np.reshape(contour, (-1, 2))
+
+        xmin = np.clip(np.min(contour[:, 0]), 0, w - 1)
+        xmax = np.clip(np.max(contour[:, 0]), 0, w - 1)
+        ymin = np.clip(np.min(contour[:, 1]), 0, h - 1)
+        ymax = np.clip(np.max(contour[:, 1]), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+
+        contour[:, 0] = contour[:, 0] - xmin
+        contour[:, 1] = contour[:, 1] - ymin
+
+        cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype("int32"), 1)
+        return cv2.mean(bitmap[ymin : ymax + 1, xmin : xmax + 1], mask)[0]
+
+    def __call__(self, outs_dict, shape_list):
+        pred = outs_dict["maps"]
+        assert isinstance(pred, np.ndarray)
+        pred = pred[:, 0, :, :]
+        segmentation = pred > self.thresh
+
+        boxes_batch = []
+        for batch_index in range(pred.shape[0]):
+            src_h, src_w, ratio_h, ratio_w = shape_list[batch_index]
+            if self.dilation_kernel is not None:
+                mask = cv2.dilate(
+                    np.array(segmentation[batch_index]).astype(np.uint8),
+                    self.dilation_kernel,
+                )
+            else:
+                mask = segmentation[batch_index]
+            if self.box_type == "poly":
+                boxes, scores = self.polygons_from_bitmap(
+                    pred[batch_index], mask, src_w, src_h
+                )
+            elif self.box_type == "quad":
+                boxes, scores = self.boxes_from_bitmap(
+                    pred[batch_index], mask, src_w, src_h
+                )
+            else:
+                raise ValueError("box_type can only be one of ['quad', 'poly']")
+
+            boxes_batch.append({"points": boxes})
+        return boxes_batch
--- a/src/models/thrid_party/paddleocr/infer/operators.py
+++ b/src/models/thrid_party/paddleocr/infer/operators.py
@@ -0,0 +1,186 @@
+import numpy as np
+import cv2
+import math
+import sys
+
+
+class DetResizeForTest(object):
+    def __init__(self, **kwargs):
+        super(DetResizeForTest, self).__init__()
+        self.resize_type = 0
+        self.keep_ratio = False
+        if "image_shape" in kwargs:
+            self.image_shape = kwargs["image_shape"]
+            self.resize_type = 1
+            if "keep_ratio" in kwargs:
+                self.keep_ratio = kwargs["keep_ratio"]
+        elif "limit_side_len" in kwargs:
+            self.limit_side_len = kwargs["limit_side_len"]
+            self.limit_type = kwargs.get("limit_type", "min")
+        elif "resize_long" in kwargs:
+            self.resize_type = 2
+            self.resize_long = kwargs.get("resize_long", 960)
+        else:
+            self.limit_side_len = 736
+            self.limit_type = "min"
+
+    def __call__(self, data):
+        img = data["image"]
+        src_h, src_w, _ = img.shape
+        if sum([src_h, src_w]) < 64:
+            img = self.image_padding(img)
+
+        if self.resize_type == 0:
+            # img, shape = self.resize_image_type0(img)
+            img, [ratio_h, ratio_w] = self.resize_image_type0(img)
+        elif self.resize_type == 2:
+            img, [ratio_h, ratio_w] = self.resize_image_type2(img)
+        else:
+            # img, shape = self.resize_image_type1(img)
+            img, [ratio_h, ratio_w] = self.resize_image_type1(img)
+        data["image"] = img
+        data["shape"] = np.array([src_h, src_w, ratio_h, ratio_w])
+        return data
+
+    def image_padding(self, im, value=0):
+        h, w, c = im.shape
+        im_pad = np.zeros((max(32, h), max(32, w), c), np.uint8) + value
+        im_pad[:h, :w, :] = im
+        return im_pad
+
+    def resize_image_type1(self, img):
+        resize_h, resize_w = self.image_shape
+        ori_h, ori_w = img.shape[:2]  # (h, w, c)
+        if self.keep_ratio is True:
+            resize_w = ori_w * resize_h / ori_h
+            N = math.ceil(resize_w / 32)
+            resize_w = N * 32
+        ratio_h = float(resize_h) / ori_h
+        ratio_w = float(resize_w) / ori_w
+        img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        # return img, np.array([ori_h, ori_w])
+        return img, [ratio_h, ratio_w]
+
+    def resize_image_type0(self, img):
+        """
+        resize image to a size multiple of 32 which is required by the network
+        args:
+            img(array): array with shape [h, w, c]
+        return(tuple):
+            img, (ratio_h, ratio_w)
+        """
+        limit_side_len = self.limit_side_len
+        h, w, c = img.shape
+
+        # limit the max side
+        if self.limit_type == "max":
+            if max(h, w) > limit_side_len:
+                if h > w:
+                    ratio = float(limit_side_len) / h
+                else:
+                    ratio = float(limit_side_len) / w
+            else:
+                ratio = 1.0
+        elif self.limit_type == "min":
+            if min(h, w) < limit_side_len:
+                if h < w:
+                    ratio = float(limit_side_len) / h
+                else:
+                    ratio = float(limit_side_len) / w
+            else:
+                ratio = 1.0
+        elif self.limit_type == "resize_long":
+            ratio = float(limit_side_len) / max(h, w)
+        else:
+            raise Exception("not support limit type, image ")
+        resize_h = int(h * ratio)
+        resize_w = int(w * ratio)
+
+        resize_h = max(int(round(resize_h / 32) * 32), 32)
+        resize_w = max(int(round(resize_w / 32) * 32), 32)
+
+        try:
+            if int(resize_w) <= 0 or int(resize_h) <= 0:
+                return None, (None, None)
+            img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        except:
+            print(img.shape, resize_w, resize_h)
+            sys.exit(0)
+        ratio_h = resize_h / float(h)
+        ratio_w = resize_w / float(w)
+        return img, [ratio_h, ratio_w]
+
+    def resize_image_type2(self, img):
+        h, w, _ = img.shape
+
+        resize_w = w
+        resize_h = h
+
+        if resize_h > resize_w:
+            ratio = float(self.resize_long) / resize_h
+        else:
+            ratio = float(self.resize_long) / resize_w
+
+        resize_h = int(resize_h * ratio)
+        resize_w = int(resize_w * ratio)
+
+        max_stride = 128
+        resize_h = (resize_h + max_stride - 1) // max_stride * max_stride
+        resize_w = (resize_w + max_stride - 1) // max_stride * max_stride
+        img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        ratio_h = resize_h / float(h)
+        ratio_w = resize_w / float(w)
+
+        return img, [ratio_h, ratio_w]
+
+
+class NormalizeImage(object):
+    """normalize image such as substract mean, divide std"""
+
+    def __init__(self, scale=None, mean=None, std=None, order="chw", **kwargs):
+        if isinstance(scale, str):
+            scale = eval(scale)
+        self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+        mean = mean if mean is not None else [0.485, 0.456, 0.406]
+        std = std if std is not None else [0.229, 0.224, 0.225]
+
+        shape = (3, 1, 1) if order == "chw" else (1, 1, 3)
+        self.mean = np.array(mean).reshape(shape).astype("float32")
+        self.std = np.array(std).reshape(shape).astype("float32")
+
+    def __call__(self, data):
+        img = data["image"]
+        from PIL import Image
+
+        if isinstance(img, Image.Image):
+            img = np.array(img)
+        assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+        data["image"] = (img.astype("float32") * self.scale - self.mean) / self.std
+        return data
+
+
+class ToCHWImage(object):
+    """convert hwc image to chw image"""
+
+    def __init__(self, **kwargs):
+        pass
+
+    def __call__(self, data):
+        img = data["image"]
+        from PIL import Image
+
+        if isinstance(img, Image.Image):
+            img = np.array(img)
+        data["image"] = img.transpose((2, 0, 1))
+        return data
+
+
+class KeepKeys(object):
+    def __init__(self, keep_keys, **kwargs):
+        self.keep_keys = keep_keys
+
+    def __call__(self, data):
+        data_list = []
+        for key in self.keep_keys:
+            data_list.append(data[key])
+        return data_list
--- a/src/models/thrid_party/paddleocr/infer/ppocr_keys_v1.txt
+++ b/src/models/thrid_party/paddleocr/infer/ppocr_keys_v1.txt
--- a/src/models/thrid_party/paddleocr/infer/predict_det.py
+++ b/src/models/thrid_party/paddleocr/infer/predict_det.py
@@ -0,0 +1,298 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+
+__dir__ = os.path.dirname(os.path.abspath(__file__))
+sys.path.append(__dir__)
+sys.path.insert(0, os.path.abspath(os.path.join(__dir__, "../..")))
+
+os.environ["FLAGS_allocator_strategy"] = "auto_growth"
+
+import cv2
+import numpy as np
+import time
+import sys
+
+# import tools.infer.utility as utility
+import utility
+from utility import get_logger
+
+from DBPostProcess import DBPostProcess
+from operators import DetResizeForTest, KeepKeys, NormalizeImage, ToCHWImage
+
+
+def transform(data, ops=None):
+    """transform"""
+    if ops is None:
+        ops = []
+    for op in ops:
+        data = op(data)
+        if data is None:
+            return None
+    return data
+
+logger = get_logger()
+
+
+class TextDetector(object):
+    def __init__(self, args):
+        self.args = args
+        self.det_algorithm = args.det_algorithm
+        self.use_onnx = args.use_onnx
+        postprocess_params = {}
+        assert self.det_algorithm == "DB"
+        postprocess_params["name"] = "DBPostProcess"
+        postprocess_params["thresh"] = args.det_db_thresh
+        postprocess_params["box_thresh"] = args.det_db_box_thresh
+        postprocess_params["max_candidates"] = 1000
+        postprocess_params["unclip_ratio"] = args.det_db_unclip_ratio
+        postprocess_params["use_dilation"] = args.use_dilation
+        postprocess_params["score_mode"] = args.det_db_score_mode
+        postprocess_params["box_type"] = args.det_box_type
+
+        self.preprocess_op = [
+            DetResizeForTest(limit_side_len=args.det_limit_side_len, limit_type=args.det_limit_type),
+            NormalizeImage(std= [0.229, 0.224, 0.225], mean= [0.485, 0.456, 0.406], scale= 1./255., order= "hwc"),
+            ToCHWImage(),
+            KeepKeys(keep_keys= ["image", "shape"])
+        ]
+        self.postprocess_op = DBPostProcess(**postprocess_params)
+        (
+            self.predictor,
+            self.input_tensor,
+            self.output_tensors,
+            self.config,
+        ) = utility.create_predictor(args, "det", logger)
+
+        assert self.use_onnx
+        if self.use_onnx:
+            img_h, img_w = self.input_tensor.shape[2:]
+            if isinstance(img_h, str) or isinstance(img_w, str):
+                pass
+            elif img_h is not None and img_w is not None and img_h > 0 and img_w > 0:
+                self.preprocess_op[0] = DetResizeForTest(image_shape=[img_h, img_w])
+
+
+    def order_points_clockwise(self, pts):
+        rect = np.zeros((4, 2), dtype="float32")
+        s = pts.sum(axis=1)
+        rect[0] = pts[np.argmin(s)]
+        rect[2] = pts[np.argmax(s)]
+        tmp = np.delete(pts, (np.argmin(s), np.argmax(s)), axis=0)
+        diff = np.diff(np.array(tmp), axis=1)
+        rect[1] = tmp[np.argmin(diff)]
+        rect[3] = tmp[np.argmax(diff)]
+        return rect
+
+    def clip_det_res(self, points, img_height, img_width):
+        for pno in range(points.shape[0]):
+            points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1))
+            points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1))
+        return points
+
+    def filter_tag_det_res(self, dt_boxes, image_shape):
+        img_height, img_width = image_shape[0:2]
+        dt_boxes_new = []
+        for box in dt_boxes:
+            if type(box) is list:
+                box = np.array(box)
+            box = self.order_points_clockwise(box)
+            box = self.clip_det_res(box, img_height, img_width)
+            rect_width = int(np.linalg.norm(box[0] - box[1]))
+            rect_height = int(np.linalg.norm(box[0] - box[3]))
+            if rect_width <= 3 or rect_height <= 3:
+                continue
+            dt_boxes_new.append(box)
+        dt_boxes = np.array(dt_boxes_new)
+        return dt_boxes
+
+    def filter_tag_det_res_only_clip(self, dt_boxes, image_shape):
+        img_height, img_width = image_shape[0:2]
+        dt_boxes_new = []
+        for box in dt_boxes:
+            if type(box) is list:
+                box = np.array(box)
+            box = self.clip_det_res(box, img_height, img_width)
+            dt_boxes_new.append(box)
+        dt_boxes = np.array(dt_boxes_new)
+        return dt_boxes
+
+    def predict(self, img):
+        ori_im = img.copy()
+        data = {"image": img}
+
+        st = time.time()
+
+        if self.args.benchmark:
+            self.autolog.times.start()
+
+        data = transform(data, self.preprocess_op)
+        img, shape_list = data
+        if img is None:
+            return None, 0
+        img = np.expand_dims(img, axis=0)
+        shape_list = np.expand_dims(shape_list, axis=0)
+        img = img.copy()
+
+        if self.args.benchmark:
+            self.autolog.times.stamp()
+        if self.use_onnx:
+            input_dict = {}
+            input_dict[self.input_tensor.name] = img
+            outputs = self.predictor.run(self.output_tensors, input_dict)
+        else:
+            self.input_tensor.copy_from_cpu(img)
+            self.predictor.run()
+            outputs = []
+            for output_tensor in self.output_tensors:
+                output = output_tensor.copy_to_cpu()
+                outputs.append(output)
+            if self.args.benchmark:
+                self.autolog.times.stamp()
+
+        preds = {}
+        if self.det_algorithm == "EAST":
+            preds["f_geo"] = outputs[0]
+            preds["f_score"] = outputs[1]
+        elif self.det_algorithm == "SAST":
+            preds["f_border"] = outputs[0]
+            preds["f_score"] = outputs[1]
+            preds["f_tco"] = outputs[2]
+            preds["f_tvo"] = outputs[3]
+        elif self.det_algorithm in ["DB", "PSE", "DB++"]:
+            preds["maps"] = outputs[0]
+        elif self.det_algorithm == "FCE":
+            for i, output in enumerate(outputs):
+                preds["level_{}".format(i)] = output
+        elif self.det_algorithm == "CT":
+            preds["maps"] = outputs[0]
+            preds["score"] = outputs[1]
+        else:
+            raise NotImplementedError
+
+        post_result = self.postprocess_op(preds, shape_list)
+        dt_boxes = post_result[0]["points"]
+
+        if self.args.det_box_type == "poly":
+            dt_boxes = self.filter_tag_det_res_only_clip(dt_boxes, ori_im.shape)
+        else:
+            dt_boxes = self.filter_tag_det_res(dt_boxes, ori_im.shape)
+
+        if self.args.benchmark:
+            self.autolog.times.end(stamp=True)
+        et = time.time()
+        return dt_boxes, et - st
+
+    def __call__(self, img):
+        # For image like poster with one side much greater than the other side,
+        # splitting recursively and processing with overlap to enhance performance.
+        MIN_BOUND_DISTANCE = 50
+        dt_boxes = np.zeros((0, 4, 2), dtype=np.float32)
+        elapse = 0
+        if (
+            img.shape[0] / img.shape[1] > 2
+            and img.shape[0] > self.args.det_limit_side_len
+        ):
+            start_h = 0
+            end_h = 0
+            while end_h <= img.shape[0]:
+                end_h = start_h + img.shape[1] * 3 // 4
+                subimg = img[start_h:end_h, :]
+                if len(subimg) == 0:
+                    break
+                sub_dt_boxes, sub_elapse = self.predict(subimg)
+                offset = start_h
+                # To prevent text blocks from being cut off, roll back a certain buffer area.
+                if (
+                    len(sub_dt_boxes) == 0
+                    or img.shape[1] - max([x[-1][1] for x in sub_dt_boxes])
+                    > MIN_BOUND_DISTANCE
+                ):
+                    start_h = end_h
+                else:
+                    sorted_indices = np.argsort(sub_dt_boxes[:, 2, 1])
+                    sub_dt_boxes = sub_dt_boxes[sorted_indices]
+                    bottom_line = (
+                        0
+                        if len(sub_dt_boxes) <= 1
+                        else int(np.max(sub_dt_boxes[:-1, 2, 1]))
+                    )
+                    if bottom_line > 0:
+                        start_h += bottom_line
+                        sub_dt_boxes = sub_dt_boxes[
+                            sub_dt_boxes[:, 2, 1] <= bottom_line
+                        ]
+                    else:
+                        start_h = end_h
+                if len(sub_dt_boxes) > 0:
+                    if dt_boxes.shape[0] == 0:
+                        dt_boxes = sub_dt_boxes + np.array(
+                            [0, offset], dtype=np.float32
+                        )
+                    else:
+                        dt_boxes = np.append(
+                            dt_boxes,
+                            sub_dt_boxes + np.array([0, offset], dtype=np.float32),
+                            axis=0,
+                        )
+                elapse += sub_elapse
+        elif (
+            img.shape[1] / img.shape[0] > 3
+            and img.shape[1] > self.args.det_limit_side_len * 3
+        ):
+            start_w = 0
+            end_w = 0
+            while end_w <= img.shape[1]:
+                end_w = start_w + img.shape[0] * 3 // 4
+                subimg = img[:, start_w:end_w]
+                if len(subimg) == 0:
+                    break
+                sub_dt_boxes, sub_elapse = self.predict(subimg)
+                offset = start_w
+                if (
+                    len(sub_dt_boxes) == 0
+                    or img.shape[0] - max([x[-1][0] for x in sub_dt_boxes])
+                    > MIN_BOUND_DISTANCE
+                ):
+                    start_w = end_w
+                else:
+                    sorted_indices = np.argsort(sub_dt_boxes[:, 2, 0])
+                    sub_dt_boxes = sub_dt_boxes[sorted_indices]
+                    right_line = (
+                        0
+                        if len(sub_dt_boxes) <= 1
+                        else int(np.max(sub_dt_boxes[:-1, 1, 0]))
+                    )
+                    if right_line > 0:
+                        start_w += right_line
+                        sub_dt_boxes = sub_dt_boxes[sub_dt_boxes[:, 1, 0] <= right_line]
+                    else:
+                        start_w = end_w
+                if len(sub_dt_boxes) > 0:
+                    if dt_boxes.shape[0] == 0:
+                        dt_boxes = sub_dt_boxes + np.array(
+                            [offset, 0], dtype=np.float32
+                        )
+                    else:
+                        dt_boxes = np.append(
+                            dt_boxes,
+                            sub_dt_boxes + np.array([offset, 0], dtype=np.float32),
+                            axis=0,
+                        )
+                elapse += sub_elapse
+        else:
+            dt_boxes, elapse = self.predict(img)
+        return dt_boxes, elapse
+
--- a/src/models/thrid_party/paddleocr/infer/predict_rec.py
+++ b/src/models/thrid_party/paddleocr/infer/predict_rec.py
@@ -0,0 +1,383 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+from PIL import Image
+
+__dir__ = os.path.dirname(os.path.abspath(__file__))
+sys.path.append(__dir__)
+sys.path.insert(0, os.path.abspath(os.path.join(__dir__, "../..")))
+
+os.environ["FLAGS_allocator_strategy"] = "auto_growth"
+
+import cv2
+import numpy as np
+import math
+import time
+
+import utility
+from utility import get_logger
+
+from CTCLabelDecode import CTCLabelDecode
+
+logger = get_logger()
+
+
+class TextRecognizer(object):
+    def __init__(self, args):
+        self.rec_image_shape = [int(v) for v in args.rec_image_shape.split(",")]
+        self.rec_batch_num = args.rec_batch_num
+        self.rec_algorithm = args.rec_algorithm
+        self.postprocess_op = CTCLabelDecode(character_dict_path=args.rec_char_dict_path, use_space_char=args.use_space_char)
+        (
+            self.predictor,
+            self.input_tensor,
+            self.output_tensors,
+            self.config,
+        ) = utility.create_predictor(args, "rec", logger)
+        self.benchmark = args.benchmark
+        self.use_onnx = args.use_onnx
+        self.return_word_box = args.return_word_box
+
+    def resize_norm_img(self, img, max_wh_ratio):
+        imgC, imgH, imgW = self.rec_image_shape
+        if self.rec_algorithm == "NRTR" or self.rec_algorithm == "ViTSTR":
+            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+            # return padding_im
+            image_pil = Image.fromarray(np.uint8(img))
+            if self.rec_algorithm == "ViTSTR":
+                img = image_pil.resize([imgW, imgH], Image.BICUBIC)
+            else:
+                img = image_pil.resize([imgW, imgH], Image.Resampling.LANCZOS)
+            img = np.array(img)
+            norm_img = np.expand_dims(img, -1)
+            norm_img = norm_img.transpose((2, 0, 1))
+            if self.rec_algorithm == "ViTSTR":
+                norm_img = norm_img.astype(np.float32) / 255.0
+            else:
+                norm_img = norm_img.astype(np.float32) / 128.0 - 1.0
+            return norm_img
+        elif self.rec_algorithm == "RFL":
+            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+            resized_image = cv2.resize(img, (imgW, imgH), interpolation=cv2.INTER_CUBIC)
+            resized_image = resized_image.astype("float32")
+            resized_image = resized_image / 255
+            resized_image = resized_image[np.newaxis, :]
+            resized_image -= 0.5
+            resized_image /= 0.5
+            return resized_image
+
+        assert imgC == img.shape[2]
+        imgW = int((imgH * max_wh_ratio))
+        if self.use_onnx:
+            w = self.input_tensor.shape[3:][0]
+            if isinstance(w, str):
+                pass
+            elif w is not None and w > 0:
+                imgW = w
+        h, w = img.shape[:2]
+        ratio = w / float(h)
+        if math.ceil(imgH * ratio) > imgW:
+            resized_w = imgW
+        else:
+            resized_w = int(math.ceil(imgH * ratio))
+        if self.rec_algorithm == "RARE":
+            if resized_w > self.rec_image_shape[2]:
+                resized_w = self.rec_image_shape[2]
+            imgW = self.rec_image_shape[2]
+        resized_image = cv2.resize(img, (resized_w, imgH))
+        resized_image = resized_image.astype("float32")
+        resized_image = resized_image.transpose((2, 0, 1)) / 255
+        resized_image -= 0.5
+        resized_image /= 0.5
+        padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
+        padding_im[:, :, 0:resized_w] = resized_image
+        return padding_im
+
+    def resize_norm_img_vl(self, img, image_shape):
+        imgC, imgH, imgW = image_shape
+        img = img[:, :, ::-1]  # bgr2rgb
+        resized_image = cv2.resize(img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
+        resized_image = resized_image.astype("float32")
+        resized_image = resized_image.transpose((2, 0, 1)) / 255
+        return resized_image
+
+    def resize_norm_img_srn(self, img, image_shape):
+        imgC, imgH, imgW = image_shape
+
+        img_black = np.zeros((imgH, imgW))
+        im_hei = img.shape[0]
+        im_wid = img.shape[1]
+
+        if im_wid <= im_hei * 1:
+            img_new = cv2.resize(img, (imgH * 1, imgH))
+        elif im_wid <= im_hei * 2:
+            img_new = cv2.resize(img, (imgH * 2, imgH))
+        elif im_wid <= im_hei * 3:
+            img_new = cv2.resize(img, (imgH * 3, imgH))
+        else:
+            img_new = cv2.resize(img, (imgW, imgH))
+
+        img_np = np.asarray(img_new)
+        img_np = cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY)
+        img_black[:, 0 : img_np.shape[1]] = img_np
+        img_black = img_black[:, :, np.newaxis]
+
+        row, col, c = img_black.shape
+        c = 1
+
+        return np.reshape(img_black, (c, row, col)).astype(np.float32)
+
+    def srn_other_inputs(self, image_shape, num_heads, max_text_length):
+        imgC, imgH, imgW = image_shape
+        feature_dim = int((imgH / 8) * (imgW / 8))
+
+        encoder_word_pos = (
+            np.array(range(0, feature_dim)).reshape((feature_dim, 1)).astype("int64")
+        )
+        gsrm_word_pos = (
+            np.array(range(0, max_text_length))
+            .reshape((max_text_length, 1))
+            .astype("int64")
+        )
+
+        gsrm_attn_bias_data = np.ones((1, max_text_length, max_text_length))
+        gsrm_slf_attn_bias1 = np.triu(gsrm_attn_bias_data, 1).reshape(
+            [-1, 1, max_text_length, max_text_length]
+        )
+        gsrm_slf_attn_bias1 = np.tile(gsrm_slf_attn_bias1, [1, num_heads, 1, 1]).astype(
+            "float32"
+        ) * [-1e9]
+
+        gsrm_slf_attn_bias2 = np.tril(gsrm_attn_bias_data, -1).reshape(
+            [-1, 1, max_text_length, max_text_length]
+        )
+        gsrm_slf_attn_bias2 = np.tile(gsrm_slf_attn_bias2, [1, num_heads, 1, 1]).astype(
+            "float32"
+        ) * [-1e9]
+
+        encoder_word_pos = encoder_word_pos[np.newaxis, :]
+        gsrm_word_pos = gsrm_word_pos[np.newaxis, :]
+
+        return [
+            encoder_word_pos,
+            gsrm_word_pos,
+            gsrm_slf_attn_bias1,
+            gsrm_slf_attn_bias2,
+        ]
+
+    def process_image_srn(self, img, image_shape, num_heads, max_text_length):
+        norm_img = self.resize_norm_img_srn(img, image_shape)
+        norm_img = norm_img[np.newaxis, :]
+
+        [
+            encoder_word_pos,
+            gsrm_word_pos,
+            gsrm_slf_attn_bias1,
+            gsrm_slf_attn_bias2,
+        ] = self.srn_other_inputs(image_shape, num_heads, max_text_length)
+
+        gsrm_slf_attn_bias1 = gsrm_slf_attn_bias1.astype(np.float32)
+        gsrm_slf_attn_bias2 = gsrm_slf_attn_bias2.astype(np.float32)
+        encoder_word_pos = encoder_word_pos.astype(np.int64)
+        gsrm_word_pos = gsrm_word_pos.astype(np.int64)
+
+        return (
+            norm_img,
+            encoder_word_pos,
+            gsrm_word_pos,
+            gsrm_slf_attn_bias1,
+            gsrm_slf_attn_bias2,
+        )
+
+    def resize_norm_img_sar(self, img, image_shape, width_downsample_ratio=0.25):
+        imgC, imgH, imgW_min, imgW_max = image_shape
+        h = img.shape[0]
+        w = img.shape[1]
+        valid_ratio = 1.0
+        # make sure new_width is an integral multiple of width_divisor.
+        width_divisor = int(1 / width_downsample_ratio)
+        # resize
+        ratio = w / float(h)
+        resize_w = math.ceil(imgH * ratio)
+        if resize_w % width_divisor != 0:
+            resize_w = round(resize_w / width_divisor) * width_divisor
+        if imgW_min is not None:
+            resize_w = max(imgW_min, resize_w)
+        if imgW_max is not None:
+            valid_ratio = min(1.0, 1.0 * resize_w / imgW_max)
+            resize_w = min(imgW_max, resize_w)
+        resized_image = cv2.resize(img, (resize_w, imgH))
+        resized_image = resized_image.astype("float32")
+        # norm
+        if image_shape[0] == 1:
+            resized_image = resized_image / 255
+            resized_image = resized_image[np.newaxis, :]
+        else:
+            resized_image = resized_image.transpose((2, 0, 1)) / 255
+        resized_image -= 0.5
+        resized_image /= 0.5
+        resize_shape = resized_image.shape
+        padding_im = -1.0 * np.ones((imgC, imgH, imgW_max), dtype=np.float32)
+        padding_im[:, :, 0:resize_w] = resized_image
+        pad_shape = padding_im.shape
+
+        return padding_im, resize_shape, pad_shape, valid_ratio
+
+    def resize_norm_img_spin(self, img):
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+        # return padding_im
+        img = cv2.resize(img, tuple([100, 32]), cv2.INTER_CUBIC)
+        img = np.array(img, np.float32)
+        img = np.expand_dims(img, -1)
+        img = img.transpose((2, 0, 1))
+        mean = [127.5]
+        std = [127.5]
+        mean = np.array(mean, dtype=np.float32)
+        std = np.array(std, dtype=np.float32)
+        mean = np.float32(mean.reshape(1, -1))
+        stdinv = 1 / np.float32(std.reshape(1, -1))
+        img -= mean
+        img *= stdinv
+        return img
+
+    def resize_norm_img_svtr(self, img, image_shape):
+        imgC, imgH, imgW = image_shape
+        resized_image = cv2.resize(img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
+        resized_image = resized_image.astype("float32")
+        resized_image = resized_image.transpose((2, 0, 1)) / 255
+        resized_image -= 0.5
+        resized_image /= 0.5
+        return resized_image
+
+    def resize_norm_img_cppd_padding(
+        self, img, image_shape, padding=True, interpolation=cv2.INTER_LINEAR
+    ):
+        imgC, imgH, imgW = image_shape
+        h = img.shape[0]
+        w = img.shape[1]
+        if not padding:
+            resized_image = cv2.resize(img, (imgW, imgH), interpolation=interpolation)
+            resized_w = imgW
+        else:
+            ratio = w / float(h)
+            if math.ceil(imgH * ratio) > imgW:
+                resized_w = imgW
+            else:
+                resized_w = int(math.ceil(imgH * ratio))
+            resized_image = cv2.resize(img, (resized_w, imgH))
+        resized_image = resized_image.astype("float32")
+        if image_shape[0] == 1:
+            resized_image = resized_image / 255
+            resized_image = resized_image[np.newaxis, :]
+        else:
+            resized_image = resized_image.transpose((2, 0, 1)) / 255
+        resized_image -= 0.5
+        resized_image /= 0.5
+        padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
+        padding_im[:, :, 0:resized_w] = resized_image
+
+        return padding_im
+
+    def resize_norm_img_abinet(self, img, image_shape):
+        imgC, imgH, imgW = image_shape
+
+        resized_image = cv2.resize(img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
+        resized_image = resized_image.astype("float32")
+        resized_image = resized_image / 255.0
+
+        mean = np.array([0.485, 0.456, 0.406])
+        std = np.array([0.229, 0.224, 0.225])
+        resized_image = (resized_image - mean[None, None, ...]) / std[None, None, ...]
+        resized_image = resized_image.transpose((2, 0, 1))
+        resized_image = resized_image.astype("float32")
+
+        return resized_image
+
+    def norm_img_can(self, img, image_shape):
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # CAN only predict gray scale image
+
+        if self.inverse:
+            img = 255 - img
+
+        if self.rec_image_shape[0] == 1:
+            h, w = img.shape
+            _, imgH, imgW = self.rec_image_shape
+            if h < imgH or w < imgW:
+                padding_h = max(imgH - h, 0)
+                padding_w = max(imgW - w, 0)
+                img_padded = np.pad(
+                    img,
+                    ((0, padding_h), (0, padding_w)),
+                    "constant",
+                    constant_values=(255),
+                )
+                img = img_padded
+
+        img = np.expand_dims(img, 0) / 255.0  # h,w,c -> c,h,w
+        img = img.astype("float32")
+
+        return img
+
+    def __call__(self, img_list):
+        img_num = len(img_list)
+        # Calculate the aspect ratio of all text bars
+        width_list = []
+        for img in img_list:
+            width_list.append(img.shape[1] / float(img.shape[0]))
+        # Sorting can speed up the recognition process
+        indices = np.argsort(np.array(width_list))
+        rec_res = [["", 0.0]] * img_num
+        batch_num = self.rec_batch_num
+        st = time.time()
+        if self.benchmark:
+            self.autolog.times.start()
+        for beg_img_no in range(0, img_num, batch_num):
+            end_img_no = min(img_num, beg_img_no + batch_num)
+            norm_img_batch = []
+            imgC, imgH, imgW = self.rec_image_shape[:3]
+            max_wh_ratio = imgW / imgH
+            wh_ratio_list = []
+            for ino in range(beg_img_no, end_img_no):
+                h, w = img_list[indices[ino]].shape[0:2]
+                wh_ratio = w * 1.0 / h
+                max_wh_ratio = max(max_wh_ratio, wh_ratio)
+                wh_ratio_list.append(wh_ratio)
+            for ino in range(beg_img_no, end_img_no):
+                norm_img = self.resize_norm_img(
+                    img_list[indices[ino]], max_wh_ratio
+                )
+                norm_img = norm_img[np.newaxis, :]
+                norm_img_batch.append(norm_img)
+            norm_img_batch = np.concatenate(norm_img_batch)
+            norm_img_batch = norm_img_batch.copy()
+            if self.benchmark:
+                self.autolog.times.stamp()
+
+            assert self.use_onnx
+            input_dict = {}
+            input_dict[self.input_tensor.name] = norm_img_batch
+            outputs = self.predictor.run(self.output_tensors, input_dict)
+            preds = outputs[0]
+            rec_result = self.postprocess_op(
+                preds,
+                return_word_box=self.return_word_box,
+                wh_ratio_list=wh_ratio_list,
+                max_wh_ratio=max_wh_ratio,
+            )
+            for rno in range(len(rec_result)):
+                rec_res[indices[beg_img_no + rno]] = rec_result[rno]
+            if self.benchmark:
+                self.autolog.times.end(stamp=True)
+        return rec_res, time.time() - st
--- a/src/models/thrid_party/paddleocr/infer/utility.py
+++ b/src/models/thrid_party/paddleocr/infer/utility.py
@@ -0,0 +1,713 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+import sys
+import functools
+import logging
+import cv2
+import numpy as np
+import PIL
+from PIL import Image, ImageDraw, ImageFont
+import math
+import random
+
+
+def str2bool(v):
+    return v.lower() in ("true", "yes", "t", "y", "1")
+
+
+def str2int_tuple(v):
+    return tuple([int(i.strip()) for i in v.split(",")])
+
+
+def init_args():
+    parser = argparse.ArgumentParser()
+    # params for prediction engine
+    parser.add_argument("--use_gpu", type=str2bool, default=True)
+    parser.add_argument("--use_xpu", type=str2bool, default=False)
+    parser.add_argument("--use_npu", type=str2bool, default=False)
+    parser.add_argument("--use_mlu", type=str2bool, default=False)
+    parser.add_argument("--ir_optim", type=str2bool, default=True)
+    parser.add_argument("--use_tensorrt", type=str2bool, default=False)
+    parser.add_argument("--min_subgraph_size", type=int, default=15)
+    parser.add_argument("--precision", type=str, default="fp32")
+    parser.add_argument("--gpu_mem", type=int, default=500)
+    parser.add_argument("--gpu_id", type=int, default=0)
+
+    # params for text detector
+    parser.add_argument("--image_dir", type=str)
+    parser.add_argument("--page_num", type=int, default=0)
+    parser.add_argument("--det_algorithm", type=str, default="DB")
+    parser.add_argument("--det_model_dir", type=str)
+    parser.add_argument("--det_limit_side_len", type=float, default=960)
+    parser.add_argument("--det_limit_type", type=str, default="max")
+    parser.add_argument("--det_box_type", type=str, default="quad")
+
+    # DB parmas
+    parser.add_argument("--det_db_thresh", type=float, default=0.3)
+    parser.add_argument("--det_db_box_thresh", type=float, default=0.6)
+    parser.add_argument("--det_db_unclip_ratio", type=float, default=1.5)
+    parser.add_argument("--max_batch_size", type=int, default=10)
+    parser.add_argument("--use_dilation", type=str2bool, default=False)
+    parser.add_argument("--det_db_score_mode", type=str, default="fast")
+
+    # EAST parmas
+    parser.add_argument("--det_east_score_thresh", type=float, default=0.8)
+    parser.add_argument("--det_east_cover_thresh", type=float, default=0.1)
+    parser.add_argument("--det_east_nms_thresh", type=float, default=0.2)
+
+    # SAST parmas
+    parser.add_argument("--det_sast_score_thresh", type=float, default=0.5)
+    parser.add_argument("--det_sast_nms_thresh", type=float, default=0.2)
+
+    # PSE parmas
+    parser.add_argument("--det_pse_thresh", type=float, default=0)
+    parser.add_argument("--det_pse_box_thresh", type=float, default=0.85)
+    parser.add_argument("--det_pse_min_area", type=float, default=16)
+    parser.add_argument("--det_pse_scale", type=int, default=1)
+
+    # FCE parmas
+    parser.add_argument("--scales", type=list, default=[8, 16, 32])
+    parser.add_argument("--alpha", type=float, default=1.0)
+    parser.add_argument("--beta", type=float, default=1.0)
+    parser.add_argument("--fourier_degree", type=int, default=5)
+
+    # params for text recognizer
+    parser.add_argument("--rec_algorithm", type=str, default="SVTR_LCNet")
+    parser.add_argument("--rec_model_dir", type=str)
+    parser.add_argument("--rec_image_inverse", type=str2bool, default=True)
+    parser.add_argument("--rec_image_shape", type=str, default="3, 48, 320")
+    parser.add_argument("--rec_batch_num", type=int, default=6)
+    parser.add_argument("--max_text_length", type=int, default=25)
+    parser.add_argument(
+        "--rec_char_dict_path", type=str, default="./ppocr_keys_v1.txt"
+    )
+    parser.add_argument("--use_space_char", type=str2bool, default=True)
+    parser.add_argument("--vis_font_path", type=str, default="./doc/fonts/simfang.ttf")
+    parser.add_argument("--drop_score", type=float, default=0.5)
+
+    # params for e2e
+    parser.add_argument("--e2e_algorithm", type=str, default="PGNet")
+    parser.add_argument("--e2e_model_dir", type=str)
+    parser.add_argument("--e2e_limit_side_len", type=float, default=768)
+    parser.add_argument("--e2e_limit_type", type=str, default="max")
+
+    # PGNet parmas
+    parser.add_argument("--e2e_pgnet_score_thresh", type=float, default=0.5)
+    parser.add_argument(
+        "--e2e_char_dict_path", type=str, default="./ppocr/utils/ic15_dict.txt"
+    )
+    parser.add_argument("--e2e_pgnet_valid_set", type=str, default="totaltext")
+    parser.add_argument("--e2e_pgnet_mode", type=str, default="fast")
+
+    # params for text classifier
+    parser.add_argument("--use_angle_cls", type=str2bool, default=False)
+    parser.add_argument("--cls_model_dir", type=str)
+    parser.add_argument("--cls_image_shape", type=str, default="3, 48, 192")
+    parser.add_argument("--label_list", type=list, default=["0", "180"])
+    parser.add_argument("--cls_batch_num", type=int, default=6)
+    parser.add_argument("--cls_thresh", type=float, default=0.9)
+
+    parser.add_argument("--enable_mkldnn", type=str2bool, default=False)
+    parser.add_argument("--cpu_threads", type=int, default=10)
+    parser.add_argument("--use_pdserving", type=str2bool, default=False)
+    parser.add_argument("--warmup", type=str2bool, default=False)
+
+    # SR parmas
+    parser.add_argument("--sr_model_dir", type=str)
+    parser.add_argument("--sr_image_shape", type=str, default="3, 32, 128")
+    parser.add_argument("--sr_batch_num", type=int, default=1)
+
+    #
+    parser.add_argument("--draw_img_save_dir", type=str, default="./inference_results")
+    parser.add_argument("--save_crop_res", type=str2bool, default=False)
+    parser.add_argument("--crop_res_save_dir", type=str, default="./output")
+
+    # multi-process
+    parser.add_argument("--use_mp", type=str2bool, default=False)
+    parser.add_argument("--total_process_num", type=int, default=1)
+    parser.add_argument("--process_id", type=int, default=0)
+
+    parser.add_argument("--benchmark", type=str2bool, default=False)
+    parser.add_argument("--save_log_path", type=str, default="./log_output/")
+
+    parser.add_argument("--show_log", type=str2bool, default=True)
+    parser.add_argument("--use_onnx", type=str2bool, default=False)
+
+    # extended function
+    parser.add_argument(
+        "--return_word_box",
+        type=str2bool,
+        default=False,
+        help="Whether return the bbox of each word (split by space) or chinese character. Only used in ppstructure for layout recovery",
+    )
+
+    return parser
+
+
+def parse_args():
+    parser = init_args()
+    return parser.parse_args([])
+
+
+def create_predictor(args, mode, logger):
+    if mode == "det":
+        model_dir = args.det_model_dir
+    elif mode == "cls":
+        model_dir = args.cls_model_dir
+    elif mode == "rec":
+        model_dir = args.rec_model_dir
+    elif mode == "table":
+        model_dir = args.table_model_dir
+    elif mode == "ser":
+        model_dir = args.ser_model_dir
+    elif mode == "re":
+        model_dir = args.re_model_dir
+    elif mode == "sr":
+        model_dir = args.sr_model_dir
+    elif mode == "layout":
+        model_dir = args.layout_model_dir
+    else:
+        model_dir = args.e2e_model_dir
+
+    if model_dir is None:
+        logger.info("not find {} model file path {}".format(mode, model_dir))
+        sys.exit(0)
+    assert args.use_onnx
+
+    import onnxruntime as ort
+
+    model_file_path = model_dir
+    if not os.path.exists(model_file_path):
+        raise ValueError("not find model file path {}".format(model_file_path))
+    if args.use_gpu:
+        sess = ort.InferenceSession(
+            model_file_path, providers=["CUDAExecutionProvider"]
+        )
+    else:
+        sess = ort.InferenceSession(model_file_path)
+    return sess, sess.get_inputs()[0], None, None
+
+
+
+def get_output_tensors(args, mode, predictor):
+    output_names = predictor.get_output_names()
+    output_tensors = []
+    if mode == "rec" and args.rec_algorithm in ["CRNN", "SVTR_LCNet", "SVTR_HGNet"]:
+        output_name = "softmax_0.tmp_0"
+        if output_name in output_names:
+            return [predictor.get_output_handle(output_name)]
+        else:
+            for output_name in output_names:
+                output_tensor = predictor.get_output_handle(output_name)
+                output_tensors.append(output_tensor)
+    else:
+        for output_name in output_names:
+            output_tensor = predictor.get_output_handle(output_name)
+            output_tensors.append(output_tensor)
+    return output_tensors
+
+
+def draw_e2e_res(dt_boxes, strs, img_path):
+    src_im = cv2.imread(img_path)
+    for box, str in zip(dt_boxes, strs):
+        box = box.astype(np.int32).reshape((-1, 1, 2))
+        cv2.polylines(src_im, [box], True, color=(255, 255, 0), thickness=2)
+        cv2.putText(
+            src_im,
+            str,
+            org=(int(box[0, 0, 0]), int(box[0, 0, 1])),
+            fontFace=cv2.FONT_HERSHEY_COMPLEX,
+            fontScale=0.7,
+            color=(0, 255, 0),
+            thickness=1,
+        )
+    return src_im
+
+
+def draw_text_det_res(dt_boxes, img):
+    for box in dt_boxes:
+        box = np.array(box).astype(np.int32).reshape(-1, 2)
+        cv2.polylines(img, [box], True, color=(255, 255, 0), thickness=2)
+    return img
+
+
+def resize_img(img, input_size=600):
+    """
+    resize img and limit the longest side of the image to input_size
+    """
+    img = np.array(img)
+    im_shape = img.shape
+    im_size_max = np.max(im_shape[0:2])
+    im_scale = float(input_size) / float(im_size_max)
+    img = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
+    return img
+
+
+def draw_ocr(
+    image,
+    boxes,
+    txts=None,
+    scores=None,
+    drop_score=0.5,
+    font_path="./doc/fonts/simfang.ttf",
+):
+    """
+    Visualize the results of OCR detection and recognition
+    args:
+        image(Image|array): RGB image
+        boxes(list): boxes with shape(N, 4, 2)
+        txts(list): the texts
+        scores(list): txxs corresponding scores
+        drop_score(float): only scores greater than drop_threshold will be visualized
+        font_path: the path of font which is used to draw text
+    return(array):
+        the visualized img
+    """
+    if scores is None:
+        scores = [1] * len(boxes)
+    box_num = len(boxes)
+    for i in range(box_num):
+        if scores is not None and (scores[i] < drop_score or math.isnan(scores[i])):
+            continue
+        box = np.reshape(np.array(boxes[i]), [-1, 1, 2]).astype(np.int64)
+        image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
+    if txts is not None:
+        img = np.array(resize_img(image, input_size=600))
+        txt_img = text_visual(
+            txts,
+            scores,
+            img_h=img.shape[0],
+            img_w=600,
+            threshold=drop_score,
+            font_path=font_path,
+        )
+        img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
+        return img
+    return image
+
+
+def draw_ocr_box_txt(
+    image,
+    boxes,
+    txts=None,
+    scores=None,
+    drop_score=0.5,
+    font_path="./doc/fonts/simfang.ttf",
+):
+    h, w = image.height, image.width
+    img_left = image.copy()
+    img_right = np.ones((h, w, 3), dtype=np.uint8) * 255
+    random.seed(0)
+
+    draw_left = ImageDraw.Draw(img_left)
+    if txts is None or len(txts) != len(boxes):
+        txts = [None] * len(boxes)
+    for idx, (box, txt) in enumerate(zip(boxes, txts)):
+        if scores is not None and scores[idx] < drop_score:
+            continue
+        color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
+        draw_left.polygon(box, fill=color)
+        img_right_text = draw_box_txt_fine((w, h), box, txt, font_path)
+        pts = np.array(box, np.int32).reshape((-1, 1, 2))
+        cv2.polylines(img_right_text, [pts], True, color, 1)
+        img_right = cv2.bitwise_and(img_right, img_right_text)
+    img_left = Image.blend(image, img_left, 0.5)
+    img_show = Image.new("RGB", (w * 2, h), (255, 255, 255))
+    img_show.paste(img_left, (0, 0, w, h))
+    img_show.paste(Image.fromarray(img_right), (w, 0, w * 2, h))
+    return np.array(img_show)
+
+
+def draw_box_txt_fine(img_size, box, txt, font_path="./doc/fonts/simfang.ttf"):
+    box_height = int(
+        math.sqrt((box[0][0] - box[3][0]) ** 2 + (box[0][1] - box[3][1]) ** 2)
+    )
+    box_width = int(
+        math.sqrt((box[0][0] - box[1][0]) ** 2 + (box[0][1] - box[1][1]) ** 2)
+    )
+
+    if box_height > 2 * box_width and box_height > 30:
+        img_text = Image.new("RGB", (box_height, box_width), (255, 255, 255))
+        draw_text = ImageDraw.Draw(img_text)
+        if txt:
+            font = create_font(txt, (box_height, box_width), font_path)
+            draw_text.text([0, 0], txt, fill=(0, 0, 0), font=font)
+        img_text = img_text.transpose(Image.ROTATE_270)
+    else:
+        img_text = Image.new("RGB", (box_width, box_height), (255, 255, 255))
+        draw_text = ImageDraw.Draw(img_text)
+        if txt:
+            font = create_font(txt, (box_width, box_height), font_path)
+            draw_text.text([0, 0], txt, fill=(0, 0, 0), font=font)
+
+    pts1 = np.float32(
+        [[0, 0], [box_width, 0], [box_width, box_height], [0, box_height]]
+    )
+    pts2 = np.array(box, dtype=np.float32)
+    M = cv2.getPerspectiveTransform(pts1, pts2)
+
+    img_text = np.array(img_text, dtype=np.uint8)
+    img_right_text = cv2.warpPerspective(
+        img_text,
+        M,
+        img_size,
+        flags=cv2.INTER_NEAREST,
+        borderMode=cv2.BORDER_CONSTANT,
+        borderValue=(255, 255, 255),
+    )
+    return img_right_text
+
+
+def create_font(txt, sz, font_path="./doc/fonts/simfang.ttf"):
+    font_size = int(sz[1] * 0.99)
+    font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
+    if int(PIL.__version__.split(".")[0]) < 10:
+        length = font.getsize(txt)[0]
+    else:
+        length = font.getlength(txt)
+
+    if length > sz[0]:
+        font_size = int(font_size * sz[0] / length)
+        font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
+    return font
+
+
+def str_count(s):
+    """
+    Count the number of Chinese characters,
+    a single English character and a single number
+    equal to half the length of Chinese characters.
+    args:
+        s(string): the input of string
+    return(int):
+        the number of Chinese characters
+    """
+    import string
+
+    count_zh = count_pu = 0
+    s_len = len(s)
+    en_dg_count = 0
+    for c in s:
+        if c in string.ascii_letters or c.isdigit() or c.isspace():
+            en_dg_count += 1
+        elif c.isalpha():
+            count_zh += 1
+        else:
+            count_pu += 1
+    return s_len - math.ceil(en_dg_count / 2)
+
+
+def text_visual(
+    texts, scores, img_h=400, img_w=600, threshold=0.0, font_path="./doc/simfang.ttf"
+):
+    """
+    create new blank img and draw txt on it
+    args:
+        texts(list): the text will be draw
+        scores(list|None): corresponding score of each txt
+        img_h(int): the height of blank img
+        img_w(int): the width of blank img
+        font_path: the path of font which is used to draw text
+    return(array):
+    """
+    if scores is not None:
+        assert len(texts) == len(
+            scores
+        ), "The number of txts and corresponding scores must match"
+
+    def create_blank_img():
+        blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
+        blank_img[:, img_w - 1 :] = 0
+        blank_img = Image.fromarray(blank_img).convert("RGB")
+        draw_txt = ImageDraw.Draw(blank_img)
+        return blank_img, draw_txt
+
+    blank_img, draw_txt = create_blank_img()
+
+    font_size = 20
+    txt_color = (0, 0, 0)
+    font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
+
+    gap = font_size + 5
+    txt_img_list = []
+    count, index = 1, 0
+    for idx, txt in enumerate(texts):
+        index += 1
+        if scores[idx] < threshold or math.isnan(scores[idx]):
+            index -= 1
+            continue
+        first_line = True
+        while str_count(txt) >= img_w // font_size - 4:
+            tmp = txt
+            txt = tmp[: img_w // font_size - 4]
+            if first_line:
+                new_txt = str(index) + ": " + txt
+                first_line = False
+            else:
+                new_txt = "    " + txt
+            draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
+            txt = tmp[img_w // font_size - 4 :]
+            if count >= img_h // gap - 1:
+                txt_img_list.append(np.array(blank_img))
+                blank_img, draw_txt = create_blank_img()
+                count = 0
+            count += 1
+        if first_line:
+            new_txt = str(index) + ": " + txt + "   " + "%.3f" % (scores[idx])
+        else:
+            new_txt = "  " + txt + "  " + "%.3f" % (scores[idx])
+        draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
+        # whether add new blank img or not
+        if count >= img_h // gap - 1 and idx + 1 < len(texts):
+            txt_img_list.append(np.array(blank_img))
+            blank_img, draw_txt = create_blank_img()
+            count = 0
+        count += 1
+    txt_img_list.append(np.array(blank_img))
+    if len(txt_img_list) == 1:
+        blank_img = np.array(txt_img_list[0])
+    else:
+        blank_img = np.concatenate(txt_img_list, axis=1)
+    return np.array(blank_img)
+
+
+def base64_to_cv2(b64str):
+    import base64
+
+    data = base64.b64decode(b64str.encode("utf8"))
+    data = np.frombuffer(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+
+def draw_boxes(image, boxes, scores=None, drop_score=0.5):
+    if scores is None:
+        scores = [1] * len(boxes)
+    for box, score in zip(boxes, scores):
+        if score < drop_score:
+            continue
+        box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
+        image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
+    return image
+
+
+def get_rotate_crop_image(img, points):
+    """
+    img_height, img_width = img.shape[0:2]
+    left = int(np.min(points[:, 0]))
+    right = int(np.max(points[:, 0]))
+    top = int(np.min(points[:, 1]))
+    bottom = int(np.max(points[:, 1]))
+    img_crop = img[top:bottom, left:right, :].copy()
+    points[:, 0] = points[:, 0] - left
+    points[:, 1] = points[:, 1] - top
+    """
+    assert len(points) == 4, "shape of points must be 4*2"
+    img_crop_width = int(
+        max(
+            np.linalg.norm(points[0] - points[1]), np.linalg.norm(points[2] - points[3])
+        )
+    )
+    img_crop_height = int(
+        max(
+            np.linalg.norm(points[0] - points[3]), np.linalg.norm(points[1] - points[2])
+        )
+    )
+    pts_std = np.float32(
+        [
+            [0, 0],
+            [img_crop_width, 0],
+            [img_crop_width, img_crop_height],
+            [0, img_crop_height],
+        ]
+    )
+    M = cv2.getPerspectiveTransform(points, pts_std)
+    dst_img = cv2.warpPerspective(
+        img,
+        M,
+        (img_crop_width, img_crop_height),
+        borderMode=cv2.BORDER_REPLICATE,
+        flags=cv2.INTER_CUBIC,
+    )
+    dst_img_height, dst_img_width = dst_img.shape[0:2]
+    if dst_img_height * 1.0 / dst_img_width >= 1.5:
+        dst_img = np.rot90(dst_img)
+    return dst_img
+
+
+def get_minarea_rect_crop(img, points):
+    bounding_box = cv2.minAreaRect(np.array(points).astype(np.int32))
+    points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
+
+    index_a, index_b, index_c, index_d = 0, 1, 2, 3
+    if points[1][1] > points[0][1]:
+        index_a = 0
+        index_d = 1
+    else:
+        index_a = 1
+        index_d = 0
+    if points[3][1] > points[2][1]:
+        index_b = 2
+        index_c = 3
+    else:
+        index_b = 3
+        index_c = 2
+
+    box = [points[index_a], points[index_b], points[index_c], points[index_d]]
+    crop_img = get_rotate_crop_image(img, np.array(box))
+    return crop_img
+
+
+# def check_gpu(use_gpu):
+#     if use_gpu and (
+#         not paddle.is_compiled_with_cuda() or paddle.device.get_device() == "cpu"
+#     ):
+#         use_gpu = False
+#     return use_gpu
+
+
+def _check_image_file(path):
+    img_end = {"jpg", "bmp", "png", "jpeg", "rgb", "tif", "tiff", "gif", "pdf"}
+    return any([path.lower().endswith(e) for e in img_end])
+
+
+def get_image_file_list(img_file, infer_list=None):
+    imgs_lists = []
+    if img_file is None or not os.path.exists(img_file):
+        raise Exception("not found any img file in {}".format(img_file))
+
+    if os.path.isfile(img_file) and _check_image_file(img_file):
+        imgs_lists.append(img_file)
+    elif os.path.isdir(img_file):
+        for single_file in os.listdir(img_file):
+            file_path = os.path.join(img_file, single_file)
+            if os.path.isfile(file_path) and _check_image_file(file_path):
+                imgs_lists.append(file_path)
+
+    if len(imgs_lists) == 0:
+        raise Exception("not found any img file in {}".format(img_file))
+    imgs_lists = sorted(imgs_lists)
+    return imgs_lists
+
+
+logger_initialized = {}
+@functools.lru_cache()
+def get_logger(name="ppocr", log_file=None, log_level=logging.DEBUG):
+    """Initialize and get a logger by name.
+    If the logger has not been initialized, this method will initialize the
+    logger by adding one or two handlers, otherwise the initialized logger will
+    be directly returned. During initialization, a StreamHandler will always be
+    added. If `log_file` is specified a FileHandler will also be added.
+    Args:
+        name (str): Logger name.
+        log_file (str | None): The log filename. If specified, a FileHandler
+            will be added to the logger.
+        log_level (int): The logger level. Note that only the process of
+            rank 0 is affected, and other processes will set the level to
+            "Error" thus be silent most of the time.
+    Returns:
+        logging.Logger: The expected logger.
+    """
+    logger = logging.getLogger(name)
+    if name in logger_initialized:
+        return logger
+    for logger_name in logger_initialized:
+        if name.startswith(logger_name):
+            return logger
+
+    formatter = logging.Formatter(
+        "[%(asctime)s] %(name)s %(levelname)s: %(message)s", datefmt="%Y/%m/%d %H:%M:%S"
+    )
+
+    stream_handler = logging.StreamHandler(stream=sys.stdout)
+    stream_handler.setFormatter(formatter)
+    logger.addHandler(stream_handler)
+    logger_initialized[name] = True
+    logger.propagate = False
+    return logger
+
+
+def get_rotate_crop_image(img, points):
+    """
+    img_height, img_width = img.shape[0:2]
+    left = int(np.min(points[:, 0]))
+    right = int(np.max(points[:, 0]))
+    top = int(np.min(points[:, 1]))
+    bottom = int(np.max(points[:, 1]))
+    img_crop = img[top:bottom, left:right, :].copy()
+    points[:, 0] = points[:, 0] - left
+    points[:, 1] = points[:, 1] - top
+    """
+    assert len(points) == 4, "shape of points must be 4*2"
+    img_crop_width = int(
+        max(
+            np.linalg.norm(points[0] - points[1]), np.linalg.norm(points[2] - points[3])
+        )
+    )
+    img_crop_height = int(
+        max(
+            np.linalg.norm(points[0] - points[3]), np.linalg.norm(points[1] - points[2])
+        )
+    )
+    pts_std = np.float32(
+        [
+            [0, 0],
+            [img_crop_width, 0],
+            [img_crop_width, img_crop_height],
+            [0, img_crop_height],
+        ]
+    )
+    M = cv2.getPerspectiveTransform(points, pts_std)
+    dst_img = cv2.warpPerspective(
+        img,
+        M,
+        (img_crop_width, img_crop_height),
+        borderMode=cv2.BORDER_REPLICATE,
+        flags=cv2.INTER_CUBIC,
+    )
+    dst_img_height, dst_img_width = dst_img.shape[0:2]
+    if dst_img_height * 1.0 / dst_img_width >= 1.5:
+        dst_img = np.rot90(dst_img)
+    return dst_img
+
+
+def get_minarea_rect_crop(img, points):
+    bounding_box = cv2.minAreaRect(np.array(points).astype(np.int32))
+    points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
+
+    index_a, index_b, index_c, index_d = 0, 1, 2, 3
+    if points[1][1] > points[0][1]:
+        index_a = 0
+        index_d = 1
+    else:
+        index_a = 1
+        index_d = 0
+    if points[3][1] > points[2][1]:
+        index_b = 2
+        index_c = 3
+    else:
+        index_b = 3
+        index_c = 2
+
+    box = [points[index_a], points[index_b], points[index_c], points[index_d]]
+    crop_img = get_rotate_crop_image(img, np.array(box))
+    return crop_img
+
+
+
+if __name__ == "__main__":
+    pass
--- a/src/models/tokenizer/roberta-tokenizer-550K/merges.txt
+++ b/src/models/tokenizer/roberta-tokenizer-550K/merges.txt
--- a/src/models/tokenizer/roberta-tokenizer-550K/special_tokens_map.json
+++ b/src/models/tokenizer/roberta-tokenizer-550K/special_tokens_map.json
@@ -1,15 +0,0 @@
-{
-  "bos_token": "<s>",
-  "cls_token": "<s>",
-  "eos_token": "</s>",
-  "mask_token": {
-    "content": "<mask>",
-    "lstrip": true,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "unk_token": "<unk>"
-}
--- a/src/models/tokenizer/roberta-tokenizer-550K/tokenizer.json
+++ b/src/models/tokenizer/roberta-tokenizer-550K/tokenizer.json
--- a/src/models/tokenizer/roberta-tokenizer-550K/tokenizer_config.json
+++ b/src/models/tokenizer/roberta-tokenizer-550K/tokenizer_config.json
@@ -1,57 +0,0 @@
-{
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "3": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "4": {
-      "content": "<mask>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    }
-  },
-  "bos_token": "<s>",
-  "clean_up_tokenization_spaces": true,
-  "cls_token": "<s>",
-  "eos_token": "</s>",
-  "errors": "replace",
-  "mask_token": "<mask>",
-  "model_max_length": 1000000000000000019884624838656,
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "tokenizer_class": "RobertaTokenizer",
-  "trim_offsets": true,
-  "unk_token": "<unk>"
-}
--- a/src/models/tokenizer/roberta-tokenizer-550K/vocab.json
+++ b/src/models/tokenizer/roberta-tokenizer-550K/vocab.json
--- a/src/models/tokenizer/roberta-tokenizer-7Mformulas/merges.txt
+++ b/src/models/tokenizer/roberta-tokenizer-7Mformulas/merges.txt
--- a/src/models/tokenizer/roberta-tokenizer-7Mformulas/special_tokens_map.json
+++ b/src/models/tokenizer/roberta-tokenizer-7Mformulas/special_tokens_map.json
@@ -1,15 +0,0 @@
-{
-  "bos_token": "<s>",
-  "cls_token": "<s>",
-  "eos_token": "</s>",
-  "mask_token": {
-    "content": "<mask>",
-    "lstrip": true,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "unk_token": "<unk>"
-}
--- a/src/models/tokenizer/roberta-tokenizer-7Mformulas/tokenizer.json
+++ b/src/models/tokenizer/roberta-tokenizer-7Mformulas/tokenizer.json
--- a/src/models/tokenizer/roberta-tokenizer-7Mformulas/tokenizer_config.json
+++ b/src/models/tokenizer/roberta-tokenizer-7Mformulas/tokenizer_config.json
@@ -1,57 +0,0 @@
-{
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "3": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "4": {
-      "content": "<mask>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    }
-  },
-  "bos_token": "<s>",
-  "clean_up_tokenization_spaces": true,
-  "cls_token": "<s>",
-  "eos_token": "</s>",
-  "errors": "replace",
-  "mask_token": "<mask>",
-  "model_max_length": 1000000000000000019884624838656,
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "tokenizer_class": "RobertaTokenizer",
-  "trim_offsets": true,
-  "unk_token": "<unk>"
-}
--- a/src/models/tokenizer/roberta-tokenizer-7Mformulas/vocab.json
+++ b/src/models/tokenizer/roberta-tokenizer-7Mformulas/vocab.json
--- a/src/models/tokenizer/roberta-tokenizer-raw/config.json
+++ b/src/models/tokenizer/roberta-tokenizer-raw/config.json
@@ -1,21 +0,0 @@
-{
-  "architectures": [
-    "RobertaForMaskedLM"
-  ],
-  "attention_probs_dropout_prob": 0.1,
-  "bos_token_id": 0,
-  "eos_token_id": 2,
-  "hidden_act": "gelu",
-  "hidden_dropout_prob": 0.1,
-  "hidden_size": 768,
-  "initializer_range": 0.02,
-  "intermediate_size": 3072,
-  "layer_norm_eps": 1e-05,
-  "max_position_embeddings": 514,
-  "model_type": "roberta",
-  "num_attention_heads": 12,
-  "num_hidden_layers": 12,
-  "pad_token_id": 1,
-  "type_vocab_size": 1,
-  "vocab_size": 50265
-}
--- a/src/models/tokenizer/roberta-tokenizer-raw/merges.txt
+++ b/src/models/tokenizer/roberta-tokenizer-raw/merges.txt
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
三洋三洋	a0942db712	[deps] pin transformers to 4.45.2 and sentence-transformers to 3.1.1	2025-02-01 13:00:44 +08:00
OleehyO	cee83611b5	Merge pull request #78 from OleehyO/pre_release Change to better import dependency	2024-08-07 12:43:15 +08:00
三洋三洋	e1046ba3fa	Change to better import dependency	2024-08-07 01:19:26 +08:00
OleehyO	bbc8ecf88b	Merge pull request #67 from OleehyO/pre_release Change setting name	2024-07-11 20:34:50 +08:00
三洋三洋	7438dee7ac	Change setting name	2024-07-11 20:33:51 +08:00
OleehyO	be922cc952	Merge pull request #60 from OleehyO/pre_release Pre release	2024-06-23 22:16:09 +08:00
三洋三洋	bfb1810fb0	Update README	2024-06-23 22:14:05 +08:00
三洋三洋	838febf48c	Remove onnxruntime-gpu	2024-06-23 22:13:51 +08:00
OleehyO	69f53d7256	Merge pull request #59 from OleehyO/pre_release Pre release	2024-06-22 23:56:45 +08:00
三洋三洋	6793142557	Update model config	2024-06-22 22:08:08 +08:00
三洋三洋	25f6cddf72	Update README	2024-06-22 22:00:14 +08:00
三洋三洋	cd519d8e99	Support onnx runtime	2024-06-22 22:00:05 +08:00
三洋三洋	2ae59776fa	Add optimum	2024-06-22 21:49:47 +08:00
OleehyO	529fba4db6	Merge pull request #58 from OleehyO/pre_release Add formula detection service	2024-06-17 21:26:35 +08:00
三洋三洋	d8659cd3a9	Add formula detection service	2024-06-17 21:23:55 +08:00
OleehyO	18dc6497ae	Merge pull request #56 from OleehyO/pre_release Add docker link	2024-06-11 13:22:17 +08:00
三洋三洋	c849728ee7	Add docker link	2024-06-11 13:20:32 +08:00
三洋三洋	a1c2b5b1ef	Update server.py 1. Change the default host address to 0.0.0.0. 2. Convert the output to KaTeX.	2024-06-07 12:26:24 +00:00
三洋三洋	6fbd285658	Update README	2024-06-07 06:54:23 +00:00
三洋三洋	9f4058c64b	Add Apache2.0 license	2024-06-06 13:06:16 +00:00
三洋三洋	236489ba2a	Add cover.png	2024-06-06 13:06:16 +00:00
三洋三洋	2920b753a8	Modify the names of options in the web.py Formula only -> Formula recognition Text formula mixed -> Paragraph recognition Improved display during mixed inference	2024-06-06 13:06:16 +00:00
三洋三洋	dbbec511ef	Refine mix_inference 1. Add the formula number back to the isolated formula and merge multiple tag. 2. remove bold effect from inline formuals 3. change split environment into aligned	2024-06-06 13:06:11 +00:00
三洋三洋	29e626c984	Bugfix: to_katex.py 1. Added `change_all` function to fix a bug where some LaTeX formulas with the same wrapper were causing issues. 2. Removed some unnecessary formatting commands. Bugfix: to_katex.py	2024-06-06 08:25:50 +00:00
三洋三洋	848726e6e2	Update	2024-05-28 09:51:53 +00:00
三洋三洋	e66f237cfd	Added releasing file	2024-05-28 07:50:09 +00:00
三洋三洋	f509b8c94a	Change the model configuration to trocr	2024-05-28 07:50:09 +00:00
三洋三洋	2ac159bfa2	Using paddleocr with onnxruntime Deleted the code for test time.	2024-05-28 07:50:09 +00:00
三洋三洋	226c1e1f76	Added mixed recognition change suryaocr to paddleocr	2024-05-28 07:50:08 +00:00
三洋三洋	a24ccd53ae	Added ONNX file for PaddleOCR model	2024-05-28 07:50:08 +00:00
三洋三洋	d3451d0ce7	Update .gitignore	2024-05-28 07:50:08 +00:00
三洋三洋	e2bf22dac8	Added code for PaddleOCR inference	2024-05-28 07:50:08 +00:00
三洋三洋	5c9cff2125	Eliminated dependency on paddleocr Change to trocr	2024-05-28 07:50:08 +00:00
三洋三洋	cc602f5a82	update	2024-05-28 07:50:08 +00:00
OleehyO	19827f1837	bugfix: ocr_aug.py Change "lhy_custom" in ink_swap_color to "random"	2024-05-28 07:49:55 +00:00
三洋三洋	0a51bde1c5	bugfix: missing filter_fn and inference/train transform	2024-05-12 07:49:04 +00:00
三洋三洋	249a4d5a5f	update	2024-05-12 07:47:35 +00:00
三洋三洋	720795e478	update	2024-05-10 03:48:31 +00:00
TonyLee1256	fac1cfdcda	Update requirements.txt	2024-05-09 00:23:32 +08:00
TonyLee1256	82f3eb67b7	Update mix_inference.py 替换文本OCR模型为paddleocr	2024-05-09 00:23:02 +08:00
TonyLee1256	30fbc6dc2d	Update inference.py 替换文本OCR模型为paddleocr	2024-05-09 00:22:01 +08:00
TonyLee1256	b869122dc6	Update inference.py 增加了计时功能	2024-05-09 00:20:32 +08:00
TonyLee1256	eaed8d88ca	Update infer_det.py 增加使用gpu进行onnx模型推理的功能	2024-05-09 00:19:39 +08:00
三洋三洋	5d95d2e65c	bugfix	2024-05-08 14:34:01 +00:00
三洋三洋	ad84fcfce8	Added Language option in mixed mode	2024-05-07 07:44:24 +00:00
三洋三洋	ec90b2fdb9	Update README	2024-05-07 07:30:29 +00:00
三洋三洋	ff1872d067	bugfix	2024-05-07 07:11:34 +00:00
三洋三洋	ef529f9234	Add train_config.yaml	2024-05-07 07:11:05 +00:00
三洋三洋	3c0ec95b26	update .gitignore	2024-05-07 06:54:53 +00:00
TonyLee1256	f3148ef32c	bugfix inference.py	2024-05-07 13:28:07 +08:00
TonyLee1256	3b18667541	Update README_zh.md	2024-05-07 13:27:23 +08:00
TonyLee1256	91efec1bfa	Update README.md	2024-05-07 13:26:50 +08:00
TonyLee1256	6aa4c49d33	Update README.md	2024-05-07 13:25:28 +08:00
TonyLee1256	7b2b947c47	bugfix inference.py	2024-05-07 13:19:43 +08:00
三洋三洋	a3b85c0d3d	update	2024-05-02 09:10:21 +00:00
三洋三洋	683e53c78d	Merge remote-tracking branch 'origin/pre_release' into pre_release	2024-04-21 16:13:49 +00:00
三洋三洋	cb02bc4313	update README.md	2024-04-21 16:13:45 +00:00
TonyLee1256	9e2d4347b1	Update rec_infer_from_crop_imgs.py	2024-04-22 00:08:36 +08:00
TonyLee1256	55823256ec	Update infer_det.py	2024-04-22 00:07:41 +08:00
TonyLee1256	58e565e2da	Update README.md	2024-04-21 22:14:23 +08:00
TonyLee1256	0de36b5523	Update README.md	2024-04-21 22:12:22 +08:00
TonyLee1256	7c50ae8595	Update README_zh.md	2024-04-21 22:09:58 +08:00
三洋三洋	dc57872bc9	Merge branch 'dev' into pre_release	2024-04-21 13:14:49 +00:00
三洋三洋	1997145cf6	Update README.md	2024-04-21 13:06:01 +00:00
三洋三洋	5f62c7fbf0	1) 修复了to_katex.py的bug; 2)把Box.py中的转化结果写在logs	2024-04-21 12:09:26 +00:00
三洋三洋	9b7e392c66	merge dev后调整了项目结构	2024-04-21 00:48:24 +08:00
三洋三洋	eac7f455d6	merge dev后删除了resizer	2024-04-21 00:13:21 +08:00
三洋三洋	f84168a00b	1) 实现了文本-公式混排识别; 2) 重构了项目结构	2024-04-21 00:05:14 +08:00
三洋三洋	3746ddd427	update infer_det.py	2024-04-18 00:06:05 +08:00
三洋三洋	d5eca45fcc	为了支持mixed inference, 重构了目录	2024-04-17 15:24:06 +00:00
三洋三洋	5a9138026f	修复了merge pre_release分支后导致参数名不一致的bug	2024-04-17 14:47:58 +00:00
三洋三洋	891a9c310a	Merge branch 'pre_release' into dev	2024-04-17 10:32:22 +00:00
三洋三洋	7a8491b595	checkpoint	2024-04-17 10:20:15 +00:00
三洋三洋	fe273c0258	update README.md	2024-04-17 10:08:46 +00:00
三洋三洋	b4b9e8cfc4	前端更新, inference.py更新 1) 前端支持剪贴板粘贴图片. 2) 前端支持模型配置. 3) 修改了inference.py的接口. 4) 删除了不必要的文件	2024-04-17 09:36:40 +00:00
三洋三洋	8e657bdc25	add contributor	2024-04-12 07:29:36 +00:00
三洋三洋	d80d7262ef	update README	2024-04-12 06:16:37 +00:00
三洋三洋	7d237d820c	work in progress	2024-04-12 03:20:04 +00:00
OleehyO	468f5c7a66	Merge pull request #14 from TonyLee1256/pre_release 新增公式检测模块	2024-04-12 00:46:45 +08:00
TonyLee1256	936744ea13	新增公式检测模块	2024-04-11 16:44:19 +00:00
三洋三洋	574dcc2842	修改了transforms.py中inference_transform的bug: 在训练的eval阶段没有把png图片转化为np.ndarray	2024-04-11 07:04:58 +00:00
三洋三洋	5c58b88c96	优化了transform.py中的trim_white_border	2024-04-10 16:09:13 +00:00
三洋三洋	aaee57acd2	增加了数据增强的概率	2024-04-09 13:50:35 +00:00
三洋三洋	7e163928c7	inference.py支持katex语法	2024-04-06 12:06:08 +00:00
三洋三洋	8fdaef43f9	update README.md	2024-04-06 11:57:50 +00:00
三洋三洋	35bc4e71a1	inference.py支持katex	2024-04-06 11:38:59 +00:00
三洋三洋	09f02166db	update README.md	2024-04-06 07:43:03 +00:00
三洋三洋	6179cc3226	web demo支持katex, 不再需要本地安装xelatex渲染器	2024-04-06 07:28:46 +00:00
三洋三洋	8d1e719455	web demo加入了katex支持, 不再需要本地安装xelatex渲染器	2024-04-06 07:18:40 +00:00
三洋三洋	dd00e11a98	inference_transform bugfix	2024-04-06 05:09:50 +00:00
三洋三洋	4d494520f8	完成了v3版本：加入自然场景的数据增强	2024-04-05 08:11:06 +00:00
三洋三洋	e99ca14d59	Merge remote-tracking branch 'origin/dev' into dev	2024-04-05 08:00:11 +00:00
三洋三洋	af34ac5552	Merge remote-tracking branch 'origin/dev' into dev	2024-04-05 07:52:40 +00:00
三洋三洋	34ac31504a	修改了v3(支持自然场景、混合文字场景识别)版本的inference.py模版	2024-04-05 07:27:07 +00:00
三洋三洋	5b730329b4	update README.md	2024-04-05 05:19:27 +00:00
三洋三洋	d8ee5e3b11	Merge remote-tracking branch 'origin/dev' into dev	2024-03-28 14:33:46 +00:00
三洋三洋	17c92cce37	merge v3_nature_scence	2024-03-28 14:33:25 +00:00
三洋三洋	bf220c1f7f	merge v3_nature_scence	2024-03-28 14:22:23 +00:00
三洋三洋	5b66e42df7	Merge remote-tracking branch 'origin/dev' into dev	2024-03-28 13:28:47 +00:00
三洋三洋	979301a768	TexTellerv2 release	2024-03-25 13:22:11 +00:00
OleehyO	14b637cd6b	Update README_zh.md	2024-03-25 16:35:34 +08:00
OleehyO	b64e119093	Update README_zh.md	2024-03-25 16:35:34 +08:00
OleehyO	c66b55638f	Update README.md	2024-03-25 16:34:46 +08:00
三洋三洋	3f4b3c9645	update	2024-03-25 08:32:17 +00:00
三洋三洋	5e191ff0fe	update	2024-03-25 07:53:11 +00:00
三洋三洋	9c3bb1c22a	update mp4	2024-03-25 07:32:33 +00:00
三洋三洋	ef218d67f6	TexTeller v2	2024-03-25 07:11:10 +00:00
三洋三洋	74341c7e8a	update	2024-03-19 14:43:03 +00:00
三洋三洋	5d089b5a7f	update	2024-03-03 12:09:14 +08:00
三洋三洋	d9ee6b0d9e	update	2024-03-01 22:42:15 +08:00
三洋三洋	2d21d2d215	update	2024-02-27 07:44:35 +00:00
三洋三洋	3527a4af47	updated API usage (supports remote calls)	2024-02-27 07:13:36 +00:00
三洋三洋	b4537944d0	Update README_zh.md	2024-02-12 16:33:49 +00:00
三洋三洋	72a60f8611	Update README	2024-02-12 16:27:58 +00:00
三洋三洋	3683623925	Update README_zh.md	2024-02-12 15:02:31 +00:00
三洋三洋	94b0781d84	Update README	2024-02-12 11:46:26 +00:00
三洋三洋	9bc165f955	Update files	2024-02-12 11:40:51 +00:00
三洋三洋	fa6bcda721	update README	2024-02-12 08:44:45 +00:00
三洋三洋	6e2e45a8d6	update README	2024-02-12 08:41:33 +00:00
三洋三洋	b4962bfa98	Initial commit	2024-02-11 10:44:42 +00:00
三洋三洋	f057490bdb	Initial commit	2024-02-11 09:14:40 +00:00