From ec90b2fdb93a4ddd5bb739dadf1696753ab159c0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E4=B8=89=E6=B4=8B=E4=B8=89=E6=B4=8B?= <1258009915@qq.com> Date: Tue, 7 May 2024 07:28:16 +0000 Subject: [PATCH] Update README --- README.md | 10 ++++++---- assets/README_zh.md | 10 ++++++---- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index cc9b1a8..360d181 100644 --- a/README.md +++ b/README.md @@ -160,11 +160,13 @@ If you are using a different dataset, you might need to retrain the tokenizer to ### Training the Model -To train the model, you need to run the following command in the `TexTeller/src` directory: +1. Modify `num_processes` in `src/train_config.yaml` to match the number of GPUs available for training (default is 1). -```bash -python -m models.ocr_model.train.train -``` +2. In the `TexTeller/src` directory, run the following command: + + ```bash + accelerate launch --config_file ./train_config.yaml -m models.ocr_model.train.train + ``` You can set your own tokenizer and checkpoint paths in `TexTeller/src/models/ocr_model/train/train.py` (refer to `train.py` for more information). If you are using the same architecture and dictionary as TexTeller, you can also fine-tune TexTeller's default weights with your own dataset. diff --git a/assets/README_zh.md b/assets/README_zh.md index fce448b..e8da0bd 100644 --- a/assets/README_zh.md +++ b/assets/README_zh.md @@ -198,11 +198,13 @@ python server.py ### 训练模型 -要想训练模型, 你需要在 `TexTeller/src`目录下运行以下命令: +1. 修改`src/train_config.yaml`中的`num_processes`为训练用的显卡数(默认为1) -```bash -python -m models.ocr_model.train.train -``` +2. 在`TexTeller/src`目录下运行以下命令: + + ```bash + accelerate launch --config_file ./train_config.yaml -m models.ocr_model.train.train + ``` 你可以在 `TexTeller/src/models/ocr_model/train/train.py`中设置自己的tokenizer和checkpoint路径(请参考 `train.py`)。如果你使用了与TexTeller一样的架构和相同的字典,你还可以用自己的数据集来微调TexTeller的默认权重。