[deps] Pin transformers to 4.47

[chore] Setup deps for doc build
[chore] Update
2025-04-21 12:24:03 +00:00 · 2025-04-21 12:24:00 +00:00 · 2025-04-21 12:24:00 +00:00 · 2025-04-21 12:23:56 +00:00 · 2025-04-21 08:38:06 +00:00 · 2025-04-21 08:21:16 +00:00
17 changed files with 172 additions and 162 deletions
--- a/.github/workflows/deploy-doc.yml
+++ b/.github/workflows/deploy-doc.yml
@@ -11,18 +11,13 @@ jobs:
    - uses: actions/checkout@v4
      with:
        persist-credentials: false
-    - name: Set up Python
-      uses: actions/setup-python@v4
-      with:
-        python-version: '3.10'
-    - name: Install uv
-      run: pip install uv
-    - name: Install docs dependencies
-      run: uv pip install --system -e ".[docs]"
    - name: Build HTML
-      run: |
-        cd docs
-        make html
+      uses: ammaraskar/sphinx-action@7.0.0
+      with:
+        pre-build-command: |
+          apt-get update && apt-get install -y git
+          pip install uv
+          uv pip install --system . .[docs]
    - name: Upload artifacts
      uses: actions/upload-artifact@v4
      with:
@@ -33,4 +28,4 @@ jobs:
      if: github.ref == 'refs/heads/main'
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
-        publish_dir: docs/build/html/
+        publish_dir: docs/build/html
--- a/.github/workflows/pr-welcome.yml
+++ b/.github/workflows/pr-welcome.yml
@@ -4,6 +4,10 @@ on:
  pull_request:
    types: [opened]

+permissions:
+  pull-requests: write
+  issues: write
+
 jobs:
  welcome:
    runs-on: ubuntu-latest
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -2,8 +2,6 @@ name: Publish to PyPI

 on:
  push:
-    branches:
-      - 'main'
    tags:
      - 'v*'

--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
@@ -28,8 +28,8 @@ jobs:

    - name: Install dependencies
      run: |
-        uv sync --group test
+        uv sync --extra test

    - name: Run tests with pytest
      run: |
-        uv run pytest tests/
+        uv run pytest -v tests/
--- a/README.md
+++ b/README.md
@@ -56,37 +56,43 @@ TexTeller was trained with **80M image-formula pairs** (previous dataset can be
 </tr>
 </table>

-## 🔄 Change Log
+## 📮 Change Log

- 📮[2024-06-06] **TexTeller3.0 released!** The training data has been increased to **80M** (**10x more than** TexTeller2.0 and also improved in data diversity). TexTeller3.0's new features:
+- [2024-06-06] **TexTeller3.0 released!** The training data has been increased to **80M** (**10x more than** TexTeller2.0 and also improved in data diversity). TexTeller3.0's new features:

  - Support scanned image, handwritten formulas, English(Chinese) mixed formulas.

  - OCR abilities in both Chinese and English for printed images.

- 📮[2024-05-02] Support **paragraph recognition**.
+- [2024-05-02] Support **paragraph recognition**.

- 📮[2024-04-12] **Formula detection model** released!
+- [2024-04-12] **Formula detection model** released!

- 📮[2024-03-25] TexTeller2.0 released! The training data for TexTeller2.0 has been increased to 7.5M (15x more than TexTeller1.0 and also improved in data quality). The trained TexTeller2.0 demonstrated **superior performance** in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.
+- [2024-03-25] TexTeller2.0 released! The training data for TexTeller2.0 has been increased to 7.5M (15x more than TexTeller1.0 and also improved in data quality). The trained TexTeller2.0 demonstrated **superior performance** in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.

  > [Here](./assets/test.pdf) are more test images and a horizontal comparison of various recognition models.

 ## 🚀 Getting Started

-1. Install the project's dependencies:
+1. Install uv:

   ```bash
-   pip install texteller
+   pip install uv
   ```

-2. If your are using CUDA backend, you may need to install `onnxruntime-gpu`:
+2. Install the project's dependencies:

   ```bash
-   pip install texteller[onnxruntime-gpu]
+   uv pip install texteller
   ```

-3. Run the following command to start inference:
+3. If your are using CUDA backend, you may need to install `onnxruntime-gpu`:
+
+   ```bash
+   uv pip install texteller[onnxruntime-gpu]
+   ```
+
+4. Run the following command to start inference:

   ```bash
   texteller inference "/path/to/image.{jpg,png}"
@@ -164,7 +170,7 @@ Please setup your environment before training:
 1. Install the dependencies for training:

   ```bash
-   pip install texteller[train]
+   uv pip install texteller[train]
   ```

 2. Clone the repository:
--- a/assets/README_zh.md
+++ b/assets/README_zh.md
@@ -74,19 +74,25 @@ TexTeller 使用 **8千万图像-公式对** 进行训练（前代数据集可

 ## 🚀 快速开始

-1. 安装项目依赖：
+1. 安装uv：

   ```bash
-   pip install texteller
+   pip install uv
   ```

-2. 若使用 CUDA 后端，可能需要安装 `onnxruntime-gpu`：
+2. 安装项目依赖：

   ```bash
-   pip install texteller[onnxruntime-gpu]
+   uv pip install texteller
   ```

-3. 运行以下命令开始推理：
+3. 若使用 CUDA 后端，可能需要安装 `onnxruntime-gpu`：
+
+   ```bash
+   uv pip install texteller[onnxruntime-gpu]
+   ```
+
+4. 运行以下命令开始推理：

   ```bash
   texteller inference "/path/to/image.{jpg,png}"
@@ -96,7 +102,7 @@ TexTeller 使用 **8千万图像-公式对** 进行训练（前代数据集可

 ## 🌐 网页演示

-运行命令：
+命令行运行：

 ```bash
 texteller web
@@ -152,7 +158,7 @@ print(response.text)
 TexTeller的公式检测模型在3415张中文资料图像和8272张[IBEM数据集](https://zenodo.org/records/4757865)图像上训练。

 <div align="center">
-    <img src="./assets/det_rec.png" width=250>
+    <img src="./det_rec.png" width=250>
 </div>

 我们在Python接口中提供了公式检测接口，详见[接口文档](https://oleehyo.github.io/TexTeller/)。
@@ -164,7 +170,7 @@ TexTeller的公式检测模型在3415张中文资料图像和8272张[IBEM数据
 1. 安装训练依赖：

   ```bash
-   pip install texteller[train]
+   uv pip install texteller[train]
   ```

 2. 克隆仓库：
--- a/assets/logo.svg
+++ b/assets/logo.svg
@@ -1,9 +1,10 @@
-<svg xmlns="http://www.w3.org/2000/svg" width="354" height="100" viewBox="0 0 354 100">
+
+<svg xmlns="http://www.w3.org/2000/svg" width="430" height="80" viewBox="0 0 430 80">

  <text
    x="50%"
    y="50%"
-    font-family="Arial, sans-serif"
+    font-family="monaco"
    font-size="55"
    text-anchor="middle"
    dominant-baseline="middle">
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -12,64 +12,64 @@
 import os
 import sys

-sys.path.insert(0, os.path.abspath('../..'))
+sys.path.insert(0, os.path.abspath("../.."))

 # -- Project information -----------------------------------------------------

-project = 'TexTeller'
-copyright = '2025, TexTeller Team'
-author = 'TexTeller Team'
+project = "TexTeller"
+copyright = "2025, TexTeller Team"
+author = "TexTeller Team"

 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

 extensions = [
-    'myst_parser',
-    'sphinx.ext.duration',
-    'sphinx.ext.intersphinx',
-    'sphinx.ext.autosectionlabel',
-    'sphinx.ext.autodoc',
-    'sphinx.ext.viewcode',
-    'sphinx.ext.napoleon',
-    'sphinx.ext.autosummary',
-    'sphinx_copybutton',
+    "myst_parser",
+    "sphinx.ext.duration",
+    "sphinx.ext.intersphinx",
+    "sphinx.ext.autosectionlabel",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.viewcode",
+    "sphinx.ext.napoleon",
+    "sphinx.ext.autosummary",
+    "sphinx_copybutton",
    # 'sphinx.ext.linkcode',
    # 'sphinxarg.ext',
-    'sphinx_design',
-    'nbsphinx',
+    "sphinx_design",
+    "nbsphinx",
 ]

-templates_path = ['_templates']
+templates_path = ["_templates"]
 exclude_patterns = []

 # Autodoc settings
-autodoc_member_order = 'bysource'
+autodoc_member_order = "bysource"
 add_module_names = False
-autoclass_content = 'both'
+autoclass_content = "both"
 autodoc_default_options = {
-    'members': True,
-    'member-order': 'bysource',
-    'undoc-members': True,
-    'show-inheritance': True,
-    'imported-members': True,
+    "members": True,
+    "member-order": "bysource",
+    "undoc-members": True,
+    "show-inheritance": True,
+    "imported-members": True,
 }

 # Intersphinx settings
 intersphinx_mapping = {
-    'python': ('https://docs.python.org/3', None),
-    'numpy': ('https://numpy.org/doc/stable', None),
-    'torch': ('https://pytorch.org/docs/stable', None),
-    'transformers': ('https://huggingface.co/docs/transformers/main/en', None),
+    "python": ("https://docs.python.org/3", None),
+    "numpy": ("https://numpy.org/doc/stable", None),
+    "torch": ("https://pytorch.org/docs/stable", None),
+    "transformers": ("https://huggingface.co/docs/transformers/main/en", None),
 }

-html_theme = 'sphinx_book_theme'
+html_theme = "sphinx_book_theme"

 html_theme_options = {
-    'repository_url': 'https://github.com/OleehyO/TexTeller',
-    'use_repository_button': True,
-    'use_issues_button': True,
-    'use_edit_page_button': True,
-    'use_download_button': True,
+    "repository_url": "https://github.com/OleehyO/TexTeller",
+    "use_repository_button": True,
+    "use_issues_button": True,
+    "use_edit_page_button": True,
+    "use_download_button": True,
 }

 html_logo = "../../assets/logo.svg"
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -40,7 +40,7 @@ Converting an image to LaTeX:

 Processing a mixed text/formula image:

-.. code-block::python
+.. code-block:: python

   from texteller import (
       load_model, load_tokenizer, load_latexdet_model,
--- a/examples/client_demo.py
+++ b/examples/client_demo.py
@@ -3,8 +3,8 @@ import requests
 server_url = "http://127.0.0.1:8000/predict"

 img_path = "/path/to/your/image"
-with open(img_path, 'rb') as img:
-    files = {'img': img}
+with open(img_path, "rb") as img:
+    files = {"img": img}
    response = requests.post(server_url, files=files)

 print(response.text)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -22,7 +22,7 @@ dependencies = [
    "streamlit-paste-button>=0.1.2",
    "torch>=2.6.0",
    "torchvision>=0.21.0",
-    "transformers==4.45.2",
+    "transformers==4.47",
    "wget>=3.2",
    "optimum[onnxruntime]>=1.24.0",
    "python-multipart>=0.0.20",
--- a/texteller/api/format.py
+++ b/texteller/api/format.py
@@ -19,8 +19,8 @@ TEXT_LINE_START = ""
 COMMENT_LINE_START = "% "

 # Opening and closing delimiters
-OPENS = ['{', '(', '[']
-CLOSES = ['}', ')', ']']
+OPENS = ["{", "(", "["]
+CLOSES = ["}", ")", "]"]

 # Names of LaTeX verbatim environments
 VERBATIMS = ["verbatim", "Verbatim", "lstlisting", "minted", "comment"]
@@ -138,7 +138,7 @@ class Pattern:
                contains_env_end=ENV_END in s,
                contains_item=ITEM in s,
                contains_splitting=True,
-                contains_comment='%' in s,
+                contains_comment="%" in s,
            )
        else:
            return cls(
@@ -146,7 +146,7 @@ class Pattern:
                contains_env_end=False,
                contains_item=False,
                contains_splitting=False,
-                contains_comment='%' in s,
+                contains_comment="%" in s,
            )


@@ -169,11 +169,11 @@ def find_comment_index(line: str, pattern: Pattern) -> Optional[int]:

    in_command = False
    for i, c in enumerate(line):
-        if c == '\\':
+        if c == "\\":
            in_command = True
        elif in_command and not c.isalpha():
            in_command = False
-        elif c == '%' and not in_command:
+        elif c == "%" and not in_command:
            return i

    return None
@@ -390,10 +390,10 @@ def find_wrap_point(line: str, indent_length: int, args: Args) -> Optional[int]:
        line_width += 1
        if line_width > wrap_boundary and wrap_point is not None:
            break
-        if c == ' ' and prev_char != '\\':
+        if c == " " and prev_char != "\\":
            if after_char:
                wrap_point = i
-        elif c != '%':
+        elif c != "%":
            after_char = True
        prev_char = c

@@ -483,8 +483,8 @@ def split_line(line: str, state: State, file: str, args: Args, logs: List[Log])
    if not match:
        return line, ""

-    prev = match.group('prev')
-    rest = match.group('env')
+    prev = match.group("prev")
+    rest = match.group("env")

    if args.verbosity >= 3:  # Trace level
        logs.append(
@@ -517,8 +517,8 @@ def clean_text(text: str, args: Args) -> str:
    text = RE_NEWLINES.sub(f"{LINE_END}{LINE_END}", text)

    # Remove tabs if they shouldn't be used
-    if args.tabchar != '\t':
-        text = text.replace('\t', ' ' * args.tabsize)
+    if args.tabchar != "\t":
+        text = text.replace("\t", " " * args.tabsize)

    # Remove trailing spaces
    text = RE_TRAIL.sub(LINE_END, text)
@@ -577,7 +577,7 @@ def _format_latex(old_text: str, file: str, args: Args) -> Tuple[str, List[Log]]
    new_text = ""

    # Select the character used for indentation
-    indent_char = '\t' if args.tabchar == '\t' else ' '
+    indent_char = "\t" if args.tabchar == "\t" else " "

    # Get any extra environments to be indented as lists
    lists_begin = [f"\\begin{{{l}}}" for l in args.lists]
--- a/texteller/api/katex.py
+++ b/texteller/api/katex.py
@@ -5,13 +5,13 @@ from .format import format_latex


 def _rm_dollar_surr(content):
-    pattern = re.compile(r'\\[a-zA-Z]+\$.*?\$|\$.*?\$')
+    pattern = re.compile(r"\\[a-zA-Z]+\$.*?\$|\$.*?\$")
    matches = pattern.findall(content)

    for match in matches:
-        if not re.match(r'\\[a-zA-Z]+', match):
-            new_match = match.strip('$')
-            content = content.replace(match, ' ' + new_match + ' ')
+        if not re.match(r"\\[a-zA-Z]+", match):
+            new_match = match.strip("$")
+            content = content.replace(match, " " + new_match + " ")

    return content

@@ -33,97 +33,97 @@ def to_katex(formula: str) -> str:
    """
    res = formula
    # remove mbox surrounding
-    res = change_all(res, r'\mbox ', r' ', r'{', r'}', r'', r'')
-    res = change_all(res, r'\mbox', r' ', r'{', r'}', r'', r'')
+    res = change_all(res, r"\mbox ", r" ", r"{", r"}", r"", r"")
+    res = change_all(res, r"\mbox", r" ", r"{", r"}", r"", r"")
    # remove hbox surrounding
-    res = re.sub(r'\\hbox to ?-? ?\d+\.\d+(pt)?\{', r'\\hbox{', res)
-    res = change_all(res, r'\hbox', r' ', r'{', r'}', r'', r' ')
+    res = re.sub(r"\\hbox to ?-? ?\d+\.\d+(pt)?\{", r"\\hbox{", res)
+    res = change_all(res, r"\hbox", r" ", r"{", r"}", r"", r" ")
    # remove raise surrounding
-    res = re.sub(r'\\raise ?-? ?\d+\.\d+(pt)?', r' ', res)
+    res = re.sub(r"\\raise ?-? ?\d+\.\d+(pt)?", r" ", res)
    # remove makebox
-    res = re.sub(r'\\makebox ?\[\d+\.\d+(pt)?\]\{', r'\\makebox{', res)
-    res = change_all(res, r'\makebox', r' ', r'{', r'}', r'', r' ')
+    res = re.sub(r"\\makebox ?\[\d+\.\d+(pt)?\]\{", r"\\makebox{", res)
+    res = change_all(res, r"\makebox", r" ", r"{", r"}", r"", r" ")
    # remove vbox surrounding, scalebox surrounding
-    res = re.sub(r'\\raisebox\{-? ?\d+\.\d+(pt)?\}\{', r'\\raisebox{', res)
-    res = re.sub(r'\\scalebox\{-? ?\d+\.\d+(pt)?\}\{', r'\\scalebox{', res)
-    res = change_all(res, r'\scalebox', r' ', r'{', r'}', r'', r' ')
-    res = change_all(res, r'\raisebox', r' ', r'{', r'}', r'', r' ')
-    res = change_all(res, r'\vbox', r' ', r'{', r'}', r'', r' ')
+    res = re.sub(r"\\raisebox\{-? ?\d+\.\d+(pt)?\}\{", r"\\raisebox{", res)
+    res = re.sub(r"\\scalebox\{-? ?\d+\.\d+(pt)?\}\{", r"\\scalebox{", res)
+    res = change_all(res, r"\scalebox", r" ", r"{", r"}", r"", r" ")
+    res = change_all(res, r"\raisebox", r" ", r"{", r"}", r"", r" ")
+    res = change_all(res, r"\vbox", r" ", r"{", r"}", r"", r" ")

    origin_instructions = [
-        r'\Huge',
-        r'\huge',
-        r'\LARGE',
-        r'\Large',
-        r'\large',
-        r'\normalsize',
-        r'\small',
-        r'\footnotesize',
-        r'\tiny',
+        r"\Huge",
+        r"\huge",
+        r"\LARGE",
+        r"\Large",
+        r"\large",
+        r"\normalsize",
+        r"\small",
+        r"\footnotesize",
+        r"\tiny",
    ]
    for old_ins, new_ins in zip(origin_instructions, origin_instructions):
-        res = change_all(res, old_ins, new_ins, r'$', r'$', '{', '}')
-    res = change_all(res, r'\mathbf', r'\bm', r'{', r'}', r'{', r'}')
-    res = change_all(res, r'\boldmath ', r'\bm', r'{', r'}', r'{', r'}')
-    res = change_all(res, r'\boldmath', r'\bm', r'{', r'}', r'{', r'}')
-    res = change_all(res, r'\boldmath ', r'\bm', r'$', r'$', r'{', r'}')
-    res = change_all(res, r'\boldmath', r'\bm', r'$', r'$', r'{', r'}')
-    res = change_all(res, r'\scriptsize', r'\scriptsize', r'$', r'$', r'{', r'}')
-    res = change_all(res, r'\emph', r'\textit', r'{', r'}', r'{', r'}')
-    res = change_all(res, r'\emph ', r'\textit', r'{', r'}', r'{', r'}')
+        res = change_all(res, old_ins, new_ins, r"$", r"$", "{", "}")
+    res = change_all(res, r"\mathbf", r"\bm", r"{", r"}", r"{", r"}")
+    res = change_all(res, r"\boldmath ", r"\bm", r"{", r"}", r"{", r"}")
+    res = change_all(res, r"\boldmath", r"\bm", r"{", r"}", r"{", r"}")
+    res = change_all(res, r"\boldmath ", r"\bm", r"$", r"$", r"{", r"}")
+    res = change_all(res, r"\boldmath", r"\bm", r"$", r"$", r"{", r"}")
+    res = change_all(res, r"\scriptsize", r"\scriptsize", r"$", r"$", r"{", r"}")
+    res = change_all(res, r"\emph", r"\textit", r"{", r"}", r"{", r"}")
+    res = change_all(res, r"\emph ", r"\textit", r"{", r"}", r"{", r"}")

    # remove bold command
-    res = change_all(res, r'\bm', r' ', r'{', r'}', r'', r'')
+    res = change_all(res, r"\bm", r" ", r"{", r"}", r"", r"")

    origin_instructions = [
-        r'\left',
-        r'\middle',
-        r'\right',
-        r'\big',
-        r'\Big',
-        r'\bigg',
-        r'\Bigg',
-        r'\bigl',
-        r'\Bigl',
-        r'\biggl',
-        r'\Biggl',
-        r'\bigm',
-        r'\Bigm',
-        r'\biggm',
-        r'\Biggm',
-        r'\bigr',
-        r'\Bigr',
-        r'\biggr',
-        r'\Biggr',
+        r"\left",
+        r"\middle",
+        r"\right",
+        r"\big",
+        r"\Big",
+        r"\bigg",
+        r"\Bigg",
+        r"\bigl",
+        r"\Bigl",
+        r"\biggl",
+        r"\Biggl",
+        r"\bigm",
+        r"\Bigm",
+        r"\biggm",
+        r"\Biggm",
+        r"\bigr",
+        r"\Bigr",
+        r"\biggr",
+        r"\Biggr",
    ]
    for origin_ins in origin_instructions:
-        res = change_all(res, origin_ins, origin_ins, r'{', r'}', r'', r'')
+        res = change_all(res, origin_ins, origin_ins, r"{", r"}", r"", r"")

-    res = re.sub(r'\\\[(.*?)\\\]', r'\1\\newline', res)
+    res = re.sub(r"\\\[(.*?)\\\]", r"\1\\newline", res)

-    if res.endswith(r'\newline'):
+    if res.endswith(r"\newline"):
        res = res[:-8]

    # remove multiple spaces
-    res = re.sub(r'(\\,){1,}', ' ', res)
-    res = re.sub(r'(\\!){1,}', ' ', res)
-    res = re.sub(r'(\\;){1,}', ' ', res)
-    res = re.sub(r'(\\:){1,}', ' ', res)
-    res = re.sub(r'\\vspace\{.*?}', '', res)
+    res = re.sub(r"(\\,){1,}", " ", res)
+    res = re.sub(r"(\\!){1,}", " ", res)
+    res = re.sub(r"(\\;){1,}", " ", res)
+    res = re.sub(r"(\\:){1,}", " ", res)
+    res = re.sub(r"\\vspace\{.*?}", "", res)

    # merge consecutive text
    def merge_texts(match):
        texts = match.group(0)
-        merged_content = ''.join(re.findall(r'\\text\{([^}]*)\}', texts))
-        return f'\\text{{{merged_content}}}'
+        merged_content = "".join(re.findall(r"\\text\{([^}]*)\}", texts))
+        return f"\\text{{{merged_content}}}"

-    res = re.sub(r'(\\text\{[^}]*\}\s*){2,}', merge_texts, res)
+    res = re.sub(r"(\\text\{[^}]*\}\s*){2,}", merge_texts, res)

-    res = res.replace(r'\bf ', '')
+    res = res.replace(r"\bf ", "")
    res = _rm_dollar_surr(res)

    # remove extra spaces (keeping only one)
-    res = re.sub(r' +', ' ', res)
+    res = re.sub(r" +", " ", res)

    # format latex
    res = res.strip()
--- a/texteller/models/init.py
+++ b/texteller/models/init.py
@@ -1,3 +1,3 @@
 from .texteller import TexTeller

-__all__ = ['TexTeller']
+__all__ = ["TexTeller"]
--- a/texteller/utils/image.py
+++ b/texteller/utils/image.py
@@ -41,7 +41,7 @@ def readimgs(image_paths: list[str]) -> list[np.ndarray]:
        if image is None:
            raise ValueError(f"Image at {path} could not be read.")
        if image.dtype == np.uint16:
-            _logger.warning(f'Converting {path} to 8-bit, image may be lossy.')
+            _logger.warning(f"Converting {path} to 8-bit, image may be lossy.")
            image = cv2.convertScaleAbs(image, alpha=(255.0 / 65535.0))

        channels = 1 if len(image.shape) == 2 else image.shape[2]
@@ -112,7 +112,7 @@ def transform(images: List[Union[np.ndarray, Image.Image]]) -> List[torch.Tensor

    assert IMG_CHANNELS == 1, "Only support grayscale images for now"
    images = [
-        np.array(img.convert('RGB')) if isinstance(img, Image.Image) else img for img in images
+        np.array(img.convert("RGB")) if isinstance(img, Image.Image) else img for img in images
    ]
    images = [trim_white_border(image) for image in images]
    images = [general_transform_pipeline(image) for image in images]
--- a/texteller/utils/latex.py
+++ b/texteller/utils/latex.py
@@ -21,7 +21,7 @@ def _change(input_str, old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l, n
            j = start + 1
            escaped = False
            while j < n and count > 0:
-                if input_str[j] == '\\' and not escaped:
+                if input_str[j] == "\\" and not escaped:
                    escaped = True
                    j += 1
                    continue
@@ -71,10 +71,10 @@ def change_all(input_str, old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l
    for p in pos[::-1]:
        res[p:] = list(
            _change(
-                ''.join(res[p:]), old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l, new_surr_r
+                "".join(res[p:]), old_inst, new_inst, old_surr_l, old_surr_r, new_surr_l, new_surr_r
            )
        )
-    res = ''.join(res)
+    res = "".join(res)
    return res


@@ -121,7 +121,7 @@ def add_newlines(latex_str: str) -> str:

    # 4. Cleanup: Collapse multiple consecutive newlines into a single newline.
    # This handles cases where the replacements above might have created \n\n.
-    processed_str = re.sub(r'\n{2,}', '\n', processed_str)
+    processed_str = re.sub(r"\n{2,}", "\n", processed_str)

    # Remove leading/trailing whitespace (including potential single newlines
    # at the very start/end resulting from the replacements) from the entire result.
Author	SHA1	Message	Date
OleehyO	12e6bb4312	[deps] Pin transformers to 4.47	2025-04-21 12:24:03 +00:00
OleehyO	4292be86f2	[chore] Setup deps for doc build	2025-04-21 12:24:00 +00:00
OleehyO	3e0d236f6b	[chore] Update	2025-04-21 12:24:00 +00:00
OleehyO	b653b9e784	[CD] Add documentation auto-deployment	2025-04-21 12:23:56 +00:00
OleehyO	b85979b258	[deps] Add sphnix extension deps	2025-04-21 08:38:06 +00:00
OleehyO	05e494af4b	[docs] Fix typo	2025-04-21 08:21:16 +00:00
OleehyO	4b1b8d10de	[chore] Change logo font	2025-04-21 08:20:16 +00:00
OleehyO	c8e08a22aa	🔧 Fix all ruff typo errors & test CI/CD workflow (#109 ) * [chore] Fix ruff typo * [robot] Fix welcome robot	2025-04-21 13:52:16 +08:00
OleehyO	4d3be22956	[CI] Fix deps installation	2025-04-21 05:17:12 +00:00
OleehyO	4e92a38682	[CD] Change trigger condition	2025-04-21 05:12:38 +00:00
OleehyO	3e5272a476	[chore] Update README_zh.md	2025-04-21 05:11:47 +00:00