📝 [docs] Update README badges and branding consistency

Add arXiv paper badge, fix TexTeller3.0 capitalization, and update documentation links for improved consistency. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
🔧 [chore] Replace pre-commit with ruff for linting workflow
2025-08-14 22:41:35 +08:00 · 2025-08-14 22:34:42 +08:00 · 2025-08-13 22:01:17 +08:00 · 2025-08-13 21:59:12 +08:00 · 2025-04-23 22:21:40 +08:00 · 2025-04-23 10:40:12 +00:00
13 changed files with 284 additions and 16 deletions
--- a/.claude/agents/commit-crafter.md
+++ b/.claude/agents/commit-crafter.md
@@ -0,0 +1,164 @@
+---
+name: commit-crafter
+description: Expertly creates clean, conventional, and atomic Git commits with pre-commit checks.
+---
+
+You are an expert Git assistant. Your purpose is to help create perfectly formatted, atomic commits that follow conventional commit standards. You enforce code quality by running pre-commit checks (if exists) and help maintain a clean project history by splitting large changes into logical units.
+
+## Using Hints for Commit Customization
+
+When a user provides a hint, use it to guide the commit message generation while still maintaining conventional commit standards:
+
+- **Analyze the hint**: Extract the key intent, context, or focus area from the user's hint
+- **Combine with code analysis**: Use both the hint and the actual code changes to determine the most appropriate commit type and description
+- **Prioritize hint context**: When the hint provides specific context (e.g., "fix login bug"), use it to craft a more targeted and meaningful commit message
+- **Maintain standards**: The hint should guide the message content, but the format must still follow conventional commit standards
+- **Resolve conflicts**: If the hint conflicts with what the code changes suggest, prioritize the code changes but incorporate the hint's context where applicable
+
+## Best Practices for Commits
+
+- **Verify before committing**: Ensure code is linted, builds correctly, and documentation is updated
+- **Use hints effectively**: When a hint is provided, incorporate its context into the commit message while ensuring the message accurately reflects the actual code changes
+- **Atomic commits**: Each commit should contain related changes that serve a single purpose
+- **Split large changes**: If changes touch multiple concerns, split them into separate commits
+- **Conventional commit format**: Use the format `[<type>] <description>`, some of <type> are:
+  - feat: A new feature
+  - fix: A bug fix
+  - docs: Documentation changes
+  - style: Code style changes (formatting, etc)
+  - refactor: Code changes that neither fix bugs nor add features
+  - perf: Performance improvements
+  - test: Adding or fixing tests
+  - chore: Changes to the build process, tools, etc.
+- **Present tense, imperative mood**: Write commit messages as commands (e.g., "add feature" not "added feature")
+- **Concise first line**: Keep the first line under 72 characters
+- **Emoji**: Each commit type is paired with an appropriate emoji:
+  - ✨ [feat] New feature
+  - 🐛 [fix] Bug fix
+  - 📝 [docs] Documentation
+  - 💄 [style] Formatting/style
+  - ♻️ [refactor] Code refactoring
+  - ⚡️ [perf] Performance improvements
+  - ✅ [test] Tests
+  - 🔧 [chore] Tooling, configuration
+  - 🚀 [ci] CI/CD improvements
+  - 🗑️ [revert] Reverting changes
+  - 🧪 [test] Add a failing test
+  - 🚨 [fix] Fix compiler/linter warnings
+  - 🔒️ [fix] Fix security issues
+  - 👥 [chore] Add or update contributors
+  - 🚚 [refactor] Move or rename resources
+  - 🏗️ [refactor] Make architectural changes
+  - 🔀 [chore] Merge branches
+  - 📦️ [chore] Add or update compiled files or packages
+  - ➕ [chore] Add a dependency
+  - ➖ [chore] Remove a dependency
+  - 🌱 [chore] Add or update seed files
+  - 🧑 [chore] Improve developer experience
+  - 🧵 [feat] Add or update code related to multithreading or concurrency
+  - 🔍️ [feat] Improve SEO
+  - 🏷️ [feat] Add or update types
+  - 💬 [feat] Add or update text and literals
+  - 🌐 [feat] Internationalization and localization
+  - 👔 [feat] Add or update business logic
+  - 📱 [feat] Work on responsive design
+  - 🚸 [feat] Improve user experience / usability
+  - 🩹 [fix] Simple fix for a non-critical issue
+  - 🥅 [fix] Catch errors
+  - 👽️ [fix] Update code due to external API changes
+  - 🔥 [fix] Remove code or files
+  - 🎨 [style] Improve structure/format of the code
+  - 🚑️ [fix] Critical hotfix
+  - 🎉 [chore] Begin a project
+  - 🔖 [chore] Release/Version tags
+  - 🚧 [wip] Work in progress
+  - 💚 [fix] Fix CI build
+  - 📌 [chore] Pin dependencies to specific versions
+  - 👷 [ci] Add or update CI build system
+  - 📈 [feat] Add or update analytics or tracking code
+  - ✏️ [fix] Fix typos
+  - ⏪️ [revert] Revert changes
+  - 📄 [chore] Add or update license
+  - 💥 [feat] Introduce breaking changes
+  - 🍱 [assets] Add or update assets
+  - ♿️ [feat] Improve accessibility
+  - 💡 [docs] Add or update comments in source code
+  - 🗃 ️[db] Perform database related changes
+  - 🔊 [feat] Add or update logs
+  - 🔇 [fix] Remove logs
+  - 🤡 [test] Mock things
+  - 🥚 [feat] Add or update an easter egg
+  - 🙈 [chore] Add or update .gitignore file
+  - 📸 [test] Add or update snapshots
+  - ⚗️ [experiment] Perform experiments
+  - 🚩 [feat] Add, update, or remove feature flags
+  - 💫 [ui] Add or update animations and transitions
+  - ⚰️ [refactor] Remove dead code
+  - 🦺 [feat] Add or update code related to validation
+  - ✈️ [feat] Improve offline support
+
+## Guidelines for Splitting Commits
+
+When analyzing the diff, consider splitting commits based on these criteria:
+
+1. **Different concerns**: Changes to unrelated parts of the codebase
+2. **Different types of changes**: Mixing features, fixes, refactoring, etc.
+3. **File patterns**: Changes to different types of files (e.g., source code vs documentation)
+4. **Logical grouping**: Changes that would be easier to understand or review separately
+5. **Size**: Very large changes that would be clearer if broken down
+
+## Examples
+
+Good commit messages:
+- ✨ [feat] Add user authentication system
+- 🐛 [fix] Resolve memory leak in rendering process
+- 📝 [docs] Update API documentation with new endpoints
+- ♻️ [refactor] Simplify error handling logic in parser
+- 🚨 [fix] Resolve linter warnings in component files
+- 🧑 [chore] Improve developer tooling setup process
+- 👔 [feat] Implement business logic for transaction validation
+- 🩹 [fix] Address minor styling inconsistency in header
+- 🚑 ️[fix] Patch critical security vulnerability in auth flow
+- 🎨 [style] Reorganize component structure for better readability
+- 🔥 [fix] Remove deprecated legacy code
+- 🦺 [feat] Add input validation for user registration form
+- 💚 [fix] Resolve failing CI pipeline tests
+- 📈 [feat] Implement analytics tracking for user engagement
+- 🔒️ [fix] Strengthen authentication password requirements
+- ♿️ [feat] Improve form accessibility for screen readers
+
+Examples with hints:
+**Hint: "fix user login bug"**
+- Code changes: Fix null pointer exception in auth service
+- Generated: 🐛 [fix] Resolve null pointer exception in user login flow
+
+**Hint: "API refactoring"**
+- Code changes: Extract common validation logic into separate service
+- Generated: ♻️ [refactor] Extract API validation logic into shared service
+
+**Hint: "add dark mode support"**
+- Code changes: Add CSS variables and theme toggle component
+- Generated: ✨ [feat] Implement dark mode support with theme toggle
+
+**Hint: "performance optimization"**
+- Code changes: Implement memoization for expensive calculations
+- Generated: ⚡️ [perf] Add memoization to optimize calculation performance
+
+Example of splitting commits:
+- First commit: ✨ [feat] Add new solc version type definitions
+- Second commit: 📝 [docs] Update documentation for new solc versions
+- Third commit: 🔧 [chore] Update package.json dependencies
+- Fourth commit: 🏷 [feat] Add type definitions for new API endpoints
+- Fifth commit: 🧵 [feat] Improve concurrency handling in worker threads
+- Sixth commit: 🚨 [fix] Resolve linting issues in new code
+- Seventh commit: ✅ [test] Add unit tests for new solc version features
+- Eighth commit: 🔒️ [fix] Update dependencies with security vulnerabilities
+
+## Important Notes
+
+- **If no files are staged, abort the process immediately**.
+- **Commit staged files only**: Unstaged files are assumed to be intentionally excluded from the current commit.
+- **Do not make any pre-commit checks**. If a pre-commit hook is triggered and fails during the commit process, abort the process immediately.
+- **Process hints carefully**: When a hint is provided, analyze it to understand the user's intent, but always verify it aligns with the actual code changes.
+- **Hint priority**: Use hints to provide context and focus, but the actual code changes should determine the commit type and scope.
+- Before committing, review the diff to **identify if multiple commits would be more appropriate**.
--- a/.claude/agents/staged-code-reviewer.md
+++ b/.claude/agents/staged-code-reviewer.md
@@ -0,0 +1,71 @@
+---
+name: staged-code-reviewer
+description: Reviews staged git changes for quality, security, and performance. Analyzes files in the git index (git diff --cached) and provides actionable, line-by-line feedback.
+---
+
+You are a specialized code review agent. Your sole function is to analyze git changes that have been staged for commit. You must ignore unstaged changes, untracked files, and non-code files (e.g., binaries, data). Your review should be direct, objective, and focused on providing actionable improvements.
+
+## Core Directives
+
+1.  Analyze Staged Code: Use the output of `git diff --cached` as the exclusive source for your review.
+2.  Prioritize by Impact: Focus first on security vulnerabilities and critical bugs, then on performance, and finally on code quality and style.
+3.  Provide Actionable Feedback: Every identified issue must be accompanied by a concrete suggestion for improvement.
+
+## Review Criteria
+
+For each change, evaluate the following:
+
+* Security: Check for hardcoded secrets, injection vulnerabilities (SQL, XSS), insecure direct object references, and missing authentication/authorization.
+* Correctness & Reliability: Verify the logic works as intended, includes proper error handling, and considers edge cases.
+* Performance: Identify inefficient algorithms, potential bottlenecks, and expensive operations (e.g., N+1 database queries).
+* Code Quality: Assess readability, simplicity, naming conventions, and code duplication (DRY principle).
+* Test Coverage: Ensure that new logic is accompanied by meaningful tests.
+
+## Critical Issues to Flag Immediately
+
+* Hardcoded credentials, API keys, or tokens.
+* SQL or command injection vulnerabilities.
+* Cross-Site Scripting (XSS) vulnerabilities.
+* Missing or incorrect authentication/authorization checks.
+* Use of unsafe functions like eval() without proper sanitization.
+
+## Output Format
+
+Your entire response must follow this structure. Do not deviate.
+
+Start with a summary header:
+
+Staged Code Review
+---
+Files Reviewed: [List of staged files]
+Total Changes: [Number of lines added/removed]
+
+---
+
+Then, for each file with issues, create a section:
+
+### filename.ext
+
+(One-line summary of the changes in this file.)
+
+**CRITICAL ISSUES**
+* (Line X): [Concise Issue Title]
+    Problem: [Clear description of the issue.]
+    Suggestion: [Specific, actionable improvement.]
+    Reasoning: [Why the change is necessary (e.g., security, performance).]
+
+**MAJOR ISSUES**
+* (Line Y): [Concise Issue Title]
+    Problem: [Clear description of the issue.]
+    Suggestion: [Specific, actionable improvement, including code examples if helpful.]
+    Reasoning: [Why the change is necessary.]
+
+**MINOR ISSUES**
+* (Line Z): [Concise Issue Title]
+    Problem: [Clear description of the issue.]
+    Suggestion: [Specific, actionable improvement.]
+    Reasoning: [Why the change is necessary.]
+
+If a file has no issues, state: "No issues found."
+
+If you see well-implemented code, you may optionally add a "Positive Feedback" section to acknowledge it.
--- a/.claude/commands/code-review.md
+++ b/.claude/commands/code-review.md
@@ -0,0 +1 @@
+Use staged-code-reviewer sub agent to perform code review
--- a/.claude/commands/fix-github-issue.md
+++ b/.claude/commands/fix-github-issue.md
@@ -0,0 +1,13 @@
+Please analyze and fix the GitHub issue: $ARGUMENTS.
+
+Follow these steps:
+
+1. Use `gh issue view` to get the issue details
+2. Understand the problem described in the issue
+3. Search the codebase for relevant files
+4. Implement the necessary changes to fix the issue
+5. Write and run tests to verify the fix
+6. Ensure code passes linting and type checking
+7. Create a descriptive commit message
+
+Remember to use the GitHub CLI (`gh`) for all GitHub-related tasks.
--- a/.claude/commands/make-commit.md
+++ b/.claude/commands/make-commit.md
@@ -0,0 +1,16 @@
+Use commit-crafter sub agent to make a standardized commit
+
+## Usage
+
+```
+/make-commit [hint]
+```
+
+**Parameters:**
+- `hint` (optional): A brief description or context to help customize the commit message. The hint will be used to guide the commit message generation while maintaining conventional commit standards.
+
+**Examples:**
+- `/make-commit` - Generate commit message based purely on code changes
+- `/make-commit "API refactoring"` - Guide the commit to focus on API-related changes
+- `/make-commit "fix user login bug"` - Provide context about the specific issue being fixed
+- `/make-commit "add dark mode support"` - Indicate the feature being added
--- a/.github/workflows/python-lint.yml
+++ b/.github/workflows/python-lint.yml
@@ -21,7 +21,7 @@ jobs:
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
-          pip install pre-commit
+          pip install ruff

-      - name: Run pre-commit
-        run: pre-commit run --all-files
+      - name: Run ruff
+        run: ruff check .
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -17,6 +17,7 @@ repos:
      - id: check-yaml
      - id: check-toml
      - id: check-added-large-files
+        exclude: assets/
      - id: check-case-conflict
      - id: check-merge-conflict
      - id: debug-statements
--- a/README.md
+++ b/README.md
@@ -2,15 +2,16 @@

 <div align="center">
    <h1>
-        <img src="./assets/fire.svg" width=30, height=30>
+        <img src="./assets/fire.svg" width=60, height=60>
        𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
-        <img src="./assets/fire.svg" width=30, height=30>
+        <img src="./assets/fire.svg" width=60, height=60>
    </h1>

  [![](https://img.shields.io/badge/API-Docs-orange.svg?logo=read-the-docs)](https://oleehyo.github.io/TexTeller/)
-  [![](https://img.shields.io/badge/docker-pull-green.svg?logo=docker)](https://hub.docker.com/r/oleehyo/texteller)
-  [![](https://img.shields.io/badge/Data-Texteller1.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas)
+  [![arXiv](https://img.shields.io/badge/arXiv-2508.09200-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.09220)
+  [![](https://img.shields.io/badge/Data-Texteller3.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas-80M)
  [![](https://img.shields.io/badge/Weights-Texteller3.0-yellow.svg?logo=huggingface)](https://huggingface.co/OleehyO/TexTeller)
+  [![](https://img.shields.io/badge/docker-pull-green.svg?logo=docker)](https://hub.docker.com/r/oleehyo/texteller)
  [![](https://img.shields.io/badge/License-Apache_2.0-blue.svg?logo=github)](https://opensource.org/licenses/Apache-2.0)

 </div>
--- a/assets/README_zh.md
+++ b/assets/README_zh.md
@@ -1,16 +1,17 @@
-📄 中文 | [English](./README.md)
+📄 中文 | [English](../README.md)

 <div align="center">
    <h1>
-        <img src="./fire.svg" width=30, height=30>
+        <img src="./fire.svg" width=60, height=60>
        𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
-        <img src="./fire.svg" width=30, height=30>
+        <img src="./fire.svg" width=60, height=60>
    </h1>

  [![](https://img.shields.io/badge/API-文档-orange.svg?logo=read-the-docs)](https://oleehyo.github.io/TexTeller/)
+  [![arXiv](https://img.shields.io/badge/arXiv-2508.09200-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.09220)
  [![](https://img.shields.io/badge/docker-镜像-green.svg?logo=docker)](https://hub.docker.com/r/oleehyo/texteller)
-  [![](https://img.shields.io/badge/数据-Texteller1.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas)
-  [![](https://img.shields.io/badge/权重-Texteller3.0-yellow.svg?logo=huggingface)](https://huggingface.co/OleehyO/TexTeller)
+  [![](https://img.shields.io/badge/数据-TexTeller3.0-brightgreen.svg?logo=huggingface)](https://huggingface.co/datasets/OleehyO/latex-formulas-80M)
+  [![](https://img.shields.io/badge/权重-TexTeller3.0-yellow.svg?logo=huggingface)](https://huggingface.co/OleehyO/TexTeller)
  [![](https://img.shields.io/badge/协议-Apache_2.0-blue.svg?logo=github)](https://opensource.org/licenses/Apache-2.0)

 </div>
@@ -70,7 +71,7 @@ TexTeller 使用 **8千万图像-公式对** 进行训练（前代数据集可

 - [2024-03-25] TexTeller2.0 发布！TexTeller2.0 的训练数据增至750万（是前代的15倍并提升了数据质量）。训练后的 TexTeller2.0 在测试集中展现了**更优性能**，特别是在识别罕见符号、复杂多行公式和矩阵方面表现突出。

-  > [此处](./assets/test.pdf) 展示了更多测试图像及各类识别模型的横向对比。
+  > [此处](./test.pdf) 展示了更多测试图像及各类识别模型的横向对比。

 ## 🚀 快速开始

@@ -191,7 +192,7 @@ TexTeller的公式检测模型在3415张中文资料图像和8272张[IBEM数据
   accelerate launch train.py
   ```

-训练参数可通过[`train_config.yaml`](./examples/train_texteller/train_config.yaml)调整。
+训练参数可通过[`train_config.yaml`](../examples/train_texteller/train_config.yaml)调整。

 ## 📅 计划列表

--- a/assets/compare/handwritten_compare.pdf
+++ b/assets/compare/handwritten_compare.pdf
--- a/assets/compare/other_compare.pdf
+++ b/assets/compare/other_compare.pdf
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -20,7 +20,8 @@ You can install TexTeller using pip:

 .. code-block:: bash

-   pip install texteller
+   pip install uv
+   uv pip install texteller

 Quick Start
 ----------
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -44,7 +44,6 @@ quote-style = "double"
 [tool.ruff.lint]
 select = ["E", "W"]
 ignore = [
-    "E999",
    "EXE001",
    "UP009",
    "F401",
Author	SHA1	Message	Date
OleehyO	30f7e93c49	📝 [docs] Update README badges and branding consistency Add arXiv paper badge, fix TexTeller3.0 capitalization, and update documentation links for improved consistency. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-14 22:41:35 +08:00
OleehyO	4f88499de5	🔧 [chore] Replace pre-commit with ruff for linting workflow - Update CI workflow to use ruff instead of pre-commit - Remove E999 from ruff ignore rules in pyproject.toml 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-14 22:34:42 +08:00
OleehyO	bfe070f976	📦️ [chore] Update project for TexTeller 3.0 release - Update dataset references from TexTeller 1.0 to 3.0 in README files - Add paper.pdf to assets directory - Configure pre-commit to exclude assets/ from large file checks 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-13 22:01:17 +08:00
OleehyO	af56271e1c	🧑 [chore] Add Claude Code configuration for Git workflow automation Add Claude agents and commands to enhance developer experience: - commit-crafter agent for standardized conventional commits - staged-code-reviewer agent for automated code review - Commands for code review, GitHub issue fixing, and commit creation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-13 21:59:12 +08:00
三洋三洋	30f88d55ac	Upload compare	2025-04-23 22:21:40 +08:00
OleehyO	3d430735a4	[docs] Using uv to install deps	2025-04-23 10:40:12 +00:00
OleehyO	184c890437	[chore] Correct file url	2025-04-23 10:39:24 +00:00
OleehyO	c758dc277b	[deps] Pin transformers to 4.47	2025-04-21 12:24:03 +00:00
OleehyO	0ab938aad4	[chore] Setup deps for doc build	2025-04-21 12:24:00 +00:00
OleehyO	90e16fd868	[chore] Update	2025-04-21 12:24:00 +00:00
OleehyO	cab9d664f2	[CD] Add documentation auto-deployment	2025-04-21 12:23:56 +00:00
OleehyO	3f930fdaaf	[deps] Add sphnix extension deps	2025-04-21 08:38:06 +00:00
OleehyO	324ab8a03f	[docs] Fix typo	2025-04-21 08:21:16 +00:00
OleehyO	a62600f384	[chore] Change logo font	2025-04-21 08:20:16 +00:00
OleehyO	511f69555c	🔧 Fix all ruff typo errors & test CI/CD workflow (#109 ) * [chore] Fix ruff typo * [robot] Fix welcome robot	2025-04-21 13:52:16 +08:00
OleehyO	ae776aa9c7	[CI] Fix deps installation	2025-04-21 05:17:12 +00:00
OleehyO	d46be980ee	[CD] Change trigger condition	2025-04-21 05:12:38 +00:00
OleehyO	1201c67237	[chore] Update README_zh.md	2025-04-21 05:11:47 +00:00
OleehyO	d5938c6a2a	[deps] Grouped deps & setup vcs	2025-04-21 04:48:29 +00:00
OleehyO	9a388cdfc5	[chore] Update README.md	2025-04-21 04:47:53 +00:00
OleehyO	59bc9bdd41	[CI/CD] Setup complete workflow	2025-04-21 03:00:06 +00:00
OleehyO	7490fa9c5a	[chore] Setup vcs and deps	2025-04-21 02:41:46 +00:00
OleehyO	5cf9960a7c	[chore] Ignore images	2025-04-21 02:41:06 +00:00
OleehyO	9006edb949	[docs] Set up documentation structure with API reference	2025-04-21 02:38:36 +00:00
OleehyO	d6c659d576	Upload logo.svg	2025-04-21 02:37:28 +00:00
OleehyO	ff02336007	[feat] Support dynamic package vcs	2025-04-21 02:36:13 +00:00
OleehyO	789006894c	[docs] Add comprehensive function documentation	2025-04-21 02:34:56 +00:00
OleehyO	2c9ce6b6c1	Add globals test	2025-04-21 02:32:05 +00:00
OleehyO	57b757c0f0	[test] Init	2025-04-19 16:36:48 +00:00
OleehyO	a7a296025a	[feat] Add texteller training script	2025-04-19 16:36:43 +00:00
OleehyO	991d6bc00d	[CI] Update ruff hook	2025-04-19 14:32:32 +00:00
OleehyO	06edd104e2	[refactor] Init	2025-04-19 14:32:28 +00:00
OleehyO	0e32f3f3bf	[chore] Cleanup	2025-04-17 07:08:47 +00:00
OleehyO	6bd68ad3b7	[feat] Support n-gram stop criteria	2025-04-02 03:23:27 +00:00
OleehyO	aae7af445f	[deps] Change onnx-gpu to manually install	2025-04-02 02:48:23 +00:00
三洋三洋	38e7c6293f	[feat][formatter] Integrate LaTeX formatter for improved formula readability - Add latex_formatter.py based on tex-fmt (https://github.com/WGUNDERWOOD/tex-fmt) - Update to_katex.py to use the new formatter - Enhance LaTeX formula output with better formatting and readability This integration helps make generated LaTeX formulas more readable and maintainable by applying consistent formatting rules.	2025-03-01 00:55:41 +08:00
三洋三洋	192e8d6352	[chore] Ignore ruff lint E741	2025-03-01 00:54:57 +08:00
三洋三洋	110cb29d6c	[fix] Add project prefix	2025-02-28 23:38:12 +08:00
三洋三洋	abd6057378	[feat] Remove bold style	2025-02-28 23:38:12 +08:00
三洋三洋	e214b508d2	[deps] Add ray serve & python-multipart	2025-02-28 23:37:53 +08:00
三洋三洋	de9deacaf2	[chore] Add build system and pakage location	2025-02-28 23:18:06 +08:00
三洋三洋	cd0f397f20	[chore] Add python related rules	2025-02-28 23:18:03 +08:00
三洋三洋	5668a2e26c	[chore] Remove unsed files	2025-02-28 20:54:51 +08:00
三洋三洋	3d546f9993	[chore] exclude paddleocr directory from pre-commit hooks	2025-02-28 20:01:54 +08:00
三洋三洋	a8a005ae10	[chore] Setup project infrastructure	2025-02-28 20:01:52 +08:00
三洋三洋	52fce4d39d	[deps] pin transformers to 4.45.2 and sentence-transformers to 3.1.1	2025-02-01 13:00:44 +08:00
OleehyO	b8100517c6	Merge pull request #78 from OleehyO/pre_release Change to better import dependency	2024-08-07 12:43:15 +08:00
三洋三洋	06701415cc	Change to better import dependency	2024-08-07 01:19:26 +08:00
OleehyO	c6eb1b6ea2	Merge pull request #67 from OleehyO/pre_release Change setting name	2024-07-11 20:34:50 +08:00
三洋三洋	1b685054c9	Change setting name	2024-07-11 20:33:51 +08:00
OleehyO	c835cedcf5	Merge pull request #60 from OleehyO/pre_release Pre release	2024-06-23 22:16:09 +08:00
三洋三洋	9f3a46e8a9	Update README	2024-06-23 22:14:05 +08:00
三洋三洋	569c72ffe3	Remove onnxruntime-gpu	2024-06-23 22:13:51 +08:00
OleehyO	b4f70a09e0	Merge pull request #59 from OleehyO/pre_release Pre release	2024-06-22 23:56:45 +08:00
三洋三洋	36a2680d28	Update model config	2024-06-22 22:08:08 +08:00
三洋三洋	c5e859517a	Update README	2024-06-22 22:00:14 +08:00
三洋三洋	9638c0030d	Support onnx runtime	2024-06-22 22:00:05 +08:00
三洋三洋	8da3fd7418	Add optimum	2024-06-22 21:49:47 +08:00
OleehyO	fb6784b535	Merge pull request #58 from OleehyO/pre_release Add formula detection service	2024-06-17 21:26:35 +08:00
三洋三洋	76eeb18b83	Add formula detection service	2024-06-17 21:23:55 +08:00
OleehyO	e2d0e91a77	Merge pull request #56 from OleehyO/pre_release Add docker link	2024-06-11 13:22:17 +08:00
三洋三洋	0d5cd9a75d	Add docker link	2024-06-11 13:20:32 +08:00
三洋三洋	624f9531b4	Update server.py 1. Change the default host address to 0.0.0.0. 2. Convert the output to KaTeX.	2024-06-07 12:26:24 +00:00
三洋三洋	aa14674097	Update README	2024-06-07 06:54:23 +00:00
三洋三洋	a7044e0369	Add Apache2.0 license	2024-06-06 13:06:16 +00:00
三洋三洋	837cb6021f	Add cover.png	2024-06-06 13:06:16 +00:00
三洋三洋	354833aac8	Modify the names of options in the web.py Formula only -> Formula recognition Text formula mixed -> Paragraph recognition Improved display during mixed inference	2024-06-06 13:06:16 +00:00
三洋三洋	760bd78c10	Refine mix_inference 1. Add the formula number back to the isolated formula and merge multiple tag. 2. remove bold effect from inline formuals 3. change split environment into aligned	2024-06-06 13:06:11 +00:00
三洋三洋	c0e730f697	Bugfix: to_katex.py 1. Added `change_all` function to fix a bug where some LaTeX formulas with the same wrapper were causing issues. 2. Removed some unnecessary formatting commands. Bugfix: to_katex.py	2024-06-06 08:25:50 +00:00
三洋三洋	7aad0839c4	Update	2024-05-28 09:51:53 +00:00
三洋三洋	5420e92cc4	Added releasing file	2024-05-28 07:50:09 +00:00
三洋三洋	89aa396cbb	Change the model configuration to trocr	2024-05-28 07:50:09 +00:00
三洋三洋	9b11689f22	Using paddleocr with onnxruntime Deleted the code for test time.	2024-05-28 07:50:09 +00:00
三洋三洋	85d558f772	Added mixed recognition change suryaocr to paddleocr	2024-05-28 07:50:08 +00:00
三洋三洋	2af1e067c1	Added ONNX file for PaddleOCR model	2024-05-28 07:50:08 +00:00
三洋三洋	6b852d561d	Update .gitignore	2024-05-28 07:50:08 +00:00
三洋三洋	e193fe3798	Added code for PaddleOCR inference	2024-05-28 07:50:08 +00:00
三洋三洋	714fef4def	Eliminated dependency on paddleocr Change to trocr	2024-05-28 07:50:08 +00:00
三洋三洋	edef073812	update	2024-05-28 07:50:08 +00:00
OleehyO	1b8f6ba0b6	bugfix: ocr_aug.py Change "lhy_custom" in ink_swap_color to "random"	2024-05-28 07:49:55 +00:00
三洋三洋	a27cf716ee	bugfix: missing filter_fn and inference/train transform	2024-05-12 07:49:04 +00:00
三洋三洋	8557e81374	update	2024-05-12 07:47:35 +00:00
三洋三洋	10e22259a2	update	2024-05-10 03:48:31 +00:00
TonyLee1256	9875fedb1b	Update requirements.txt	2024-05-09 00:23:32 +08:00
TonyLee1256	83da4262fd	Update mix_inference.py 替换文本OCR模型为paddleocr	2024-05-09 00:23:02 +08:00
TonyLee1256	bd2aaa3e00	Update inference.py 替换文本OCR模型为paddleocr	2024-05-09 00:22:01 +08:00
TonyLee1256	fe7e4a7af0	Update inference.py 增加了计时功能	2024-05-09 00:20:32 +08:00
TonyLee1256	48043d11e3	Update infer_det.py 增加使用gpu进行onnx模型推理的功能	2024-05-09 00:19:39 +08:00
三洋三洋	e495640690	bugfix	2024-05-08 14:34:01 +00:00
三洋三洋	84fa43321f	Added Language option in mixed mode	2024-05-07 07:44:24 +00:00
三洋三洋	b116dfae55	Update README	2024-05-07 07:30:29 +00:00
三洋三洋	85b22ff9c7	bugfix	2024-05-07 07:11:34 +00:00
三洋三洋	42959cd6a5	Add train_config.yaml	2024-05-07 07:11:05 +00:00
三洋三洋	4c182aecda	update .gitignore	2024-05-07 06:54:53 +00:00
TonyLee1256	d2c1e5e10f	bugfix inference.py	2024-05-07 13:28:07 +08:00
TonyLee1256	c5dd0dacd8	Update README_zh.md	2024-05-07 13:27:23 +08:00
TonyLee1256	8981df6bc9	Update README.md	2024-05-07 13:26:50 +08:00
TonyLee1256	bb0594815a	Update README.md	2024-05-07 13:25:28 +08:00
TonyLee1256	8c85575260	bugfix inference.py	2024-05-07 13:19:43 +08:00
三洋三洋	7c5a547b1f	update	2024-05-02 09:10:21 +00:00
三洋三洋	c6e6622aaf	Merge remote-tracking branch 'origin/pre_release' into pre_release	2024-04-21 16:13:49 +00:00
三洋三洋	8fa462b434	update README.md	2024-04-21 16:13:45 +00:00
TonyLee1256	1a7939190f	Update rec_infer_from_crop_imgs.py	2024-04-22 00:08:36 +08:00
TonyLee1256	0bb11bebfc	Update infer_det.py	2024-04-22 00:07:41 +08:00
TonyLee1256	be19ed8d63	Update README.md	2024-04-21 22:14:23 +08:00
TonyLee1256	0079c07be2	Update README.md	2024-04-21 22:12:22 +08:00
TonyLee1256	b3dd73c716	Update README_zh.md	2024-04-21 22:09:58 +08:00
三洋三洋	188ab88e07	Merge branch 'dev' into pre_release	2024-04-21 13:14:49 +00:00
三洋三洋	9018c62f66	Update README.md	2024-04-21 13:06:01 +00:00
三洋三洋	5cbbfb38d6	1) 修复了to_katex.py的bug; 2)把Box.py中的转化结果写在logs	2024-04-21 12:09:26 +00:00
三洋三洋	11df230200	merge dev后调整了项目结构	2024-04-21 00:48:24 +08:00
三洋三洋	e6dca76123	merge dev后删除了resizer	2024-04-21 00:13:21 +08:00
三洋三洋	185b2e3db6	1) 实现了文本-公式混排识别; 2) 重构了项目结构	2024-04-21 00:05:14 +08:00
三洋三洋	eab6e4c85d	update infer_det.py	2024-04-18 00:06:05 +08:00
三洋三洋	48f778eeda	为了支持mixed inference, 重构了目录	2024-04-17 15:24:06 +00:00
三洋三洋	7883d3c07f	修复了merge pre_release分支后导致参数名不一致的bug	2024-04-17 14:47:58 +00:00
三洋三洋	a064b7dbb0	Merge branch 'pre_release' into dev	2024-04-17 10:32:22 +00:00
三洋三洋	f81a31a8c9	checkpoint	2024-04-17 10:20:15 +00:00
三洋三洋	ec3e744376	update README.md	2024-04-17 10:08:46 +00:00
三洋三洋	3cebc2eb2a	前端更新, inference.py更新 1) 前端支持剪贴板粘贴图片. 2) 前端支持模型配置. 3) 修改了inference.py的接口. 4) 删除了不必要的文件	2024-04-17 09:36:40 +00:00
三洋三洋	66d4902871	add contributor	2024-04-12 07:29:36 +00:00
三洋三洋	78d29d49ef	update README	2024-04-12 06:16:37 +00:00
三洋三洋	7d1d8ddd77	work in progress	2024-04-12 03:20:04 +00:00
OleehyO	9e8b15ef3a	Merge pull request #14 from TonyLee1256/pre_release 新增公式检测模块	2024-04-12 00:46:45 +08:00
TonyLee1256	9e8ac666b0	新增公式检测模块	2024-04-11 16:44:19 +00:00
三洋三洋	1538cb73f8	修改了transforms.py中inference_transform的bug: 在训练的eval阶段没有把png图片转化为np.ndarray	2024-04-11 07:04:58 +00:00
三洋三洋	762012be1f	优化了transform.py中的trim_white_border	2024-04-10 16:09:13 +00:00
三洋三洋	1589fb3217	增加了数据增强的概率	2024-04-09 13:50:35 +00:00
三洋三洋	1db514bdbf	inference.py支持katex语法	2024-04-06 12:06:08 +00:00
三洋三洋	840be6b843	update README.md	2024-04-06 11:57:50 +00:00
三洋三洋	93fc22adf5	inference.py支持katex	2024-04-06 11:38:59 +00:00
三洋三洋	8d6d889efa	update README.md	2024-04-06 07:43:03 +00:00
三洋三洋	ecd5481bea	web demo支持katex, 不再需要本地安装xelatex渲染器	2024-04-06 07:28:46 +00:00
三洋三洋	b5f7166e58	web demo加入了katex支持, 不再需要本地安装xelatex渲染器	2024-04-06 07:18:40 +00:00
三洋三洋	c9c15d27bd	inference_transform bugfix	2024-04-06 05:09:50 +00:00
三洋三洋	87ddb86e5e	完成了v3版本：加入自然场景的数据增强	2024-04-05 08:11:06 +00:00
三洋三洋	a4e878da96	Merge remote-tracking branch 'origin/dev' into dev	2024-04-05 08:00:11 +00:00
三洋三洋	70dce92e19	Merge remote-tracking branch 'origin/dev' into dev	2024-04-05 07:52:40 +00:00
三洋三洋	e16f46e856	修改了v3(支持自然场景、混合文字场景识别)版本的inference.py模版	2024-04-05 07:27:07 +00:00
三洋三洋	67426c439f	update README.md	2024-04-05 05:19:27 +00:00
三洋三洋	d2090c0d61	Merge remote-tracking branch 'origin/dev' into dev	2024-03-28 14:33:46 +00:00
三洋三洋	5a259065a4	merge v3_nature_scence	2024-03-28 14:33:25 +00:00
三洋三洋	8d94611aba	merge v3_nature_scence	2024-03-28 14:22:23 +00:00
三洋三洋	a6a5d07430	Merge remote-tracking branch 'origin/dev' into dev	2024-03-28 13:28:47 +00:00
三洋三洋	63b8e04dab	TexTellerv2 release	2024-03-25 13:22:11 +00:00
OleehyO	86443d0cf7	Update README_zh.md	2024-03-25 16:35:34 +08:00
OleehyO	88d2730752	Update README.md	2024-03-25 16:34:46 +08:00
				`@@ -0,0 +1 @@`
				`Use staged-code-reviewer sub agent to perform code review`