Initial commit: working RIP/INEX_TM help processing pipeline

- help_processor.py: parses .docx/.html/.pdf/.doc/.txt, extracts images,
  classifies sections via Claude API, writes to SQL Server
- generate_html.py: builds interactive HTML viewer (Home/Editor/Search/Generator)
- save_keywords.py: applies keyword edits back to DB
- Prefix-scoped DB schema (RIP_help_files, RIP_help_sections) so multiple
  projects share the same database without collision
- BAT launchers per project (RIP_load.bat, INEX_TM_load.bat, ...) load
  credentials from gitignored .env via _load_env.bat
- Rich HTML preservation for .html sources (html_text column)
- Image extraction for all formats with MS Word / LibreOffice fallback for .doc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 11:52:11 +03:00
commit 711053b8bd
16 changed files with 2421 additions and 0 deletions

30
.gitignore vendored Normal file
View File

@@ -0,0 +1,30 @@
# Credentials
.env
.env.local
*.local
# Python
__pycache__/
*.pyc
*.pyo
# Logs
*.log
# Generated outputs
help_viewer.html
keywords_changes*.json
# Output processing folders (на отделен диск, не за git)
Output/
output/
# Archives
*.zip
*.tar.gz
# IDE / tools
.vscode/
.idea/
.claude/
*.swp