Initial commit: working RIP/INEX_TM help processing pipeline

- help_processor.py: parses .docx/.html/.pdf/.doc/.txt, extracts images,
  classifies sections via Claude API, writes to SQL Server
- generate_html.py: builds interactive HTML viewer (Home/Editor/Search/Generator)
- save_keywords.py: applies keyword edits back to DB
- Prefix-scoped DB schema (RIP_help_files, RIP_help_sections) so multiple
  projects share the same database without collision
- BAT launchers per project (RIP_load.bat, INEX_TM_load.bat, ...) load
  credentials from gitignored .env via _load_env.bat
- Rich HTML preservation for .html sources (html_text column)
- Image extraction for all formats with MS Word / LibreOffice fallback for .doc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 11:52:11 +03:00
commit 711053b8bd
16 changed files with 2421 additions and 0 deletions

9
INEX_TM_load_force.bat Normal file
View File

@@ -0,0 +1,9 @@
:@echo off
chcp 65001 > nul
call "%~dp0_load_env.bat" || exit /b 1
set PYTHONIOENCODING=utf-8
echo === FORCE + PURGE prefix=INEX_TM ===
echo.
python help_processor.py --prefix=INEX_TM --force --purge-missing "q:\___Proekti\2022 INEX Технологична модернизация" "q:\___Proekti\2022 INEX Технологична модернизация\Output"
pause