Initial commit: working RIP/INEX_TM help processing pipeline

- help_processor.py: parses .docx/.html/.pdf/.doc/.txt, extracts images,
  classifies sections via Claude API, writes to SQL Server
- generate_html.py: builds interactive HTML viewer (Home/Editor/Search/Generator)
- save_keywords.py: applies keyword edits back to DB
- Prefix-scoped DB schema (RIP_help_files, RIP_help_sections) so multiple
  projects share the same database without collision
- BAT launchers per project (RIP_load.bat, INEX_TM_load.bat, ...) load
  credentials from gitignored .env via _load_env.bat
- Rich HTML preservation for .html sources (html_text column)
- Image extraction for all formats with MS Word / LibreOffice fallback for .doc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 11:52:11 +03:00
commit 711053b8bd
16 changed files with 2421 additions and 0 deletions

9
_load_env.bat Normal file
View File

@@ -0,0 +1,9 @@
@echo off
REM Зарежда ANTHROPIC_API_KEY и HELP_DB_CONN от .env в текущата cmd среда.
REM Извиква се с: call _load_env.bat
if not exist .env (
echo [ERROR] Липсва .env файл. Копирай .env.example като .env и попълни.
exit /b 1
)
for /f "usebackq tokens=1,* delims== eol=#" %%A in (".env") do set "%%A=%%B"
exit /b 0