Initial commit: working RIP/INEX_TM help processing pipeline

- help_processor.py: parses .docx/.html/.pdf/.doc/.txt, extracts images,
  classifies sections via Claude API, writes to SQL Server
- generate_html.py: builds interactive HTML viewer (Home/Editor/Search/Generator)
- save_keywords.py: applies keyword edits back to DB
- Prefix-scoped DB schema (RIP_help_files, RIP_help_sections) so multiple
  projects share the same database without collision
- BAT launchers per project (RIP_load.bat, INEX_TM_load.bat, ...) load
  credentials from gitignored .env via _load_env.bat
- Rich HTML preservation for .html sources (html_text column)
- Image extraction for all formats with MS Word / LibreOffice fallback for .doc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 11:52:11 +03:00
commit 711053b8bd
16 changed files with 2421 additions and 0 deletions

1162
help_processor.py Normal file

File diff suppressed because it is too large Load Diff