No description
  • Python 90.6%
  • Shell 8.1%
  • Dockerfile 1.3%
Find a file
2026-04-02 19:08:35 +00:00
.devcontainer ich habs vergessen 2026-04-02 19:08:35 +00:00
.scripts Initial commit 2026-04-01 09:21:45 +00:00
.vscode Initial commit 2026-04-01 09:21:45 +00:00
data Working 2026-04-02 09:39:52 +00:00
src/vergleich ich habs vergessen 2026-04-02 19:08:35 +00:00
tests changed name 2026-04-01 09:38:19 +00:00
.gitignore Initial commit 2026-04-01 09:21:45 +00:00
.python-version Initial commit 2026-04-01 09:21:45 +00:00
LICENSE Initial commit 2026-04-01 09:21:45 +00:00
pyproject.toml Working 2026-04-02 09:39:52 +00:00
README.md added new text support 2026-04-02 14:07:39 +00:00
uv.lock Working 2026-04-02 09:39:52 +00:00

vergleich

Excel comparison tool for the workbook in data/Auftreten_Ausprägung_Vergleich.xlsx.

The script reads the sheets Kristallin, Salz, and Christa and writes the result into the sheet Vergleich.

Current behavior:

  • Column B (Auftreten) is compared directly across the three source sheets.
  • Column D (Epoche) is compared directly across the three source sheets.
  • Columns C and E are compared with an LLM.
  • Salz is treated as the baseline for the LLM comparison.
  • The first LLM pass writes short migration-oriented comparison notes into Vergleich column C and E in the format Diff Kristallin: ... and Diff Christa: ....
  • A second LLM pass uses the comparison notes plus the original texts from Salz, Kristallin, and Christa to create a minimal migrated target text for Wirtsgestein Kristallin.
  • The migrated target texts are written to Vergleich column F and G.

Setup (Dev Container)

  1. Open this repository in VS Code.
  2. Run Dev Containers: Reopen in Container.
  3. On a GPU host, the devcontainer starts two services:
    • the workspace development container
    • a llama.cpp server on http://localhost:8000/v1
  4. Wait for postCreateCommand to complete:
    • uv python install
    • uv sync --extra dev

This creates .venv in the workspace and installs project + dev dependencies from pyproject.toml.

GPU devcontainer target

The devcontainer is configured for the future NVIDIA GPU server and starts a dedicated llama.cpp service with:

  • image: ghcr.io/ggml-org/llama.cpp:server-cuda
  • Hugging Face repo: mistralai/Ministral-3-14B-Instruct-2512-GGUF
  • default quant: Ministral-3-14B-Instruct-2512-Q5_K_M.gguf
  • API endpoint: http://localhost:8000/v1/chat/completions
  • persistent model volume: llama-models

Host prerequisites:

  • nvidia-smi must work on the host
  • Docker must have NVIDIA GPU support available
  • the NVIDIA Container Toolkit must be installed on the host

Important:

  • This GPU devcontainer setup is meant for the future graphics server.
  • On the current no-GPU machine, the llama-server service is not expected to start successfully with CUDA.
  • The model choice is centralized in docker-compose.yml via the top-level x-model-repo and x-model-file values.

Optional local setup (without Dev Container)

bash ./.scripts/bootstrap.sh

LLM configuration

For columns C and E, the script expects an OpenAI-compatible local chat completions endpoint.

Environment variables:

  • VERGLEICH_LLM_BASE_URL
    • Default: http://localhost:8000/v1
  • VERGLEICH_LLM_MODEL
    • Default: mistralai/Ministral-3-8B-Reasoning-2512
  • VERGLEICH_LLM_API_KEY
    • Default: dummy
  • VERGLEICH_LLM_TIMEOUT
    • Default: 300

Example:

export VERGLEICH_LLM_BASE_URL="http://localhost:8000/v1"
export VERGLEICH_LLM_MODEL="mistralai/Ministral-3-8B-Reasoning-2512"
export VERGLEICH_LLM_API_KEY="dummy"
uv run vergleich

If no compatible server is running, the script will fail when it reaches the LLM comparison for columns C and E.

When you open the GPU devcontainer, VERGLEICH_LLM_BASE_URL is set automatically to http://llama-server:8000/v1 inside the workspace container.

Run

uv run vergleich

The script updates the workbook in place:

  • input and output file: data/Auftreten_Ausprägung_Vergleich.xlsx
  • output sheet: Vergleich
  • output columns F and G: migrated target texts for Wirtsgestein Kristallin derived from source columns C and E

Notes

  • The script currently assumes the source sheets are named exactly Kristallin, Salz, and Christa.
  • It matches rows by the value in column A (Prozess).
  • Exact text differences in B and D are written as multiline sheet-by-sheet output.
  • The first LLM prompt identifies only fachlich notwendige Anpassungen for migrating Steinsalz text to Kristallin.
  • The second LLM prompt applies those instructions with minimal edits to the Salz text.

Project structure

src/
  vergleich/
    main.py