Multimodal, agentic triage decision-support for rural healthcare in the Global South. A phone photo plus a typed narrative becomes a top-3 condition guess, a Red / Yellow / Green urgency, and a structured pre-visit SOAP for the clinic doctor — contextualized by village distance, cost, and harvest season. It never diagnoses.
A community health worker (or the patient) sends a phone photo of a skin lesion plus a short text history. Path to Care runs the image and narrative through a multimodal agentic pipeline and returns:
Every patient-facing string passes through a deterministic
cardinal-rule rewriter that strips diagnostic phrasing
("you have X" → "signs suggest X") before it
leaves the API. The output is decision support, not a diagnosis.
NarrativeToSOAP signature.
--enable-lora.
:8000. The Next.js frontend talks to it directly.
The triage head is a LoRA SFT pass over Gemma 4 31B-it, run end to end on a single AMD Instinct MI300X. Two epochs, twenty-one training examples, finished in 32 seconds. Training script in training/lora_sft.py; full log in logs/lora_train.log.
| Parameter | Value |
|---|---|
| Base model | google/gemma-4-31B-it (multimodal, dense) |
| Adapter | LoRA · r=16, alpha=32, dropout=0.05, bias=none |
| Target modules | language-model self-attention (q_proj, k_proj, v_proj, o_proj) |
| Trainable parameters | 45.0 M of 31.3 B (0.14%) |
| Training rows | 21 (image + SOAP + village context → urgency + reasoning) |
| Epochs · effective batch size | 2 · 4 |
| Wall-clock time | 32 seconds |
| Training loss | 3.90 → 0.58 |
Adapter weights on Hugging Face Hub at
sankara68/path-to-care-triage-gemma4-lora.
Served by vLLM with
--enable-lora --lora-modules triage=adapters/triage-gemma4-lora;
the orchestrator opts in via PTC_VLLM_LORA_NAME=triage.
The eval delta from this adapter is the +7 pp top-1
accuracy lift in the Results table below (Image classification ·
SCIN top-16).
Two complementary evaluations: a 30-case adversarial test set authored to probe the safety property (red flags, contradictions, off-distribution variants), and a 100-case held-out slice of the SCIN dermatology dataset to probe image-grounded classification.
Reward R = 1.0 exact / 0.5 adjacent /
0.0 off-by-two.
| Run | Mean reward | Exact match | FN Red → Green |
|---|---|---|---|
| Zero-shot baseline (Gemma 4 31B) | 0.983 | 96.7% | 0.0% |
| LoRA-tuned (180 MB adapter) | 0.983 | 96.7% | 0.0% |
Both runs hit the same ceiling — the single residual error is a Yellow → Green slip; no Red was missed. The headline here is the false-negative Red → Green rate at 0.0% — the safety property that matters in the field.
Top-1 accuracy on a held-out slice of the Stanford SCIN dermatology dataset, restricted to the 16 most-frequent conditions.
| Run | Top-1 accuracy | Δ vs baseline |
|---|---|---|
| Zero-shot baseline (Gemma 4 31B) | 28.0% | — |
| LoRA-tuned (same 180 MB adapter) | 35.0% | +7.0 pp / +25% rel |
A 32-second LoRA training run on the MI300X moved top-1 from 28% to 35% — a real learning signal beyond the saturated triage table above. Per-case results in results/scin_top16_topk_tuned.json and scin_top16_topk_baseline.json.
--enable-lora.
Path to Care never produces diagnostic statements. The output is always "signs suggest infection", never "you have cellulitis"; image output is always top-3 with confidence, never a single class label, never binary sick / healthy. Enforcement is defense-in-depth: a system prompt rule, a deterministic regex rewriter on every model output, and a unit test suite that fails the build on diagnostic phrasing.