# RunPod Qwen Image Edit

## RTX PRO 6000 Blackwell compatibility

Default Docker build targets RTX PRO 6000 Blackwell/SM120 instances. It uses `pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel`, requires a host NVIDIA R570+ driver, and disables Flash Attention 3 by default.

Build:

```bash
docker build -t runpod-qwen .
```

The GitHub-installed Hugging Face packages are pinned to commit hashes in the Dockerfile. To intentionally refresh them later, override `ACCELERATE_REF`, `DIFFUSERS_REF`, or `PEFT_REF` at build time.

Run:

```bash
docker run --gpus all -p 7860:7860 -v /workspace:/workspace runpod-qwen
```

Flash Attention 3 is intentionally off because the bundled `kernels-community/vllm-flash-attn3` path is not a safe default for Blackwell/SM120. Leave it off for RTX PRO 6000:

```bash
docker run --gpus all \
  -e QWEN_ENABLE_FA3=0 \
  -p 7860:7860 \
  -v /workspace:/workspace \
  runpod-qwen
```

The startup script runs `/app/runtime_preflight.py` before loading the model. It fails fast if CUDA is not visible, if a Blackwell RTX GPU is running under a CUDA build older than 12.8, or if the host driver is older than R570. This is deliberate; it is cheaper to fail before model downloads and inference.

The image includes `nano` for quick edits inside a running container.

Expose the output export app as well:

```bash
docker run --gpus all -p 7860:7860 -p 7861:7861 -v /workspace:/workspace runpod-qwen
```

Open port `7861` to create a ZIP from `/workspace/output` and upload it to your server endpoint. The upload is sent as `multipart/form-data` with the ZIP in field `file`.

In the Gradio UI, both batch processing and single-image testing expose output size controls under Advanced Settings. Keep custom width/height at `0` to preserve aspect ratio and scale by the output long side, or set one/both custom dimensions for a specific target size.

Restart GUI and reload models from a shell inside the container:

```bash
/app/restart_gui.sh
```

This stops the existing `app.py` process, starts a fresh Gradio process, and writes logs to `/workspace/gui.log`. If `app.py` is PID 1, restart the pod/container instead; killing PID 1 exits the container before the script can start a replacement process.

Override the base image only if you deliberately want a different PyTorch/CUDA build:

```bash
docker build \
  --build-arg BASE_IMAGE=pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel \
  -t runpod-qwen:cu128 .
```

Do not run the Blackwell build on a host without a compatible NVIDIA driver and NVIDIA Container Toolkit. If you intentionally need a CPU/debug run, set `QWEN_REQUIRE_CUDA=0`, but that is not suitable for real image generation.

## Startup checks

At startup the app prints `torch.__version__`, `torch.version.cuda`, `CUDA_VISIBLE_DEVICES`, GPU name, compute capability, and selected device. Flash Attention 3 is opt-in only:

```bash
docker run --gpus all \
  -e QWEN_ENABLE_FA3=1 \
  -e QWEN_ALLOW_FA3_ON_BLACKWELL=1 \
  -p 7860:7860 \
  -v /workspace:/workspace \
  runpod-qwen
```

Do not use those FA3 flags on RTX PRO 6000 unless you have already verified the exact kernel build on that instance.