--- title: "ONNX Model Import" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{ONNX Model Import} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(eval = FALSE, collapse = TRUE, comment = "#>") library(ggmlR) ``` ggmlR includes a built-in zero-dependency ONNX loader (hand-written protobuf parser in C). Load any compatible ONNX model and run inference on CPU or Vulkan GPU — no Python, no TensorFlow, no ONNX Runtime required. > **Note:** The examples below require a valid `.onnx` model file. > Replace `"path/to/model.onnx"` with the actual path on your system. ```{r, eval=FALSE} library(ggmlR) ``` --- ## 1. Load and inspect a model ```{r, eval=FALSE} model <- onnx_load("path/to/model.onnx") # Model summary (layers, ops, parameters) onnx_summary(model) # Input tensor info (name, shape, dtype) onnx_inputs(model) ``` --- ## 2. Run inference Inputs are named R arrays in NCHW order (matching the ONNX model's expected layout). ```{r, eval=FALSE} # Random image batch — replace with real data input <- array(runif(1 * 3 * 224 * 224), dim = c(1L, 3L, 224L, 224L)) result <- onnx_run(model, list(input_name = input)) cat("Output shape:", paste(dim(result[[1]]), collapse = " x "), "\n") ``` For models with multiple inputs, pass a named list: ```{r, eval=FALSE} result <- onnx_run(model, list( input_ids = array(as.integer(tokens), dim = c(1L, length(tokens))), attention_mask = array(1L, dim = c(1L, length(tokens))) )) ``` --- ## 3. GPU inference By default ggmlR tries Vulkan first and falls back to CPU automatically. To force a specific backend: ```{r, eval=FALSE} # Check what's available if (ggml_vulkan_available()) { cat("Vulkan GPU ready\n") ggml_vulkan_status() } # Load with explicit device model_gpu <- onnx_load("path/to/model.onnx", device = "vulkan") model_cpu <- onnx_load("path/to/model.onnx", device = "cpu") ``` Weights are transferred to the GPU once at load time. Repeated calls to `onnx_run()` do not re-transfer weights. --- ## 4. Dynamic input shapes Some models accept variable-length inputs. Override shapes at load time: ```{r, eval=FALSE} model <- onnx_load("path/to/bert.onnx", input_shapes = list(input_ids = c(1L, 128L))) ``` --- ## 5. FP16 inference Run in half-precision for faster GPU inference: ```{r, eval=FALSE} model_fp16 <- onnx_load("path/to/model.onnx", dtype = "f16") result <- onnx_run(model_fp16, list(input = input)) ``` --- ## 6. Supported operators ggmlR supports 50+ ONNX operators, including: - **Convolution:** Conv, ConvTranspose, MaxPool, AveragePool, GlobalAveragePool - **Linear:** Gemm, MatMul, Linear - **Activations:** Relu, Sigmoid, Tanh, Gelu, HardSigmoid, Mish, Clip, Elu - **Normalization:** BatchNormalization, LayerNormalization, GroupNormalization - **Shape ops:** Reshape, Transpose, Flatten, Squeeze, Unsqueeze, Concat, Split, Slice, Gather, ScatterElements - **Elementwise:** Add, Sub, Mul, Div, Pow, Sqrt, Exp, Log, Abs, Neg - **Reduction:** ReduceMean, ReduceSum, ReduceMax - **Attention:** Attention (fused), MultiHeadAttention - **Quantized:** QLinearConv, QLinearMatMul, DynamicQuantizeLinear - **Other:** Cast, Pad, Resize, Dropout (identity at inference), LSTM, GRU, Einsum Custom fused ops: **RelPosBias2D** (BoTNet). --- ## 7. Examples For full working examples with real ONNX Zoo models see: ```{r, eval=FALSE} # GPU vs CPU benchmark across multiple models # inst/examples/benchmark_onnx.R # FP16 inference benchmark # inst/examples/benchmark_onnx_fp16.R # Run all supported ONNX Zoo models # inst/examples/test_all_onnx.R # BERT sentence similarity # inst/examples/bert_similarity.R ``` --- ## 8. Debugging tips If a model fails to load or produces wrong results: 1. **Check operator support** — print the model's op list with Python's `onnx` package and compare against the table above. 2. **Verify protobuf field numbers** — the built-in parser is hand-written; an unexpected field can cause silent mis-parsing. 3. **NaN tracing** — use the eval callback for per-node inspection rather than a post-compute scan (which aliases buffers and gives false readings). 4. **Repeated-run aliasing** — `ggml_backend_sched` aliases intermediate buffers over weight buffers. ggmlR calls `sched_alloc_and_load()` before each compute to reset allocation. If you see correct results on the first run but garbage on subsequent runs, this is the cause.