OpenGL Geometry Benchmark Guide: Tests, Metrics, and Best Practices
Overview
This guide covers how to design and run OpenGL geometry benchmarks, which test the CPU/GPU pipeline handling of vertex processing, primitive assembly, and draw submission. It explains common tests, key metrics to collect, configuration choices that affect results, and practical best practices to produce repeatable, meaningful measurements.
Goals and scope
- Goal: Measure how efficiently a system renders geometric data (vertices, indices, and primitives) under realistic and synthetic workloads.
- Scope: Vertex processing performance, draw-call overhead, index and non-indexed drawing, instancing, and how shader complexity and state changes affect throughput. This guide does not cover fragment-heavy tests (e.g., complex full-screen shading) except where relevant to geometry-bound scenarios.
Test types
-
Vertex Throughput Test
- Render large streams of vertices with minimal fragment cost.
- Use simple vertex shaders (pass-through) and minimal fragment shaders (single constant color) or disable fragments with glColorMask/glDepthMask where supported.
- Measure maximum sustained vertex rate (vertices/sec).
-
Indexed vs Non-Indexed
- Compare drawing with and without index buffers to measure memory locality and cache benefits.
- Vary index patterns: sequential, random, and locality-friendly (striped or clustered).
-
Draw Call Overhead
- Issue many small draw calls (e.g., 1–1000 triangles per call) to stress CPU submit path.
- Measure draws/sec and total frames rendered at target FPS.
-
Instancing
- Render many instances of a small mesh using glDrawArraysInstanced or glDrawElementsInstanced.
- Vary instance count and per-instance attribute size to assess instancing overhead and rate.
-
State Change Sensitivity
- Introduce controlled changes to GL state between draws: shader switches, VAO/VBO binds, texture binds, blending toggles.
- Measure performance impact per-state change type.
-
Shader Complexity Sweep
- Keep geometry constant and vary vertex shader arithmetic and attribute count.
- Identify the point where shader execution becomes the bottleneck.
-
Large Vertex Buffer Streaming
- Update VBOs frequently (streaming) using glBufferSubData or buffer orphaning (glBufferData(NULL) trick) to evaluate dynamic data upload cost.
Key metrics to collect
- Vertices/sec — primary metric for pure geometry throughput.
- Primitives/sec (triangles/sec) — reflects rendered primitive rate.
- Draw calls/sec — indicates CPU submission capacity.
- Frames/sec (FPS) — end-to-end throughput under a test scenario.
- CPU usage (per core/thread) — to identify CPU-side bottlenecks.
- GPU busy/idle time / GPU utilization — if available from vendor tools.
- API call timings — time spent in draw calls and buffer uploads.
- Memory bandwidth and cache hit metrics — where supported by profiling tools.
- GPU pipeline stalls / synchronization waits — to detect pipeline bubbles.
Test configuration and methodology
- Use a controlled fixed timestep and a sufficient run duration (e.g., warm-up of several seconds, then 30–60s measured window) to avoid transient effects.
- Disable vsync for raw throughput tests; enable it only for real-world interactive scenarios.
- Run tests at a resolution and clear state that keeps fragment load minimal (e.g., 256×256 with simple shaders) for geometry-bound profiling.
- Use timer queries (GL_TIMESTAMP or glBeginQuery with GL_TIME_ELAPSED) to measure GPU times. Combine with CPU timers for end-to-end latencies.
- Isolate variables: change one factor per experiment (e.g., only shader complexity) to attribute performance differences correctly.
- Repeat runs and report median and variance, not single-run peaks.
Test harness recommendations
- Provide command-line params to configure mesh size, draw-call size, instancing count, shader variants, and buffering strategies.
- Support synthetic models: point cloud, line lists, triangle lists, strips, indexed grids, and small repeated meshes.
- Include validation mode (render a reference image or per-frame checksum) to ensure correctness during aggressive optimizations.
- Export detailed logs: per-frame metrics, histogram
Leave a Reply
You must be logged in to post a comment.