Part 1: Architecture, Initialization & Hardware Discovery Link to heading
This is Part 1 of a three-part series exploring Nova, a new open-source GPU driver written in Rust for NVIDIA GPUs in the Linux kernel. This post covers the two-driver architecture, PCI initialization, BAR0 mapping, GPU identification, and safe register access.
Note: Nova is under active development. The code shown here reflects the upstream state as of writing, but APIs are evolving — some features described are work-in-progress. For the current state, check the nova-gpu mailing list and the NVIDIA/nova development branch.
Prerequisites: This post assumes basic familiarity with systems programming concepts (memory addresses, pointers, hardware registers). Rust experience helps but isn’t required — patterns are explained as they appear. Kernel-specific concepts (MMIO, DMA, PCI) are introduced when first used.
How to read this series Link to heading
This series explains Nova’s design patterns and the reasoning behind them. Code snippets illustrate architectural decisions — they’re snapshots of a moving target, but the patterns they demonstrate are durable. When an API is actively being reworked, I’ll (try to) note it. For current implementation details, the source of truth is drivers/gpu/nova-core/ and drivers/gpu/drm/nova/ in the kernel tree (or the NVIDIA/nova development branch).
The goal is to give you enough understanding of how the pieces fit together to start reading patches, following mailing list discussions, and eventually contributing.
What is Nova? Link to heading
Nova is a from-scratch GPU driver for NVIDIA GPUs (Turing and later) being developed directly in the upstream Linux kernel. Unlike traditional GPU drivers written in C, Nova is written in Rust and targets exclusively GPUs that use NVIDIA’s GPU System Processor (GSP) firmware model — RTX 20 series onward.
The project is developed in the open on the nova-gpu@vger.kernel.org mailing list. It landed in Linux 6.15 as an initial stub and has been rapidly growing functionality since then.
Nova’s key thesis: a GPU driver is essentially a resource management problem — managing lifetimes of PCI mappings, DMA buffers, firmware objects, and user-space handles. Rust’s ownership system (a set of compile-time rules that track which code “owns” each resource and is responsible for releasing it) confines lifetime-related bugs (use-after-free, double-free, forgotten cleanup) to clearly-marked unsafe boundaries, making them structurally impossible in the vast majority of driver code — the part that grows and changes — rather than merely detectable at runtime through testing or code review.
Why Not Just Improve Nouveau? Link to heading
Nouveau is the existing open-source NVIDIA driver in Linux, and it deserves recognition for what it accomplished. Over nearly two decades of community-driven reverse engineering, Nouveau delivered open-source support across a remarkable range of NVIDIA hardware — from NV04 through modern GPUs — without any official vendor cooperation for most of its history. It enabled Linux users to run NVIDIA hardware with a fully free driver stack long before NVIDIA began opening up.
However, the GPU landscape has fundamentally changed. Starting with Turing (RTX 20 series), NVIDIA introduced the GSP firmware model: a microcontroller on the GPU itself handles most of the hardware management, and the kernel driver’s role shifts from direct register programming to communicating with this firmware. This represents a fundamentally different driver architecture — one that Nouveau has been adapting to, but which creates an opportunity to rethink the design from scratch.
Nova takes that opportunity. Rather than retrofitting a new architecture onto an existing C codebase, it starts fresh with Rust and a design built around the GSP model from day one.
Nova is intended as the successor to Nouveau for all GSP-based GPUs. However, Nouveau continues to serve the vast catalog of pre-Turing hardware — NV04 through Volta — that Nova will never support. The two drivers coexist in the kernel because they target different hardware generations: Nova supersedes Nouveau for GSP-era GPUs, while Nouveau continues supporting older hardware.
| Nouveau | Nova | |
|---|---|---|
| Language | C | Rust |
| Architecture | Single monolithic module | Two cooperating modules |
| GPU support | NV04 through current (broad) | Turing+ only (GSP-based) |
| Resource lifetimes | Manual, convention-based | Compiler-enforced |
| Firmware model | Mixed (custom + GSP) | GSP-only |
| Development history | ~20 years of community effort | Started 2024, upstream 2025 |

nova vs nouveau architecture
The Two-Driver Architecture Link to heading
Nova is split into two kernel modules with a clear boundary between them:
- nova-core (
drivers/gpu/nova-core/): The PCI driver that owns the GPU hardware. It manages operations related with the physical GPU device such as BAR0 register access, falcon microcontrollers, GSP firmware boot, and framebuffer layout. It also serves as the common base for both nova-drm and a future vGPU manager VFIO driver. - nova-drm (
drivers/gpu/drm/nova/): An auxiliary DRM driver (DRM — Direct Rendering Manager — is the Linux kernel’s subsystem for managing GPUs and displays) that exposes the user-space API — IOCTLs (system calls through which applications request services from the driver), GEM buffer objects, and (eventually) command submission.
Why Split? Link to heading
-
Separation of concerns. Hardware lifecycle management (power sequencing, firmware, register programming) is fundamentally different from user-space API design (IOCTLs, memory management, scheduling).
-
Rebindability. nova-drm can be unbound and rebound without resetting the GPU. This is useful for driver updates, debugging, and vGPU scenarios.
-
Multiple consumers. nova-core is designed to also serve as the backend for a vGPU manager (VFIO driver), not just nova-drm.
-
Independent development velocity. Teams can iterate on the user-space API without touching hardware abstraction code and vice versa.
The Auxiliary Bus Bridge Link to heading
The two modules communicate through the kernel’s auxiliary bus. The auxiliary bus is a kernel mechanism that lets a parent driver create virtual “sub-devices” that other drivers can bind to — without those child drivers needing to know about the physical bus (PCI, platform, etc.) underneath.
The flow:
- nova-core registers an auxiliary device named
"nova-drm"during its PCI probe - this is when the driver is initializing the physical device. - nova-drm declares a device table matching
("nova-core", "nova-drm"). - The kernel’s bus infrastructure automatically probes nova-drm when nova-core creates the device.
- The bus guarantees ordering: nova-drm is always unbound before nova-core. The parent outlives the child.
From drivers/gpu/nova-core/driver.rs — nova-core creates the auxiliary device:
#[pin_data]
pub(crate) struct NovaCore<'bound> {
#[pin]
pub(crate) gpu: Gpu<'bound>,
bar: pci::Bar<'bound, BAR0_SIZE>,
// Creates a "nova-drm" auxiliary device on the bus.
// When this Registration is dropped (nova-core unbinds), the auxiliary
// device is removed, forcing nova-drm to unbind first.
#[allow(clippy::type_complexity)]
_reg: auxiliary::Registration<'bound, ForLt!(())>,
}
From drivers/gpu/drm/nova/driver.rs — nova-drm declares what it binds to:
// Static device table: "I bind to auxiliary devices named 'nova-drm'
// created by the module 'nova-core'."
kernel::auxiliary_device_table!(
AUX_TABLE,
MODULE_AUX_TABLE,
<NovaDriver as auxiliary::Driver>::IdInfo,
[(
auxiliary::DeviceId::new(NOVA_CORE_MODULE_NAME, AUXILIARY_NAME),
() // No per-match metadata needed.
)]
);
Where NOVA_CORE_MODULE_NAME = c"nova-core" and AUXILIARY_NAME = c"nova-drm".
When the kernel sees nova-core create an auxiliary device whose name matches nova-drm’s table, it calls nova-drm’s probe(), passing a reference to the auxiliary device. At that point, the two modules are connected.

Nova’s initialization sequence
Nova-Core: PCI Probe and BAR0 Mapping Link to heading
The Probe Flow Link to heading
When the kernel finds an NVIDIA GPU on the PCI bus, nova-core’s probe function runs. The PCI device table tells the kernel which hardware this driver handles — in this case, any NVIDIA device classified as a VGA or 3D controller:
// "Match any NVIDIA GPU that's either a VGA controller or a 3D controller."
// This is broad — the driver will reject unsupported chipsets later during
// GPU identification (Spec::new), returning ENODEV for pre-Turing GPUs.
kernel::pci_device_table!(
PCI_TABLE,
MODULE_PCI_TABLE,
<NovaCoreDriver as pci::Driver>::IdInfo,
[
(pci::DeviceId::from_class_and_vendor(
Class::DISPLAY_VGA, ClassMask::ClassSubclass, Vendor::NVIDIA), ()),
(pci::DeviceId::from_class_and_vendor(
Class::DISPLAY_3D, ClassMask::ClassSubclass, Vendor::NVIDIA), ()),
]
);
The probe function itself is remarkably compact thanks to Rust’s try_pin_init! macro, which initializes all fields in declaration order while maintaining pinning guarantees.
A brief explanation of why this matters: kernel objects often can’t be moved in memory after creation — other subsystems hold raw pointers to them (linked lists, wait queues, PCI bookkeeping). Normal Rust would construct a struct in one place in memory and then move it to its final location, which would invalidate those pointers.
try_pin_init! solves this by initializing each field directly at its final pinned address. Additionally, if initialization fails at any point, previously-initialized fields are automatically dropped in reverse order — no explicit goto cleanup chains:
impl pci::Driver for NovaCoreDriver {
type IdInfo = ();
// The driver's per-device state. `'bound` ties it to the PCI binding lifetime.
type Data<'bound> = NovaCore<'bound>;
const ID_TABLE: pci::IdTable<Self::IdInfo> = &PCI_TABLE;
fn probe<'bound>(
pdev: &'bound pci::Device<Core<'_>>,
_info: &'bound Self::IdInfo,
) -> impl PinInit<Self::Data<'bound>, Error> + 'bound {
// Returns a PinInit — a "recipe" for initializing NovaCore in-place
// at its final pinned address. The kernel calls this recipe after
// allocating the memory.
pin_init::pin_init_scope(move || {
pdev.enable_device_mem()?; // Enable PCI memory BARs
// (The `?` operator: if this call fails, return the error immediately
// to the caller. It's Rust's concise error propagation syntax.)
pdev.set_master(); // Enable bus mastering (needed for DMA)
Ok(try_pin_init!(NovaCore {
// Map BAR0 (16 MB MMIO region). Compile-time size check.
bar: pdev.iomap_region_sized::<BAR0_SIZE>(0, c"nova-core/bar0")?,
// Initialize the GPU. `<-` means "pinned sub-initializer" —
// Gpu is constructed in-place at its final address.
gpu <- Gpu::new(pdev, unsafe { &*core::ptr::from_ref(bar) }),
// Register the auxiliary device that triggers nova-drm's probe.
_reg: auxiliary::Registration::new(
pdev.as_ref(), c"nova-drm",
AUXILIARY_ID_COUNTER.fetch_add(1, Relaxed),
crate::MODULE_NAME, (),
)?,
}))
})
}
}
The initialization sequence:
enable_device_mem()— Enables PCI memory BARs so we can map them.set_master()— Enables bus mastering for DMA transfers (DMA — Direct Memory Access — allows the GPU to read/write system RAM independently without the CPU moving every byte).iomap_region_sized::<SZ_16M>(0)— Maps BAR0 (16 MB) into kernel virtual address space.Gpu::new()— Runs the full GPU initialization sequence.auxiliary::Registration::new()— Creates the auxiliary device that triggers nova-drm’s probe.
Typed BAR0: MMIO with Compile-Time Safety Link to heading
BAR0 is the 16 MB memory-mapped I/O region through which the CPU accesses GPU registers. In a traditional C driver, you’d use ioremap() and raw pointer arithmetic. In Nova, BAR0 is a typed, lifetime-bound reference:
const BAR0_SIZE: usize = SZ_16M; // 16 MB — fixed for all supported GPUs.
// Bar0 is just a reference to a sized, lifetime-bound MMIO mapping.
// Three invariants are encoded in this one type alias.
pub(crate) type Bar0<'a> = &'a pci::Bar<'a, BAR0_SIZE>;
This one type alias encodes three critical invariants:
- Lifetime
'a: The BAR0 reference cannot outlive the PCI device binding. Once the driver unbinds, all references to BAR0 are statically invalidated. - Const generic
BAR0_SIZE: The size is baked into the type at compile time, enabling bounds checking on register offsets without runtime overhead. pci::Barwrapper: Uses the kernel crate’sIotrait for MMIO reads/writes rather than raw pointer derefs.
Reading a register is a single method call that combines offset resolution, bounds checking, a volatile MMIO read (one the compiler can’t optimize away — necessary because hardware registers can change value between reads), and typed bitfield construction:
// Returns an NV_PMC_BOOT_0 struct — a typed bitfield wrapper around the raw u32.
// The register's offset (0x0) is const — out-of-bounds offsets won't compile.
let boot0 = bar.read(regs::NV_PMC_BOOT_0);
What is MMIO? Link to heading
For readers unfamiliar with GPU driver internals: Memory-Mapped I/O is how the CPU talks to GPU hardware. The GPU exposes a window of “registers” (really just fixed-address control/status locations) through PCIe Base Address Register 0.
For instance, when the OS maps BAR0 into kernel virtual address space, reading offset 0x0 returns the GPU’s boot identification register, reading offset 0xa00 returns extended chipset info, and so on for thousands of control registers across the 16 MB window.
Every GPU operation — identification, firmware loading, falcon microcontroller control — ultimately comes down to reading and writing specific offsets within this 16 MB window.
GPU Identification Link to heading
Once BAR0 is mapped, the first thing Nova does is figure out which GPU it’s talking to.
Boot Registers Link to heading
NVIDIA GPUs expose two identification registers:
NV_PMC_BOOT_0at offset0x0— A legacy register present on all NVIDIA GPUs since NV04. Used to filter out pre-Fermi GPUs that Nova doesn’t support.NV_PMC_BOOT_42at offset0xa00— An extended register (available since Fermi) containing the architecture and implementation fields needed to precisely identify the GPU.
From regs.rs — the register! macro declares each register’s MMIO offset and its bit-level structure. Reading the register returns a typed struct with accessor methods for each field:
register! {
/// Basic revision information about the GPU.
/// Offset 0x0 — present on all NVIDIA GPUs since NV04.
pub(crate) NV_PMC_BOOT_0(u32) @ 0x00000000 {
// Each line: `hi:lo field_name` — defines a bit range and its accessor.
28:24 architecture_0;
23:20 implementation;
8:8 architecture_1;
7:4 major_revision;
3:0 minor_revision;
}
/// Extended architecture information.
/// Offset 0xa00 — available since Fermi, the authoritative ID register.
pub(crate) NV_PMC_BOOT_42(u32) @ 0x00000a00 {
// `?=> Architecture` means: cast bits to u32, then TryFrom into the
// Architecture enum. Returns Result — Rust's "either success or error"
// type (fallible — unknown values possible).
29:24 architecture ?=> Architecture;
23:20 implementation;
19:16 major_revision;
15:12 minor_revision;
}
}
The Identification Algorithm Link to heading
The Spec::new() function implements the two-stage identification:
fn new(dev: &device::Device, bar: Bar0<'_>) -> Result<Spec> {
let boot0 = bar.read(regs::NV_PMC_BOOT_0);
// Stage 1: Reject pre-Fermi GPUs (they don't have reliable boot42).
if boot0.is_older_than_fermi() {
return Err(ENODEV); // ENODEV = "no such device" — a kernel error code
}
// Stage 2: Use boot42 for precise identification.
let boot42 = bar.read(regs::NV_PMC_BOOT_42);
Spec::try_from(boot42).inspect_err(|_| {
dev_err!(dev, "Unsupported chipset: {}\n", boot42);
})
}
The chipset code is formed by combining the architecture and implementation fields — this is the inverse of how NVIDIA encodes the chipset into the register so bits are architecture bits are shifted to left:
impl NV_PMC_BOOT_42 {
/// Combines the architecture and implementation fields into a unique chipset code.
/// E.g., architecture 0x16 (Turing) + implementation 0x2 = chipset 0x162 (TU102).
pub(crate) fn chipset(self) -> Result<Chipset> {
self.architecture()
.map(|arch| {
// Shift architecture left by the implementation field width, OR in impl.
((arch as u32) << Self::IMPLEMENTATION_RANGE.len())
| u32::from(self.implementation())
})
.and_then(Chipset::try_from) // Match against known chipsets or return ENODEV.
}
}
For example, a TU102 GPU has architecture 0x16 (Turing) and implementation 0x2, producing chipset code 0x162.

GPU identification sequence
The define_chipset! Macro
Link to heading
Rather than writing the Chipset enum, its TryFrom<u32>, and its name() method by hand, Nova uses a macro that generates all three from a single definition:
define_chipset!({
// Turing (RTX 20 series)
TU102 = 0x162,
TU104 = 0x164,
TU106 = 0x166,
// ...
// Ampere (RTX 30 series)
GA102 = 0x172,
GA104 = 0x174,
// ...
// Ada (RTX 40 series)
AD102 = 0x192,
AD104 = 0x194,
// ...
// Blackwell (RTX 50 series)
GB202 = 0x1b2,
GB203 = 0x1b3,
// ...
});
(The full list in the source includes every supported variant across Turing, Ampere, Hopper, Ada, and Blackwell generations.)
This generates:
- The
Chipsetenum with each variant. TryFrom<u32>matching each value to its variant, returningENODEVfor unknown chipsets.Chipset::ALL— a&'static [Chipset]slice of all supported chipsets.Chipset::name()— a lowercase string version ("tu102","ga100", etc.) used for firmware path construction.
The Chipset type also derives additional methods like arch() that maps each chipset to its Architecture generation, and helper predicates like needs_fwsec_bootloader() and uses_fsp() that encode hardware generation boundaries.
Safe Register Access and Bitfields Link to heading
We’ve already seen the register! macro in use during GPU identification — it declared NV_PMC_BOOT_0 and NV_PMC_BOOT_42 with typed fields. But the macro handles more than simple fixed-offset registers. This section covers the full design: how Nova creates a compile-time-checked register access layer, and the complementary bitfield! macro for non-MMIO structured values.
To see why this matters, compare how a C driver reads the same register:
// C (traditional approach — e.g., Nouveau)
u32 val = nvkm_rd32(device, 0xa00); // raw offset, no bounds check possible
u32 arch = (val >> 24) & 0x3f; // manual bit extract, easy to misshift
// If you typo the offset or shift amount, you get silent data corruption at runtime.
// Rust (Nova)
let boot42 = bar.read(regs::NV_PMC_BOOT_42); // offset checked at compile time, lifetime enforced
let arch = boot42.architecture()?; // typed enum, unknown values must be handled
// Wrong offset = compile error. Wrong field = type error. Unknown chipset = forced error handling.
The register! Macro
Link to heading
The register! macro (provided by the kernel crate’s io::register module) defines typed register structs with compile-time offsets and named bit fields:
register! {
pub(crate) NV_PMC_BOOT_0(u32) @ 0x00000000 {
28:24 architecture_0;
23:20 implementation;
8:8 architecture_1;
7:4 major_revision;
3:0 minor_revision;
}
}
This generates a struct NV_PMC_BOOT_0 that the Io trait can read from BAR0 at offset 0x0. The field declarations produce accessor methods: boot0.architecture_0(), boot0.implementation(), etc.
The macro supports several offset forms, demonstrating how one macro handles the full range of register addressing patterns found in GPU hardware:
register! {
// Fixed absolute offset — the common case.
pub(crate) NV_PMC_BOOT_42(u32) @ 0x00000a00 { ... }
// Array with stride — 64 contiguous 32-bit registers starting at 0x1400.
// Accessed as NV_PBUS_SW_SCRATCH[index].
pub(crate) NV_PBUS_SW_SCRATCH(u32)[64] @ 0x00001400 {}
// Alias into an array element — gives a meaningful name to scratch reg 0xe.
pub(crate) NV_PBUS_SW_SCRATCH_0E_FRTS_ERR(u32) => NV_PBUS_SW_SCRATCH[0xe] { ... }
// Base-relative offset — the offset is added to a base address determined
// by the type parameter E (a FalconEngine). Resolved at compile time.
pub(crate) NV_PFALCON_FALCON_HWCFG1(u32) @ PFalconBase + 0x0012c { ... }
}
The base-relative form is how Nova handles registers that exist at different addresses on different falcon engines (GSP at 0x110000, SEC2 at 0x840000, FSP at 0x8f2000). The WithBase::of::<E>() mechanism resolves the engine’s base address at compile time — zero runtime cost, wrong-engine access is a type error:
// Reads HWCFG1 register from the GSP falcon's register space.
// The offset PFalconBase + 0x0012c is resolved at compile time to 0x11012c
// because Gsp implements RegisterBase<PFalconBase> with BASE = 0x110000.
let hwcfg1 = bar.read(regs::NV_PFALCON_FALCON_HWCFG1::of::<Gsp>());
The bitfield! Macro
Link to heading
While register! handles the MMIO mechanics, Nova defines a complementary bitfield! macro for standalone typed wrappers around raw integers. This is used for values that aren’t directly MMIO registers but still have structured bit fields (e.g., firmware headers, command queue entries):
bitfield! {
// A u32 wrapper with two named fields.
pub struct ControlReg(u32) {
// Bit 7: boolean field. Accessor returns a State enum via From<bool>.
7:7 state as bool => State;
// Bits 3:0: 4-bit field. Accessor returns Result<Mode> via TryFrom<u8>
// (fallible — not all 4-bit values are valid Mode variants).
3:0 mode as u8 ?=> Mode;
}
}
The field type annotations control how raw bits become meaningful values.
Two Rust traits are central here: From and TryFrom. Both define a standard way to convert one type into another. From is infallible — the conversion always succeeds (e.g., every possible single-bit value maps to true or false). TryFrom is fallible — the conversion can fail and returns a Result that the caller must handle (e.g., a 6-bit field has 64 possible values, but only 7 of them correspond to known GPU architectures).
as bool— Single-bit field, returned as boolean.as u8— Multi-bit field, cast to the specified integer type.=> EnumType— Infallible conversion viaFrom. The enum must cover all possible bit patterns.?=> EnumType— Fallible conversion viaTryFrom. ReturnsResultfor registers with reserved/invalid bit patterns. The caller is forced to handle the “unknown value” case — the compiler won’t let you ignore it.
The macro generates a zero-overhead wrapper around the raw integer with:
- Typed getter methods for each field.
- Builder-pattern setters (
.set_mode(value)) for constructing register values. - Automatic
DebugandDefaultimplementations.
Compile-Time Safety Guarantees Link to heading
Together, these macros provide several guarantees that C drivers cannot:
- Offset bounds checking. Register offsets are
const— accessing an offset beyond BAR0’s 16 MB size is a compile error. - Bit range validation. Boolean fields must be exactly one bit wide. MSB (most significant bit — the highest-numbered bit in the range) must be ≥ LSB (least significant bit — the lowest-numbered). These are checked with
build_assert!. - Type-safe conversions. Reading a register field that maps to an enum with
?=>forces the caller to handle the possibility of an invalid value. - Lifetime enforcement.
Bar0<'a>cannot outlive the PCI device binding — accessing registers after device unbind is structurally impossible.
Hardware Abstraction: The HAL Pattern Link to heading
Nova uses traits to abstract hardware differences between generations. Each subsystem defines a Hardware Abstraction Layer trait — a contract that each generation must fulfill. The trait methods represent the operations that differ between chip families:
/// Trait defining generation-specific GPU behavior.
/// Each method represents something that works differently across architectures.
pub(crate) trait GpuHal {
/// Wait for the GPU's own firmware to finish initializing (devinit).
/// No-op on Hopper+ where FSP handles this.
fn wait_gfw_boot_completion(&self, bar: Bar0<'_>) -> Result;
/// DMA address width supported by this architecture (e.g., 47-bit or 52-bit).
fn dma_mask(&self) -> DmaMask;
/// Location of the PCI config mirror in BAR0 (varies by generation).
fn pci_config_mirror_range(&self) -> Range<u32>;
}
The gpu_hal() function selects the appropriate implementation based on the chipset’s architecture — called once during init, then the returned reference is used throughout the driver’s lifetime:
/// Returns the GpuHal implementation for the given chipset.
/// Groups of architectures share behavior — Turing/Ampere/Ada all use the
/// TU102 HAL, while Hopper and Blackwell use GH100.
pub(super) fn gpu_hal(chipset: Chipset) -> &'static dyn GpuHal {
match chipset.arch() {
Architecture::Turing | Architecture::Ampere | Architecture::Ada => tu102::TU102_HAL,
Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
gh100::GH100_HAL
}
}
}
However, not all HALs can be global constants like GpuHal. HALs with generic type parameters (like FalconHal<E>, parameterized over the engine type) are allocated at runtime instead, since Rust can’t define all the generic combinations as static constants.
This pattern is repeated across nova-core:
| Subsystem | Trait | Selection |
|---|---|---|
| GPU top-level | GpuHal |
Global constant, picked once at init |
| Falcon engines | FalconHal<E> |
Allocated at runtime (generic) |
| Framebuffer | FbHal |
Global constant, picked once at init |
| GSP boot | GspHal |
Global constant, picked once at init |
GFW Boot: Waiting for the GPU to Initialize Itself Link to heading
Before Nova can do anything meaningful, it must wait for the GPU’s own firmware (devinit) to finish initializing. Upon reset, several GPU microcontrollers run firmware from the VBIOS to set up clocks, timing circuits, VRAM timings, and power sequencing.
On Turing through Ada GPUs, the driver polls a scratch register until the GPU signals completion. This is a GpuHal method — on Hopper/Blackwell, the implementation is an empty no-op:
fn wait_gfw_boot_completion(&self, bar: Bar0<'_>) -> Result {
// read_poll_timeout: repeatedly call the closure (the `|| { ... }` anonymous
// function below), check the predicate, sleep 1ms between attempts, give up
// after 4 seconds.
read_poll_timeout(
|| {
Ok(
// Step 1: Check that FWSEC has lowered its privilege level.
// The GFW_BOOT register is inaccessible until this happens.
bar.read(regs::NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK)
.read_protection_level0()
// Step 2: Read the actual boot completion flag.
&& bar
.read(regs::NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_0_GFW_BOOT)
.completed(),
)
},
|&gfw_booted| gfw_booted, // Predicate: stop when true.
Delta::from_millis(1), // Poll interval.
Delta::from_secs(4), // Timeout.
)
.map(|_| ())
}
There’s a subtle two-step check here:
- First, verify that FWSEC (running on the GSP in Heavy-Secured mode) has lowered its privilege level — the status register is inaccessible until this happens.
- Then read the actual completion flag.
On Hopper and Blackwell GPUs, this step is a no-op — the FSP (Foundation Security Processor) handles all early initialization, and the driver’s boot path uses a different mechanism entirely.
The Full Initialization Sequence Link to heading
Putting it all together, here’s what happens when Nova initializes a GPU. (Note: SysmemFlush is a DMA page that enables the GPU to flush pending PCIe memory writes into system RAM — a hardware prerequisite before any falcon microcontroller reset. It’s covered in detail in Part 3.)

Full Nova’s initialization sequence
The key design insight: the initialization is expressed as a single try_pin_init! block where each field is initialized in declaration order. If any step fails, previously-initialized fields are dropped in reverse order — the PinnedDrop implementation (Rust’s destructor for pinned types — code that runs automatically when the object is destroyed) on GspResources automatically runs the GSP unload sequence. No explicit error-handling cleanup paths needed.
Compare to the equivalent pattern in C kernel drivers:
// C error handling (typical pattern)
err = step1_identify_gpu();
if (err) return err;
err = step2_setup_dma();
if (err) goto undo_step1;
err = step3_register_flush();
if (err) goto undo_step2;
err = step4_boot_gsp();
if (err) goto undo_step3;
// ...
undo_step3: unregister_flush();
undo_step2: tear_down_dma();
undo_step1: release_gpu_id();
return err;
Missing a goto label, cleaning up in wrong order, or forgetting a resource are common bugs in this pattern. In Nova, the compiler generates the correct cleanup chain from the struct definition — you cannot forget because the type system won’t let you omit a Drop implementation for types that hold resources.
The pattern for GPU initialization looks roughly like this (simplified — see gpu.rs for the full implementation):
// drivers/gpu/nova-core/gpu.rs (simplified)
impl<'gpu> Gpu<'gpu> {
pub(crate) fn new(pdev: &'gpu pci::Device<...>, bar: Bar0<'gpu>)
-> impl PinInit<Self, Error> + 'gpu
{
try_pin_init!(Self {
spec: Spec::new(pdev.as_ref(), bar)?, // 1. Identify GPU
_: { /* set DMA mask, wait GFW boot */ }, // 2. Wait for HW init
sysmem_flush: SysmemFlush::register(...)?, // 3. Register flush page
gsp_resources <- /* falcon init + GSP boot */, // 4. Boot GSP firmware
gsp_static_info: /* query GPU info from GSP */, // 5. Get GPU metadata
})
}
}
The <- syntax indicates a pinned sub-initializer (the field is initialized in-place at its final address). If step 4 fails, sysmem_flush is dropped (unregistering the flush page). If step 5 fails, gsp_resources is dropped (which triggers PinnedDrop running the GSP unload sequence). The ownership model makes cleanup exhaustive and automatic.
Summary Link to heading
Part 1 has covered the foundational layers of Nova:
- Two-driver split: nova-core handles hardware, nova-drm handles user-space, connected via the auxiliary bus.
- Typed BAR0: MMIO access with compile-time bounds checking and lifetime enforcement.
- GPU identification: A two-stage algorithm reading boot registers and matching against a macro-generated chipset enum.
- Safe register access: Two complementary macros (
register!andbitfield!) create a fully typed, compile-time-checked register layer. - HAL pattern: Traits abstract generation-specific behavior with static or boxed dispatch.
- Pin-init initialization: The entire GPU bring-up is expressed as a pinned initializer with automatic cleanup on failure.
In Part 2, we’ll dive into the GPU’s internal microcontrollers (Falcons), firmware loading with type-state safety, the GSP boot sequence, and the command queue that enables communication between the driver and the GPU’s system processor.
Sources: Nova GPU driver source at drivers/gpu/nova-core/ and drivers/gpu/drm/nova/ in the Linux kernel tree. Rust-for-Linux Nova page. Nova kernel documentation. nova-gpu mailing list archive.