Skip to content

Nintendo 64

codingncaffeine edited this page May 26, 2026 · 2 revisions

Nintendo 64

Emutastic supports two libretro N64 cores: parallel_n64 (default) and mupen64plus-next. Both are based on mupen64plus but differ significantly in architecture and plugin support.

Core DLL Status
parallel_n64 parallel_n64_libretro.dll Default — pairs cleanly with ParaLLEl-RDP (Vulkan); what Emutastic launches a N64 game with unless overridden per-game
mupen64plus-next mupen64plus_next_libretro.dll Alternative — supports ParaLLEl-RDP (Vulkan) and GlideN64 (OpenGL); statically links GlideN64 which has teardown implications (see below)

Visual Upgrades

The N64 renders natively at 320x240. With ParaLLEl-RDP (the default renderer for both cores), upscaling is handled entirely on the GPU via Vulkan compute:

Option Native Upscaled (suggested)
parallel-rdp-upscaling* 1x 4x (sharp, fast on modern GPUs)
parallel-rdp-vi-aa enabled enabled (preserves N64's VI anti-aliasing)
parallel-rdp-vi-bilinear enabled enabled
parallel-rdp-super-sampled-read-back False True (SSAA — heavier but cleaner)

* Option key prefix is parallel-n64- for parallel_n64 or mupen64plus- for mupen64plus-next.

Note: ParaLLEl-RDP upscaling requires a game restart to take effect. Unlike most other cores, changing the upscaling level mid-game has no effect until the game is relaunched.

If using GlideN64 (mupen64plus-next only), resolution is instead controlled by mupen64plus-43screensize / mupen64plus-169screensize, and additional options include MSAA (up to 16x), FXAA, and texture enhancement filters (xBRZ, HQ variants).

Graphics Plugins (mupen64plus-next)

mupen64plus-next selects its RDP plugin via the mupen64plus-rdp-plugin core option:

Plugin Option Value Renderer Status
ParaLLEl-RDP parallel Vulkan compute Recommended — bit-accurate LLE, GPU-accelerated, 1x–8x upscaling
GlideN64 gliden64 OpenGL Works out of the box — HLE, good compatibility, resolution scaling via screensize options

Important: mupen64plus-next statically links GlideN64 even when using ParaLLEl-RDP. This means nvoglv64.dll (NVIDIA's unified GL+Vulkan driver) is always loaded and spawns background threads regardless of which RDP plugin is active. This has major implications for teardown — see below.

GlideN64 (OpenGL) — mupen64plus-rdp-plugin = gliden64

GlideN64 uses the standard libretro OpenGL HW rendering path (RETRO_HW_CONTEXT_OPENGL). The frontend creates an FBO, the core renders into it, and the frontend reads back the pixels. This is the same path used by PPSSPP, Dolphin, and other OpenGL cores.

GlideN64 appeared to work out of the box during initial testing — resolution scaling, save states, and basic gameplay all functioned without special handling. More thorough testing is pending.

Core Options (GlideN64)

mupen64plus-rdp-plugin              = gliden64
mupen64plus-43screensize            = 640x480     (or higher, e.g. 1280x960)
mupen64plus-cpucore                 = dynamic_recompiler
mupen64plus-pak1                    = memory

Resolution is controlled by mupen64plus-43screensize (4:3 mode) or mupen64plus-169screensize (16:9 mode).

Teardown (GlideN64 / OpenGL)

Uses the standard GL teardown path. Key rules specific to mupen64plus-next:

Emu thread:  retro_unload_game()
             skip context_destroy (EmuThread still running cleanup via co_switch)
             retro_deinit()            ← GL context must still be current
             wglMakeCurrent(NULL)
             thread exits

Background:  join(emu_thread)
             Sleep(1500ms)             ← drain NVIDIA driver callbacks
             wglDeleteContext()
             Sleep(500ms)
             FreeLibrary()             ← must be AFTER wglDeleteContext
  1. retro_deinit must run with a current GL context (OPENGL32.dll dispatch table)
  2. Skip context_destroy — EmuThread is still running cleanup
  3. wglDeleteContext after a delay (NVIDIA driver callbacks)
  4. FreeLibrary after wglDeleteContext (driver calls back into core DLL)
  5. All cleanup must be synchronous before relaunching (mupen64plus uses global state)

ParaLLEl-RDP (Vulkan) — mupen64plus-rdp-plugin = parallel

ParaLLEl-RDP is a bit-accurate N64 RDP renderer that runs entirely on GPU compute shaders via Vulkan. It trivially handles 4x and 8x upscaling on modern GPUs (tested on RTX 5080).

All of the Vulkan integration, swapchain presentation, teardown/relaunch fixes, and HUD overlay work documented below was done using mupen64plus-next with parallel (ParaLLEl-RDP).

Libretro Vulkan Integration

Unlike OpenGL cores where the frontend creates an FBO and reads back pixels, the Vulkan path uses a negotiation interface:

  1. Core sends SET_HW_RENDER with context_type = RETRO_HW_CONTEXT_VULKAN (6)
  2. Core sends SET_HW_RENDER_CONTEXT_NEGOTIATION_INTERFACE — a struct with get_application_info and create_device callbacks
  3. Frontend creates VkInstance (with VK_KHR_surface + VK_KHR_win32_surface extensions)
  4. Frontend picks a physical device (prefer discrete GPU)
  5. Core's create_device callback creates the VkDevice (the core knows its own required extensions/features)
  6. Frontend stores the device, queue, and queue family index
  7. After retro_load_game, frontend calls context_reset
  8. Core requests GET_HW_RENDER_INTERFACE — frontend provides retro_hw_render_interface_vulkan with callbacks for set_image, get_sync_index, lock_queue, etc.
  9. Each frame, the core calls set_image with a retro_vulkan_image containing the rendered VkImageView

Swapchain Presentation (Eliminating CPU Readback)

The naive approach — vkCmdCopyImageToBuffer into a staging buffer, memcpy to CPU, upload to WPF WriteableBitmap — works but is slow. At 4x+ upscaling the GPU->CPU->GPU round-trip becomes the bottleneck.

The correct approach is Vulkan swapchain presentation: create a VkSurfaceKHR on a native window, build a swapchain, and use vkCmdBlitImage to scale the core's rendered image directly to the swapchain image. The GPU handles the blit (with linear filtering) and presents — zero CPU involvement per frame.

WPF Airspace: Overlay Window

Emutastic's window uses AllowsTransparency="True" + WindowStyle="None" for custom chrome. This creates a Win32 layered window, which has a fundamental limitation: child windows (WS_CHILD) are invisible in layered windows. WPF's HwndHost creates child windows, so it can't be used.

The solution is a top-level popup window (WS_POPUP) owned by the emulator window:

// WS_POPUP = top-level, not affected by parent's AllowsTransparency
// WS_EX_NOACTIVATE = doesn't steal keyboard focus from WPF
hwnd = CreateWindowEx(
    WS_EX_NOACTIVATE, "Static", "",
    WS_POPUP | WS_VISIBLE | WS_CLIPSIBLINGS,
    x, y, width, height,
    ownerHwnd,  // owned by EmulatorWindow (not parented)
    ...);

The overlay is positioned over the game viewport using PointToScreen, and kept in sync via LocationChanged / SizeChanged / StateChanged events.

Window Resize (Swapchain Debounce)

RecreateSwapchain calls vkDeviceWaitIdle + destroy + create — calling this on every SizeChanged event (every pixel of a window drag) overwhelms the NVIDIA driver and crashes the app. The fix: reposition the Win32 overlay instantly via SetWindowPos (cheap), but debounce the actual swapchain recreation with a 150ms DispatcherTimer. One brief stutter when resizing stops is inherent to Vulkan swapchain recreation. Applies to both mupen64plus-next and parallel_n64.

Frame Pipeline

Core renders frame (Vulkan compute) -> set_image(VkImageView)
  |
Frontend: vkAcquireNextImageKHR (swapchain)
  |
Barrier: core image -> TRANSFER_SRC_OPTIMAL
Barrier: swap image -> TRANSFER_DST_OPTIMAL
  |
vkCmdBlitImage (e.g. 2560x960 -> 1278x960, linear filter)
  |
Barrier: core image -> original layout
Barrier: swap image -> PRESENT_SRC_KHR
  |
vkQueueSubmit (semaphore sync)
  |
vkQueuePresentKHR

Core Options (ParaLLEl-RDP)

mupen64plus-rdp-plugin                    = parallel
mupen64plus-parallel-rdp-upscaling        = 4x       (1x / 2x / 4x / 8x)
mupen64plus-parallel-rdp-synchronous      = True     (False causes race condition crashes)
mupen64plus-cpucore                       = dynamic_recompiler
mupen64plus-pak1                          = memory

Note: mupen64plus-43screensize only affects GlideN64/glide64, NOT ParaLLEl-RDP. Upscaling is controlled solely by mupen64plus-parallel-rdp-upscaling.

In-Game HUD Overlay

The Vulkan swapchain renders on a WS_POPUP window that sits on top of the WPF window, covering all WPF UI including the OverlayHud pill. The fix: a separate transparent WPF Window (_vulkanHudWindow) floats above the Vulkan overlay. The OverlayHud StackPanel is reparented from GameViewport to this window during ShowOverlay and back during HideOverlay/DestroyVulkanOverlay.

Teardown (ParaLLEl-RDP / Vulkan)

Emu thread:  context_destroy()        <- tells ParaLLEl-RDP to release Vulkan objects
             retro_unload_game()
             retro_deinit()
             VulkanContext.Dispose()   <- destroy swapchain/surface; DEFER device/instance (leak)
             DestroyVulkanOverlay()    <- destroy popup window on UI thread
             _vulkanTeardownComplete = true
             thread exits

Background:  join(emu_thread, 10s)
             _core.Dispose()           <- skips unload/deinit (already done), sets DeferredFreeHandle
             Stash DLL handle in _staleDllHandle (no FreeLibrary yet!)

Next launch: EmulatorWindow.FreeStaleDll()  <- FreeLibrary BEFORE new LoadLibrary!
             new LibretroCore(corePath)      <- LoadLibrary gets fresh DLL with clean globals

Critical ordering: FreeLibrary MUST happen BEFORE LoadLibrary. If reversed, LoadLibrary increments the refcount on the still-loaded DLL (1->2), then FreeLibrary only decrements it (2->1) — the DLL never unloads and globals stay stale, causing a crash in retro_run on the second session. FreeStaleDll() is called in GameDetailWindows before new LibretroCore() to ensure correct ordering.

Why leak VkDevice/VkInstance: Because mupen64plus-next statically links GlideN64, nvoglv64.dll (NVIDIA's unified GL+Vulkan driver) spawns background threads even when using ParaLLEl-RDP. If VkDevice is destroyed during teardown, those threads AV on freed memory. If VkDevice is kept alive, threads access valid memory and exit naturally. Destroying the deferred VkDevice at next session start also crashes — nvoglv64.dll writes a -1 sentinel to its internal tables during vkDestroyDevice, then the next vkCreateDevice reads it and AVs. The old VkDevice/VkInstance simply leak (small, reclaimed on process exit).

VEH safety net: A vectored exception handler catches ALL post-teardown AVs (after _vulkanTeardownComplete is set) and redirects faulting threads to ExitThread(0). This handles residual driver threads that AV on the destroyed swapchain/surface.

Vulkan Implementation Gotchas

VkImage Tracking

The core's set_image callback provides a VkImageView, but readback (fallback path) needs the VkImage. In retro_vulkan_image, the VkImage is at offset 40 — it's the ci_image field from an embedded VkImageViewCreateInfo struct. Read it directly:

IntPtr vkImage = Marshal.ReadIntPtr(imagePtr, 40);

Shifted Video Fix (retro_hw_render_interface_vulkan Layout)

The retro_hw_render_interface_vulkan struct has two function pointer fields — get_instance_proc_addr and get_device_proc_addr — between the device and queue fields. If these are omitted from the struct definition, every field after device is offset by 16 bytes (two 8-byte pointers on x64). The core reads queue, queue_index, set_image, get_sync_index, etc. from the wrong memory locations. The visual symptom is that the game display appears off-center, shifted to the left.

Correct layout (x64):
  offset  0: interface_type (uint)
  offset  4: interface_version (uint)
  offset  8: handle (IntPtr)
  offset 16: instance (VkInstance)
  offset 24: gpu (VkPhysicalDevice)
  offset 32: device (VkDevice)
  offset 40: get_instance_proc_addr (PFN)  <- EASY TO MISS
  offset 48: get_device_proc_addr (PFN)    <- EASY TO MISS
  offset 56: queue (VkQueue)
  offset 64: queue_index (uint)
  offset 72: set_image (PFN)
  ...

Struct Alignment Gotcha

VkPhysicalDeviceMemoryProperties contains VkMemoryHeap entries which are 16 bytes each (not 12), due to 8-byte alignment padding on the VkDeviceSize field. Using 12-byte structs causes a buffer overflow that corrupts adjacent memory — manifests as random crashes in unrelated code.

Controller Pak Swap (Memory ↔ Rumble)

The N64 controller has a single accessory slot — it can hold either a Memory Pak (saves) or a Rumble Pak (vibration), but not both at once. This is a hardware limitation of the original console, and games that use both (Forsaken 64, Banjo-Kazooie, Ocarina of Time, etc.) only see whichever pak is currently inserted.

When playing an N64 game, the cog menu in the in-game overlay shows a P1 Pak entry. Clicking it cycles Player 1's accessory between Memory Pak and Rumble Pak. The change is persisted to the core's saved options and the core picks it up immediately via check_variables() — no relaunch required for parallel_n64.

A typical workflow for a game that wants both:

  1. Boot the game with P1 Pak: Memory Pak so it can find existing saves.
  2. After the title screen / save load, swap to P1 Pak: Rumble Pak for gameplay feedback.
  3. Swap back to Memory Pak before saving in-game.

For non-Player-1 ports or for the Transfer Pak (titles that use it for save transfer with cartridge games), use the full Core Options panel to set parallel-n64-pak2/3/4 or pick transfer directly.

Emulation Speed / Timing

N64 emulation ran at a locked 48fps on a 144Hz monitor despite the core running at full speed. The root cause: an audio-driven timing loop waited for the audio buffer to drain before calling retro_run. On Windows, WaveOut drains in steps synchronized to the DWM compositor (144 / 3 = 48 drain steps/sec), capping emulation at 48fps.

The fix for Vulkan path: use audio-drain timing with relaxed thresholds (prefillMs=250, lowWatermark=120, backpressureMs=500) since the GPU presentation is self-throttling via swapchain vsync.

Graphics Plugins (parallel_n64 — legacy core)

parallel_n64 has a different set of plugins via parallel-n64-gfxplugin:

Plugin Option Value Renderer Status
parallel (ParaLLEl-RDP) parallel Vulkan compute Best option if using this core
glide64 glide64 OpenGL Freezes at resolutions above native

Core Options (parallel_n64)

parallel-n64-gfxplugin                    = parallel
parallel-n64-parallel-rdp-upscaling       = 4x       (1x / 2x / 4x / 8x)
parallel-n64-parallel-rdp-synchronous     = enabled   (disabled causes race condition crashes)
parallel-n64-cpucore                      = dynamic_recompiler
parallel-n64-audio-buffer-size            = 2048      (only supports 2048 or 1024; 4096 is rejected)
parallel-n64-pak1                         = memory

Note: parallel-n64-screensize only affects glide64, NOT ParaLLEl-RDP.

Clone this wiki locally