-
Notifications
You must be signed in to change notification settings - Fork 2
Nintendo 64
Emutastic supports two libretro N64 cores: parallel_n64 (default) and mupen64plus-next. Both are based on mupen64plus but differ significantly in architecture and plugin support.
| Core | DLL | Status |
|---|---|---|
| parallel_n64 | parallel_n64_libretro.dll |
Default — pairs cleanly with ParaLLEl-RDP (Vulkan); what Emutastic launches a N64 game with unless overridden per-game |
| mupen64plus-next | mupen64plus_next_libretro.dll |
Alternative — supports ParaLLEl-RDP (Vulkan) and GlideN64 (OpenGL); statically links GlideN64 which has teardown implications (see below) |
The N64 renders natively at 320x240. With ParaLLEl-RDP (the default renderer for both cores), upscaling is handled entirely on the GPU via Vulkan compute:
| Option | Native | Upscaled (suggested) |
|---|---|---|
parallel-rdp-upscaling* |
1x |
4x (sharp, fast on modern GPUs) |
parallel-rdp-vi-aa |
enabled |
enabled (preserves N64's VI anti-aliasing) |
parallel-rdp-vi-bilinear |
enabled |
enabled |
parallel-rdp-super-sampled-read-back |
False |
True (SSAA — heavier but cleaner) |
* Option key prefix is parallel-n64- for parallel_n64 or mupen64plus- for mupen64plus-next.
Note: ParaLLEl-RDP upscaling requires a game restart to take effect. Unlike most other cores, changing the upscaling level mid-game has no effect until the game is relaunched.
If using GlideN64 (mupen64plus-next only), resolution is instead controlled by mupen64plus-43screensize / mupen64plus-169screensize, and additional options include MSAA (up to 16x), FXAA, and texture enhancement filters (xBRZ, HQ variants).
mupen64plus-next selects its RDP plugin via the mupen64plus-rdp-plugin core option:
| Plugin | Option Value | Renderer | Status |
|---|---|---|---|
| ParaLLEl-RDP | parallel |
Vulkan compute | Recommended — bit-accurate LLE, GPU-accelerated, 1x–8x upscaling |
| GlideN64 | gliden64 |
OpenGL | Works out of the box — HLE, good compatibility, resolution scaling via screensize options |
Important: mupen64plus-next statically links GlideN64 even when using ParaLLEl-RDP. This means nvoglv64.dll (NVIDIA's unified GL+Vulkan driver) is always loaded and spawns background threads regardless of which RDP plugin is active. This has major implications for teardown — see below.
GlideN64 uses the standard libretro OpenGL HW rendering path (RETRO_HW_CONTEXT_OPENGL). The frontend creates an FBO, the core renders into it, and the frontend reads back the pixels. This is the same path used by PPSSPP, Dolphin, and other OpenGL cores.
GlideN64 appeared to work out of the box during initial testing — resolution scaling, save states, and basic gameplay all functioned without special handling. More thorough testing is pending.
mupen64plus-rdp-plugin = gliden64
mupen64plus-43screensize = 640x480 (or higher, e.g. 1280x960)
mupen64plus-cpucore = dynamic_recompiler
mupen64plus-pak1 = memory
Resolution is controlled by mupen64plus-43screensize (4:3 mode) or mupen64plus-169screensize (16:9 mode).
Uses the standard GL teardown path. Key rules specific to mupen64plus-next:
Emu thread: retro_unload_game()
skip context_destroy (EmuThread still running cleanup via co_switch)
retro_deinit() ← GL context must still be current
wglMakeCurrent(NULL)
thread exits
Background: join(emu_thread)
Sleep(1500ms) ← drain NVIDIA driver callbacks
wglDeleteContext()
Sleep(500ms)
FreeLibrary() ← must be AFTER wglDeleteContext
-
retro_deinitmust run with a current GL context (OPENGL32.dll dispatch table) - Skip
context_destroy— EmuThread is still running cleanup -
wglDeleteContextafter a delay (NVIDIA driver callbacks) -
FreeLibraryafterwglDeleteContext(driver calls back into core DLL) - All cleanup must be synchronous before relaunching (mupen64plus uses global state)
ParaLLEl-RDP is a bit-accurate N64 RDP renderer that runs entirely on GPU compute shaders via Vulkan. It trivially handles 4x and 8x upscaling on modern GPUs (tested on RTX 5080).
All of the Vulkan integration, swapchain presentation, teardown/relaunch fixes, and HUD overlay work documented below was done using mupen64plus-next with parallel (ParaLLEl-RDP).
Unlike OpenGL cores where the frontend creates an FBO and reads back pixels, the Vulkan path uses a negotiation interface:
- Core sends
SET_HW_RENDERwithcontext_type = RETRO_HW_CONTEXT_VULKAN(6) - Core sends
SET_HW_RENDER_CONTEXT_NEGOTIATION_INTERFACE— a struct withget_application_infoandcreate_devicecallbacks -
Frontend creates
VkInstance(withVK_KHR_surface+VK_KHR_win32_surfaceextensions) - Frontend picks a physical device (prefer discrete GPU)
-
Core's
create_devicecallback creates theVkDevice(the core knows its own required extensions/features) - Frontend stores the device, queue, and queue family index
- After
retro_load_game, frontend callscontext_reset - Core requests
GET_HW_RENDER_INTERFACE— frontend providesretro_hw_render_interface_vulkanwith callbacks forset_image,get_sync_index,lock_queue, etc. - Each frame, the core calls
set_imagewith aretro_vulkan_imagecontaining the renderedVkImageView
The naive approach — vkCmdCopyImageToBuffer into a staging buffer, memcpy to CPU, upload to WPF WriteableBitmap — works but is slow. At 4x+ upscaling the GPU->CPU->GPU round-trip becomes the bottleneck.
The correct approach is Vulkan swapchain presentation: create a VkSurfaceKHR on a native window, build a swapchain, and use vkCmdBlitImage to scale the core's rendered image directly to the swapchain image. The GPU handles the blit (with linear filtering) and presents — zero CPU involvement per frame.
Emutastic's window uses AllowsTransparency="True" + WindowStyle="None" for custom chrome. This creates a Win32 layered window, which has a fundamental limitation: child windows (WS_CHILD) are invisible in layered windows. WPF's HwndHost creates child windows, so it can't be used.
The solution is a top-level popup window (WS_POPUP) owned by the emulator window:
// WS_POPUP = top-level, not affected by parent's AllowsTransparency
// WS_EX_NOACTIVATE = doesn't steal keyboard focus from WPF
hwnd = CreateWindowEx(
WS_EX_NOACTIVATE, "Static", "",
WS_POPUP | WS_VISIBLE | WS_CLIPSIBLINGS,
x, y, width, height,
ownerHwnd, // owned by EmulatorWindow (not parented)
...);The overlay is positioned over the game viewport using PointToScreen, and kept in sync via LocationChanged / SizeChanged / StateChanged events.
RecreateSwapchain calls vkDeviceWaitIdle + destroy + create — calling this on every SizeChanged event (every pixel of a window drag) overwhelms the NVIDIA driver and crashes the app. The fix: reposition the Win32 overlay instantly via SetWindowPos (cheap), but debounce the actual swapchain recreation with a 150ms DispatcherTimer. One brief stutter when resizing stops is inherent to Vulkan swapchain recreation. Applies to both mupen64plus-next and parallel_n64.
Core renders frame (Vulkan compute) -> set_image(VkImageView)
|
Frontend: vkAcquireNextImageKHR (swapchain)
|
Barrier: core image -> TRANSFER_SRC_OPTIMAL
Barrier: swap image -> TRANSFER_DST_OPTIMAL
|
vkCmdBlitImage (e.g. 2560x960 -> 1278x960, linear filter)
|
Barrier: core image -> original layout
Barrier: swap image -> PRESENT_SRC_KHR
|
vkQueueSubmit (semaphore sync)
|
vkQueuePresentKHR
mupen64plus-rdp-plugin = parallel
mupen64plus-parallel-rdp-upscaling = 4x (1x / 2x / 4x / 8x)
mupen64plus-parallel-rdp-synchronous = True (False causes race condition crashes)
mupen64plus-cpucore = dynamic_recompiler
mupen64plus-pak1 = memory
Note:
mupen64plus-43screensizeonly affects GlideN64/glide64, NOT ParaLLEl-RDP. Upscaling is controlled solely bymupen64plus-parallel-rdp-upscaling.
The Vulkan swapchain renders on a WS_POPUP window that sits on top of the WPF window, covering all WPF UI including the OverlayHud pill. The fix: a separate transparent WPF Window (_vulkanHudWindow) floats above the Vulkan overlay. The OverlayHud StackPanel is reparented from GameViewport to this window during ShowOverlay and back during HideOverlay/DestroyVulkanOverlay.
Emu thread: context_destroy() <- tells ParaLLEl-RDP to release Vulkan objects
retro_unload_game()
retro_deinit()
VulkanContext.Dispose() <- destroy swapchain/surface; DEFER device/instance (leak)
DestroyVulkanOverlay() <- destroy popup window on UI thread
_vulkanTeardownComplete = true
thread exits
Background: join(emu_thread, 10s)
_core.Dispose() <- skips unload/deinit (already done), sets DeferredFreeHandle
Stash DLL handle in _staleDllHandle (no FreeLibrary yet!)
Next launch: EmulatorWindow.FreeStaleDll() <- FreeLibrary BEFORE new LoadLibrary!
new LibretroCore(corePath) <- LoadLibrary gets fresh DLL with clean globals
Critical ordering: FreeLibrary MUST happen BEFORE LoadLibrary. If reversed, LoadLibrary increments the refcount on the still-loaded DLL (1->2), then FreeLibrary only decrements it (2->1) — the DLL never unloads and globals stay stale, causing a crash in retro_run on the second session. FreeStaleDll() is called in GameDetailWindows before new LibretroCore() to ensure correct ordering.
Why leak VkDevice/VkInstance: Because mupen64plus-next statically links GlideN64, nvoglv64.dll (NVIDIA's unified GL+Vulkan driver) spawns background threads even when using ParaLLEl-RDP. If VkDevice is destroyed during teardown, those threads AV on freed memory. If VkDevice is kept alive, threads access valid memory and exit naturally. Destroying the deferred VkDevice at next session start also crashes — nvoglv64.dll writes a -1 sentinel to its internal tables during vkDestroyDevice, then the next vkCreateDevice reads it and AVs. The old VkDevice/VkInstance simply leak (small, reclaimed on process exit).
VEH safety net: A vectored exception handler catches ALL post-teardown AVs (after _vulkanTeardownComplete is set) and redirects faulting threads to ExitThread(0). This handles residual driver threads that AV on the destroyed swapchain/surface.
The core's set_image callback provides a VkImageView, but readback (fallback path) needs the VkImage. In retro_vulkan_image, the VkImage is at offset 40 — it's the ci_image field from an embedded VkImageViewCreateInfo struct. Read it directly:
IntPtr vkImage = Marshal.ReadIntPtr(imagePtr, 40);The retro_hw_render_interface_vulkan struct has two function pointer fields — get_instance_proc_addr and get_device_proc_addr — between the device and queue fields. If these are omitted from the struct definition, every field after device is offset by 16 bytes (two 8-byte pointers on x64). The core reads queue, queue_index, set_image, get_sync_index, etc. from the wrong memory locations. The visual symptom is that the game display appears off-center, shifted to the left.
Correct layout (x64):
offset 0: interface_type (uint)
offset 4: interface_version (uint)
offset 8: handle (IntPtr)
offset 16: instance (VkInstance)
offset 24: gpu (VkPhysicalDevice)
offset 32: device (VkDevice)
offset 40: get_instance_proc_addr (PFN) <- EASY TO MISS
offset 48: get_device_proc_addr (PFN) <- EASY TO MISS
offset 56: queue (VkQueue)
offset 64: queue_index (uint)
offset 72: set_image (PFN)
...
VkPhysicalDeviceMemoryProperties contains VkMemoryHeap entries which are 16 bytes each (not 12), due to 8-byte alignment padding on the VkDeviceSize field. Using 12-byte structs causes a buffer overflow that corrupts adjacent memory — manifests as random crashes in unrelated code.
The N64 controller has a single accessory slot — it can hold either a Memory Pak (saves) or a Rumble Pak (vibration), but not both at once. This is a hardware limitation of the original console, and games that use both (Forsaken 64, Banjo-Kazooie, Ocarina of Time, etc.) only see whichever pak is currently inserted.
When playing an N64 game, the cog menu in the in-game overlay shows a P1 Pak entry. Clicking it cycles Player 1's accessory between Memory Pak and Rumble Pak. The change is persisted to the core's saved options and the core picks it up immediately via check_variables() — no relaunch required for parallel_n64.
A typical workflow for a game that wants both:
- Boot the game with P1 Pak: Memory Pak so it can find existing saves.
- After the title screen / save load, swap to P1 Pak: Rumble Pak for gameplay feedback.
- Swap back to Memory Pak before saving in-game.
For non-Player-1 ports or for the Transfer Pak (titles that use it for save transfer with cartridge games), use the full Core Options panel to set parallel-n64-pak2/3/4 or pick transfer directly.
N64 emulation ran at a locked 48fps on a 144Hz monitor despite the core running at full speed. The root cause: an audio-driven timing loop waited for the audio buffer to drain before calling retro_run. On Windows, WaveOut drains in steps synchronized to the DWM compositor (144 / 3 = 48 drain steps/sec), capping emulation at 48fps.
The fix for Vulkan path: use audio-drain timing with relaxed thresholds (prefillMs=250, lowWatermark=120, backpressureMs=500) since the GPU presentation is self-throttling via swapchain vsync.
parallel_n64 has a different set of plugins via parallel-n64-gfxplugin:
| Plugin | Option Value | Renderer | Status |
|---|---|---|---|
| parallel (ParaLLEl-RDP) | parallel |
Vulkan compute | Best option if using this core |
| glide64 | glide64 |
OpenGL | Freezes at resolutions above native |
parallel-n64-gfxplugin = parallel
parallel-n64-parallel-rdp-upscaling = 4x (1x / 2x / 4x / 8x)
parallel-n64-parallel-rdp-synchronous = enabled (disabled causes race condition crashes)
parallel-n64-cpucore = dynamic_recompiler
parallel-n64-audio-buffer-size = 2048 (only supports 2048 or 1024; 4096 is rejected)
parallel-n64-pak1 = memory
Note:
parallel-n64-screensizeonly affects glide64, NOT ParaLLEl-RDP.
Console Notes
- Nintendo 64
- Nintendo 3DS
- GameCube
- Sega Saturn
- Dreamcast
- PlayStation
- PlayStation 2
- PlayStation Portable
- TurboGrafx-CD
- Neo Geo
- Arcade
- Vectrex
- Philips CD-i
- Atari Jaguar
Features
- Artwork & Metadata
- Cheats
- Cloud Sync
- Disc-Based Systems
- Disk Swapping
- Portable Mode
- RetroAchievements
- ROM Hacks
- Hardcore Compliance
Technical
Platforms
Legal