Core changes
- NS: Rewrite implementation in preparation for applet support
- Frontend: Clean up error message on files not found
- OS: Allow non-FIRM processes to use SVCOpenProcess
The NS module needs this system call to open applet processes
- NS: More applet work, and implemented System Font support
- Add GenericImageFormat for I4 textures
Required by Cubic Ninja’s level select once more than one level is cleared.
- Rearrange default key bindings to reflect 3DS button layout more closely
- Implement circle pad input
- GSP/DSP: Implement data cache flushing
- OS: Remove obsolete virtual address translation workaround
GSP now forwards the client process handle to OS rather than its own one, hence these addresses can be looked up without any issues now
- Add Just-In-Time binary translator for CPU emulation (“Harmonic”)
- Harmonic: Support binary translation of the ARM instructions MUL, BLX, LDRSB, and MSR
- Harmonic: Fix mode switches between ARM and Thumb Support barrel shifter operations involving multiple registers (logical shift left by register etc)
- Harmonic: Support computing conditional execution flags for a wider range of functions
- Harmonic: Fix CMN incorrectly writing the barrel shifter output to a false destination register
- Harmonic: Add support for more Thumb instructions
- Harmonic: Implement LDR instructions that use PC as the target register
- Harmonic: Require interpreter fallback to not modify processor mode or program counter
This will enable us to use a much more efficient method of emitting this fallback
- Harmonic: Inline calls to interpreter fallback into generated functions
- Harmonic: Add quicker variant of interpreter fallback that performs the instruction dispatch at JIT time rather than runtime
- Harmonic: Support addressing mode 1 instructions that write to the program counter
- Refactor CPU engine interfaces
There are now two main interfaces (Processor and ExecutionContext) implemented by each CPU engine (interpreter/JIT).
This decouples the CPU engine from most other parts of the emulator, hence reducing unnecessary rebuilding when editing CPU related files.
- Harmonic: Add analysis pass to enable compiling functions in groups
- Harmonic: Fully utilize the analysis prepass
- Static branches within a function are now translated to branches to the corresponding basic block rather than to a tail function calls (better performance)
- Static branches to functions now never recursively call GetOrCompileFunction since the target function can be resolved directly (removed risk of overflowing the stack due to deep recursion)
Overall, this change reduces compilation overhead slightly due to compiling functions in groups. Ingame latency is significantly reduced due to most compilation work being done upfront
- Harmonic: Implement SXTAH/UXTAH
- Harmonic: Start implementing the VFP instruction set
- Harmonic: Optimize symbol lookup for JIT-external functions
Function names exposed to the JIT are now compressed down to two characters, based on which the function to call can quickly be looked up from a table.
- Harmonic: First prototype for background compilation
With this change, the emulator falls back to the interpreter when just-in-time-compiling new code on a background thread. This massively reduces perceived latency.
- Harmonic: Move analysis to main CPU emulator thread and submit preanlalyzed function sets for binary translation
This vastly reduces the pressure on the translation queue and minimizes redundancy.
- Harmonic: Put background compilation thread to sleep when no new functions are queued
Spinlocking on the background thread triggered thermal throttling quickly and cut performance in half. Putting the thread fully to sleep is better for power consumption and hence allows the main emulation thread to run at optimal performance.
- Harmonic: Optimize handling CPSR conditional execution flags
Instead of decoding/encoding the flags into the main CPSR register after every emulated instruction, the flags are now stored in internal registers each and encoded/decoding when converting to/from ARM::State instead.
This both simplifies the JIT implementation and reduces the size of the generated LLVM IR.
- Harmonic: Optimize reads to data in read-only memory
Instead of jumping out of JIT code to call ReadMemory32, these reads can be replaced with the constant result.
- Harmonic: Disable LLVM optimization passes
This can vastly reduce compilation latency in games that compile thousands of blocks at once
- Memory: Revamp subsystem for better performance
Physical memory address ranges can now safely be accessed through host pointers instead of requiring a full bus lookup each time.
Similarly, virtual memory address ranges can now be translated to pointer-size pairs to avoid redundant address space translation in subsequent accesses.
- Memory: Enable external subsystems to subscribe to modifications to emulated memory pages
This can e.g. be used to invalidate host GPU resources generated from emulated memory.
GPU core changes
- Apply proper memory bounds for display transfers
- Migrate to page-based memory accesses and replace manual image format handling with GenericImageFormat
- ResourceManager: Invalidate host resources upon modification by the guest
Obsoletes the need for the “Eager Texture Cache” option.
- Move texture management to ResourceManager
Managing textures and render targets in the same place will allow us to support render-to-texture in the future, and as a side benefit reduces code duplication a bit.
- Add support for I4 textures
Required by Cubic Ninja’s level select once more than one level is cleared.
UI changes
- Add CirclePad overlay
- Remove “Disable D-pad” option (not needed anymore since there is now a circle pad overlay)
- Remove “Eager texture cache” option (not needed anymore thanks to GPU emulator core improvements)
- Add toggle for JIT-based CPU emulation using Harmonic