Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
f309ac2
Open disassembly file as unicode
suleram Feb 15, 2026
ca2c158
Propagate arguments passed by two-dimensional Scopes
suleram Feb 15, 2026
285ac21
Include/exclude defined functions from decompilation
suleram Feb 15, 2026
101e179
Removed quotes around global symbols
suleram Feb 15, 2026
b11b4bd
Fixed mapping Scope variables
suleram Feb 15, 2026
cdc2129
Added .gitignore
hasherezade Feb 15, 2026
0934fc7
[FEATURE] Changed commandline arguments
hasherezade Feb 15, 2026
6f60392
Fixed README
hasherezade Feb 15, 2026
707941a
[BUGFIX] Make the input argument required
hasherezade Feb 15, 2026
9b4b03a
[FEATURE] Allow to serialize/deserialize decompiled files
hasherezade Feb 15, 2026
7b3351f
[FEATURE] Allow to split output into separate files. Refactoring
hasherezade Feb 15, 2026
32be7c2
[BUGFIX] Substitute ConstPool values not only in the visible lines
hasherezade Feb 15, 2026
bf10b37
[FEATURE] Added verbosity argument
hasherezade Feb 15, 2026
ab581e8
[BUGFIX] Fixed missing verbosity argument
hasherezade Feb 15, 2026
0ec27a4
[FEATURE] Added visibility and metadata to CodeLine and to FunctionInfo
hasherezade Feb 15, 2026
a43de3f
[FEATURE] Added _global prefix to global variables
hasherezade Feb 15, 2026
118fa4d
[BUGFIX] In global_scope_replace: improved patterns. Keep replacing t…
hasherezade Feb 15, 2026
bb10c1d
[FEATURE] In view8_util: added new util functions. Included and exclu…
hasherezade Feb 15, 2026
a518690
[NOBIN] Set view8_util as non executable
hasherezade Feb 15, 2026
c8599d4
[FEATURE] Added module init
hasherezade Feb 15, 2026
42a26d9
[BUGFIX] Changed unicode to utf-8
hasherezade Feb 15, 2026
671df73
[REFACT] Cleanup and optimizations
hasherezade Feb 15, 2026
9fe07fa
[BUGFIX] In get_relative_offset: validate offsets, fixed off by one
hasherezade Feb 15, 2026
a891806
[BUGFIX] Fixed typos in argument descriptions
hasherezade Feb 18, 2026
f012470
[REFACT] Removed unused variable
hasherezade Feb 18, 2026
b98c175
[BUGFIX] Fixed typo in the variable name
hasherezade Feb 18, 2026
87b0823
[NOBIN] Fixed indentations
hasherezade Feb 18, 2026
fec4adb
[REFACT] Fixed comparison convention
hasherezade Feb 18, 2026
8fd266b
[BUGFIX] Fixed argument description
hasherezade Feb 18, 2026
84818a7
[BUGFIX] More robust deletion from a map
hasherezade Feb 18, 2026
6ed8314
[BUGFIX] In GlobalVars: has_value - protect against searching values …
hasherezade Feb 18, 2026
5153161
[REFACT] Fixed convention: comparison with None
hasherezade Feb 18, 2026
8605843
[BUGFIX] Use utf-8 encoding for exporting output to file
hasherezade Feb 18, 2026
789936d
[BUGFIX] Fixed invalid variable name
hasherezade Feb 18, 2026
d94e98c
[BUGFIX] More robust LHS/RHS splitting
hasherezade Feb 18, 2026
ed8e705
[BUGFIX] In is_root: protect against None argument
hasherezade Feb 18, 2026
83c905b
[BUGFIX] In rename_functions_in_code: don't skip line with index 0
hasherezade Feb 18, 2026
576d373
[BUGFIX] In split_trees: protect against current function not present…
hasherezade Feb 18, 2026
0ff0788
[REFACT] In init: insert View8 path at the last position
hasherezade Feb 18, 2026
c10a368
[REFACT] Renamed function: get_all_children to get_declared_children
hasherezade Feb 18, 2026
6e644dc
[BUGFIX] Removed g_GlobalVars singleton
hasherezade Feb 18, 2026
1b48dd6
[BUGFIX] In resolve_global_name: guard against empty name mappings
hasherezade Feb 18, 2026
10b7523
[REFACT] Refactored annotation for backward compat.
hasherezade Feb 18, 2026
0c5082c
[BUGFIX] In global replacements: if index not found in const_pool, le…
hasherezade Feb 18, 2026
db9de7f
[BUGFIX] In replace_global_scope: more robust fetching of keys
hasherezade Feb 18, 2026
c6f5bfa
[BUGFIX] In save_trees: use excluded_list to filter functions
hasherezade Feb 18, 2026
da083f6
[REFACT] Cleanup global_scope_replace, added a comment
hasherezade Feb 18, 2026
c3ef3e2
[NOBIN] Documented a function export_to_file
hasherezade Feb 18, 2026
9ac9b28
[BUGFIX] Fixed searching for the assignment op
hasherezade Feb 18, 2026
83dcc7a
[BUGFIX] In replace_const_pool: fixed index check
hasherezade Feb 18, 2026
c4f7182
[BUGFIX] Added error checks in tree splitting
hasherezade Feb 18, 2026
85dc767
[BUGFIX] Added error check in get_start_function
hasherezade Feb 18, 2026
72d3d85
[REFACT] More precise definition of pattern recognizing functions
hasherezade Feb 18, 2026
10a40ae
[NOBIN] Fixed grammar in function comment
hasherezade Feb 18, 2026
aef5ca1
[NOBIN] Improved a comment
hasherezade Feb 18, 2026
4494c7b
[NOBIN] In GlobalVars: annotate output types
hasherezade Feb 18, 2026
cedb704
[BUGFIX] Prevent from loading empty lines to the set of included/excl…
hasherezade Feb 18, 2026
ade829b
[BUGFIX] In fill_global_variables: protect against fetching undefined…
hasherezade Feb 18, 2026
3a58beb
[BUGFIX] When renaming functions: rename declarers too
hasherezade Feb 18, 2026
8ede281
[REFACT] Fixed input_format argument
hasherezade Feb 18, 2026
5ed6722
[NOBIN] Fixed indentation
hasherezade Feb 18, 2026
f7b0143
[NOBIN] Improved style
hasherezade Feb 18, 2026
b34c066
[NOBIN] Removed a redundant space
hasherezade Feb 18, 2026
096b5c4
[REFACT] In GlobalVars: optimization, cleaned up searching functions …
hasherezade Feb 18, 2026
1695ff7
[REFACT] Convention cleanups
hasherezade Feb 18, 2026
c9c1d84
[BUGFIX] Ensure utf-8 encoding (in `get_next_line(file)`)
hasherezade Feb 18, 2026
344646b
[BUGFIX] In get_start_function: don't use iterator to navigate the co…
hasherezade Feb 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*/__pycache__/*
__pycache__/*
data/
*.bak

27 changes: 17 additions & 10 deletions Parser/parse_v8cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,24 @@ def run_disassembler_binary(binary_path, file_name, out_file_name):
)

# Open the output file in write mode
with open(out_file_name, 'w') as outfile:
with open(out_file_name, 'w', encoding="utf-8", errors="replace") as outfile:
# Call the binary with the file name as argument and pipe the output to the file
try:
result = subprocess.run([binary_path, file_name], stdout=outfile, stderr=subprocess.PIPE, text=True)

# Check the return status code
if result.stderr:
raise RuntimeError(
f"Binary execution failed with status code {result.returncode}: {result.stderr.strip()}")
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Error calling the binary: {e}")
result = subprocess.run(
[binary_path, file_name],
stdout=outfile,
stderr=subprocess.PIPE,
text=True,
)

# Treat only non-zero exit codes as failure. Some tools may emit warnings to stderr on success.
if result.returncode != 0:
err = (result.stderr or "").strip()
raise RuntimeError(
f"Binary execution failed with status code {result.returncode}." + (f" Stderr: {err}" if err else "")
)

if result.stderr:
print(f"[!] Disassembler stderr: {result.stderr.strip()}")


def parse_v8cache_file(file_name, out_name, view8_dir, binary_path):
Expand Down
8 changes: 5 additions & 3 deletions Parser/sfi_file_parser.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from Parser.shared_function_info import SharedFunctionInfo, CodeLine
from parse import parse
import re
import json

all_functions = {}
repeat_last_line = False
Expand All @@ -12,7 +13,7 @@ def set_repeat_line_flag(flag):


def get_next_line(file):
with open(file) as f:
with open(file, encoding='utf-8', errors='ignore') as f:
for line in f:
line = line.strip()
if not line:
Expand Down Expand Up @@ -75,8 +76,9 @@ def parse_const_line(lines, func_name):
if not address:
return var_idx, value
if value.startswith("<String"):
value = value.split("#", 1)[-1].rstrip('> ').replace('"', '\\"')
return var_idx, f'"{value}"'
value = json.dumps(value.split("#", 1)[-1].rstrip('> ')) #.replace('"', '\\"')
#return var_idx, f'"{value}"'
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed. If the old string formatting method is no longer needed, this commented line should be deleted rather than left in place.

Suggested change
#return var_idx, f'"{value}"'

Copilot uses AI. Check for mistakes.
return var_idx, value
if value.startswith("<SharedFunctionInfo"):
value = value.split(" ", 1)[-1].rstrip('> ') if " " in value else ""
return var_idx, parse_shared_function_info(lines, value, func_name)
Expand Down
179 changes: 164 additions & 15 deletions Parser/shared_function_info.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,83 @@
from Translate.translate import translate_bytecode
from Translate.jump_blocks import CodeLine
from Simplify.simplify import simplify_translated_bytecode

import re
import pickle
from typing import Dict, List, Optional, Union

class CodeLine:
def __init__(self, opcode="", line="", inst="", translated="", decompiled=""):
self.v8_opcode = opcode
self.line_num = line
self.v8_instruction = inst
self.translated = translated
self.decompiled = decompiled
self.visible = True
###

class GlobalVars:
_STRING_RE = re.compile(r'"([^"\\]*(?:\\.[^"\\]*)*)"')
_FUNC_RE = re.compile(r'\b(func_([A-Za-z0-9_$]+)_0x[0-9a-fA-F]+)\b')

def __init__(self):
self.strings_set = None
self.funcs_map = None

def parse(self, value) -> bool:
is_parsed = False

strings = set(self._STRING_RE.findall(value))
funcs = list(self._FUNC_RE.finditer(value))

if strings:
is_parsed = True
self.strings_set = (self.strings_set or set())
self.strings_set.update(strings)

if funcs:
is_parsed = True
self.funcs_map = (self.funcs_map or {})

for match in funcs:
full_name = match.group(1)
short_name = match.group(2)
self.funcs_map[short_name] = full_name

return is_parsed

def is_filled(self) -> bool:
if self.strings_set or self.funcs_map:
return True
return False

def has_value(self, value) -> bool:
if self.strings_set is not None:
val = value.strip('"')
if (value in self.strings_set or val in self.strings_set):
return True
if self.funcs_map is not None:
if value in self.funcs_map.keys():
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient iteration: The expression 'value in self.funcs_map.keys()' is inefficient. The .keys() call is unnecessary in Python - you can use 'value in self.funcs_map' directly, which is more idiomatic and efficient.

Copilot uses AI. Check for mistakes.
return True
return False

def resolve_global_name(self, value) -> Optional[str]:

def _is_string(value):
if value.startswith('"') and value.endswith('"'):
return True
return False

if not self.is_filled():
return None

if not _is_string(value):
return None

val = value.strip('"')
if self.strings_set is not None:
if (value in self.strings_set or val in self.strings_set):
return "global_" + val

if self.funcs_map is not None:
if val in self.funcs_map:
return self.funcs_map[val]

return None

###

class SharedFunctionInfo:
def __init__(self):
Expand All @@ -22,6 +89,8 @@ def __init__(self):
self.code = None
self.const_pool = None
self.exception_table = None
self.visible = True
self.metadata = None

def is_fully_parsed(self):
return all(
Expand All @@ -40,18 +109,62 @@ def translate_bytecode(self):
def simplify_bytecode(self):
simplify_translated_bytecode(self, self.code)

def replace_const_pool(self):
replacements = {f"ConstPool[{idx}]": var for idx, var in enumerate(self.const_pool)}
def fill_global_variables(self, global_vars: GlobalVars):
"""
If the Global Vars were defined anywhere in this function, fill them in and store in the global structure.
"""

patternDef = re.compile(r'ConstPoolLiteral\[(\d+)\]')

for obj in self.code:
line = obj.decompiled
if "DeclareGlobals(" not in line:
continue
match = re.search(patternDef, line.strip())
if not match:
continue
index = int(match.group(1))
# Ensure const_pool exists and index is within valid bounds; otherwise skip
if self.const_pool is None or not (0 <= index < len(self.const_pool)):
continue
if global_vars.parse(self.const_pool[index]):
return True
return False

def replace_const_pool(self, global_vars: GlobalVars):

def _replacement(match):
index = int(match.group(2))
# Ensure const_pool exists and index is within valid bounds; otherwise leave unchanged
if self.const_pool is None or not (0 <= index < len(self.const_pool)):
return match.group(0) # Leave unchanged

value = self.const_pool[index]
if match.group(1) == "ConstPool": # Not ConstPoolLiteral

global_symbol = global_vars.resolve_global_name(value)
if global_symbol:
return global_symbol

return value.strip('"')
return value

# Regular expression to match patterns A[NUMBER] or B[NUMBER]
pattern = r'(ConstPoolLiteral|ConstPool)\[(\d+)\]'

#replacements = {f"ConstPool[{idx}]": var.strip('"') for idx, var in enumerate(self.const_pool)}
#replacements.update({f"ConstPoolLiteral[{idx}]": var for idx, var in enumerate(self.const_pool)})

for line in self.code:
if not line.visible:
if "ConstPool" not in line.decompiled:
continue
for const_id, var in replacements.items():
line.decompiled = line.decompiled.replace(const_id, var)
line.decompiled = re.sub(pattern, _replacement, line.decompiled)

def decompile(self):
def decompile(self, global_vars: GlobalVars):
self.translate_bytecode()
self.simplify_bytecode()
self.replace_const_pool()
self.fill_global_variables(global_vars)
self.replace_const_pool(global_vars)

def export(self, export_v8code=False, export_translated=False, export_decompiled=True):
export_func = self.create_function_header() + '\n'
Expand All @@ -70,3 +183,39 @@ def export(self, export_v8code=False, export_translated=False, export_decompiled
if export_line:
export_func += export_line + '\n'
return export_func

####

FunctionsBlob = Union[Dict[str, "SharedFunctionInfo"], List["SharedFunctionInfo"]]

# Helper function for serializing multiple functions
def serialize_functions(functions: FunctionsBlob) -> bytes:
"""Serialize decompiled output using pickle.

SECURITY NOTE:
Pickle is unsafe for untrusted input. Only load serialized files that you
generated yourself.
"""
return pickle.dumps(functions, protocol=pickle.HIGHEST_PROTOCOL)


def deserialize_functions(data: bytes) -> FunctionsBlob:
"""Deserialize decompiled output using pickle.

SECURITY NOTE:
Unpickling can execute arbitrary code. Do not load files from untrusted
sources.
"""
return pickle.loads(data)


def save_functions_to_file(functions: FunctionsBlob, filename: str):
"""Save decompiled output to a file (pickle)."""
with open(filename, 'wb') as f:
f.write(serialize_functions(functions))


def load_functions_from_file(filename: str) -> FunctionsBlob:
"""Load decompiled output from a file (pickle)."""
with open(filename, 'rb') as f:
return deserialize_functions(f.read())
32 changes: 22 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,31 +18,42 @@
<h2>Usage</h2>
<h3>Command-Line Arguments</h3>
<ul>
<li><code>input_file</code>: The input file name.</li>
<li><code>output_file</code>: The output file name.</li>
<li><code>--path</code>, <code>-p</code>: Path to disassembler binary (optional).</li>
<li><code>--disassembled</code>, <code>-d</code>: Indicate if the input file is already disassembled (optional).</li>
<li><code>--export_format</code>, <code>-e</code>: Specify the export format(s). Options are <code>v8_opcode</code>, <code>translated</code>, and <code>decompiled</code>. Multiple options can be combined (optional, default: <code>decompiled</code>).</li>
<li><code>--inp</code>, <code>-i</code>: The input file name</li>
<li><code>--out</code>, <code>-o</code>: Path to the output (depending on the type of the output, a single file or a directory tree may be generated)</li>
<li><code>--input_format</code>, <code>-f</code>: Indicate format of the input. Options are: <code>raw</code>: the output is a raw JSC file; <code>disassembled</code>: the input file is already disassembled; <code>serialized</code>: the input is already decompiled, and stored in a serialized format (pickle; trusted input only)</li>
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation inconsistency: Line 23 states "the output is a raw JSC file" but it should say "the input is a raw JSC file" since this describes the input_format parameter, not output.

Suggested change
<li><code>--input_format</code>, <code>-f</code>: Indicate format of the input. Options are: <code>raw</code>: the output is a raw JSC file; <code>disassembled</code>: the input file is already disassembled; <code>serialized</code>: the input is already decompiled, and stored in a serialized format (pickle; trusted input only)</li>
<li><code>--input_format</code>, <code>-f</code>: Indicate format of the input. Options are: <code>raw</code>: the input is a raw JSC file; <code>disassembled</code>: the input file is already disassembled; <code>serialized</code>: the input is already decompiled, and stored in a serialized format (pickle; trusted input only)</li>

Copilot uses AI. Check for mistakes.
<li><code>--export_format</code>, <code>-e</code>: Specify the export format(s). Options are <code>v8_opcode</code>, <code>translated</code>, <code>decompiled</code>, and <code>serialized</code>. Multiple options can be combined (optional, default: <code>decompiled</code>).</li>
<li><code>--path</code>, <code>-p</code>: Path to disassembler binary. Required if the input is in the raw format.</li>
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say --path is required for raw input, but the current code path auto-detects the V8 version and selects a bundled disassembler from Bin/ when --path is omitted. Consider updating this line to reflect that --path is only needed to override the auto-selected binary or when the bundled binaries aren’t available.

Suggested change
<li><code>--path</code>, <code>-p</code>: Path to disassembler binary. Required if the input is in the raw format.</li>
<li><code>--path</code>, <code>-p</code>: Path to disassembler binary. By default, View8 auto-detects the V8 version and uses a bundled disassembler from <code>Bin/</code>; use this option to override the auto-selected binary or when no suitable bundled binary is available for raw input.</li>

Copilot uses AI. Check for mistakes.
<li><code>--tree</code>, <code>-t</code>: Split output into a tree structure (rather than storing all functions in one file). Specify the function that will be used as a top node of the tree. To start from the default main function, use 'start' (optional).</li>
<li><code>--mainlimit</code>, <code>-l</code>: In tree mode: a tree with depth above this limit will be treated as different module than main (optional).</li>
<li><code>--include</code>, <code>-n</code>: Functions tree to Include in the output (optional).</li>
<li><code>--exclude</code>, <code>-x</code>: Functions tree to Exclude from the output (optional).</li>
</ul>

<h3>Basic Usage</h3>
<p>To decompile a V8 bytecode file and export the decompiled code:</p>
<pre><code>python view8.py input_file output_file</code></pre>
<pre><code>python view8.py -i input_file -o output_file</code></pre>
<h3>Disassembler Path</h3>
<p>By default, <code>view8</code> detects the V8 bytecode version of the input file (using <code>VersionDetector.exe</code>) and automatically searches for a compatible disassembler binary in the <code>Bin</code> folder. This can be changed by specifing a different disassembler binary, use the <code>--path</code> (or <code>-p</code>) option:</p>
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: 'specifing' should be 'specifying'.

Suggested change
<p>By default, <code>view8</code> detects the V8 bytecode version of the input file (using <code>VersionDetector.exe</code>) and automatically searches for a compatible disassembler binary in the <code>Bin</code> folder. This can be changed by specifing a different disassembler binary, use the <code>--path</code> (or <code>-p</code>) option:</p>
<p>By default, <code>view8</code> detects the V8 bytecode version of the input file (using <code>VersionDetector.exe</code>) and automatically searches for a compatible disassembler binary in the <code>Bin</code> folder. This can be changed by specifying a different disassembler binary, use the <code>--path</code> (or <code>-p</code>) option:</p>

Copilot uses AI. Check for mistakes.
<pre><code>python view8.py input_file output_file --path /path/to/disassembler</code></pre>
<pre><code>python view8.py -i input_file -o output_file --path /path/to/disassembler</code></pre>
<h3>Processing Disassembled Files</h3>
<p>To skip the disassembling process and provide an already disassembled file as the input, use the <code>--disassembled</code> (or <code>-d</code>) flag:</p>
<pre><code>python view8.py input_file output_file --disassembled</code></pre>
<p>To skip the disassembling process and provide an already disassembled file as the input, use the <code>--input_format disassembled</code> (or <code>-f disassembled</code>) option:</p>
<pre><code>python view8.py -i input_file -o output_file -f disassembled</code></pre>
<h3>Creating and Processing Serialized Files</h3>
<p>Sometimes we may want to decompile the file into a serialized format (preserving all the objects and structures). This type of an output may be easier to post-process than a text format, and useful i.e. for further deobfuscation. To create a serialized output we use a specific export format: <code>--export_format serialized</code> (or <code>-e serialized</code>)</p>
<p><strong>Security warning:</strong> the current serialized format is a Python <code>pickle</code> file (<code>.pkl</code>). Unpickling data from untrusted sources can execute arbitrary code. Only load serialized files that you generated yourself.</p>
<pre><code>python view8.py -i input_file -o output_file -e serialized</code></pre>
<p>If we ever want to load the serialized output back, and decompile it as a different type of an output, we can do it using <code>--input_format serialized</code> (or <code>-f serialized</code>) option:</p>
<pre><code>python view8.py -i input_file -o output_file -f serialized</code></pre>
<h3>Export Formats</h3>
<p>Specify the export format(s) using the <code>--export_format</code> (or <code>-e</code>) option. You can combine multiple formats:</p>
<ul>
<li><code>v8_opcode</code></li>
<li><code>translated</code></li>
<li><code>decompiled</code></li>
<li><code>serialized</code></li>
</ul>
<p>For example, to export both V8 opcodes and decompiled code side by side:</p>
<pre><code>python view8.py input_file output_file -e v8_opcode decompiled</code></pre>
<pre><code>python view8.py -i input_file -o output_file -e v8_opcode decompiled</code></pre>
<p>By default, the format used is <code>decompiled</code>.</p>

<h3>VersionDetector.exe</h3>
Expand All @@ -52,3 +63,4 @@
<li><code>-d</code>: Retrieves a hash (little-endian) and returns its corresponding version using brute force.</li>
<li><code>-f</code>: Retrieves a file and returns its version.</li>
</ul>

Loading