Unveiling the Hidden Payload in resnet18.pth: A Forensic Model Analysis
Challenge: Analyze a corrupted magical machine learning model (resnet18.pth
) from the Library of Loria, tampered with by Malakar's followers. Uncover the hidden payload and extract the flag to dispel the dark magic. The flag format is HTB{Flag_Goes_Here}
.
Workflow: We will investigate resnet18.pth
using a forensic approach, starting with file format analysis, progressing to code extraction and finally, payload recovery using steganographic techniques.
Step-by-Step Analysis:
-
Initial File Inspection and Format Identification:
-
Purpose: Determine the file type and basic characteristics of
resnet18.pth
. -
Action: Use a Python script to read the first few bytes of the file and check for known magic numbers.
import binascii file_path = 'resnet18.pth' with open(file_path, 'rb') as f: header = f.read(4) print(f"[+] First 4 bytes of file (hex): {binascii.hexlify(header)}")
-
Observation: The output reveals the header
504b0304
, which is the magic number for a ZIP archive. This indicates thatresnet18.pth
is not a standard PyTorch model file but a ZIP archive in disguise.
-
-
Analyzing the File as a ZIP Archive:
-
Purpose: Explore the contents of the ZIP archive to understand its internal structure.
-
Action: Use Python's
zipfile
module to list the files within the archive and inspect their names and sizes.import zipfile file_path = 'resnet18.pth' with zipfile.ZipFile(file_path, 'r') as zf: print("[+] Files in ZIP archive:") zf.printdir() file_list = zf.namelist() for file_name in file_list: print(f"[*] Reading {file_name}...") with zf.open(file_name, 'r') as f: header = f.read(16) # Read first 16 bytes as header example print(f" First bytes (hex): {binascii.hexlify(header)}")
-
Observation: The ZIP archive contains a directory named
resnet18
and files within it, includingdata.pkl
, several files nameddata/XX
(where XX are numbers), andversion
. This structure resembles a saved PyTorch model, where weights and potentially code are serialized. Thedata.pkl
file is a strong indicator of pickled Python objects.
-
-
Extracting and Analyzing
data.pkl
for Code:-
Purpose: Examine
data.pkl
for potentially malicious or interesting Python code, given the challenge description about embedded enchantments. -
Action: Extract the content of
data.pkl
from the ZIP archive and analyze it as raw data and as a potential string.import zipfile file_path = 'resnet18.pth' pickle_file_path = 'resnet18/data.pkl' with zipfile.ZipFile(file_path, 'r') as zf: with zf.open(pickle_file_path, 'r') as pkl_file: pickle_content = pkl_file.read() print("[+] Analyzing pickle data in detail...") print(f"\n[+] Raw content (first 500 bytes):\n{pickle_content[:500]}") try: content_str = pickle_content.decode('utf-8') # Try decoding as UTF-8 print(f"\n[+] Content as string:\n{content_str[:500]}...") # Print first 500 chars except UnicodeDecodeError as e: print(f"\n[+] Content as string:\nError: {e}")
-
Observation: By inspecting the decoded string output, we identify Python code within the pickle data. Crucially, we find a function definition:
def stego_decode(tensor, n=3): ...
. This function, along with imports likestruct
,hashlib
, andnumpy
, strongly suggests that steganography is employed to hide a payload within the model's data.
-
-
Implementing the
stego_decode
Function and Extracting Data:-
Purpose: Replicate the
stego_decode
function in a script to extract the hidden message from the model's tensors. -
Action: Create a Python script that implements the
stego_decode
function exactly as defined indata.pkl
. Then, iterate through thedata/XX
files within the ZIP archive, treating their contents as tensors, and apply thestego_decode
function.import zipfile import struct import hashlib import numpy def stego_decode(tensor_bytes, n=3): # Modified to accept bytes directly tensor = numpy.frombuffer(tensor_bytes, dtype=numpy.float32) # Create numpy array from bytes bits = numpy.unpackbits(tensor.view(dtype=numpy.uint8)) payload = numpy.packbits(numpy.concatenate([numpy.vstack(tuple([bits[i::tensor.dtype.itemsize * 8] for i in range(8-n, 8)])).ravel("F")])).tobytes() (size, checksum_bytes) = struct.unpack("i 64s", payload[:68]) checksum = checksum_bytes.rstrip(b'\x00') # Remove padding from checksum message = payload[68:68+size] calculated_checksum = hashlib.sha256(message).hexdigest().encode()[:64] if calculated_checksum == checksum: return message.decode('utf-8', errors='ignore') else: return None # Checksum mismatch file_path = 'resnet18.pth' flag = None with zipfile.ZipFile(file_path, 'r') as zf: data_files = [f for f in zf.namelist() if f.startswith('resnet18/data/') and f != 'resnet18/data.pkl'] for data_file in data_files: print(f"[*] Processing {data_file} (size: {zf.getinfo(data_file).file_size})") with zf.open(data_file, 'r') as tensor_file: tensor_data = tensor_file.read() extracted_message = stego_decode(tensor_data) if extracted_message: print(f"[!] Found flag in {data_file}: {extracted_message}") flag = extracted_message break # Stop after finding the first flag if flag: print(f"\n[+] Extracted Flag: {flag}") else: print("\n[!] Flag not found in any data file.")
-
Important Modification: The original
stego_decode
was designed to work with PyTorch tensors. Since we are running this script independently, we modify it to accept raw bytes (tensor_bytes
) and create a NumPy array from it usingnumpy.frombuffer
. We also added checksum verification as implemented in thestego_decode
function. We iterate through thedata/XX
files and attempt to decode each one.
-
-
Locating and Extracting the Flag:
- Purpose: Run the
stego_extract.py
script to process the model's data files and extract the hidden flag. - Action: Execute the
stego_extract.py
script. - Observation: The script processes each
data/XX
file. Upon processingresnet18/data/0
, the script outputs:[!] Found flag in resnet18/data/0: HTB{n3v3r_tru5t_p1ckl3_m0d3ls}
. The flag is successfully extracted!
- Purpose: Run the
Conclusion:
By treating resnet18.pth
as a ZIP archive, we uncovered a pickled Python payload containing a steganography decoding function. Implementing and applying this stego_decode
function to the model's data tensors, specifically resnet18/data/0
, revealed the flag: HTB{n3v3r_tru5t_p1ckl3_m0d3ls}
. This challenge highlights the significant security risks associated with loading untrusted model files, especially pickle files, which can harbor both executable code and hidden data within model weights through techniques like steganography.
Flag: HTB{n3v3r_tru5t_p1ckl3_m0d3ls}