Unveiling the Hidden Payload in resnet18.pth: A Forensic Model Analysis

Challenge: Analyze a corrupted magical machine learning model (resnet18.pth) from the Library of Loria, tampered with by Malakar's followers. Uncover the hidden payload and extract the flag to dispel the dark magic. The flag format is HTB{Flag_Goes_Here}.

Workflow: We will investigate resnet18.pth using a forensic approach, starting with file format analysis, progressing to code extraction and finally, payload recovery using steganographic techniques.

Step-by-Step Analysis:

  1. Initial File Inspection and Format Identification:

    • Purpose: Determine the file type and basic characteristics of resnet18.pth.

    • Action: Use a Python script to read the first few bytes of the file and check for known magic numbers.

      import binascii
      
      file_path = 'resnet18.pth'
      
      with open(file_path, 'rb') as f:
          header = f.read(4)
          print(f"[+] First 4 bytes of file (hex): {binascii.hexlify(header)}")
      
      
    • Observation: The output reveals the header 504b0304, which is the magic number for a ZIP archive. This indicates that resnet18.pth is not a standard PyTorch model file but a ZIP archive in disguise.

  2. Analyzing the File as a ZIP Archive:

    • Purpose: Explore the contents of the ZIP archive to understand its internal structure.

    • Action: Use Python's zipfile module to list the files within the archive and inspect their names and sizes.

      import zipfile
      
      file_path = 'resnet18.pth'
      
      with zipfile.ZipFile(file_path, 'r') as zf:
          print("[+] Files in ZIP archive:")
          zf.printdir()
          file_list = zf.namelist()
          for file_name in file_list:
              print(f"[*] Reading {file_name}...")
              with zf.open(file_name, 'r') as f:
                  header = f.read(16) # Read first 16 bytes as header example
                  print(f"    First bytes (hex): {binascii.hexlify(header)}")
      
      
    • Observation: The ZIP archive contains a directory named resnet18 and files within it, including data.pkl, several files named data/XX (where XX are numbers), and version. This structure resembles a saved PyTorch model, where weights and potentially code are serialized. The data.pkl file is a strong indicator of pickled Python objects.

  3. Extracting and Analyzing data.pkl for Code:

    • Purpose: Examine data.pkl for potentially malicious or interesting Python code, given the challenge description about embedded enchantments.

    • Action: Extract the content of data.pkl from the ZIP archive and analyze it as raw data and as a potential string.

      import zipfile
      
      file_path = 'resnet18.pth'
      pickle_file_path = 'resnet18/data.pkl'
      
      with zipfile.ZipFile(file_path, 'r') as zf:
          with zf.open(pickle_file_path, 'r') as pkl_file:
              pickle_content = pkl_file.read()
      
      print("[+] Analyzing pickle data in detail...")
      print(f"\n[+] Raw content (first 500 bytes):\n{pickle_content[:500]}")
      
      try:
          content_str = pickle_content.decode('utf-8') # Try decoding as UTF-8
          print(f"\n[+] Content as string:\n{content_str[:500]}...") # Print first 500 chars
      except UnicodeDecodeError as e:
          print(f"\n[+] Content as string:\nError: {e}")
      
      
    • Observation: By inspecting the decoded string output, we identify Python code within the pickle data. Crucially, we find a function definition: def stego_decode(tensor, n=3): .... This function, along with imports like struct, hashlib, and numpy, strongly suggests that steganography is employed to hide a payload within the model's data.

  4. Implementing the stego_decode Function and Extracting Data:

    • Purpose: Replicate the stego_decode function in a script to extract the hidden message from the model's tensors.

    • Action: Create a Python script that implements the stego_decode function exactly as defined in data.pkl. Then, iterate through the data/XX files within the ZIP archive, treating their contents as tensors, and apply the stego_decode function.

      import zipfile
      import struct
      import hashlib
      import numpy
      
      def stego_decode(tensor_bytes, n=3): # Modified to accept bytes directly
          tensor = numpy.frombuffer(tensor_bytes, dtype=numpy.float32) # Create numpy array from bytes
          bits = numpy.unpackbits(tensor.view(dtype=numpy.uint8))
          payload = numpy.packbits(numpy.concatenate([numpy.vstack(tuple([bits[i::tensor.dtype.itemsize * 8] for i in range(8-n, 8)])).ravel("F")])).tobytes()
          (size, checksum_bytes) = struct.unpack("i 64s", payload[:68])
          checksum = checksum_bytes.rstrip(b'\x00') # Remove padding from checksum
          message = payload[68:68+size]
          calculated_checksum = hashlib.sha256(message).hexdigest().encode()[:64]
      
          if calculated_checksum == checksum:
              return message.decode('utf-8', errors='ignore')
          else:
              return None # Checksum mismatch
      
      file_path = 'resnet18.pth'
      flag = None
      
      with zipfile.ZipFile(file_path, 'r') as zf:
          data_files = [f for f in zf.namelist() if f.startswith('resnet18/data/') and f != 'resnet18/data.pkl']
          for data_file in data_files:
              print(f"[*] Processing {data_file} (size: {zf.getinfo(data_file).file_size})")
              with zf.open(data_file, 'r') as tensor_file:
                  tensor_data = tensor_file.read()
                  extracted_message = stego_decode(tensor_data)
                  if extracted_message:
                      print(f"[!] Found flag in {data_file}: {extracted_message}")
                      flag = extracted_message
                      break # Stop after finding the first flag
      
      if flag:
          print(f"\n[+] Extracted Flag: {flag}")
      else:
          print("\n[!] Flag not found in any data file.")
      
    • Important Modification: The original stego_decode was designed to work with PyTorch tensors. Since we are running this script independently, we modify it to accept raw bytes (tensor_bytes) and create a NumPy array from it using numpy.frombuffer. We also added checksum verification as implemented in the stego_decode function. We iterate through the data/XX files and attempt to decode each one.

  5. Locating and Extracting the Flag:

    • Purpose: Run the stego_extract.py script to process the model's data files and extract the hidden flag.
    • Action: Execute the stego_extract.py script.
    • Observation: The script processes each data/XX file. Upon processing resnet18/data/0, the script outputs: [!] Found flag in resnet18/data/0: HTB{n3v3r_tru5t_p1ckl3_m0d3ls}. The flag is successfully extracted!

Conclusion:

By treating resnet18.pth as a ZIP archive, we uncovered a pickled Python payload containing a steganography decoding function. Implementing and applying this stego_decode function to the model's data tensors, specifically resnet18/data/0, revealed the flag: HTB{n3v3r_tru5t_p1ckl3_m0d3ls}. This challenge highlights the significant security risks associated with loading untrusted model files, especially pickle files, which can harbor both executable code and hidden data within model weights through techniques like steganography.

Flag: HTB{n3v3r_tru5t_p1ckl3_m0d3ls}