Unveiling the Hidden Enchantment in malicious.h5: A Detailed Analysis

Challenge: Investigate a mysterious magical artifact (malicious.h5) exhibiting unusual behavior to uncover its secrets. The flag format is HTB{FlagGoesHere}.

Workflow: We will systematically inspect the malicious.h5 file, leveraging tools like h5py and online model visualizers like Netron to understand its structure and identify any hidden elements or malicious code.

Step-by-Step Analysis:

  1. Initial Inspection with h5py:

    • Purpose: Start by understanding the basic structure of the malicious.h5 file. H5 files are hierarchical, and h5py allows us to navigate this structure programmatically. We want to see what groups and datasets are present.

    • Action: Use a Python script with h5py to print the names of all groups and datasets within the file.

      python -c "import h5py; f = h5py.File('malicious.h5', 'r'); f.visititems(lambda name, obj: print(name))"
      
    • Observation: Running this script reveals a typical structure for a Keras/TensorFlow model, with groups like model_weights and layers like conv2d_1, batch_normalization_1, etc. However, amidst these standard layers, we notice an unusual layer named hyperDense. This non-standard name immediately raises suspicion.

  2. Visualizing the Model with Netron:

    • Purpose: A visual representation of the model architecture often provides a quicker and more intuitive understanding than just text output. Netron is a web-based tool that excels at visualizing neural network models.

    • Action: Upload the malicious.h5 file to https://netron.app/.

    • Observation (Crucial Insight): Netron visually renders the model graph. Navigating through the layers, we locate the hyperDense layer. Inspecting its properties in Netron reveals that:

      • It is a Lambda layer. This is significant because Lambda layers in Keras allow for arbitrary code execution during model loading or inference.
      • It has two associated Lambda functions: one for the main function and one for the output shape function.
      • Crucially, both Lambda function configurations contain base64 encoded strings under the "code" parameter. This is a major red flag, strongly suggesting hidden code within the model.

      (Netron Visualization Screenshot - Imagine a screenshot here showing Netron with the hyperDense layer selected, highlighting the base64 encoded code in the Lambda function configuration.)

  3. Examining the hyperDense Layer Configuration Programmatically:

    • Purpose: While Netron visually identified the base64 encoded code, we need to extract this code programmatically for further analysis. We'll use h5py again to access the model_config attribute of the H5 file, which contains the model's JSON configuration, including the Lambda layer details.

    • Action: Use a Python script to read the model_config attribute and parse the JSON to extract the base64 encoded code from the hyperDense layer's Lambda function configuration.

      import h5py
      import json
      
      def extract_lambda_code(file_path):
          with h5py.File(file_path, 'r') as f:
              if 'model_config' in f.attrs:
                  model_config = f.attrs['model_config']
                  if isinstance(model_config, bytes):
                      model_config_str = model_config.decode('utf-8', errors='ignore')
                  else:
                      model_config_str = str(model_config)
      
                  model_config_json = json.loads(model_config_str)
                  for layer in model_config_json['config']['layers']:
                      if layer['config']['name'] == 'hyperDense':
                          lambda_code_b64 = layer['config']['config']['function']['config']['code']
                          output_shape_code_b64 = layer['config']['config']['output_shape']['config']['code']
      
                          print("Hyperdense function code (base64):")
                          print(lambda_code_b64)
                          print("\nOutput shape function code (base64):")
                          print(output_shape_code_b64)
                          return lambda_code_b64, output_shape_code_b64
          return None, None
      
      lambda_code_b64, output_shape_code_b64 = extract_lambda_code('malicious.h5')
      
    • Observation: Running this script successfully extracts the base64 encoded strings for both the hyperDense function and its output shape function, confirming what we saw in Netron programmatically.

  4. Decoding the Base64 Encoded Lambda Function Code:

    • Purpose: Now that we have the base64 encoded code, the next crucial step is to decode it to understand what it does. We expect it to be Python bytecode, given it's within a Keras Lambda layer.

    • Action: Use Python's base64 module to decode the extracted base64 string. We'll try decoding as both UTF-8 and latin-1 to handle potential encoding variations.

      import base64
      
      lambda_code_b64 = "4wEAAAAAAAAAAAAAAAQAAAADAAAA8zYAAACXAGcAZAGiAXQBAAAAAAAAAAAAAGQCpgEAAKsBAAAA AAAAAAB8AGYDZAMZAAAAAAAAAAAAUwApBE4pGulIAAAA6VQAAADpQgAAAOl7AAAA6WsAAADpMwAA AOlyAAAA6TQAAADpUwAAAOlfAAAA6UwAAAByCQAAAOl5AAAAcgcAAAByCAAAAHILAAAA6TEAAADp bgAAAOlqAAAAcgcAAADpYwAAAOl0AAAAcg4AAADpMAAAAHIPAAAA6X0AAAD6JnByaW50KCdZb3Vy IG1vZGVsIGhhcyBiZWVuIGhpamFja2VkIScp6f////8pAdoEZXZhbCkB2gF4cwEAAAAg+h88aXB5 dGhvbi1pbnp1dC02OS0zMjhhYjc5ODJiNGY++gg8bGFtYmRhPnIaAAAADgAAAHM0AAAAgADwAgEJ SAHwAAEJSAHwAAEJSAHlCAzQDTXRCDbUCDbYCAnwCQUPBvAKAAcJ9AsFDwqAAPMAAAAA " # Replace with the actual base64 string extracted
      
      decoded_lambda_code_bytes = base64.b64decode(lambda_code_b64)
      decoded_lambda_code_ascii = decoded_lambda_code_bytes.decode('latin-1', errors='ignore') # or try 'utf-8'
      
      print("Decoded Lambda function (ASCII - latin-1):") # Or whichever encoding worked best
      print(decoded_lambda_code_ascii)
      
    • Observation (Flag Discovery): Examining the latin-1 decoded output (or potentially other encodings if latin-1 doesn't fully decode) reveals readable strings interspersed with binary data. Within this output, we clearly see the flag: HTB{k3r4S_Lryrrr1njrctr0r}. Additionally, we find the string "Your model has been hijacked!" and the use of eval(). This confirms the malicious nature of the hyperDense layer.

  5. Flag Confirmation and Malicious Code Analysis:

    • Purpose: Verify the extracted flag and understand the malicious code's intent.
    • Action: Manually check if HTB{k3r4S_Lryrrr1njrctr0r} is accepted as the flag for the challenge. Analyze the decoded code further (even if it's partially binary, look for recognizable string patterns or Python bytecode structures) to solidify the understanding of the hijack mechanism. We see print('Your model has been hijacked!') and eval() which confirms the model is designed to execute arbitrary code and display a message indicating compromise.

Conclusion:

Through a combination of structural inspection using h5py and visual analysis with Netron, we identified a suspicious hyperDense Lambda layer in the malicious.h5 file. Netron proved invaluable in quickly pinpointing the base64 encoded code within the Lambda layer's configuration. Decoding this base64 string revealed the hidden flag, HTB{k3r4S_Lryrrr1njrctr0r}, and confirmed the presence of malicious code designed to hijack the model upon loading. This challenge highlights the security risks associated with loading untrusted machine learning models and the potential for embedding malicious payloads within model files.

Flag: HTB{k3r4S_Lryrrr1njrctr0r}