The Challenge:

Words carry semantic information. Similar to how people can infer meaning based on a word's context, AI can derive representations for words based on their context too! However, the kinds of meaning that a model uses may not match ours. We've found a pair of AIs speaking in metaphors that we can't make any sense of! The embedding model is glove-twitter-25. Note that the flag should be fully ASCII ans starts with 'htb{some_text}'.

Image description

Ever wondered how AI understands metaphors and analogies? This Hack The Box challenge threw me into a linguistic maze filled with strange word pairs and metaphorical riddles. The twist? It had to be solved using GloVe Twitter embeddings.


Each line follows the analogy format:

A is to B, as C is to ?

These were weird combinations, mixing English, Unicode characters, emojis, and foreign scripts. We’re told that the embedding model in use is:glove-twitter-25

Goal:

Infer the missing fourth term using word embeddings and extract the final flag which must be ASCII and start with htb{}.

Tools & Setup

  • Model: glove-twitter-25
  • Library: gensim
  • Input: challenge.txt (a list of analogies)
  • Output: flag.txt (the inferred flag characters)
import re
from gensim.models import KeyedVectors

def load_glove_model():
    model_path = "glove.twitter.27B/glove.twitter.27B.25d.txt"
    model = KeyedVectors.load_word2vec_format(model_path, binary=False, no_header=True)
    return model

def parse_challenge(file_path, model):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    results = []
    flag_characters = []

    for i, line in enumerate(lines):        
        match = re.search(r"Like (.+?) is to (.+?), (.+?) is to\?", line.strip())
        if not match:
            match = re.search(r"Like (.+) is to (.+), (.+) is to\?", line.strip())
            if not match:
                continue

        key, value, query = match.groups()
        key = key.strip()
        value = value.strip()
        query = query.strip()
        print(f"Extracted: '{key}' -> '{value}', '{query}' -> ?")       
        try:
            missing_words = []
            for word in [key, value, query]:
                if word not in model:
                    missing_words.append(word)           
            if missing_words:
                print(f"Skipping due to missing words: {missing_words}")
                continue
            # This performs the vector math   
            result_vector = model[value] - model[key] + model[query]
            closest_word = model.most_similar(positive=[result_vector], topn=1)[0][0]

            print(f"Closest match for '{query}' is '{closest_word}'")

            flag_characters.append((i, query, closest_word))
        except KeyError as e:
            print(f"Error: {e}")
            continue


    mapped_chars = [char[2] for char in flag_characters]

    potential_flag = ''.join(mapped_chars)
    print(f"Potential flag sequence: {potential_flag}")

    normalized_flag = potential_flag
    replacements = {
        '0': '0', '1': '1', '2': '2', '3': '3', '4': '4',
        '5': '5', '6': '6', '7': '7', '8': '8', '9': '9'
    }

    for non_ascii, ascii_char in replacements.items():
        normalized_flag = normalized_flag.replace(non_ascii, ascii_char)

    print(f"Normalized flag: {normalized_flag}")
    return normalized_flag

if __name__ == "__main__":
    challenge_file = "challenge.txt"
    model = load_glove_model()
    flag_sequence = parse_challenge(challenge_file, model)
    print("FINAL FLAG:")
    print(flag_sequence)

    # Create a clean output file with just the flag
    with open('flag.txt', 'w') as flag_file:
        flag_file.write(flag_sequence)
    print(f"Flag has been saved to flag.txt")

Steps to Run:

1) Place the GloVe files in "glove.twitter.25B/".
2) Run "python main.py" to process "challenge.txt".
3) The resulting flag is written to "flag.txt".

This challenge was a fun mix of NLP, embeddings, and CTF logic. It’s not every day you have AIs “speaking in metaphors,” and it was fascinating to reverse-engineer that conversation!

Let me know if you faced the same challenge — I’d love to compare notes!

Github Link for solution: Github Link