The Challenge:
Words carry semantic information. Similar to how people can infer meaning based on a word's context, AI can derive representations for words based on their context too! However, the kinds of meaning that a model uses may not match ours. We've found a pair of AIs speaking in metaphors that we can't make any sense of! The embedding model is glove-twitter-25. Note that the flag should be fully ASCII ans starts with 'htb{some_text}'.
Ever wondered how AI understands metaphors and analogies? This Hack The Box challenge threw me into a linguistic maze filled with strange word pairs and metaphorical riddles. The twist? It had to be solved using GloVe Twitter embeddings.
Each line follows the analogy format:
A is to B, as C is to ?
These were weird combinations, mixing English, Unicode characters, emojis, and foreign scripts. We’re told that the embedding model in use is:glove-twitter-25
Goal:
Infer the missing fourth term using word embeddings and extract the final flag which must be ASCII and start with htb{}
.
Tools & Setup
- Model:
glove-twitter-25
- Library:
gensim
- Input:
challenge.txt
(a list of analogies) - Output:
flag.txt
(the inferred flag characters)
import re
from gensim.models import KeyedVectors
def load_glove_model():
model_path = "glove.twitter.27B/glove.twitter.27B.25d.txt"
model = KeyedVectors.load_word2vec_format(model_path, binary=False, no_header=True)
return model
def parse_challenge(file_path, model):
with open(file_path, 'r') as file:
lines = file.readlines()
results = []
flag_characters = []
for i, line in enumerate(lines):
match = re.search(r"Like (.+?) is to (.+?), (.+?) is to\?", line.strip())
if not match:
match = re.search(r"Like (.+) is to (.+), (.+) is to\?", line.strip())
if not match:
continue
key, value, query = match.groups()
key = key.strip()
value = value.strip()
query = query.strip()
print(f"Extracted: '{key}' -> '{value}', '{query}' -> ?")
try:
missing_words = []
for word in [key, value, query]:
if word not in model:
missing_words.append(word)
if missing_words:
print(f"Skipping due to missing words: {missing_words}")
continue
# This performs the vector math
result_vector = model[value] - model[key] + model[query]
closest_word = model.most_similar(positive=[result_vector], topn=1)[0][0]
print(f"Closest match for '{query}' is '{closest_word}'")
flag_characters.append((i, query, closest_word))
except KeyError as e:
print(f"Error: {e}")
continue
mapped_chars = [char[2] for char in flag_characters]
potential_flag = ''.join(mapped_chars)
print(f"Potential flag sequence: {potential_flag}")
normalized_flag = potential_flag
replacements = {
'0': '0', '1': '1', '2': '2', '3': '3', '4': '4',
'5': '5', '6': '6', '7': '7', '8': '8', '9': '9'
}
for non_ascii, ascii_char in replacements.items():
normalized_flag = normalized_flag.replace(non_ascii, ascii_char)
print(f"Normalized flag: {normalized_flag}")
return normalized_flag
if __name__ == "__main__":
challenge_file = "challenge.txt"
model = load_glove_model()
flag_sequence = parse_challenge(challenge_file, model)
print("FINAL FLAG:")
print(flag_sequence)
# Create a clean output file with just the flag
with open('flag.txt', 'w') as flag_file:
flag_file.write(flag_sequence)
print(f"Flag has been saved to flag.txt")
Steps to Run:
1) Place the GloVe files in "glove.twitter.25B/".
2) Run "python main.py" to process "challenge.txt".
3) The resulting flag is written to "flag.txt".
This challenge was a fun mix of NLP, embeddings, and CTF logic. It’s not every day you have AIs “speaking in metaphors,” and it was fascinating to reverse-engineer that conversation!