This is Part 2 of a two-part series detailing how a major obstacle encountered during the OMEGA-T iOS automation research – an obscured WebView CAPTCHA – was diagnosed and ultimately overcome. This article focuses on the *"Orchestrated Visual Relay" bypass methodology*.
By Neverlow512
12 April 2025
Date of original case study: 03 April 2025
Purpose & Context: This article details the technique developed to bypass the specific Arkose Labs implementation encountered, undertaken for research, technical exploration, and methodology demonstration.
Responsible Disclosure: Findings are based on research conducted approximately six months prior to publication to mitigate immediate risks. This work is shared for educational purposes and defensive awareness; very specific details will not be disclosed for obvious reasons. Please use the information gathered from my article or study ethically and legally.
Complete case study on GitHub: Breaking the Unbreakable Research
Picking Up the Pieces: The Frida Revelation 🕵️♂️
If you read Part 1, you know the story so far. My attempt to automate account generation on Tinder using the OMEGA-T framework hit a major barrier: a tricky Arkose Labs CAPTCHA inside an obscured WKWebView
. Appium couldn't see inside, couldn't interact (at least not by relying on usual element recognition functions). Dead end for standard UI automation, or so I thought.
The Frida diagnostics phase, however, gave me the crucial clue – the solved CAPTCHA token used the internal window.webkit.messageHandlers
bridge to report back to the native Swift/Objective-C code.
Knowing the path was one thing, but the path itself seemed hardened against direct tampering, even with Frida's capabilities. This ruled out simple interception/replay as a reliable automation strategy.
I was back to needing a way to make the legitimate onCompleted callback fire within the WebView's original context. So what now?
Automation is Dead. Long Live Automation!: The Visual Relay Idea 👀
It seemed like traditional, element-based automation was truly blocked here. When you can't interact with the underlying structure (the DOM), you have to adapt. This led to my shift in thinking: "What if I could simply solve the captcha like a normal user would?"
Appium might be blind to the DOM in this WebView, but it can still capture the screen and tap coordinates.
This sparked my concept for "Orchestrated Visual Relay":
I know it sounds fancy, but considering the pain I went through for it, I get to pick the name!
Appium as Eyes & Hands: Capture screenshots of the CAPTCHA area; perform precise coordinate-based taps.
OCR (Tesseract) as Instruction Reader: Extract text commands from the captured image.
External CAPTCHA Solver: Outsource the visual puzzle-solving.
Python as Orchestrator: The conductor managing the whole flow – capture, analyze, delegate solving, apply results, check state, repeat.
The core idea? Externalize the part Appium couldn't handle (solving the visual puzzle) and then "relay" the answer back using the only interaction method left – tapping screen coordinates, guided by OCR. It bypasses the need for DOM access entirely for the interaction itself.
Don't think it happened in a day, it took me a while until I figured out I could automate the CAPTCHA solving process through screen interaction (kinda), and many many many more to implement it.
The Toolkit: Eyes, Hands, And External Help 🛠️
Making Visual Relay work required integrating several components orchestrated by my main Python script:
-
Appium - Still the core UI driver, but used differently here. Its main jobs became:
- Taking screenshots (
driver.get_screenshot_as_base64()
). - Performing coordinate taps (
driver.execute_script('mobile: tap', {'x': X, 'y': Y})
). - Detecting the initial presence of the CAPTCHA screen (using elements outside the WebView).
- Taking screenshots (
-
Image Processing (OpenCV/Pillow) - used to:
- Dynamically Locate the CAPTCHA: Before solving, I used image template matching (like OpenCV's
matchTemplate
) to find the exact coordinates of the CAPTCHA view within the full screenshot, ensuring clicks were accurate even if the UI shifted slightly. This involved taking a reference screenshot of the WebView element itself first. - Crop & Compress Extract just the CAPTCHA area from the full screenshot and compress it to send to the solver API efficiently.
- Dynamically Locate the CAPTCHA: Before solving, I used image template matching (like OpenCV's
OCR (Tesseract via
pytesseract
) - To read the instructions or status text ("Verify," "Try Again," "Verification Complete") directly from the cropped CAPTCHA image. This was crucial for state management.External CAPTCHA Solver API - A third-party service that accepts an image and returns the solution.
Python Orchestrator - The script I wrote manages the state machine, calling Appium for captures/taps, processing images, calling OCR, making API requests to the solver, parsing results, and deciding the next action based on the OCR output. Also, all of this had to function properly within the OMEGA-T Framework, so it was a mess initially.
BE AWARE: Testing this gave me a headache that lasted for quite a while, I am not joking!
The Core Loop: Capture, Decide, Act, Repeat 🔄
Arkose Labs challenges are often multi-step and require confirmation, especially if they suspect malicious activity. The real magic was in the state management loop I orchestrated with Python:
Side Note: While I am usually very happy to see security measures being used effectively, SOLVING 10 CAPTCHAS IN A ROW IS NOT FUN! Good job tho, Arkose, your systems are amazing.
Capture & Read - Take a screenshot of the CAPTCHA area. Run OCR on it to get the current text instruction or status.
Decide State - Analyze the OCR text:
* Is it "Verification Complete"? 👉 **SUCCESS!** Exit the loop.
* Is it "Try Again"? 👉 **RETRY!** Tell Appium to click the "Try Again" coordinates, wait, and loop back to Capture & Read the *new* puzzle.
* Is it just "Verify"? 👉 **CONFIRM!** Tell Appium to click the "Verify" coordinates, wait, and loop back to Capture & Read to see what happens next (hopefully "Complete," maybe "Try Again").
* Is it puzzle instructions (like "Select dice...")? 👉 **SOLVE!** Proceed to the next step.
* Is it something else or unreadable? 👉 Maybe retry OCR/Capture, or eventually fail.
Send to Solver - Package the current screenshot and the extracted instructions. Send the task to the external solving service API. Wait for the result.
Apply Solution - If the solver returns cell indices (e.g.,
[1, 3, 5]
), translate these into the specific (X, Y) screen coordinates for each cell (calibrated beforehand). Tell Appium to tap those coordinates, adding small random delays between taps and different millimetric surface changes to mimic human interaction slightly.Go Back to Step 1 - After applying clicks (or clicking Verify/Try Again), the screen changes. The loop must restart by capturing a new screenshot and reading the new state to decide the next action.
This cycle continued until "Verification Complete" appeared, a maximum attempt limit was hit, or the app sometimes even logged the account out (likely due to other detection mechanisms triggering on timing or behavior).
Reality Check: Did it Actually Work? 🤔
Making this felt like having to build a key if I wanted to enter my house. Dealing with coordinate calibration, occasional OCR flakiness, and the latency of external APIs wasn't so fun at times.
Although the joy I felt when I realized it worked for a first CAPTCHA was really worth it.
So, how effective was it?
During my testing (again, ~6 months ago):
Individual Puzzle Success - Very high > 95%:
The external services were generally good at solving the visual puzzles themselves when given a clear image and instructions.End-to-End Step Success - Around 80%:
This means completing the entire multi-stage CAPTCHA process successfully from start ("Let's verify...") to "Verification Complete."-
Why the Drop?:
- Latency - Sending images, waiting for it to be solved (DAMN, I HATE THE DICE PUZZLES), receiving results – it all adds time. A human might solve a step in seconds; the relay adds significant overhead, which could trigger timing-based detections. (Proxy speed didn't help here either!)
- Complexity Variation - Some Arkose challenges took solvers longer. And, yes, I am talking about the dice puzzles again, these are always the worst!
- Detection - While bypassing the obscurity, overly consistent or robotic interaction timings likely still triggered secondary checks sometimes, leading to failures or extra challenges. I added randomization in delays and click coordinates, which helped, but wasn't a perfect solution.
- OCR Hiccups Rarely, OCR would misread "Verify" or "Try Again," leading to a wrong action or a complete error/crash. Although this could have been solved on my side pretty easily, the errors were never an issue big enough to make me wanna do it.
An 80% success rate wasn't perfect for production, but for my research goal – proving the viability of bypassing this specific implementation via visual relay – it was a clear success.
Key Takeaways & Security Implications 💡
This whole exercise hammered home a few points for me:
- Implementation Matters - Even a sophisticated CAPTCHA like Arkose Labs can be solved. Relying purely on visual presentation in an obscured WebView created this bypass vector.
Although this type of implementation is definitely the best I encountered so far, and I would encourage its further development as it's definitely very effective against malicious actors. Or simply add more dice puzzles, I guess.
Obscurity Doesn't Always Mean Good Security - Hiding the DOM stopped basic Appium inspection but was irrelevant to a visual attack capturing screenshots.
Client-Side Isn't Enough - Any fancy fingerprinting or analysis happening inside that WebView during the solve was largely bypassed because the actual solving happened externally.
Defense Needs Layers - Effective defense requires more robust server-side behavioral analysis (looking at interaction timings around the CAPTCHA step), stronger device attestation, maybe even methods to interfere with screenshotting/OCR (though accessibility is a concern), and unpredictable challenge triggering. Have it pop up right after someone makes an account, or even better, let them enjoy the moment for a bit; if they are trying to automate or mass create, they will quit because of the frustration caused anyway.
Conclusion: Breaking The Unbreakable ✨
The "Orchestrated Visual Relay" technique proved that even complex, visually interactive CAPTCHAs within obscured mobile WebViews can be automated. By combining Appium for screen interaction, OCR for understanding state, and externalizing the cognitive task, it was possible to consistently bypass the specific Arkose Labs implementation encountered in Tinder ~6 months ago.
This journey, from the OMEGA-T framework, through Frida diagnostics, to this Visual Relay solution, was my deep dive into the cat-and-mouse game of mobile automation and security. It highlights the constant need for defenders to think beyond traditional defenses and consider how attackers might interact with their systems visually.
Thanks for following along! Hopefully, this sheds some light on the practical challenges and possibilities in advanced mobile security research.
Find Me & Full Research:
- GitHub: github.com/Neverlow512 (Repos for OMEGA-T, Frida, Breaking studies)
- LinkedIn: https://www.linkedin.com/in/vlad-dumitru-24b62635a/
- Contact: [email protected]
Copyright © 2025 Neverlow512. All Rights Reserved.