AI still can’t count. I built a dataset to prove it: VisQuant

02.04.2025 64 views

I’ve been experimenting with GPT-4V, Claude, and Gemini and realized something strange:

They can describe art. Solve riddles. Explain GPTs.
But ask: “How many pencils are on the table?”
Or “Which object is left of the cup?”
And they fall apart.

So I built a benchmark to test that specifically:

What is VisQuant?

100 synthetic images
40+ everyday object types
Labeled object counts and spatial layout
2 reasoning Q&A pairs per image
Grounded annotations in JSON and CSV
Baseline tested on GPT-4V
Entirely open-source

What It Tests
VisQuant isolates the visual intelligence primitives models often skip over:

Counting
Spatial relationships
Left/right/stacked inference
Multi-hop VQA from structured scenes

Why?
Because current benchmarks like VQAv2 or GQA are messy, noisy, and hide these weaknesses.
VisQuant is small, clean, focused — and it exposes real gaps in model reasoning.

Get It:
🗃️ Dataset (HuggingFace): [https://huggingface.co/datasets/Anas-Mohiuddin-Syed/VisQuant]

📜 Paper: ArXiv preprint incoming

📂 License: CC BY 4.0 — free for research + fine-tuning

Would love:

Feedback
Collabs
Benchmarks from others (Claude, Gemini, etc.)
Ideas for v2

AI still can’t count. I built a dataset to prove it: VisQuant

Comments (0)

Read More

#reading

#popular

AI still can’t count. I built a dataset to prove it: VisQuant

Comments (0)

Read More

Model routing for function calling with Arcee Conductor

Remote Development with Cursor?

Top 15 Builder.ai Alternatives for 2025: Explore the Best App Development Platforms

What is Deep Learning

#reading

#popular