Why We Didn’t “Just Add AI Images” to Metodika

The Illusion of Infinite AI

Statements like this without reflection of the current profit margins (or mire like their absence) of AI. Yes, google can throw some free nano bananas as you. A Series-whatever-the-hell bs company that will not be around in 5 years can put 10 and not expect you to pay much of that because there is plenty of VC money burn.

A Small-Indie-Company™ bum like me, however, needs to be tight with his coins.

The Cost–Quality–Speed Triangle

There’s a brutal triangle in image generation: cheap, fast, high quality. Pick two.

Cheap & Fast Models

Cheap models are fast, and fast models are cheap - but are garbage. Maybe good enough to bamboozle your grandma on facebook but not solid enough to put in an education material that people (at least as of 2026) have expectations to be remotely correct.

Gemini Flash 2.5 fumbling at the `produce-real-words` hurdle; this pleasure cost me 4 cents. Seadanceimage gen

Seadance tackling the same bisector task - almost no typos (tripped up by new line continuation 😔) but hopelessly lost in wonderful world of spacial awareness. Cost is similar to Gemini 2.5

Frontier Models

A good image will cost something like 10-15 cents PER piece. At the cost of 5 of those, you can generate a decently-sized book. This will also take a minute to generate. If you are waiting for this in your Chat GPT while you scroll tiktok, this is kind of ok, but when this is one of many steps in generation of documents that people like to seen sooner than later, this time adds up.

The Boring Route: Real Images

So we made a choice. We went the boring route. Instead of generating images, we decided to retrieve them - openly licensed, real-world, non-AI, age-appropriate, and verifiable.

We rely on sources aggregated via Openverse: historical artifacts, natural phenomena, real places, real people, real objects. No extra fingers. No distorted maps. No fictionalized “educational vibes.” Just reality.

But retrieval at scale is not trivial either. You can’t just fire off a static query like “frog image” and call it a day. Context matters: grade level, subject, surrounding paragraph, cultural relevance, language, and even what other images have already been selected.

The Agentic Image Retrieval Loop

Yes, I generated a nano banana to illustrate the agent

Instead of issuing a single search request, we built a tool-loop agent. It receives the subject, grade, lesson topics, the paragraph preceding the image, and the target language. It generates an English search query, calls Openverse, evaluates metadata, and retries when necessary - broadening the query if results are empty or narrowing it if results are noisy.

It performs a “vibe check” against topic alignment, age appropriateness, sensitive content, and duplication with already selected images. It avoids logos, posters, diagrams, and AI-style noise. It settles for “good enough and correct” rather than “perfect but fragile.”

Why This Matters in Education

In math, we might ground Fibonacci in a shell instead of generating a surreal spiral. In biology, instead of prompting “elasmobranch cartilaginous fishes,” we search for “shark in ocean,” because the goal is recognition, not academic showmanship. In literature, we prefer an author portrait or a historically relevant scene over a stylized AI fantasy.

In early grades, even letter-based visual associations require cultural awareness - searching in English while thinking in Bulgarian.

The Trade-Off

Is this as flashy as AI image generation? No. Does it make better demo screenshots? Probably not. But it scales, controls cost, and it improves trust.

At scale, every design decision compounds. We chose the boring route. And in education, boring is often the responsible choice.