[Dev Note] Gemini 3's PDF Black Magic: From Token Explosion to Painless All-You-Can-Eat

Preface: PDF Black Magic in the Gemini 3 Era

As a developer who loves integrating AI into side projects, my relationship with Google Gemini over the past six months has been like a roller coaster: from a honeymoon phase to a disappointing breakup (turning to hand-coded wheels), and finally to today—where I’ve fallen in love with it all over again.

Today, I want to talk about a “silent but massive” change in Gemini 3’s PDF Document Processing. If you, like me, used to get headaches from token explosions caused by converting PDFs to images for Gemini’s file API, and had to look for other solutions (to markdown, OCR, vision models), then I have to say, “Go home, everybody!”

Background: The Million-Token Glory Days and the Nightmare of October 26

Let’s rewind a few months. When Gemini 2.5 Pro first came out, what amazed me (and all developers) the most was that incredibly generous 1 Million Token Context Window, and the extremely generous Rate Limit for free users.

At that time, the logic for my Side Project was super simple and brute-force:

Download a financial report or newspaper PDF with dozens of pages.
Throw the whole thing directly into the Gemini API.
Prompt: “Summarize this document for me.”
Done.

Because the Context was large enough, I didn’t need to do any to markdown, ETL, or chunking. I could just throw everything in, and it would eat it all. That was a beautiful time of “Brute-force Aesthetics”.

But good things don’t last forever. That honeymoon period came to an abrupt end on October 26, 2025.

Google adjusted the limits for the Free Tier, making a devastating reduction to the TPM (Tokens Per Minute) cap:

Gemini-2.5-Pro: Dropped from 1 million to 125,000.
Gemini-2.5-Flash: Dropped from 1 million to 250,000.

This was catastrophic for my PDF processing workflow. Why?

Because at that time, Gemini 2.5’s logic for processing PDFs was to treat every page of the PDF as a “High-Resolution Image”. You might think a text-only PDF doesn’t have many words, but in the eyes of an LLM, that’s 50 high-definition screenshots.

For any random Wall Street Journal issue or financial report, the token count after converting to images would easily break 110k to 200k+. This meant that sending just one Request would immediately hit the 125k or 250k TPM wall, and the API would throw a 429 Too Many Requests error.

Dilemma: Forced Diligence and Hand-Coded Markdown Converters

To keep the project running, I had to give up the lazy approach of “directly uploading PDFs”.

I started researching Python’s pdfminer, PyMuPDF, and even other OCR tools to convert PDFs into Markdown format.

You can refer to the details of this process in my previous post: [Tutorial] Getting Hands-On with Cloudflare Auto RAG.

Although this suppressed the Token count back to the 20k~30k level and successfully bypassed the TPM limit, I still felt very frustrated:

Charts were gone: Trend charts and K-line charts in financial reports were filtered out by the script, effectively blinding one of the AI’s eyes.
Maintenance costs went up: I had to maintain the PDF parsing code myself, especially the data cleaning after to markdown.

Turning Point: The Surprise of the Gemini 3 Era

Until today, when I was testing the latest Gemini 3 model on Google AI Studio, with a mindset of “let’s see if I can break it,” I directly generated content from a complete PDF newspaper containing a large number of charts.

Result? Token usage showed: 39,000.

It really was 30-something thousand, not 130 thousand. And the content accurately included information from the charts.

Technical Details: What Actually Happened?

I dug into the latest Google AI Studio Documentation and discovered that Google made a fantastic optimization in the underlying logic.

Simply put, since the release of Gemini 3, the way PDFs are processed has changed:

Previously (Vision-only): Whether the PDF was scanned or digital, the model treated it entirely as a “series of images” and forcibly converted it into one high-resolution image after another. One image = 1000-1500 Tokens (or even more). Assuming 1500 Tokens per image, 50 pages would start at 75,000 Tokens.
Now (Native Text + Vision): File API has gotten smarter. It prioritizes reading the Native Text encoding inside the PDF.
- Here’s the key: Google’s documentation implies that Native Text Tokens extracted from PDFs are not billed (or have extremely low weight)!
- The model only calculates visual tokens for parts that “really require visual understanding” (like charts, photos), and it defaults to using smarter compression techniques.

This is why for the same file, uploading a PDF used to be 110k Tokens (because it was all images), but today uploading a PDF becomes 30k Tokens (because it’s text + a few images).

(Personal test case: wsj-daily.pdf. On average, one issue used to take up 110k to 200k tokens. The large variance is because weekend editions tend to have more ad content.)

(Technical details haven’t been fully disclosed by Google; the above is inferred from official documentation and actual testing.)

Conclusion: A Victory for Lazy Developers

This change means a lot to me:

Significant Cost Reduction: Whether it’s the TPM limit of the free version or the bill for the paid version, the pressure is instantly released.
Workflow Returns to Simplicity: I can throw those self-written pdf to Markdown scripts into the trash. Just upload_file, and leave the rest to Google. (Of course, if you plan to play with RAG, the to markdown work is still unavoidable).
Rich Content with Text and Images: Because I don’t have to brute-force convert to text myself, the model can now “see” those important trend charts and layout configurations again.

If you, like me, felt discouraged about Gemini because of the TPM restrictions at the end of October, I strongly recommend you come back and try it out. that thrill of “just throwing the document at the AI and it works” is really back!

The documentation doesn’t explain much, just a single line, but in practice, it effectively supports past Gemini 2.5 versions as well. After all, the core change is in the processing logic of the file API, not the capability of the model.

(Source: Official Documentation)

(P.S. Of course, if your PDF is a pure image scan without a text layer, the Tokens will still count as images! But for most modern documents, this is absolutely a Game Changer.)

[Dev Note] Gemini 3's PDF Black Magic: From Token Explosion to Painless All-You-Can-Eat

Preface: PDF Black Magic in the Gemini 3 Era

Background: The Million-Token Glory Days and the Nightmare of October 26

Dilemma: Forced Diligence and Hand-Coded Markdown Converters

Turning Point: The Surprise of the Gemini 3 Era

Technical Details: What Actually Happened?

Conclusion: A Victory for Lazy Developers

[Deep Dive] What is SDD?

Developing with …

Preface: I Also Want to Develop the Lazy Way

[Personal] 2025 Annual …

Calculation Method