[Tutorial] Automating WSJ AI Summaries and Emailing Them

Oct 16, 2025 min read

Introduction: A Lazy Idea

Around March of this year (2025-03), I was chatting with a colleague who mentioned he had recently subscribed to The Wall Street Journal.

He also receives the daily print version of the WSJ in his email. However, for a working engineer, reading a full newspaper every day is a luxurious fantasy—perhaps it is for most people in this modern era.

Having shorter key points and summaries would be helpful for daily information consumption. It would be even better if it could further filter for useful information.

So, I decided to start this project on the spot.

The core concept is simple:

Download the daily PDF newspaper from The Wall Street Journal (WSJ), use Google Gemini AI for in-depth analysis and summarization, and finally, send a beautifully formatted HTML report to subscribers via email.

Implementation: The Devil is in the Details

During the planning phase, the links to the Wall Street Journal were fixed in the early versions. This made it very easy to scrape the data; a login session wasn’t even needed. As long as you knew the link, you could see the newspaper.

Unfortunately, around mid-April, the WSJ decided to restrict this feature. Now, it requires a session tied to a subscriber’s account.

So, we need to maintain a cookie to keep the session alive.

Furthermore, I had to fine-tune Gemini to ensure its responses were consistently in the expected format. This involved trying many different prompt engineering details.

And also the prompt design for the system_instruction.

Finally, there were some APIs and a web interface for subscribers, which I decided to build simply on Cloudflare Workers.

This involved designing flows for subscription links, unsubscribe links, and re-subscription processes.

Detail: Preventing Crackers Among Subscribers

  • For the unsubscribe link, to effectively prevent crackers from maliciously unsubscribing others if the link is leaked, a token mechanism is needed (hashing the email, expiration, and a secret).

Detail: CI/CD Setup

  • Hosted on my own server, integrated with GitHub Actions, and connected to a local machine’s action runner (via Docker) to implement automated deployment.
  • Observed the necessary items for the SSO cookie to maintain a valid login credential.

Detail: Free SQL & Storage Service

  • Currently using Cloudflare’s D1 SQL and R2 storage service for the mail list and PDF file storage.

Detail: Sensitive Information Protection

  • The mail list handling is divided into read and write operations. Writing is the newsletter subscription function, so the worker API is public-facing, with only a simple invite_code mechanism and regional firewall blocking (Taiwan). Reading the mail list requires more protection, using IP restrictions and an API key mechanism in the firewall rules.

Detail: Hosting a Mail Server

  • I am currently using my own self-hosted mail server to send emails (intended only for friends and family), so it’s important to pay attention to the spam score and various authentication headers (SPF, DKIM, DMARC, BIMI).

Conclusion

In short, the following implementation aspects need to be addressed for this system to be complete:

The implementation details are as above, but to avoid legal issues, I will not be opening subscriptions to the public. After all, a recent case involving a law firm in Taiwan has made me too cautious to open subscriptions rashly.

This article is for technical sharing only. Remember not to break the law and become a content plagiarist.

In the future, I believe that information content will be the most valuable asset, but unfortunately, it is also the area that currently lacks a pricing mechanism for proper protection.


Finally, all of the above can be implemented through “vibe coding”. The cost is basically just for the domain rental.

You also need a machine to run cron jobs, but this can be solved by setting up n8n on Claw Cloud’s container platform (https://run.claw.cloud/).