DEV Community

Cover image for Web Adapter Tool Agent: Turn Self-Learning Skills into "98% Average Token Reduction on Revisits," Measured
tumf
tumf

Posted on • Originally published at blog.tumf.dev

Web Adapter Tool Agent: Turn Self-Learning Skills into "98% Average Token Reduction on Revisits," Measured

Originally published on 2026-03-09
Original article (Japanese): Web→Adapter→Tool→Agent: 自己学習型スキルで『再訪を実測で平均98%トークン削減』する

If you build web data extraction by having an LLM read raw HTML every time and "just figure it out," it usually ends up expensive, slow, and brittle.

It gets worse for use cases that revisit the same site repeatedly - news monitoring, documentation tracking, price change detection, and so on. You end up repeating the same failure modes over and over.

Problems like this are often better solved not with ever more heroic scraping tricks, but by accepting a simpler approach: once an extraction method works, freeze it as a reusable tool and keep using it from then on.

This article summarizes a design that turns scraping into a learned tool through a Web→Adapter→Tool→Agent transformation pipeline.

The original inspiration was web2cli (GitHub repository), which I introduced in an earlier article. If you take the idea of "Every website is a Unix command" and push it toward agent operations - revisits, token usage, and drift - it tends to converge in this direction.

More recently, along that line of thought, I added a self-learning skill called self-learning-web-adapter (skill: a package of procedures and tools given to an agent). The skill itself lives in skills/self-learning-web-adapter.

Why this hurts: passing raw HTML directly to an LLM increases cost

First, let’s align on the premise. By "LLM," I mean a Large Language Model that works not only on text, but also as an agent - a system where the LLM calls external tools to get work done.

When you hand raw HTML to an LLM and ask it to extract information, the following costs pile up:

  • token cost is high
  • latency (processing wait time) increases
  • it breaks when the DOM (Document Object Model: the idea of treating HTML as a tree structure) changes even slightly
  • retries increase when extraction fails, which makes it even more expensive

Personally, my real feeling is: "Fine for the first time, maybe, but I do not want to repeat the same exploration on the second run and beyond."

Direction of the solution: confine exploration to one pass, make execution lightweight

The idea is simple. Stop re-scraping the Web from scratch every time, and transform it like this:

website
  ↓ (exploration: one pass)
adapter
  ↓ (freeze it)
tool / CLI
  ↓ (reuse)
agent
Enter fullscreen mode Exit fullscreen mode

In this model, what the LLM does each time is no longer "interpret raw HTML," but "call a tool."

When it works well, the LLM input can be compressed down to a few hundred tokens of JSON (JavaScript Object Notation: a structured data format).

As a reference point, if you compare "raw HTML" with "adapter output" for a specific site such as a blog or marketing page, you can sometimes see input token reductions in the 95% to 99% range.

It is better not to oversell this. The first learning pass has its own cost, and results vary by site. But the overall direction is very stable: if the workload revisits the same site often, the payoff is usually easy to recover.

Measurement: how many tokens do revisits actually save?

Since the obvious question is "Does it really shrink that much?", here is a simple measurement.

For token counting, I used tiktoken (tokenizer: a mechanism that splits strings into tokens), counted with o200k_base.

The comparison uses three patterns:

  • pass raw HTML directly to the LLM
  • pass JSON output from a trained adapter to the LLM
  • pass JSON output from a web2cli-style wrapper to the LLM

Training used three articles per site, and evaluation used one different article as a holdout set.

Site HTML tokens Adapter tokens web2cli tokens Reduction vs Adapter Reduction vs web2cli
blog.python.org 15,057 265 351 98.24% 97.67%
blog.rust-lang.org 7,656 263 361 96.56% 95.28%
vercel.com 224,735 255 335 99.89% 99.85%

On average, the input token reduction looked like this:

  • average reduction rate (direct adapter output): 98.23%
  • average reduction rate (web2cli-style command output): 97.60%
  • average reduction amount (direct adapter output): 82,221.7 tokens / page
  • median reduction amount (direct adapter output): 14,792 tokens / page

There are two key takeaways:

  1. Raw HTML alone can be tens of thousands to hundreds of thousands of tokens, depending on the site
  2. Once learned, the system can compress only the needed information into a few hundred tokens of JSON

For especially heavy pages like vercel.com, it reduced more than 220k tokens per page.

A few caveats are worth noting too:

  • this is still a small-scale measurement over only three sites
  • the extracted fields are mainly limited to title, author, and published
  • the first access includes learning cost, so the real benefit appears on revisits

A rough estimate can be made with this formula:

saved_cost = saved_tokens_per_page * pages_per_month / 1_000_000 * model_input_price
Enter fullscreen mode Exit fullscreen mode

If your workload is mostly lightweight articles, it is safer to reason from the median value (14,792 tokens / page). If you deal with many SPAs (Single Page Application: a web app that navigates within a single page) or marketing pages, it may skew closer to the average value (82,221.7 tokens / page).

What is an Adapter? A contract that encapsulates site-specific differences

An Adapter is a configuration plus a set of rules that captures "for this host (a domain like example.com), extract data this way."

The important point is that the adapter remains not as LLM reasoning, but as an extraction contract.

For an article page such as a blog post, these are the typical fields you want:

  • title (title)
  • author (author)
  • published (publication datetime)

You also make the extraction strategy explicit - which information source should be prioritized:

  • JSON-LD (JSON for Linking Data: structured metadata embeddable in HTML)
  • Open Graph protocol (OG: a meta tag specification for social sharing) and ordinary meta tags
  • CSS selectors (CSS selector: a notation for targeting HTML elements)

Another practical point is the ability to determine mechanically whether something has broken.

This is where a DOM fingerprint comes in (DOM fingerprint: a signature of DOM structure). During training, you save structural features of the DOM. At runtime, if the current page deviates from that signature, you treat it as drift (structural change) and send it to retraining.

What is a Tool/CLI? A one-line "web interface"

You can leave an adapter as-is, but if you want agents to use it, it is easier to lower it all the way down into a CLI (Command Line Interface: a tool callable from the terminal).

The ideal is simply: "pass a URL, get back JSON."

# Example: extract a rust-lang blog article as structured data (conceptual)
site-article https://blog.rust-lang.org/2026/03/05/some-post.html
Enter fullscreen mode Exit fullscreen mode

Once you have this form, prompt design on the agent side becomes much simpler.

You can write instructions like: "Call this command, then use only title and published from the returned JSON."

A similar idea exists in web2cli, which turns the Web into commands. If you take that idea - "Every website is a Unix command" - and adapt it for agent operations such as revisits, token usage, and drift, you end up roughly in this direction.

Example: running the self-learning skill self-learning-web-adapter

From here on, this section gets more "skill-oriented."

This skill is designed for repeatedly reading the same host. It encapsulates site-specific differences into an adapter and reuses them.

Its goals are as follows:

  • Why: after the second run, I do not want to repeat scraping exploration
  • What: return URL -> structured JSON (title/author/published + health diagnostics)
  • Prereq: Python 3.10+, network reachability
  • Verify: python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py run <url> outputs JSON

1) Setup

If you only want to use the skill, you do not need to clone the repository.

Adding the skill can be done with npx (npm package runner: a mechanism for running a CLI temporarily), which comes with Node.js.

npx skills add tumf/self-learning-web-adapter
Enter fullscreen mode Exit fullscreen mode

The dependencies are minimal. For HTML parsing, install Beautiful Soup (an HTML parser).

(Python dependencies are not resolved by npx, so this part must be installed separately.)

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pip
python3 -m pip install beautifulsoup4
Enter fullscreen mode Exit fullscreen mode

Note: the commands below assume that skills/self-learning-web-adapter/ has been added directly under the current directory. If it was installed elsewhere, just adjust the path.

2) Prepare training samples (3 or more from the same host)

The rule for this skill is simple: training (learn) requires "3 or more URLs from the same host."

For a blog, it is usually safer to choose around three articles from the same author or category.

3) Learn -> run to freeze the behavior

If you pass representative pages from the same host, the adapter is saved to adapter_registry/<host>.json.

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py learn <url1> <url2> <url3>
Enter fullscreen mode Exit fullscreen mode

Once training succeeds, run it against a different URL (a holdout page not used in training).

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py run <holdout-url>
Enter fullscreen mode Exit fullscreen mode

The output contains not only extraction results, but also diagnostic fields such as signature_known and extraction_health.score.

That is one reason this leans toward a "skill": it turns not only extraction, but also failure handling, into a reusable tool.

4) Drift checks and retraining

check returns JSON just like run, but it is intended to answer: "Does this look like it needs retraining?"

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py check <url>
Enter fullscreen mode Exit fullscreen mode

If needs_retrain: true is set, send it through retraining.

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py retrain <host>
Enter fullscreen mode Exit fullscreen mode

5) Export to a web2cli-style command

This is where it starts to feel like a real skill.

You take a trained adapter and lower it into a single web2cli-style command.

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py export-command <host>
python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py commands
Enter fullscreen mode Exit fullscreen mode

The exported commands are placed in web2cli_commands/, and web2cli_commands/index.json becomes the registry (the command index).

From the agent’s point of view, this is the moment when "a site has become a tool."

Design intuition: how to bias toward skills that work well

The following patterns tend to work well in practice:

  1. Suspect JSON-LD first
  2. Then fall back to Open Graph and ordinary meta
  3. Treat CSS selectors as the last escape hatch
  4. Treat failures not as "exceptions," but as "health checks" (check)
  5. Do not aim for perfection immediately; narrow the fields you want first (start with something like title and date)

And the anti-patterns look like this:

  • trusting a CSS selector that happened to work on one page, without evidence
  • embedding extraction logic in the agent prompt and executing it every time
  • fixing breakage as a one-off patch and never preserving it as a learned artifact

Conclusion: turn "reading the Web" into a tooling problem

With the Web→Adapter→Tool→Agent model, scraping changes from "try hard to read the page" into "build a reusable tool."

This transformation is especially effective for workloads that revisit the same site repeatedly.

Here are a few concrete next steps:

  1. Pick one domain you read often, and narrow the required fields to three (title / published / url is a good start)
  2. Build a working extractor with the priority order JSON-LD -> meta -> CSS
  3. Add a DOM fingerprint and check, then move toward a design that automatically retrains when it breaks

References

Top comments (0)