2026.06.09 (Tue)

โœจ GPT-5.5โ€™s Summary ใ€€

A record of tracing a visitor counter spike, blocking local work from being counted, and adding a public aggregate analytics page with Cloudflare Worker and D1.

The visitor counter looked wrong.

The daily count suddenly jumped past 80.

At first I could have brushed it off as search traffic. But it felt off. I had been editing the blog heavily for several days, and during AI-assisted work I had opened both the local server and the public URL many times.

So I had to ask a blunt question.

Was this real traffic?

Or was my counter counting my own work?

The counter was too honest

I checked the structure again.

The counter from /en/devlog/github-pages-blog/github-pages-blog-visitor-counter/ uses Cloudflare Worker and D1.

When a page loads, JavaScript sends /track to the Worker. The Worker increments counters, page_counters, and dedupe_views in D1. Google Analytics is only the baseline; new increments live in D1.

The problem was that this setup was too honest.

If a browser opened a page, it counted.

The local development server still pointed at the production Worker endpoint. The Worker CORS allowlist also included localhost:4000 and 127.0.0.1:4000.

So local previews could increment the real production D1 counter.

The dedupe key was also visitorId + path. The same page in the same browser is deduped for 10 minutes, but opening many different posts still counts each one. During a blog inspection session, that becomes page views.

D1 made it obvious.

day:2026-06-06 = 14
day:2026-06-07 = 38
day:2026-06-08 = 81
day:2026-06-09 = 5

On 2026-06-08, old Daily Review posts and naver-* posts had been touched broadly, one by one. That looked closer to work-session browsing than real reader behavior.

Local work should render stats, not count

The first fix was simple.

On local preview, the page can show stats but must not send /track.

localhost
127.0.0.1
::1

The client script now skips tracking in those environments.

But the client is not enough. If the JavaScript changes or a manual request is sent, the same bug can return. So the Worker also only counts /track requests from the production origin.

TRACK_ALLOWED_ORIGIN=https://hyuk.blog

Requests from local origins now return origin_not_tracked.

I also added a production QA opt-out switch.

?hyuk_no_track=1
?visitor_tracking=off
?visitor_tracking=on

That is not a hidden admin system. It is just a way to inspect the public site without polluting the counter.

The analytics did not need to be private

At first I considered making a private analytics page.

Then I realized that was not necessary.

If I do not store raw IP addresses, raw User-Agents, full referrer URLs, or per-request event logs, the page can be public. Store only values that are safe to aggregate and show.

So the direction changed.

Not a private log page.

A public blog analytics page.

/analytics/

I added it to the top navigation and also linked it from the small sidebar visitor stats block.

D1 stores aggregates only

The new table is intentionally small.

analytics_daily_dimensions

date
dimension
value
count
updated_at

When a page view is actually counted, the Worker classifies public dimensions and increments daily aggregates.

The collected dimensions include:

page
content_group
locale
country
region
continent
colo
asn_org
device
viewport
browser
os
language
client_timezone
color_scheme
connection
traffic_source
referrer_domain
hour
weekday

The important part is what is not stored.

The referrer is stored as a domain, not the full URL. User-Agent is classified into browser, OS, and device. IP addresses are not stored.

This is not a strict analytics warehouse. It is a public flow board for a personal blog.

Today, month, total, and custom ranges

At first I only thought about today / month / total, like the existing counter.

But an analytics page needs date ranges.

The /analytics API supports:

/analytics?range=today
/analytics?range=month
/analytics?range=total
/analytics?range=custom&start=2026-06-09&end=2026-06-12

Here, total means total since analytics_daily_dimensions started collecting on 2026-06-09. It is not the same as the public sidebar total, which still includes the GA baseline.

I kept those two meanings separate and documented it.

Mobile-first enough to use

Analytics pages break easily on mobile.

Large tables would feel cramped immediately. So I used bar-list cards instead.

The top has only four numbers.

Selected range
Today
This month
Total

Below that, each dimension is a compact card.

On mobile, range buttons wrap into two rows, KPIs become two columns, and all detail cards become one column. The custom range form stacks vertically.

On desktop, KPIs stay in one row and the detail cards spread into two columns.

I checked the page in the browser. There was no horizontal overflow and no console error.

What I verified

The checks were:

node --check cloudflare/ga-stats-worker.js
node --check assets/js/custom/visitor-stats.js
node --check assets/js/custom/analytics-dashboard.js
bundle exec jekyll build
npx wrangler d1 execute hyuk-blog-view-counter --remote --file cloudflare/schema.sql
npx wrangler deploy --keep-vars

After deployment, /analytics?range=today returned a valid response.

The new aggregate table was still empty. That is expected. Detailed analytics only start filling from real public visits after deployment.

I also tested local-origin tracking.

{"status":"ignored","reason":"origin_not_tracked"}

Now local blog checks do not increment the production counter.

From visitor counter to public analytics

At first it was just three small numbers.

Today, month, total.

But once I actually operated it, the important part was not the number itself. It was whether the number was clean. And for a public blog, showing safe aggregate flow felt better than hiding a private log page.

The counter is still not an accounting system.

But now it can answer more useful questions.

Which posts are read?

Which countries are visiting?

Is traffic mostly mobile or desktop?

Is it search, direct, or referral?

And most importantly, how much of my own work should be kept out of the count?

That is what /analytics/ is for.

Leave a comment