Understanding the CitySim Calibration Protocol for Model Tuning

A backtest tells us whether the city engine missed reality.

A calibration protocol tells us what we are allowed to change after that miss, how we change it, and how we avoid cheating while doing so.

That is the difference.

Without calibration rules, a model can always be made to look better after the fact. You just keep nudging weights, bending thresholds, softening definitions, and smoothing ugly errors until the output looks elegant again. But that is not calibration. That is performance theatre.

So after the Variable Registry, Observable Proxy Map, Data Adapter Spec, and Backtest Protocol, the next thing CitySim needs is the Calibration Protocol.

One-sentence answer

The CitySim.150Y.CF Calibration Protocol is the canonical rulebook for how the engine may be tuned after backtesting, including what parameters can change, what must stay fixed, how weights and transition rules are updated, how overfitting is prevented, and when the model is considered improved rather than merely cosmetically adjusted.

That is the page that stops CitySim from cheating its way into false accuracy.

Why this page has to exist

The backtest answers one question:

How wrong was the engine?

But that is not enough.

Once the engine misses reality, the next question is:

What exactly are we allowed to change, and how do we know we have improved the model rather than just forced it to mimic the past?

That is the calibration problem.

This matters because city models are very easy to “improve” dishonestly.

A simulation can be made to look better by:

weakening difficult variables
reweighting failure signals after seeing the miss
changing thresholds to fit one city
smoothing noise so errors disappear
quietly reinterpreting the meaning of a variable
fitting Tokyo beautifully while making the engine worse for every other city

That is why calibration must be governed by a protocol.

Not because the model is untrustworthy by nature.
But because all simulation engines become dangerous when tuning is unconstrained.

What the Calibration Protocol does

The Calibration Protocol does seven jobs.

1. It defines what may be calibrated

Not everything in a city engine should move freely.

Some things are structural and should stay fixed unless the theory itself changes.

Other things are empirical and should be tuned.

The protocol must say clearly:

what can be adjusted
what cannot be adjusted
what requires explicit version change
what requires a theory update rather than a numerical tweak

2. It separates theory from coefficient tuning

This is one of the most important distinctions in the whole CitySim stack.

For example:

the idea that demographic pressure increases maintenance and dependency load is a structural claim
the exact rate at which that load compounds inside one city is a calibration question

If theory and coefficient tuning are mixed together, the engine becomes unstable and hard to audit.

3. It prevents overfitting

A model that fits one city too perfectly may become worse as a general city engine.

That is why calibration must distinguish between:

fitting Tokyo specifically
improving the general engine
improving one city archetype pack
improving one city-local calibration file

These are not the same thing.

4. It forces honest versioning

Once the engine is recalibrated, the protocol must say:

what changed
why it changed
what evidence triggered the change
whether the change belongs to the core engine, an archetype pack, or one local city pack

Otherwise no one will know which version of CitySim they are actually reading.

5. It preserves comparability across cities

If Tokyo is calibrated one way and Seoul another, but the changes are hidden or arbitrary, then the engine stops being one engine.

Calibration must preserve shared grammar while allowing local tuning where justified.

6. It tells us when recalibration is enough

Some misses can be fixed with coefficient adjustment.

Some misses reveal that the proxies are weak.

Some misses reveal the variable definition itself is wrong.

Some misses reveal that the transition kernel is structurally incomplete.

Calibration should tell us which layer failed.

7. It decides when the engine is ready to rerun forward scenarios

A forward 150-year scenario should not be rerun immediately after a bad backtest unless the calibration layer says the engine has improved enough to justify another attempt.

That gate matters.

What calibration is not

Calibration is not:

making the model look nicer
deleting variables that performed badly
redefining the city so the numbers fit
letting one city rewrite the whole engine
tuning until the past is replicated perfectly
quietly changing the model after the output looks embarrassing

Calibration is not cosmetic repair.

Calibration is disciplined correction.

The three calibration layers

CitySim should calibrate in layers, not all at once.

Layer 1. Core engine calibration

These are changes that affect the whole engine.

Examples:

drift-rate sensitivity
repair-rate response shape
general lag structure
shock propagation logic
standard normalization behaviour

These changes are powerful and should be rare.

A core engine change must be justified carefully because it affects all cities.

Layer 2. City archetype calibration

These are changes that affect a city-type family, not every city.

Examples:

shrinking aging capitals
global service megacities
industrial export cities
tourism-dependent cities
frontier growth cities

Tokyo should not force the same tuning as Houston or Lagos.
But Tokyo may help tune the “shrinking aging high-stock megacity” archetype.

This layer is where much of the useful calibration should happen.

Layer 3. City-local calibration

These are adjustments justified only for one named city.

Examples:

Tokyo-specific migration behaviour
Singapore-specific education-state coupling
London-specific housing-finance coupling
Dubai-specific foreign-labour dependence

These changes should be allowed, but clearly labeled as city-local, not global truth.

What can be calibrated

The protocol should define a small list of permitted calibration targets.

A. Parameter weights

Examples:

weight of non-attendance inside school capture
weight of aging inside maintenance load
weight of retraining success inside mid-life retool score

These are often the safest things to tune first.

B. Transition coefficients

Examples:

how fast fertility decline affects youth inflow
how fast late-life isolation reduces civic continuity
how strongly repair rate counters drift rate

These matter a great deal.

C. Lag lengths

Examples:

how many years before education leakage appears in labour weakness
how many years before infrastructure neglect shows as city-body decline
how many years before chronic low fertility shifts school-system stress

This is one of the most underappreciated calibration areas.

D. Threshold boundaries

Examples:

what counts as low, medium, or high housing stress
what counts as a legitimacy breach band
what counts as school capture failure

Thresholds may be calibrated, but very carefully, because they are easy to manipulate.

E. Composite proxy weights

Examples:

how much trust survey matters inside legitimacy
how much progression stability matters inside transfer integrity
how much vacancy burden matters inside teacher pipeline health

These often need adjustment once backtesting reveals weak proxy bundles.

What should not be casually calibrated

Some parts of the engine should stay much more stable.

1. Variable definitions

Do not redefine a variable just because it performed poorly.

If the definition must change, that is a theory update, not ordinary calibration.

2. Domain structure

Do not remove major city domains because one backtest was inconvenient.

For example:

demography
education
economy
infrastructure
governance
social continuity
should remain part of the engine unless the framework itself changes.

3. Core pass/fail language

Do not rewrite success so the engine “passes” more often.

If success criteria change, that should be a visible version change.

4. Proxy quality labels

Do not upgrade a weak proxy to “strong” because it helps the score.

Evidence strength must remain evidence strength.

5. Version history

Never erase the fact that an older model missed badly.

A serious engine keeps its miss history visible.

The calibration sequence

Every CitySim recalibration should follow the same order.

Step 1. Run the backtest

Do not calibrate blindly.

First see where the model actually missed.

Step 2. Diagnose the source of the miss

Every miss should be classified.

Was it caused by:

weak proxy
bad data adapter
wrong coefficient
wrong lag
wrong threshold
missing variable
wrong structural theory

Do not jump straight into coefficient tuning before diagnosing the layer.

Step 3. Choose the smallest justified change

Calibration should prefer the smallest change that fixes the largest real error.

That keeps the engine stable.

Step 4. Retest on the same backtest window

Check whether the change improved fit.

Step 5. Validate on a second window or second city

This is crucial.

A model that improves on one window but gets worse elsewhere may just be overfitting.

Step 6. Classify the change

Decide whether the adjustment belongs to:

core engine
archetype pack
city-local pack

Step 7. Version and publish

Record:

what changed
what evidence triggered it
what improved
what worsened
what uncertainty remains

That is the calibration cycle.

The four main calibration failure types

When the engine misses reality, the failure usually falls into one of four buckets.

Type 1. Measurement failure

The variable is conceptually fine, but the proxy map is weak.

Example:

legitimacy inferred from weak indicators
parent capability support measured too indirectly

Fix:

strengthen proxy bundle
reduce weight
improve data adapter

Type 2. Parameter failure

The structure is fine, but the coefficient is wrong.

Example:

fertility decline was too weakly linked to youth inflow decline
school stress was assumed to worsen too slowly

Fix:

recalibrate parameter or lag

Type 3. Threshold failure

The engine sees the right movement, but the boundary bands are misplaced.

Example:

the city enters stress earlier than the model recognizes
warning bands are too forgiving

Fix:

adjust thresholds carefully

Type 4. Structural failure

The engine is missing a real mechanism.

Example:

migration dynamics under global city conditions
housing-finance feedback loop
aging interacting with social isolation more strongly than expected

Fix:

theory revision
variable expansion
transition kernel update

This is the most serious class of failure.

The anti-overfitting rules

Calibration only has value if it avoids overfitting.

So the protocol needs hard rules.

Rule 1. Train / validation separation

Use one window for calibration and another window for checking whether the improvement generalizes.

For example:

calibrate on 2010–2020
validate on 2020–2025

Or:

calibrate on Tokyo
validate on Seoul or Osaka within the same archetype band

Rule 2. No perfect-fit obsession

A city engine should not be forced toward exact past replication if doing so damages generality.

The aim is not to mimic every wrinkle of the past.

The aim is to become more truth-bearing and stable.

Rule 3. Weak proxies cannot dominate tuning

Do not let the noisiest variables control the biggest recalibration moves.

Rule 4. Smaller changes beat larger changes

If two calibration choices improve the backtest similarly, keep the simpler one.

Rule 5. One city cannot rewrite the whole world

Tokyo can improve the Tokyo pack and perhaps the archetype pack. It should not automatically redefine the universal engine.

Calibration status classes

After recalibration, the engine should declare its new status.

Class R1 — minor retune

Small parameter or lag changes, core theory intact.

Class R2 — moderate recalibration

Several weights or thresholds changed, but engine architecture intact.

Class R3 — archetype recalibration

The city archetype pack changed meaningfully.

Class R4 — structural revision

The engine needed new mechanisms or important theory revision.

Class R5 — unstable

Calibration attempts did not generalize well enough; engine still too loose.

This is useful because not all recalibration events are equal.

What a calibration report must publish

Every calibration pass should publish an audit trail.

1. Previous model version

What was active before recalibration?

2. Backtest miss summary

Where did it fail?

3. Suspected failure layer

Was the problem measurement, coefficient, threshold, or structure?

4. Changes made

Exactly what changed?

5. Why the change was justified

What evidence supported the update?

6. Improvement score

Did the model actually get better?

7. Cross-check result

Did the improvement hold outside the training case?

8. New version label

What is the new engine version?

Without this, “recalibration” is just a vague claim.

The most important calibration law

Here is the deepest rule:

CitySim must prefer stable truth over elegant fit.

That means:

a rough but honest engine is better than a beautiful overfit engine
a city-local patch should not pretend to be universal law
uncertainty should remain visible
not every miss should be forced away by tuning

That is how the engine stays usable over time.

Why this matters after Tokyo

Tokyo did not merely show that the first engine missed.

Tokyo showed where the discipline gap was.

The initial run was too easy to read as if it were forward-accurate. But once the backtest pushed the model against recent history, it became clear that the engine needed a more formal way to:

identify the miss,
locate the layer that failed,
retune responsibly,
and prove that the retune was genuine.

That is exactly what this protocol provides.

Without it, every future city article risks becoming:

write,
miss,
quietly tweak,
rewrite,
move on.

That is not a civilisation-grade engine.

Final definition

The CitySim.150Y.CF Calibration Protocol is the canonical tuning discipline that governs how the model may be improved after backtesting, including what can change, what must remain stable, how errors are diagnosed, how overfitting is prevented, and how recalibration is versioned, validated, and published.

Without it, CitySim can still learn.

But no one can tell whether it learned honestly.

Almost-Code

“`text id=”s4irpd”
CITYSIM_150Y_CF_CALIBRATION_PROTOCOL_V1

PURPOSE:
Improve CitySim after backtesting without allowing hidden retuning,
overfitting,
or structural drift disguised as accuracy.

CORE_LAW:
Calibration must prefer stable truth over elegant fit.

CALIBRATION_LAYERS:
L1 = core_engine
L2 = city_archetype_pack
L3 = city_local_pack

PERMITTED_CALIBRATION_TARGETS:

parameter_weights
transition_coefficients
lag_lengths
threshold_boundaries
composite_proxy_weights

RESTRICTED_ELEMENTS:

variable_definitions
domain_structure
core_pass_fail_language
proxy_quality_labels
version_history

CALIBRATION_SEQUENCE:

run_backtest
diagnose_failure_layer
choose_smallest_justified_change
retest_on_training_window
validate_on_second_window_or_second_city
classify_change_scope
version_and_publish

FAILURE_TYPES:
F1 = measurement_failure
F2 = parameter_failure
F3 = threshold_failure
F4 = structural_failure

ANTI_OVERFITTING_RULES:

train_validation_separation
no_perfect_fit_obsession
weak_proxies_cannot_dominate_tuning
smaller_changes_preferred
one_city_cannot_rewrite_whole_engine

CALIBRATION_STATUS_CLASSES:
R1 = minor_retune
R2 = moderate_recalibration
R3 = archetype_recalibration
R4 = structural_revision
R5 = unstable

REQUIRED_CALIBRATION_REPORT:

previous_model_version
backtest_miss_summary
suspected_failure_layer
changes_made
justification
improvement_score
cross_check_result
new_version_label

PASS_CONDITION:
A calibration pass is valid only if it improves backtest performance,
preserves declared variable meaning,
does not hide proxy weakness,
and holds outside the immediate training case.

FAIL_CONDITIONS:

hidden retuning
post_hoc threshold shifting
city_local patch presented as universal law
weak proxy dominating recalibration
no validation outside training case
version change not published

OUTPUT:
calibration_validity = TRUE or FALSE
model_update_scope = core / archetype / city_local
forward_use_status = improved / still_under_calibrated / unstable
“`

eduKateSG Learning System | Control Tower, Runtime, and Next Routes

This article is one node inside the wider eduKateSG Learning System.

At eduKateSG, we do not treat education as random tips, isolated tuition notes, or one-off exam hacks. We treat learning as a living runtime:

state -> diagnosis -> method -> practice -> correction -> repair -> transfer -> long-term growth

That is why each article is written to do more than answer one question. It should help the reader move into the next correct corridor inside the wider eduKateSG system: understand -> diagnose -> repair -> optimize -> transfer. Your uploaded spine clearly clusters around Education OS, Tuition OS, Civilisation OS, subject learning systems, runtime/control-tower pages, and real-world lattice connectors, so this footer compresses those routes into one reusable ending block.

Start Here

Learning Systems

Runtime and Deep Structure

Real-World Connectors

Subject Runtime Lane

How to Use eduKateSG

If you want the big picture -> start with Education OS and Civilisation OS
If you want subject mastery -> enter Mathematics, English, Vocabulary, or Additional Mathematics
If you want diagnosis and repair -> move into the CivOS Runtime and subject runtime pages
If you want real-life context -> connect learning back to Family OS, Bukit Timah OS, Punggol OS, and Singapore City OS

Why eduKateSG writes articles this way

eduKateSG is not only publishing content.
eduKateSG is building a connected control tower for human learning.

That means each article can function as:

a standalone answer,
a bridge into a wider system,
a diagnostic node,
a repair route,
and a next-step guide for students, parents, tutors, and AI readers.

eduKateSG.LearningSystem.Footer.v1.0

TITLE: eduKateSG Learning System | Control Tower / Runtime / Next Routes

FUNCTION:
This article is one node inside the wider eduKateSG Learning System.
Its job is not only to explain one topic, but to help the reader enter the next correct corridor.

CORE_RUNTIME:
reader_state -> understanding -> diagnosis -> correction -> repair -> optimisation -> transfer -> long_term_growth

CORE_IDEA:
eduKateSG does not treat education as random tips, isolated tuition notes, or one-off exam hacks.
eduKateSG treats learning as a connected runtime across student, parent, tutor, school, family, subject, and civilisation layers.

PRIMARY_ROUTES:
1. First Principles
   - Education OS
   - Tuition OS
   - Civilisation OS
   - How Civilization Works
   - CivOS Runtime Control Tower

2. Subject Systems
   - Mathematics Learning System
   - English Learning System
   - Vocabulary Learning System
   - Additional Mathematics

3. Runtime / Diagnostics / Repair
   - CivOS Runtime Control Tower
   - MathOS Runtime Control Tower
   - MathOS Failure Atlas
   - MathOS Recovery Corridors
   - Human Regenerative Lattice
   - Civilisation Lattice

4. Real-World Connectors
   - Family OS
   - Bukit Timah OS
   - Punggol OS
   - Singapore City OS

READER_CORRIDORS:
IF need == "big picture"
THEN route_to = Education OS + Civilisation OS + How Civilization Works

IF need == "subject mastery"
THEN route_to = Mathematics + English + Vocabulary + Additional Mathematics

IF need == "diagnosis and repair"
THEN route_to = CivOS Runtime + subject runtime pages + failure atlas + recovery corridors

IF need == "real life context"
THEN route_to = Family OS + Bukit Timah OS + Punggol OS + Singapore City OS

CLICKABLE_LINKS:
Education OS:
Education OS | How Education Works — The Regenerative Machine Behind Learning


Tuition OS:
Tuition OS (eduKateOS / CivOS)


Civilisation OS:
Civilisation OS


How Civilization Works:
Civilisation: How Civilisation Actually Works


CivOS Runtime Control Tower:
CivOS Runtime / Control Tower (Compiled Master Spec)


Mathematics Learning System:
The eduKate Mathematics Learning System™


English Learning System:
Learning English System: FENCE™ by eduKateSG


Vocabulary Learning System:
eduKate Vocabulary Learning System


Additional Mathematics 101:
Additional Mathematics 101 (Everything You Need to Know)


Human Regenerative Lattice:
eRCP | Human Regenerative Lattice (HRL)


Civilisation Lattice:
The Operator Physics Keystone


Family OS:
Family OS (Level 0 root node)


Bukit Timah OS:
Bukit Timah OS


Punggol OS:
Punggol OS


Singapore City OS:
Singapore City OS


MathOS Runtime Control Tower:
MathOS Runtime Control Tower v0.1 (Install • Sensors • Fences • Recovery • Directories)


MathOS Failure Atlas:
MathOS Failure Atlas v0.1 (30 Collapse Patterns + Sensors + Truncate/Stitch/Retest)


MathOS Recovery Corridors:
MathOS Recovery Corridors Directory (P0→P3) — Entry Conditions, Steps, Retests, Exit Gates


SHORT_PUBLIC_FOOTER:
This article is part of the wider eduKateSG Learning System.
At eduKateSG, learning is treated as a connected runtime:
understanding -> diagnosis -> correction -> repair -> optimisation -> transfer -> long-term growth.

Start here:
Education OS
Education OS | How Education Works — The Regenerative Machine Behind Learning


Tuition OS
Tuition OS (eduKateOS / CivOS)


Civilisation OS
Civilisation OS


CivOS Runtime Control Tower
CivOS Runtime / Control Tower (Compiled Master Spec)


Mathematics Learning System
The eduKate Mathematics Learning System™


English Learning System
Learning English System: FENCE™ by eduKateSG


Vocabulary Learning System
eduKate Vocabulary Learning System


Family OS
Family OS (Level 0 root node)


Singapore City OS
Singapore City OS


CLOSING_LINE:
A strong article does not end at explanation.
A strong article helps the reader enter the next correct corridor.

TAGS:
eduKateSG
Learning System
Control Tower
Runtime
Education OS
Tuition OS
Civilisation OS
Mathematics
English
Vocabulary
Family OS
Singapore City OS

One-sentence answer

Why this page has to exist

What the Calibration Protocol does

1. It defines what may be calibrated

2. It separates theory from coefficient tuning

3. It prevents overfitting

4. It forces honest versioning

5. It preserves comparability across cities

6. It tells us when recalibration is enough

7. It decides when the engine is ready to rerun forward scenarios

What calibration is not

The three calibration layers

Layer 1. Core engine calibration

Layer 2. City archetype calibration

Layer 3. City-local calibration

What can be calibrated

A. Parameter weights

B. Transition coefficients

C. Lag lengths

D. Threshold boundaries

E. Composite proxy weights

What should not be casually calibrated

1. Variable definitions

2. Domain structure

3. Core pass/fail language

4. Proxy quality labels

5. Version history

The calibration sequence

Step 1. Run the backtest

Step 2. Diagnose the source of the miss

Step 3. Choose the smallest justified change

Step 4. Retest on the same backtest window

Step 5. Validate on a second window or second city

Step 6. Classify the change

Step 7. Version and publish

The four main calibration failure types

Type 1. Measurement failure

Type 2. Parameter failure

Type 3. Threshold failure

Type 4. Structural failure

The anti-overfitting rules

Rule 1. Train / validation separation

Rule 2. No perfect-fit obsession

Rule 3. Weak proxies cannot dominate tuning

Rule 4. Smaller changes beat larger changes

Rule 5. One city cannot rewrite the whole world

Calibration status classes

Class R1 — minor retune

Class R2 — moderate recalibration

Class R3 — archetype recalibration

Class R4 — structural revision

Class R5 — unstable

What a calibration report must publish

1. Previous model version

2. Backtest miss summary

3. Suspected failure layer

4. Changes made

5. Why the change was justified

6. Improvement score

7. Cross-check result

8. New version label

The most important calibration law

Why this matters after Tokyo

Final definition

Almost-Code

eduKateSG Learning System | Control Tower, Runtime, and Next Routes

Start Here

Learning Systems

Runtime and Deep Structure

Real-World Connectors

Subject Runtime Lane

How to Use eduKateSG

Why eduKateSG writes articles this way

Share this:

Like this: