A city engine does not run on raw data alone.
Even when the data is real, it still arrives in messy forms:
- one source is yearly
- another is monthly
- one is city-level
- another is prefecture-level
- one starts in 1970
- another starts in 2008
- one is a clean count
- another is a survey estimate
- one city calls something “dropout”
- another city calls something “early leaver”
So after the Variable Registry and the Observable Proxy Map, the next thing CitySim needs is the Data Adapter Spec.
One-sentence answer
The CitySim.150Y.CF Data Adapter Spec is the canonical conversion layer that tells the engine how to ingest raw external data, standardize it, resolve mismatched formats and time scales, apply fallback rules, and turn real-world datasets into simulation-ready values without silently changing meaning.
That is the layer that stops the model from becoming a patchwork of incompatible spreadsheets.
Why this page has to exist
The Variable Registry defines what the engine is allowed to use.
The Observable Proxy Map defines how variables connect to reality.
But neither page yet explains the most dangerous part of the process:
how the raw data is actually transformed before entering the simulation.
That transformation layer is where many models become unreliable.
Not because the data is fake.
But because the adapter logic is hidden, inconsistent, or improvised.
For example:
- a monthly labour series gets compared directly against an annual education series
- city data is missing, so national data is dropped in without warning
- a 0–1 ratio in one city is treated like a 0–100 index in another
- a survey question changes wording, but the time series is still treated as smooth
- missing years are filled in casually
- old boundaries and new boundaries are mixed together
That is why CitySim needs a declared adapter layer.
Without it, the engine may still sound precise, but it is not running on one stable input grammar.
What the Data Adapter Spec does
The Data Adapter Spec does seven jobs.
1. It standardizes raw inputs
Different datasets use different:
- units
- time frequencies
- reporting conventions
- geographic boundaries
- naming systems
The adapter makes them compatible before they touch the simulation.
2. It preserves meaning through conversion
A good adapter does not merely reformat numbers.
It makes sure that when a variable enters CitySim, it still means the same thing it meant in the source world.
That sounds obvious, but it is where many models quietly fail.
3. It declares fallback logic in advance
If city-level data is missing, what happens?
Does the engine use:
- metro data?
- prefecture data?
- national data?
- a proxy estimate?
- no value at all?
That must be declared before the run, not after the result.
4. It separates observation from interpolation
Some years are measured.
Some are estimated.
Some are carried forward.
Some are reconstructed.
The adapter must label those states properly.
5. It handles scale mismatches
CitySim will often combine:
- raw counts
- rates
- ratios
- percentages
- normalized indices
- composite scores
The adapter is what converts those into compatible engine values.
6. It keeps the same engine across cities
Tokyo, London, Singapore, or São Paulo may use different raw data sources, but the engine still needs the same internal variable grammar.
The adapter makes that possible.
7. It creates an audit trail
A serious model should be able to show:
- raw value
- source
- transformation rule
- adjusted value
- reason for adjustment
- confidence note
That is what makes the engine checkable.
Where the adapter sits in CitySim
The order should be:
Raw Dataset → Proxy Map → Data Adapter → Simulation Variable → Transition Kernel
That means:
- the Proxy Map says what data counts
- the Data Adapter says how it is transformed
- the Simulation Variable is the clean internal value
- the Transition Kernel then moves that value through time
This order matters.
If the adapter and the kernel get mixed together, then it becomes unclear whether a number changed because:
- the real world changed,
- the source format changed,
- or the model’s own transition equation changed.
That is not acceptable in a long-horizon engine.
What every adapter rule must contain
Every variable that enters CitySim should have a declared adapter rule with at least the following fields.
| Field | Meaning |
|---|---|
variable_id | canonical simulation variable name |
source_name | dataset or source label |
source_level | city / metro / prefecture / national / international |
raw_unit | original source unit |
target_unit | simulation unit |
time_frequency_raw | monthly / quarterly / annual / irregular |
time_frequency_target | annual / 5-year / 10-year |
geo_boundary_raw | original geography used |
geo_boundary_target | simulation geography |
transform_type | carry-over / normalize / interpolate / aggregate / reconcile |
transform_rule | exact rule applied |
missing_data_rule | what happens when values are absent |
fallback_rule | what source is used next if main source fails |
revision_rule | how later source revisions are handled |
quality_flag | high / medium / low |
audit_note | declared caveat |
That is the minimum.
If a model cannot show this for its inputs, then the input layer is still too soft.
The five main adapter problems
These are the most common input problems CitySim must solve.
1. Time-frequency mismatch
Some datasets are:
- monthly
- quarterly
- annual
- once every census cycle
- irregular
But CitySim may run on:
- yearly slices
- 5-year slices
- decade slices
So the adapter must define whether it:
- averages,
- sums,
- snapshots,
- smooths,
- or holds values constant until the next observation.
For example:
- monthly unemployment could be averaged into yearly unemployment
- annual births could be summed directly into the yearly run
- a census count every 5 years may be interpolated for intermediate years
This cannot be left vague.
2. Geographic mismatch
A simulation may be about:
- city proper
- metropolitan area
- prefecture/state
- functional urban region
But the data may come from a different boundary.
So the adapter must state:
- what the target geography is
- whether the source matches it
- what correction or caution applies if it does not
A Tokyo city run should not quietly mix:
- Tokyo Metropolis,
- Greater Tokyo,
- Japan national figures,
without clearly labeling which is which.
3. Unit mismatch
Different sources report values in different ways.
Examples:
- count
- per 1,000 persons
- %
- ratio
- index
- currency
- inflation-adjusted currency
- nominal currency
The adapter must say exactly how raw units become target units.
4. Missing data
This is unavoidable.
The adapter must specify:
- when to leave missing as missing
- when to interpolate
- when to use carry-forward
- when to use fallback geography
- when to mark the variable unusable
This should not be decided case by case after seeing the result.
5. Definition drift
Sometimes a variable looks stable but the meaning has changed.
Examples:
- survey wording changed
- administrative category changed
- school absence rules changed
- city boundary changed
- credential definitions changed
The adapter must flag these cases so the model does not treat them as seamless continuity.
The main adapter transformation types
1. Direct carry-over
Use the raw value as the simulation value.
Example:
- fertility rate
- population count
- life expectancy
This is the cleanest case.
2. Annual aggregation
Combine smaller time units into an annual value.
Example:
- monthly labour data → annual average
- monthly migration → annual total
- quarterly GDP proxy → annual aggregate
3. Interpolation
Estimate missing years between known observations.
Example:
- census every 5 years → interpolate population for intermediate years
This is allowed, but it must be labeled.
4. Normalization
Convert raw values into a common scale.
Example:
- convert a housing burden number into a 0–100 housing stress index
Useful for cross-city comparison.
5. Composite assembly
Build one simulation variable from several source inputs.
Example:
TEACHER_PIPELINE_HEALTHBASE_STOCKLEGITIMACY
This must publish the weighting rule.
6. Reconciliation
Resolve multiple competing sources or different source levels into one simulation value.
Example:
- city source and national source disagree
- yearly city survey is missing, but prefecture administrative count exists
This is one of the most sensitive adapter operations and must be declared openly.
Fallback hierarchy
CitySim should use a stable fallback logic.
For most city runs, the default order should be:
city → metro → prefecture/state → national → international → unavailable
That means:
- use city data first
- if missing, go to metropolitan data if it still matches the target city logic
- then prefecture/state
- then national
- then international harmonized dataset
- if nothing valid exists, mark as unavailable or weak estimate
The important thing is not that this hierarchy is perfect.
The important thing is that it is declared before the run.
Missing-data rules
The adapter should allow only a few permitted missing-data treatments.
Rule A. Exact observation available
Use it directly.
Rule B. Short gap with stable definition
Interpolate if the gap is small and the series is stable.
Rule C. Missing city data but valid higher-level proxy exists
Use fallback geography and mark quality down.
Rule D. Variable too weakly observed
Retain variable with low confidence or exclude from dominant verdict logic.
Rule E. No valid path
Mark as unavailable.
That is much better than pretending every missing series can be filled neatly.
Definition-drift flags
Every adapter needs drift flags.
Examples:
NO_DRIFTMINOR_METHOD_CHANGEBOUNDARY_CHANGECATEGORY_REDEFINITIONSURVEY_WORDING_SHIFTSERIES_BREAK
These flags matter because a long-horizon city engine will often cross decades where institutions change how they count the world.
If the engine ignores that, it may confuse an administrative change for a civilisation change.
Quality tiers for adapted values
After conversion, each value should carry a quality tier.
Tier 1 — clean observed input
Direct source, matching geography, matching time step, no serious transformation.
Tier 2 — adapted but strong
Minor aggregation or normalization, still robust.
Tier 3 — usable with caution
Interpolation or fallback geography used, moderate comparability issues.
Tier 4 — weak adapted value
Proxy bundle or unstable series, only suitable for directional reasoning.
Tier 5 — experimental
Very weak support, should not drive hard verdicts.
This is one of the easiest ways to keep the engine honest.
Minimum adapter requirements for the first hardening build
Before CitySim runs another serious 150-year city comparison, each active variable should have:
- declared source level
- declared target geography
- declared time frequency
- declared unit conversion
- declared missing-data rule
- declared fallback rule
- declared drift flag logic
- declared quality tier
If those are missing, the model is still too loose.
Example adapter entries
Example 1 — POP_TOTAL
| Field | Value |
|---|---|
variable_id | POP_TOTAL |
source_name | resident population statistics |
source_level | city |
raw_unit | persons |
target_unit | persons |
time_frequency_raw | annual |
time_frequency_target | annual |
geo_boundary_raw | city/metropolitan authority |
geo_boundary_target | city |
transform_type | direct carry-over |
transform_rule | use reported annual resident count |
missing_data_rule | interpolate only if 1-year gap |
fallback_rule | metro then prefecture |
revision_rule | latest official revision replaces prior release |
quality_flag | high |
Example 2 — SCHOOL_CAPTURE
| Field | Value |
|---|---|
variable_id | SCHOOL_CAPTURE |
source_name | attendance + progression + retention + re-entry bundle |
source_level | city + prefecture + national mix if needed |
raw_unit | mixed |
target_unit | index 0–100 |
time_frequency_raw | annual |
time_frequency_target | annual |
geo_boundary_raw | mixed; city preferred |
geo_boundary_target | city |
transform_type | composite assembly |
transform_rule | weighted bundle declared in proxy map |
missing_data_rule | missing component lowers quality grade |
fallback_rule | prefecture-level education series if city missing |
revision_rule | recompute full index if underlying component revised |
quality_flag | medium |
Example 3 — LEGITIMACY
| Field | Value |
|---|---|
variable_id | LEGITIMACY |
source_name | trust + satisfaction + compliance + continuity bundle |
source_level | city/national mixed |
raw_unit | mixed |
target_unit | index 0–100 |
time_frequency_raw | annual / irregular |
time_frequency_target | annual |
geo_boundary_raw | mixed |
geo_boundary_target | city |
transform_type | reconciliation + normalization |
transform_rule | multi-proxy weighted composite with uncertainty penalty |
missing_data_rule | if fewer than minimum proxies exist, value becomes low-confidence |
fallback_rule | national institutional trust proxy only if city proxy absent |
revision_rule | rerun whole bundle when source series changes |
quality_flag | low-to-medium |
That is the kind of clarity the adapter layer needs.
What the Data Adapter Spec prevents
This page exists to stop six common failures:
- silent geography swapping
- silent unit conversion
- hidden interpolation
- post-hoc fallback choices
- series-break blindness
- mixing scenario assumptions with raw observed data
Those six problems alone can distort a city model badly even when the raw data itself is real.
What this page does not yet do
This page still does not define:
- the exact formula weights for composite variables
- the full backtest scoring system
- the calibration loop
- the transition equations that move variables through time
Those are separate pages.
This page only locks the input conversion grammar.
That is exactly what CitySim needs right now.
Why this matters after Tokyo
Tokyo did not fail because there was no intelligence in the model.
Tokyo failed because the model was still too loose in the bridge between:
- dataset,
- variable,
- and forecast path.
That is not a small detail.
That is the difference between:
- a compelling city story,
- and a city engine that can be recalibrated honestly.
The Data Adapter Spec is one of the pages that hardens that bridge.
Final definition
The CitySim.150Y.CF Data Adapter Spec is the canonical input-conversion layer that standardizes raw city data, preserves variable meaning across sources and time scales, applies declared fallback and missing-data rules, and transforms external measurements into stable simulation-ready values for the CitySim engine.
Without it, the engine may still be interesting.
But it is not yet disciplined.
Almost-Code
“`text id=”6tf0t0″
CITYSIM_150Y_CF_DATA_ADAPTER_SPEC_V1
PURPOSE:
Transform raw external datasets into stable simulation-ready values without silently changing meaning.
CORE_LAW:
No raw dataset may enter the simulation directly unless its unit,
time step,
geography,
and transformation rule are declared.
PIPELINE_ORDER:
Raw_Dataset
-> Observable_Proxy_Map
-> Data_Adapter
-> Simulation_Variable
-> Transition_Kernel
ADAPTER_ENTRY_SCHEMA:
{
variable_id,
source_name,
source_level,
raw_unit,
target_unit,
time_frequency_raw,
time_frequency_target,
geo_boundary_raw,
geo_boundary_target,
transform_type,
transform_rule,
missing_data_rule,
fallback_rule,
revision_rule,
quality_flag,
audit_note
}
TRANSFORM_TYPES:
- direct_carry_over
- annual_aggregation
- interpolation
- normalization
- composite_assembly
- reconciliation
FALLBACK_HIERARCHY:
city
-> metro
-> prefecture_or_state
-> national
-> international
-> unavailable
MISSING_DATA_RULES:
A = use exact observation
B = interpolate only for short stable gaps
C = use fallback geography with quality downgrade
D = retain weak value with caution label
E = mark unavailable
DEFINITION_DRIFT_FLAGS:
- NO_DRIFT
- MINOR_METHOD_CHANGE
- BOUNDARY_CHANGE
- CATEGORY_REDEFINITION
- SURVEY_WORDING_SHIFT
- SERIES_BREAK
QUALITY_TIERS:
T1 = clean observed input
T2 = adapted but strong
T3 = usable with caution
T4 = weak adapted value
T5 = experimental
FAIL_CONDITIONS:
- source geography hidden
- unit conversion hidden
- interpolation hidden
- fallback chosen after result is known
- series break ignored
- observed and scenario values mixed without label
PASS_CONDITION:
An input is adapter-valid only if its
source,
unit,
time-frequency,
geography,
transformation,
fallback logic,
missing-data rule,
and quality tier are declared.
OUTPUT:
adapter_validity = TRUE or FALSE
“`
eduKateSG Learning System | Control Tower, Runtime, and Next Routes
This article is one node inside the wider eduKateSG Learning System.
At eduKateSG, we do not treat education as random tips, isolated tuition notes, or one-off exam hacks. We treat learning as a living runtime:
state -> diagnosis -> method -> practice -> correction -> repair -> transfer -> long-term growth
That is why each article is written to do more than answer one question. It should help the reader move into the next correct corridor inside the wider eduKateSG system: understand -> diagnose -> repair -> optimize -> transfer. Your uploaded spine clearly clusters around Education OS, Tuition OS, Civilisation OS, subject learning systems, runtime/control-tower pages, and real-world lattice connectors, so this footer compresses those routes into one reusable ending block.
Start Here
- Education OS | How Education Works
- Tuition OS | eduKateOS & CivOS
- Civilisation OS
- How Civilization Works
- CivOS Runtime Control Tower
Learning Systems
- The eduKate Mathematics Learning System
- Learning English System | FENCE by eduKateSG
- eduKate Vocabulary Learning System
- Additional Mathematics 101
Runtime and Deep Structure
- Human Regenerative Lattice | 3D Geometry of Civilisation
- Civilisation Lattice
- Advantages of Using CivOS | Start Here Stack Z0-Z3 for Humans & AI
Real-World Connectors
Subject Runtime Lane
- Math Worksheets
- How Mathematics Works PDF
- MathOS Runtime Control Tower v0.1
- MathOS Failure Atlas v0.1
- MathOS Recovery Corridors P0 to P3
How to Use eduKateSG
If you want the big picture -> start with Education OS and Civilisation OS
If you want subject mastery -> enter Mathematics, English, Vocabulary, or Additional Mathematics
If you want diagnosis and repair -> move into the CivOS Runtime and subject runtime pages
If you want real-life context -> connect learning back to Family OS, Bukit Timah OS, Punggol OS, and Singapore City OS
Why eduKateSG writes articles this way
eduKateSG is not only publishing content.
eduKateSG is building a connected control tower for human learning.
That means each article can function as:
- a standalone answer,
- a bridge into a wider system,
- a diagnostic node,
- a repair route,
- and a next-step guide for students, parents, tutors, and AI readers.
eduKateSG.LearningSystem.Footer.v1.0
TITLE: eduKateSG Learning System | Control Tower / Runtime / Next Routes
FUNCTION:
This article is one node inside the wider eduKateSG Learning System.
Its job is not only to explain one topic, but to help the reader enter the next correct corridor.
CORE_RUNTIME:
reader_state -> understanding -> diagnosis -> correction -> repair -> optimisation -> transfer -> long_term_growth
CORE_IDEA:
eduKateSG does not treat education as random tips, isolated tuition notes, or one-off exam hacks.
eduKateSG treats learning as a connected runtime across student, parent, tutor, school, family, subject, and civilisation layers.
PRIMARY_ROUTES:
1. First Principles
- Education OS
- Tuition OS
- Civilisation OS
- How Civilization Works
- CivOS Runtime Control Tower
2. Subject Systems
- Mathematics Learning System
- English Learning System
- Vocabulary Learning System
- Additional Mathematics
3. Runtime / Diagnostics / Repair
- CivOS Runtime Control Tower
- MathOS Runtime Control Tower
- MathOS Failure Atlas
- MathOS Recovery Corridors
- Human Regenerative Lattice
- Civilisation Lattice
4. Real-World Connectors
- Family OS
- Bukit Timah OS
- Punggol OS
- Singapore City OS
READER_CORRIDORS:
IF need == "big picture"
THEN route_to = Education OS + Civilisation OS + How Civilization Works
IF need == "subject mastery"
THEN route_to = Mathematics + English + Vocabulary + Additional Mathematics
IF need == "diagnosis and repair"
THEN route_to = CivOS Runtime + subject runtime pages + failure atlas + recovery corridors
IF need == "real life context"
THEN route_to = Family OS + Bukit Timah OS + Punggol OS + Singapore City OS
CLICKABLE_LINKS:
Education OS:
Education OS | How Education Works — The Regenerative Machine Behind Learning
Tuition OS:
Tuition OS (eduKateOS / CivOS)
Civilisation OS:
Civilisation OS
How Civilization Works:
Civilisation: How Civilisation Actually Works
CivOS Runtime Control Tower:
CivOS Runtime / Control Tower (Compiled Master Spec)
Mathematics Learning System:
The eduKate Mathematics Learning System™
English Learning System:
Learning English System: FENCE™ by eduKateSG
Vocabulary Learning System:
eduKate Vocabulary Learning System
Additional Mathematics 101:
Additional Mathematics 101 (Everything You Need to Know)
Human Regenerative Lattice:
eRCP | Human Regenerative Lattice (HRL)
Civilisation Lattice:
The Operator Physics Keystone
Family OS:
Family OS (Level 0 root node)
Bukit Timah OS:
Bukit Timah OS
Punggol OS:
Punggol OS
Singapore City OS:
Singapore City OS
MathOS Runtime Control Tower:
MathOS Runtime Control Tower v0.1 (Install • Sensors • Fences • Recovery • Directories)
MathOS Failure Atlas:
MathOS Failure Atlas v0.1 (30 Collapse Patterns + Sensors + Truncate/Stitch/Retest)
MathOS Recovery Corridors:
MathOS Recovery Corridors Directory (P0→P3) — Entry Conditions, Steps, Retests, Exit Gates
SHORT_PUBLIC_FOOTER:
This article is part of the wider eduKateSG Learning System.
At eduKateSG, learning is treated as a connected runtime:
understanding -> diagnosis -> correction -> repair -> optimisation -> transfer -> long-term growth.
Start here:
Education OS
Education OS | How Education Works — The Regenerative Machine Behind Learning
Tuition OS
Tuition OS (eduKateOS / CivOS)
Civilisation OS
Civilisation OS
CivOS Runtime Control Tower
CivOS Runtime / Control Tower (Compiled Master Spec)
Mathematics Learning System
The eduKate Mathematics Learning System™
English Learning System
Learning English System: FENCE™ by eduKateSG
Vocabulary Learning System
eduKate Vocabulary Learning System
Family OS
Family OS (Level 0 root node)
Singapore City OS
Singapore City OS
CLOSING_LINE:
A strong article does not end at explanation.
A strong article helps the reader enter the next correct corridor.
TAGS:
eduKateSG
Learning System
Control Tower
Runtime
Education OS
Tuition OS
Civilisation OS
Mathematics
English
Vocabulary
Family OS
Singapore City OS

