What Is the CitySim.150Y.CF Data Adapter Spec?

A city engine does not run on raw data alone.

Even when the data is real, it still arrives in messy forms:

  • one source is yearly
  • another is monthly
  • one is city-level
  • another is prefecture-level
  • one starts in 1970
  • another starts in 2008
  • one is a clean count
  • another is a survey estimate
  • one city calls something “dropout”
  • another city calls something “early leaver”

So after the Variable Registry and the Observable Proxy Map, the next thing CitySim needs is the Data Adapter Spec.

One-sentence answer

The CitySim.150Y.CF Data Adapter Spec is the canonical conversion layer that tells the engine how to ingest raw external data, standardize it, resolve mismatched formats and time scales, apply fallback rules, and turn real-world datasets into simulation-ready values without silently changing meaning.

That is the layer that stops the model from becoming a patchwork of incompatible spreadsheets.


Why this page has to exist

The Variable Registry defines what the engine is allowed to use.

The Observable Proxy Map defines how variables connect to reality.

But neither page yet explains the most dangerous part of the process:

how the raw data is actually transformed before entering the simulation.

That transformation layer is where many models become unreliable.

Not because the data is fake.
But because the adapter logic is hidden, inconsistent, or improvised.

For example:

  • a monthly labour series gets compared directly against an annual education series
  • city data is missing, so national data is dropped in without warning
  • a 0–1 ratio in one city is treated like a 0–100 index in another
  • a survey question changes wording, but the time series is still treated as smooth
  • missing years are filled in casually
  • old boundaries and new boundaries are mixed together

That is why CitySim needs a declared adapter layer.

Without it, the engine may still sound precise, but it is not running on one stable input grammar.


What the Data Adapter Spec does

The Data Adapter Spec does seven jobs.

1. It standardizes raw inputs

Different datasets use different:

  • units
  • time frequencies
  • reporting conventions
  • geographic boundaries
  • naming systems

The adapter makes them compatible before they touch the simulation.

2. It preserves meaning through conversion

A good adapter does not merely reformat numbers.

It makes sure that when a variable enters CitySim, it still means the same thing it meant in the source world.

That sounds obvious, but it is where many models quietly fail.

3. It declares fallback logic in advance

If city-level data is missing, what happens?

Does the engine use:

  • metro data?
  • prefecture data?
  • national data?
  • a proxy estimate?
  • no value at all?

That must be declared before the run, not after the result.

4. It separates observation from interpolation

Some years are measured.
Some are estimated.
Some are carried forward.
Some are reconstructed.

The adapter must label those states properly.

5. It handles scale mismatches

CitySim will often combine:

  • raw counts
  • rates
  • ratios
  • percentages
  • normalized indices
  • composite scores

The adapter is what converts those into compatible engine values.

6. It keeps the same engine across cities

Tokyo, London, Singapore, or São Paulo may use different raw data sources, but the engine still needs the same internal variable grammar.

The adapter makes that possible.

7. It creates an audit trail

A serious model should be able to show:

  • raw value
  • source
  • transformation rule
  • adjusted value
  • reason for adjustment
  • confidence note

That is what makes the engine checkable.


Where the adapter sits in CitySim

The order should be:

Raw Dataset → Proxy Map → Data Adapter → Simulation Variable → Transition Kernel

That means:

  • the Proxy Map says what data counts
  • the Data Adapter says how it is transformed
  • the Simulation Variable is the clean internal value
  • the Transition Kernel then moves that value through time

This order matters.

If the adapter and the kernel get mixed together, then it becomes unclear whether a number changed because:

  • the real world changed,
  • the source format changed,
  • or the model’s own transition equation changed.

That is not acceptable in a long-horizon engine.


What every adapter rule must contain

Every variable that enters CitySim should have a declared adapter rule with at least the following fields.

FieldMeaning
variable_idcanonical simulation variable name
source_namedataset or source label
source_levelcity / metro / prefecture / national / international
raw_unitoriginal source unit
target_unitsimulation unit
time_frequency_rawmonthly / quarterly / annual / irregular
time_frequency_targetannual / 5-year / 10-year
geo_boundary_raworiginal geography used
geo_boundary_targetsimulation geography
transform_typecarry-over / normalize / interpolate / aggregate / reconcile
transform_ruleexact rule applied
missing_data_rulewhat happens when values are absent
fallback_rulewhat source is used next if main source fails
revision_rulehow later source revisions are handled
quality_flaghigh / medium / low
audit_notedeclared caveat

That is the minimum.

If a model cannot show this for its inputs, then the input layer is still too soft.


The five main adapter problems

These are the most common input problems CitySim must solve.

1. Time-frequency mismatch

Some datasets are:

  • monthly
  • quarterly
  • annual
  • once every census cycle
  • irregular

But CitySim may run on:

  • yearly slices
  • 5-year slices
  • decade slices

So the adapter must define whether it:

  • averages,
  • sums,
  • snapshots,
  • smooths,
  • or holds values constant until the next observation.

For example:

  • monthly unemployment could be averaged into yearly unemployment
  • annual births could be summed directly into the yearly run
  • a census count every 5 years may be interpolated for intermediate years

This cannot be left vague.

2. Geographic mismatch

A simulation may be about:

  • city proper
  • metropolitan area
  • prefecture/state
  • functional urban region

But the data may come from a different boundary.

So the adapter must state:

  • what the target geography is
  • whether the source matches it
  • what correction or caution applies if it does not

A Tokyo city run should not quietly mix:

  • Tokyo Metropolis,
  • Greater Tokyo,
  • Japan national figures,
    without clearly labeling which is which.

3. Unit mismatch

Different sources report values in different ways.

Examples:

  • count
  • per 1,000 persons
  • %
  • ratio
  • index
  • currency
  • inflation-adjusted currency
  • nominal currency

The adapter must say exactly how raw units become target units.

4. Missing data

This is unavoidable.

The adapter must specify:

  • when to leave missing as missing
  • when to interpolate
  • when to use carry-forward
  • when to use fallback geography
  • when to mark the variable unusable

This should not be decided case by case after seeing the result.

5. Definition drift

Sometimes a variable looks stable but the meaning has changed.

Examples:

  • survey wording changed
  • administrative category changed
  • school absence rules changed
  • city boundary changed
  • credential definitions changed

The adapter must flag these cases so the model does not treat them as seamless continuity.


The main adapter transformation types

1. Direct carry-over

Use the raw value as the simulation value.

Example:

  • fertility rate
  • population count
  • life expectancy

This is the cleanest case.

2. Annual aggregation

Combine smaller time units into an annual value.

Example:

  • monthly labour data → annual average
  • monthly migration → annual total
  • quarterly GDP proxy → annual aggregate

3. Interpolation

Estimate missing years between known observations.

Example:

  • census every 5 years → interpolate population for intermediate years

This is allowed, but it must be labeled.

4. Normalization

Convert raw values into a common scale.

Example:

  • convert a housing burden number into a 0–100 housing stress index

Useful for cross-city comparison.

5. Composite assembly

Build one simulation variable from several source inputs.

Example:

  • TEACHER_PIPELINE_HEALTH
  • BASE_STOCK
  • LEGITIMACY

This must publish the weighting rule.

6. Reconciliation

Resolve multiple competing sources or different source levels into one simulation value.

Example:

  • city source and national source disagree
  • yearly city survey is missing, but prefecture administrative count exists

This is one of the most sensitive adapter operations and must be declared openly.


Fallback hierarchy

CitySim should use a stable fallback logic.

For most city runs, the default order should be:

city → metro → prefecture/state → national → international → unavailable

That means:

  • use city data first
  • if missing, go to metropolitan data if it still matches the target city logic
  • then prefecture/state
  • then national
  • then international harmonized dataset
  • if nothing valid exists, mark as unavailable or weak estimate

The important thing is not that this hierarchy is perfect.

The important thing is that it is declared before the run.


Missing-data rules

The adapter should allow only a few permitted missing-data treatments.

Rule A. Exact observation available

Use it directly.

Rule B. Short gap with stable definition

Interpolate if the gap is small and the series is stable.

Rule C. Missing city data but valid higher-level proxy exists

Use fallback geography and mark quality down.

Rule D. Variable too weakly observed

Retain variable with low confidence or exclude from dominant verdict logic.

Rule E. No valid path

Mark as unavailable.

That is much better than pretending every missing series can be filled neatly.


Definition-drift flags

Every adapter needs drift flags.

Examples:

  • NO_DRIFT
  • MINOR_METHOD_CHANGE
  • BOUNDARY_CHANGE
  • CATEGORY_REDEFINITION
  • SURVEY_WORDING_SHIFT
  • SERIES_BREAK

These flags matter because a long-horizon city engine will often cross decades where institutions change how they count the world.

If the engine ignores that, it may confuse an administrative change for a civilisation change.


Quality tiers for adapted values

After conversion, each value should carry a quality tier.

Tier 1 — clean observed input

Direct source, matching geography, matching time step, no serious transformation.

Tier 2 — adapted but strong

Minor aggregation or normalization, still robust.

Tier 3 — usable with caution

Interpolation or fallback geography used, moderate comparability issues.

Tier 4 — weak adapted value

Proxy bundle or unstable series, only suitable for directional reasoning.

Tier 5 — experimental

Very weak support, should not drive hard verdicts.

This is one of the easiest ways to keep the engine honest.


Minimum adapter requirements for the first hardening build

Before CitySim runs another serious 150-year city comparison, each active variable should have:

  • declared source level
  • declared target geography
  • declared time frequency
  • declared unit conversion
  • declared missing-data rule
  • declared fallback rule
  • declared drift flag logic
  • declared quality tier

If those are missing, the model is still too loose.


Example adapter entries

Example 1 — POP_TOTAL

FieldValue
variable_idPOP_TOTAL
source_nameresident population statistics
source_levelcity
raw_unitpersons
target_unitpersons
time_frequency_rawannual
time_frequency_targetannual
geo_boundary_rawcity/metropolitan authority
geo_boundary_targetcity
transform_typedirect carry-over
transform_ruleuse reported annual resident count
missing_data_ruleinterpolate only if 1-year gap
fallback_rulemetro then prefecture
revision_rulelatest official revision replaces prior release
quality_flaghigh

Example 2 — SCHOOL_CAPTURE

FieldValue
variable_idSCHOOL_CAPTURE
source_nameattendance + progression + retention + re-entry bundle
source_levelcity + prefecture + national mix if needed
raw_unitmixed
target_unitindex 0–100
time_frequency_rawannual
time_frequency_targetannual
geo_boundary_rawmixed; city preferred
geo_boundary_targetcity
transform_typecomposite assembly
transform_ruleweighted bundle declared in proxy map
missing_data_rulemissing component lowers quality grade
fallback_ruleprefecture-level education series if city missing
revision_rulerecompute full index if underlying component revised
quality_flagmedium

Example 3 — LEGITIMACY

FieldValue
variable_idLEGITIMACY
source_nametrust + satisfaction + compliance + continuity bundle
source_levelcity/national mixed
raw_unitmixed
target_unitindex 0–100
time_frequency_rawannual / irregular
time_frequency_targetannual
geo_boundary_rawmixed
geo_boundary_targetcity
transform_typereconciliation + normalization
transform_rulemulti-proxy weighted composite with uncertainty penalty
missing_data_ruleif fewer than minimum proxies exist, value becomes low-confidence
fallback_rulenational institutional trust proxy only if city proxy absent
revision_rulererun whole bundle when source series changes
quality_flaglow-to-medium

That is the kind of clarity the adapter layer needs.


What the Data Adapter Spec prevents

This page exists to stop six common failures:

  1. silent geography swapping
  2. silent unit conversion
  3. hidden interpolation
  4. post-hoc fallback choices
  5. series-break blindness
  6. mixing scenario assumptions with raw observed data

Those six problems alone can distort a city model badly even when the raw data itself is real.


What this page does not yet do

This page still does not define:

  • the exact formula weights for composite variables
  • the full backtest scoring system
  • the calibration loop
  • the transition equations that move variables through time

Those are separate pages.

This page only locks the input conversion grammar.

That is exactly what CitySim needs right now.


Why this matters after Tokyo

Tokyo did not fail because there was no intelligence in the model.

Tokyo failed because the model was still too loose in the bridge between:

  • dataset,
  • variable,
  • and forecast path.

That is not a small detail.
That is the difference between:

  • a compelling city story,
  • and a city engine that can be recalibrated honestly.

The Data Adapter Spec is one of the pages that hardens that bridge.


Final definition

The CitySim.150Y.CF Data Adapter Spec is the canonical input-conversion layer that standardizes raw city data, preserves variable meaning across sources and time scales, applies declared fallback and missing-data rules, and transforms external measurements into stable simulation-ready values for the CitySim engine.

Without it, the engine may still be interesting.

But it is not yet disciplined.


Almost-Code

“`text id=”6tf0t0″
CITYSIM_150Y_CF_DATA_ADAPTER_SPEC_V1

PURPOSE:
Transform raw external datasets into stable simulation-ready values without silently changing meaning.

CORE_LAW:
No raw dataset may enter the simulation directly unless its unit,
time step,
geography,
and transformation rule are declared.

PIPELINE_ORDER:
Raw_Dataset
-> Observable_Proxy_Map
-> Data_Adapter
-> Simulation_Variable
-> Transition_Kernel

ADAPTER_ENTRY_SCHEMA:
{
variable_id,
source_name,
source_level,
raw_unit,
target_unit,
time_frequency_raw,
time_frequency_target,
geo_boundary_raw,
geo_boundary_target,
transform_type,
transform_rule,
missing_data_rule,
fallback_rule,
revision_rule,
quality_flag,
audit_note
}

TRANSFORM_TYPES:

  • direct_carry_over
  • annual_aggregation
  • interpolation
  • normalization
  • composite_assembly
  • reconciliation

FALLBACK_HIERARCHY:
city
-> metro
-> prefecture_or_state
-> national
-> international
-> unavailable

MISSING_DATA_RULES:
A = use exact observation
B = interpolate only for short stable gaps
C = use fallback geography with quality downgrade
D = retain weak value with caution label
E = mark unavailable

DEFINITION_DRIFT_FLAGS:

  • NO_DRIFT
  • MINOR_METHOD_CHANGE
  • BOUNDARY_CHANGE
  • CATEGORY_REDEFINITION
  • SURVEY_WORDING_SHIFT
  • SERIES_BREAK

QUALITY_TIERS:
T1 = clean observed input
T2 = adapted but strong
T3 = usable with caution
T4 = weak adapted value
T5 = experimental

FAIL_CONDITIONS:

  • source geography hidden
  • unit conversion hidden
  • interpolation hidden
  • fallback chosen after result is known
  • series break ignored
  • observed and scenario values mixed without label

PASS_CONDITION:
An input is adapter-valid only if its
source,
unit,
time-frequency,
geography,
transformation,
fallback logic,
missing-data rule,
and quality tier are declared.

OUTPUT:
adapter_validity = TRUE or FALSE
“`

eduKateSG Learning System | Control Tower, Runtime, and Next Routes

This article is one node inside the wider eduKateSG Learning System.

At eduKateSG, we do not treat education as random tips, isolated tuition notes, or one-off exam hacks. We treat learning as a living runtime:

state -> diagnosis -> method -> practice -> correction -> repair -> transfer -> long-term growth

That is why each article is written to do more than answer one question. It should help the reader move into the next correct corridor inside the wider eduKateSG system: understand -> diagnose -> repair -> optimize -> transfer. Your uploaded spine clearly clusters around Education OS, Tuition OS, Civilisation OS, subject learning systems, runtime/control-tower pages, and real-world lattice connectors, so this footer compresses those routes into one reusable ending block.

Start Here

Learning Systems

Runtime and Deep Structure

Real-World Connectors

Subject Runtime Lane

How to Use eduKateSG

If you want the big picture -> start with Education OS and Civilisation OS
If you want subject mastery -> enter Mathematics, English, Vocabulary, or Additional Mathematics
If you want diagnosis and repair -> move into the CivOS Runtime and subject runtime pages
If you want real-life context -> connect learning back to Family OS, Bukit Timah OS, Punggol OS, and Singapore City OS

Why eduKateSG writes articles this way

eduKateSG is not only publishing content.
eduKateSG is building a connected control tower for human learning.

That means each article can function as:

  • a standalone answer,
  • a bridge into a wider system,
  • a diagnostic node,
  • a repair route,
  • and a next-step guide for students, parents, tutors, and AI readers.
eduKateSG.LearningSystem.Footer.v1.0

TITLE: eduKateSG Learning System | Control Tower / Runtime / Next Routes

FUNCTION:
This article is one node inside the wider eduKateSG Learning System.
Its job is not only to explain one topic, but to help the reader enter the next correct corridor.

CORE_RUNTIME:
reader_state -> understanding -> diagnosis -> correction -> repair -> optimisation -> transfer -> long_term_growth

CORE_IDEA:
eduKateSG does not treat education as random tips, isolated tuition notes, or one-off exam hacks.
eduKateSG treats learning as a connected runtime across student, parent, tutor, school, family, subject, and civilisation layers.

PRIMARY_ROUTES:
1. First Principles
   - Education OS
   - Tuition OS
   - Civilisation OS
   - How Civilization Works
   - CivOS Runtime Control Tower

2. Subject Systems
   - Mathematics Learning System
   - English Learning System
   - Vocabulary Learning System
   - Additional Mathematics

3. Runtime / Diagnostics / Repair
   - CivOS Runtime Control Tower
   - MathOS Runtime Control Tower
   - MathOS Failure Atlas
   - MathOS Recovery Corridors
   - Human Regenerative Lattice
   - Civilisation Lattice

4. Real-World Connectors
   - Family OS
   - Bukit Timah OS
   - Punggol OS
   - Singapore City OS

READER_CORRIDORS:
IF need == "big picture"
THEN route_to = Education OS + Civilisation OS + How Civilization Works

IF need == "subject mastery"
THEN route_to = Mathematics + English + Vocabulary + Additional Mathematics

IF need == "diagnosis and repair"
THEN route_to = CivOS Runtime + subject runtime pages + failure atlas + recovery corridors

IF need == "real life context"
THEN route_to = Family OS + Bukit Timah OS + Punggol OS + Singapore City OS

CLICKABLE_LINKS:
Education OS:
Education OS | How Education Works — The Regenerative Machine Behind Learning
Tuition OS:
Tuition OS (eduKateOS / CivOS)
Civilisation OS:
Civilisation OS
How Civilization Works:
Civilisation: How Civilisation Actually Works
CivOS Runtime Control Tower:
CivOS Runtime / Control Tower (Compiled Master Spec)
Mathematics Learning System:
The eduKate Mathematics Learning System™
English Learning System:
Learning English System: FENCE™ by eduKateSG
Vocabulary Learning System:
eduKate Vocabulary Learning System
Additional Mathematics 101:
Additional Mathematics 101 (Everything You Need to Know)
Human Regenerative Lattice:
eRCP | Human Regenerative Lattice (HRL)
Civilisation Lattice:
The Operator Physics Keystone
Family OS:
Family OS (Level 0 root node)
Bukit Timah OS:
Bukit Timah OS
Punggol OS:
Punggol OS
Singapore City OS:
Singapore City OS
MathOS Runtime Control Tower:
MathOS Runtime Control Tower v0.1 (Install • Sensors • Fences • Recovery • Directories)
MathOS Failure Atlas:
MathOS Failure Atlas v0.1 (30 Collapse Patterns + Sensors + Truncate/Stitch/Retest)
MathOS Recovery Corridors:
MathOS Recovery Corridors Directory (P0→P3) — Entry Conditions, Steps, Retests, Exit Gates
SHORT_PUBLIC_FOOTER: This article is part of the wider eduKateSG Learning System. At eduKateSG, learning is treated as a connected runtime: understanding -> diagnosis -> correction -> repair -> optimisation -> transfer -> long-term growth. Start here: Education OS
Education OS | How Education Works — The Regenerative Machine Behind Learning
Tuition OS
Tuition OS (eduKateOS / CivOS)
Civilisation OS
Civilisation OS
CivOS Runtime Control Tower
CivOS Runtime / Control Tower (Compiled Master Spec)
Mathematics Learning System
The eduKate Mathematics Learning System™
English Learning System
Learning English System: FENCE™ by eduKateSG
Vocabulary Learning System
eduKate Vocabulary Learning System
Family OS
Family OS (Level 0 root node)
Singapore City OS
Singapore City OS
CLOSING_LINE: A strong article does not end at explanation. A strong article helps the reader enter the next correct corridor. TAGS: eduKateSG Learning System Control Tower Runtime Education OS Tuition OS Civilisation OS Mathematics English Vocabulary Family OS Singapore City OS