Worldboard – Technical write-up

1. Data Architecture

2. Data Pipeline

3. SQL Database Schema

4. NationChart Example

5. Frontend Architecture

6. Links

1. Data Architecture

Each data point contains 5 key attributes that link CSV files to database records:

Data Point Attributes

Metric

GDP, Population, Life Expectancy

Source

World Bank, UN, WHO

S3 Object

File path in cloud storage

Country

USA, Canada, China

Year

1920, 1950, 2020

Data Flow Visualization

CSV Files

Raw data sources

→

S3 Storage

Cloud file storage

→

ETL Process

Extract, Transform, Load

→

SQL Database

Structured data

2. Data Pipeline

Multiple raw CSV files are processed through the database and transformed into clean, client-ready CSV files for each country.

Transformation Process

Raw CSV Ingestion

Multiple CSV files from different sources (World Bank, UN, WHO, etc.)

Input Files:

• worldbank-gdp-1900-1950.csv

• worldbank-gdp-1951-2025.csv

• canada-stats-gdp-1800-1899.csv

• un-population-1950-2025.csv

ETL Pipeline

This process loads all data into unified database with source tracking

2.1

Extract

Read raw CSV files from S3 storage

Input: “USA, 1920, 977000000000”

Source: worldbank-gdp-1900-1950.csv

S3 Path: s3://worldboard-data/worldbank/gdp-1900-1950.csv

2.2

Transform

Convert raw CSV rows into structured data points with source attribution

Transform: Raw row → Structured data point

Add: source, file_path, created_at

Output: {“country”: “USA”, “metric”: “GDP”, “year”: 1920, “value”: 977000000000, “source”: “World Bank”}

2.3

Load

Insert structured data into SQL database with conflict resolution

Load: Structured data → SQL database

Conflict Resolution: Latest timestamp wins

Result: Unified data_points table with source tracking

WorldDB SQL Database

All transformed data points stored in unified database with source tracking

Database Records:

• USA, GDP, 2020, 20953000000000, World Bank, s3://worldboard-data/worldbank/gdp-2020.csv

• USA, Population, 2020, 331002651, UN, s3://worldboard-data/un/population-2020.csv

• USA, Life_Expectancy, 2020, 77.28, WHO, s3://worldboard-data/who/life-expectancy-2020.csv

• USA, GDP, 2020, 21000000000000, OECD, s3://worldboard-data/oecd/gdp-2020.csv

• Canada, GDP, 2020, 1640000000000, World Bank, s3://worldboard-data/worldbank/gdp-2020.csv

Country Aggregation

Query database to collect all metrics for each country across all years

USA Data:

• GDP: 1900-2025 (World Bank)

• Population: 1950-2025 (UN)

• Life Expectancy: 1960-2020 (WHO)

Client CSV Generation

Export clean, structured CSV file for each country with all metrics

Output: usa.csv

year,gdp,population,life_expectancy

1900,977000000000,,

1950,1400000000000,152271000,

2020,20953000000000,331002651,77.28

3. SQL Database Schema

After the ETL transformation process, data is stored in a simple, unified table structure that directly maps to the data point attributes.

data_points Table

CREATE TABLE data_points (

id SERIAL PRIMARY KEY,

country VARCHAR(3) NOT NULL, -- ISO3 code (USA, CAN, CHN)

metric VARCHAR(100) NOT NULL, -- GDP, Population, Life_Expectancy

year INTEGER NOT NULL, -- 1900, 1950, 2020

value DECIMAL(20,6), -- The actual data value

source VARCHAR(255) NOT NULL, -- World Bank, UN, WHO

file_path VARCHAR(500), -- S3 object path

created_at TIMESTAMP DEFAULT NOW(),

UNIQUE(country, metric, source, year)

);

Example Data

-- Sample data_points records

INSERT INTO data_points VALUES

(1, 'USA', 'GDP', 2020, 20953000000000, 'World Bank', 's3://worldboard-data/worldbank/gdp-2020.csv'),

(2, 'USA', 'Population', 2020, 331002651, 'UN', 's3://worldboard-data/un/population-2020.csv'),

(3, 'USA', 'Life_Expectancy', 2020, 77.28, 'WHO', 's3://worldboard-data/who/life-expectancy-2020.csv'),

(4, 'USA', 'GDP', 2020, 21000000000000, 'OECD', 's3://worldboard-data/oecd/gdp-2020.csv');

-- Multiple sources for same metric (GDP from both World Bank and OECD)

4. NationChart Example

The NationChart provides a visual overview of data availability for each country. Each row represents a metric, and each column represents a year. Green indicates data is available, gray indicates missing data.

USA Data Availability (1900-2025)

Metric	1900	1910	1920	1930	1940	1950	1960	1970	1980	1990	2000	2010	2020	2025	Source
GDP															World Bank, OECD
Population															UN, US Census
Life Expectancy															WHO
Inflation Rate															IMF
Unemployment Rate															BLS

Note: This chart shows data availability across 125 years (1900-2025) with 10-year intervals. In practice, the full NationChart would include all years with data availability indicators.

Multiple Sources: The same metric can have multiple data sources (e.g., GDP from World Bank vs OECD, Population from UN vs US Census). Each source may have different coverage periods and methodologies.

Data Available

Data Missing

5. Frontend Architecture

The frontend is built with modern web technologies to provide fast, interactive data visualization.

Country Selector

NationChart Visualization

Time Series Charts

Data Comparison Tools

Next.js

React

react-globe.gl

GeoJSON

Worldboard – Technical write-up

Table of Contents

1. Data Architecture

Data Point Attributes

Data Flow Visualization

2. Data Pipeline

Transformation Process

3. SQL Database Schema

data_points Table

Example Data

4. NationChart Example

USA Data Availability (1900-2025)

5. Frontend Architecture