Conor Doran

Worldboard – Technical write-up

1. Data Architecture

Each data point contains 5 key attributes that link CSV files to database records:

Data Point Attributes

1
Metric
GDP, Population, Life Expectancy
2
Source
World Bank, UN, WHO
3
S3 Object
File path in cloud storage
4
Country
USA, Canada, China
5
Year
1920, 1950, 2020

Data Flow Visualization

CSV Files
Raw data sources
S3 Storage
Cloud file storage
ETL Process
Extract, Transform, Load
SQL Database
Structured data

2. Data Pipeline

Multiple raw CSV files are processed through the database and transformed into clean, client-ready CSV files for each country.

Transformation Process

1
Raw CSV Ingestion
Multiple CSV files from different sources (World Bank, UN, WHO, etc.)
Input Files:
• worldbank-gdp-1900-1950.csv
• worldbank-gdp-1951-2025.csv
• canada-stats-gdp-1800-1899.csv
• un-population-1950-2025.csv
2
ETL Pipeline
This process loads all data into unified database with source tracking
2.1
Extract
Read raw CSV files from S3 storage
Input: “USA, 1920, 977000000000”
Source: worldbank-gdp-1900-1950.csv
S3 Path: s3://worldboard-data/worldbank/gdp-1900-1950.csv
2.2
Transform
Convert raw CSV rows into structured data points with source attribution
Transform: Raw row → Structured data point
Add: source, file_path, created_at
Output: {“country”: “USA”, “metric”: “GDP”, “year”: 1920, “value”: 977000000000, “source”: “World Bank”}
2.3
Load
Insert structured data into SQL database with conflict resolution
Load: Structured data → SQL database
Conflict Resolution: Latest timestamp wins
Result: Unified data_points table with source tracking
3
WorldDB SQL Database
All transformed data points stored in unified database with source tracking
Database Records:
• USA, GDP, 2020, 20953000000000, World Bank, s3://worldboard-data/worldbank/gdp-2020.csv
• USA, Population, 2020, 331002651, UN, s3://worldboard-data/un/population-2020.csv
• USA, Life_Expectancy, 2020, 77.28, WHO, s3://worldboard-data/who/life-expectancy-2020.csv
• USA, GDP, 2020, 21000000000000, OECD, s3://worldboard-data/oecd/gdp-2020.csv
• Canada, GDP, 2020, 1640000000000, World Bank, s3://worldboard-data/worldbank/gdp-2020.csv
4
Country Aggregation
Query database to collect all metrics for each country across all years
USA Data:
• GDP: 1900-2025 (World Bank)
• Population: 1950-2025 (UN)
• Life Expectancy: 1960-2020 (WHO)
5
Client CSV Generation
Export clean, structured CSV file for each country with all metrics
Output: usa.csv
year,gdp,population,life_expectancy
1900,977000000000,,
1950,1400000000000,152271000,
2020,20953000000000,331002651,77.28

3. SQL Database Schema

After the ETL transformation process, data is stored in a simple, unified table structure that directly maps to the data point attributes.

data_points Table

CREATE TABLE data_points (
id SERIAL PRIMARY KEY,
country VARCHAR(3) NOT NULL, -- ISO3 code (USA, CAN, CHN)
metric VARCHAR(100) NOT NULL, -- GDP, Population, Life_Expectancy
year INTEGER NOT NULL, -- 1900, 1950, 2020
value DECIMAL(20,6), -- The actual data value
source VARCHAR(255) NOT NULL, -- World Bank, UN, WHO
file_path VARCHAR(500), -- S3 object path
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE(country, metric, source, year)
);

Example Data

-- Sample data_points records
INSERT INTO data_points VALUES
(1, 'USA', 'GDP', 2020, 20953000000000, 'World Bank', 's3://worldboard-data/worldbank/gdp-2020.csv'),
(2, 'USA', 'Population', 2020, 331002651, 'UN', 's3://worldboard-data/un/population-2020.csv'),
(3, 'USA', 'Life_Expectancy', 2020, 77.28, 'WHO', 's3://worldboard-data/who/life-expectancy-2020.csv'),
(4, 'USA', 'GDP', 2020, 21000000000000, 'OECD', 's3://worldboard-data/oecd/gdp-2020.csv');
-- Multiple sources for same metric (GDP from both World Bank and OECD)

4. NationChart Example

The NationChart provides a visual overview of data availability for each country. Each row represents a metric, and each column represents a year. Green indicates data is available, gray indicates missing data.

USA Data Availability (1900-2025)

Metric19001910192019301940195019601970198019902000201020202025Source
GDP
World Bank, OECD
Population
UN, US Census
Life Expectancy
WHO
Inflation Rate
IMF
Unemployment Rate
BLS

Note: This chart shows data availability across 125 years (1900-2025) with 10-year intervals. In practice, the full NationChart would include all years with data availability indicators.

Multiple Sources: The same metric can have multiple data sources (e.g., GDP from World Bank vs OECD, Population from UN vs US Census). Each source may have different coverage periods and methodologies.

Data Available
Data Missing

5. Frontend Architecture

The frontend is built with modern web technologies to provide fast, interactive data visualization.

Country Selector
NationChart Visualization
Time Series Charts
Data Comparison Tools
Next.js
React
react-globe.gl
GeoJSON