Overview
If you’re new to data modeling and want to learn about how to create fact and dimension tables, it’s best to start with some sample databases that are designed for practice. Here are some good options:
1. AdventureWorks (Microsoft SQL Server Sample Database)
- What It Is: AdventureWorks is a sample database provided by Microsoft. It contains data for a fictional company that sells bicycles and related products.
- Where to Get It: Download AdventureWorks
2. Northwind Traders (Microsoft Access Database)
- What It Is: Northwind is another sample database that simulates a trading company. It includes data about customers, products, orders, and suppliers.
- Where to Get It: Download Northwind Database
3. TPC-H Benchmark Dataset
- What It Is: TPC-H is a dataset used for testing database performance. It contains data for a wholesale supplier, including orders, customers, parts, and suppliers.
- Where to Get It: TPC-H Benchmark Dataset
4. IMDB Movie Dataset
- What It Is: The IMDb dataset has information about movies, actors, directors, and genres.
- Where to Get It: IMDb Datasets
5. Retail Transaction Data (from Kaggle)
- What It Is: There are several retail transaction datasets on Kaggle, like the “Online Retail Dataset” or “Instacart Market Basket Analysis.”
- Where to Get It:
- UCI link. Kaggle Link
- Instacart Market Basket Analysis
6. Airbnb Listings Dataset
- What It Is: This dataset contains details about Airbnb properties, like prices, locations, host details, and reviews.
- Where to Get It: Airbnb Listings Dataset
7. Chinook Database
- What It Is: Chinook is a sample database that simulates a digital media store. It includes data about artists, albums, media tracks, customers, and sales.
- Where to Get It: Download Chinook Database
Summary
Start with AdventureWorks or Northwind if you want something easy to get started with. These databases are ready to use and perfect for beginners. If you’re looking for more real-world examples, Kaggle’s retail or Airbnb datasets are also good options.