1. Normalization and Denormalization

  • Normalization is the process of organizing data to reduce redundancy and improve integrity. It involves dividing large tables into smaller, manageable ones and using foreign keys to link them.
    • First Normal Form (1NF): Each column contains atomic (indivisible) values, and each record is unique.
    • Second Normal Form (2NF): Achieved by removing partial dependency, i.e., all non-key attributes must be fully dependent on the primary key.
    • Third Normal Form (3NF): No transitive dependency, meaning non-key attributes cannot depend on other non-key attributes.
  • Denormalization: This involves combining tables to reduce the need for complex joins, improving query performance at the expense of redundancy.

2. Joins

  • Inner Join: Returns only rows where there is a match in both tables.
  • Left Join (or Left Outer Join): Returns all rows from the left table, and matched rows from the right table; if no match, NULL values are returned.
  • Right Join (or Right Outer Join): Similar to Left Join but returns all rows from the right table.
  • Full Outer Join: Returns rows when there is a match in one of the tables.
  • Cross Join: Returns the Cartesian product of the two tables, i.e., all combinations of rows.
  • Self Join: A table is joined with itself, often using aliases.

3. Subqueries

  • Scalar Subquery: Returns a single value, and is often used in SELECT, WHERE, or HAVING clauses.
    SELECT name
    FROM employees
    WHERE salary > (SELECT AVG(salary) FROM employees);
    
  • Correlated Subquery: A subquery that refers to columns from the outer query.
    SELECT e.name
    FROM employees e
    WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);
    
  • IN vs. EXISTS:
    • IN is used for checking if a value is in a set.
    • EXISTS is used when the query checks for the presence of rows.

4. Indexes

  • Clustered Index: This type of index determines the physical order of data rows in the table. A table can have only one clustered index.
  • Non-Clustered Index: This does not change the physical order of data. It creates a separate structure to speed up retrieval.
  • Composite Index: Involves multiple columns. It can be used when queries commonly filter by more than one column.

5. Transactions

  • ACID Properties: Ensures the database transactions are processed reliably.
    • Atomicity: A transaction is either fully completed or not executed at all.
    • Consistency: The database remains in a valid state before and after the transaction.
    • Isolation: Transactions are isolated from each other.
    • Durability: Once a transaction is committed, it is permanent.
  • Transaction Control:
    • BEGIN TRANSACTION: Starts a transaction.
    • COMMIT: Saves the changes.
    • ROLLBACK: Reverts the changes if there is an error.

6. Window Functions

  • ROW_NUMBER(): Assigns a unique row number to each row in a result set.
    SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
    FROM employees;
    
  • RANK() and DENSE_RANK(): Similar to ROW_NUMBER(), but handles ties differently.
  • PARTITION BY: Divides the result set into partitions and performs operations on each partition.

7. Triggers and Stored Procedures

  • Triggers: Automatically executed or fired when certain events occur on a table or view (e.g., INSERT, UPDATE, DELETE).
    CREATE TRIGGER update_salary
    AFTER UPDATE ON employees
    FOR EACH ROW
    BEGIN
       -- trigger logic here
    END;
    
  • Stored Procedures: A set of SQL statements stored in the database that can be executed as a program.
    CREATE PROCEDURE GetEmployeeSalary(IN emp_id INT)
    BEGIN
       SELECT salary FROM employees WHERE employee_id = emp_id;
    END;
    
  • Functions: Similar to stored procedures, but typically return a value.

8. Partitioning

  • Range Partitioning: Dividing a table into partitions based on a range of values (e.g., dates).
  • List Partitioning: Dividing a table into partitions based on a list of values.
  • Hash Partitioning: Dividing the data into partitions based on a hash function.

9. Advanced SQL Clauses

  • WITH Clause (Common Table Expressions or CTEs): Makes a subquery easier to reference.
    WITH DepartmentAvgSalary AS (
       SELECT department_id, AVG(salary) AS avg_salary
       FROM employees
       GROUP BY department_id
    )
    SELECT e.name, e.salary
    FROM employees e
    JOIN DepartmentAvgSalary das ON e.department_id = das.department_id
    WHERE e.salary > das.avg_salary;
    
  • GROUP BY ROLLUP and CUBE: Useful for generating subtotals and grand totals.
    SELECT department, SUM(salary)
    FROM employees
    GROUP BY ROLLUP (department);
    

10. Data Warehousing Concepts

  • Star Schema: A central fact table surrounded by dimension tables.
  • Snowflake Schema: A more normalized form of the star schema, where dimension tables are further split into related tables.
  • Fact Tables and Dimension Tables: Fact tables hold quantitative data, while dimension tables contain descriptive data (e.g., products, dates).
  • ETL Process: Extract, Transform, Load; used for moving data from operational databases to a data warehouse.

11. NoSQL Databases

  • For databases like MongoDB, Cassandra, or Redis, understanding data models (document, key-value, column-family) and how they differ from relational databases is important.
  • Aggregation Pipeline (MongoDB): A framework for performing data transformations within MongoDB.
  • Cassandra Queries: Using CQL (Cassandra Query Language) for column-family based databases.

12. Optimizing Queries

  • EXPLAIN Plan: Analyzing the execution plan of a query can help identify bottlenecks.
  • Query Hints: Directing the query optimizer to use specific indexes or join strategies.
  • Materialized Views: Precomputed results that can be used for complex queries to improve performance.
  • Partition Pruning: Ensuring that only relevant partitions are scanned to improve performance.