SQL Advanced Topics

1. Normalization and Denormalization

Normalization is the process of organizing data to reduce redundancy and improve integrity. It involves dividing large tables into smaller, manageable ones and using foreign keys to link them.
- First Normal Form (1NF): Each column contains atomic (indivisible) values, and each record is unique.
- Second Normal Form (2NF): Achieved by removing partial dependency, i.e., all non-key attributes must be fully dependent on the primary key.
- Third Normal Form (3NF): No transitive dependency, meaning non-key attributes cannot depend on other non-key attributes.
Denormalization: This involves combining tables to reduce the need for complex joins, improving query performance at the expense of redundancy.

2. Joins

Inner Join: Returns only rows where there is a match in both tables.
Left Join (or Left Outer Join): Returns all rows from the left table, and matched rows from the right table; if no match, NULL values are returned.
Right Join (or Right Outer Join): Similar to Left Join but returns all rows from the right table.
Full Outer Join: Returns rows when there is a match in one of the tables.
Cross Join: Returns the Cartesian product of the two tables, i.e., all combinations of rows.
Self Join: A table is joined with itself, often using aliases.

3. Subqueries

Scalar Subquery: Returns a single value, and is often used in SELECT, WHERE, or HAVING clauses.
```
SELECT name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
```

Correlated Subquery: A subquery that refers to columns from the outer query.

SELECT e.name
FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);

IN vs. EXISTS:
- IN is used for checking if a value is in a set.
- EXISTS is used when the query checks for the presence of rows.

4. Indexes

Clustered Index: This type of index determines the physical order of data rows in the table. A table can have only one clustered index.
Non-Clustered Index: This does not change the physical order of data. It creates a separate structure to speed up retrieval.
Composite Index: Involves multiple columns. It can be used when queries commonly filter by more than one column.

5. Transactions

ACID Properties: Ensures the database transactions are processed reliably.
- Atomicity: A transaction is either fully completed or not executed at all.
- Consistency: The database remains in a valid state before and after the transaction.
- Isolation: Transactions are isolated from each other.
- Durability: Once a transaction is committed, it is permanent.
Transaction Control:
- BEGIN TRANSACTION: Starts a transaction.
- COMMIT: Saves the changes.
- ROLLBACK: Reverts the changes if there is an error.

6. Window Functions

ROW_NUMBER(): Assigns a unique row number to each row in a result set.

SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
FROM employees;

RANK() and DENSE_RANK(): Similar to ROW_NUMBER(), but handles ties differently.
PARTITION BY: Divides the result set into partitions and performs operations on each partition.

7. Triggers and Stored Procedures

Triggers: Automatically executed or fired when certain events occur on a table or view (e.g., INSERT, UPDATE, DELETE).

CREATE TRIGGER update_salary
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
   -- trigger logic here
END;

Stored Procedures: A set of SQL statements stored in the database that can be executed as a program.

CREATE PROCEDURE GetEmployeeSalary(IN emp_id INT)
BEGIN
   SELECT salary FROM employees WHERE employee_id = emp_id;
END;

Functions: Similar to stored procedures, but typically return a value.

8. Partitioning

Range Partitioning: Dividing a table into partitions based on a range of values (e.g., dates).
List Partitioning: Dividing a table into partitions based on a list of values.
Hash Partitioning: Dividing the data into partitions based on a hash function.

9. Advanced SQL Clauses

WITH Clause (Common Table Expressions or CTEs): Makes a subquery easier to reference.

WITH DepartmentAvgSalary AS (
   SELECT department_id, AVG(salary) AS avg_salary
   FROM employees
   GROUP BY department_id
)
SELECT e.name, e.salary
FROM employees e
JOIN DepartmentAvgSalary das ON e.department_id = das.department_id
WHERE e.salary > das.avg_salary;

GROUP BY ROLLUP and CUBE: Useful for generating subtotals and grand totals.

SELECT department, SUM(salary)
FROM employees
GROUP BY ROLLUP (department);

10. Data Warehousing Concepts

Star Schema: A central fact table surrounded by dimension tables.
Snowflake Schema: A more normalized form of the star schema, where dimension tables are further split into related tables.
Fact Tables and Dimension Tables: Fact tables hold quantitative data, while dimension tables contain descriptive data (e.g., products, dates).
ETL Process: Extract, Transform, Load; used for moving data from operational databases to a data warehouse.

11. NoSQL Databases

For databases like MongoDB, Cassandra, or Redis, understanding data models (document, key-value, column-family) and how they differ from relational databases is important.
Aggregation Pipeline (MongoDB): A framework for performing data transformations within MongoDB.
Cassandra Queries: Using CQL (Cassandra Query Language) for column-family based databases.

12. Optimizing Queries

EXPLAIN Plan: Analyzing the execution plan of a query can help identify bottlenecks.
Query Hints: Directing the query optimizer to use specific indexes or join strategies.
Materialized Views: Precomputed results that can be used for complex queries to improve performance.
Partition Pruning: Ensuring that only relevant partitions are scanned to improve performance.