Chapter 1 — Introduction to Databases ▶

Why DBMS? Problems with plain file systems

Problem Data redundancy & inconsistency — duplicate data in many formats

Problem Difficulty accessing data — new program needed per new task

Problem Data isolation — multiple files/formats hard to reconcile

Problem Atomicity — partial updates leave DB inconsistent (e.g., fund transfer)

Problem Concurrent access — uncontrolled concurrency causes inconsistency

Problem Security — hard to give selective access

Key A DBMS solves all of the above

Three levels of data abstraction

Physical level — how records are physically stored on disk

Logical level — what data and relationships exist (the schema)

View level — application-specific "slices" of the data; hides sensitive fields

Key Physical data independence: changing physical storage does not break the logical schema

Schema vs. instance

Schema — the type/structure of the DB (like a class definition); rarely changes

Instance — the actual data at a given moment (like a variable's value)

DDL vs. DML

DDL — defines schema, constraints, authorization. Generates a data dictionary (metadata).

DML — queries & modifies data. Procedural (say how) vs. declarative (say what). SQL is declarative.

Database engine components

Storage manager — auth, transaction, file, buffer managers; stores data files, data dictionary, indexes

Query processor — DDL interpreter → DML compiler (optimization) → query evaluation engine

Transaction manager — ensures consistency despite failures; concurrency control

Architectures

Centralized, Client-server, Parallel (shared memory / shared disk / shared nothing), Distributed

Application tiers: 2-tier (client talks to DB directly) vs. 3-tier (client → app server → DB)

Chapter 2 — Relational Model & Algebra ▶

Core terminology

Relation = table; Tuple = row; Attribute = column

Domain = allowed values per attribute; values must be atomic (indivisible)

Relation schema R = (A₁, A₂, …, Aₙ) — structure

Relation instance — the actual set of tuples; order of tuples is irrelevant

Keys — know these cold

Superkey — any set of attributes that uniquely identifies a tuple (e.g., {ID} or {ID, name})

Candidate key — a minimal superkey (no redundant attributes)

Primary key — chosen candidate key; cannot be null

Foreign key — attribute in relation R referencing the primary key of relation S; enforces referential integrity

The 6 basic relational algebra operations

σ_p(r) Select — filter rows satisfying predicate p. Connectives: ∧ ∨ ¬

π_{A1,…}(r) Project — keep only listed columns; duplicates are removed

r ∪ s Union — combine; relations must have same arity & compatible domains

r − s Set difference — tuples in r but not s (same arity/compatibility)

r × s Cartesian product — all pairs of tuples from r and s

ρ_x(E) Rename — gives expression E the name x

Derived operations

Join = σ_{condition}(r × s) — Cartesian product filtered by a join condition

Set intersection r ∩ s — tuples in both (same arity)

Assignment ← — assign subexpression to a temp variable for readability

Chapter 3 — Introduction to SQL ▶

Basic query structure

SELECT A1, A2 FROM r1, r2 WHERE P — project → Cartesian product → filter

SELECT DISTINCT removes duplicates; SELECT ALL (default) keeps them

SELECT * returns all attributes; arithmetic expressions allowed: salary/12 AS monthly_salary

WHERE uses: < <= > >= = <>, AND OR NOT, BETWEEN x AND y

String, ordering, set operations

LIKE patterns: % = any substring, _ = any single character. Case sensitive.

ORDER BY attr DESC/ASC; can sort on multiple attributes

UNION, INTERSECT, EXCEPT — eliminate duplicates automatically. Use ALL suffix to keep.

Null values — tricky!

Any arithmetic with null = null. Use IS NULL / IS NOT NULL to test.

Comparisons with null = unknown (not true/false). WHERE treats unknown as false.

Exam trap null = null evaluates to unknown, not true

Logic: true AND unknown = unknown; false AND unknown = false; true OR unknown = true

Aggregate functions & grouping

Functions: avg, min, max, sum, count

GROUP BY attr — groups rows; every non-aggregate in SELECT must be in GROUP BY

HAVING condition — filters groups (applied after grouping; WHERE filters before)

Exam trap Putting a non-grouped, non-aggregated attribute in SELECT is an error

Subqueries

IN / NOT IN (subquery) — set membership test

> SOME (subquery) — true if greater than at least one value (= SOME ≡ IN)

> ALL (subquery) — true if greater than every value (≠ ALL ≡ NOT IN)

EXISTS (subquery) — true if subquery is non-empty; used for correlated subqueries

NOT EXISTS — "find students who took ALL biology courses" pattern: double-except trick

WITH clause — defines temp named relation for the query scope; great for multi-step aggregation

Scalar subquery — subquery returning single value, usable in SELECT/WHERE; runtime error if >1 result

Modification statements

INSERT INTO r VALUES (...) or INSERT INTO r SELECT ...

DELETE FROM r WHERE P — avg/subquery computed first, then all matching rows deleted at once

UPDATE r SET attr = expr WHERE P — use CASE WHEN ... THEN ... ELSE ... END for conditional updates

Chapter 4 — Intermediate SQL ▶

Join types

Natural join — auto-equates all common attribute names, keeps one copy. Dangerous if unrelated attributes share a name.

JOIN ... USING (col) — equates only the specified column(s); safer than natural join

JOIN ... ON (condition) — general predicate; equivalent to Cartesian product + WHERE

Left outer join ⟕ — all rows from left table; nulls for non-matching right rows

Right outer join ⟖ — all rows from right; nulls for non-matching left

Full outer join ⟗ — all rows from both; nulls where no match on either side

Exam trap student natural join takes natural join course silently drops students in depts different from their course dept

Views

CREATE VIEW v AS <query> — saves the query expression, not a copy of data

Views can depend on other views; must not be recursive

Materialized view — physically stored copy; must be refreshed when base relations change

Updates through views are restricted: only simple views (single table, no aggregates/group by/distinct) are generally updateable

Transactions

A transaction = sequence of operations as a single unit of work

COMMIT WORK — makes changes permanent; ROLLBACK WORK — undoes all changes

Property: atomic (all or nothing) and isolated from concurrent transactions

Integrity constraints

NOT NULL, PRIMARY KEY, UNIQUE (candidate key; can be null), CHECK (predicate)

Foreign key with cascading: ON DELETE CASCADE / ON UPDATE CASCADE; alternatives: SET NULL, SET DEFAULT

Assertions — DB-wide constraints; checked on every relevant modification (expensive)

Data types, indexes & authorization

Built-in: date, time, timestamp, interval, blob (binary), clob (character large object)

User-defined: CREATE TYPE Dollars AS numeric(12,2); CREATE DOMAIN (can have constraints)

CREATE INDEX idx ON table(attr) — speeds up lookups; not part of SQL standard but universally supported

Privileges: SELECT, INSERT, UPDATE, DELETE — granted with GRANT priv ON rel TO user, revoked with REVOKE

Roles: CREATE ROLE r; GRANT r TO user; — roles inherit privileges and can be granted to other roles

WITH GRANT OPTION — lets the grantee re-grant the privilege; CASCADE vs RESTRICT on revoke

Chapter 5 — Advanced SQL ▶

Accessing SQL from programs

JDBC (Java): DriverManager.getConnection(...) → Statement → executeQuery/executeUpdate → ResultSet

Prepared statements: use ? placeholders + setString/setInt. Always use for user input — prevents SQL injection.

Security Never build queries by string concatenation with user input — SQL injection risk

SQL injection: attacker input like X' OR 'Y'='Y makes the WHERE always true

ODBC — language-agnostic API (originally Microsoft/Windows); same open/query/fetch model as JDBC

Embedded SQL: EXEC SQL <stmt>; in host language (C, Java, etc.); host vars prefixed with :

Embedded SQL cursor pattern: DECLARE c CURSOR FOR ... ; OPEN c; FETCH c INTO :var; CLOSE c;

Functions & procedures

CREATE FUNCTION f(arg type) RETURNS type BEGIN ... RETURN val; END

Table functions: return a table; called with SELECT * FROM TABLE(f('arg'))

CREATE PROCEDURE p(IN x type, OUT y type) BEGIN ... END; called with CALL p(val, var)

Control flow: WHILE ... DO ... END WHILE; REPEAT ... UNTIL ... END REPEAT; FOR r AS SELECT ... DO ... END FOR; IF ... THEN ... ELSEIF ... ELSE ... END IF

Benefits: compiled once, reusable, security (restrict table access, expose only procedure)

Triggers

A trigger fires automatically on INSERT, UPDATE, or DELETE

Syntax: CREATE TRIGGER name BEFORE|AFTER INSERT|UPDATE|DELETE ON table FOR EACH ROW BEGIN ... END

REFERENCING OLD ROW AS orow NEW ROW AS nrow — access values before/after change

FOR EACH STATEMENT — one execution per SQL statement (more efficient for bulk updates)

Avoid triggers when loading backups, replicating; disable before such operations

Risks: cascading triggers, unexpected side effects, transaction failures

Recursive queries

WITH RECURSIVE rec(cols) AS (base_query UNION recursive_query) SELECT * FROM rec;

Computes transitive closure — iterates until a fixed point (no new tuples added)

Views must be monotonic: adding tuples to input can only add tuples to the view, never remove

Why needed Cannot write transitive closure in plain SQL without recursion or iteration

Window functions & ranking

rank() OVER (ORDER BY GPA DESC) — ranks with gaps (ties both get 1, next is 3)

dense_rank() — no gaps (ties both get 1, next is 2)

PARTITION BY dept OVER (...) — rank within each group independently

ntile(n) — divide rows into n equal buckets; percent_rank; cume_dist; row_number

Windowing: sum(val) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) — sliding window aggregates (e.g., moving average)

OLAP

OLAP — interactive multi-dimensional data analysis. Dimensions = what you slice by; Measures = what you aggregate.

Cross-tab / pivot table — rows & columns are dimension values; cells are aggregated measures

GROUP BY CUBE(a,b,c) — all 2ⁿ subsets of attributes (every combination of groupings)

GROUP BY ROLLUP(a,b,c) — hierarchical prefixes only: (a,b,c), (a,b), (a), ()

OLAP ops: drill down (coarse→fine), roll up (fine→coarse), slicing (fix one dimension), pivoting (swap dimensions)

MOLAP = multidimensional arrays in memory; ROLAP = relational only; HOLAP = hybrid