1
Chapter 1 — Introduction to Databases
Why DBMS? Problems with plain file systems
Problem Data redundancy & inconsistency — duplicate data in many formats
Problem Difficulty accessing data — new program needed per new task
Problem Data isolation — multiple files/formats hard to reconcile
Problem Atomicity — partial updates leave DB inconsistent (e.g., fund transfer)
Problem Concurrent access — uncontrolled concurrency causes inconsistency
Problem Security — hard to give selective access
Key A DBMS solves all of the above
Three levels of data abstraction
Physical level — how records are physically stored on disk
Logical level — what data and relationships exist (the schema)
View level — application-specific "slices" of the data; hides sensitive fields
Key Physical data independence: changing physical storage does not break the logical schema
Schema vs. instance
Schema — the type/structure of the DB (like a class definition); rarely changes
Instance — the actual data at a given moment (like a variable's value)
DDL vs. DML
DDL — defines schema, constraints, authorization. Generates a data dictionary (metadata).
DML — queries & modifies data. Procedural (say how) vs. declarative (say what). SQL is declarative.
Database engine components
Storage manager — auth, transaction, file, buffer managers; stores data files, data dictionary, indexes
Query processor — DDL interpreter → DML compiler (optimization) → query evaluation engine
Transaction manager — ensures consistency despite failures; concurrency control
Architectures
Centralized, Client-server, Parallel (shared memory / shared disk / shared nothing), Distributed
Application tiers: 2-tier (client talks to DB directly) vs. 3-tier (client → app server → DB)
2
Chapter 2 — Relational Model & Algebra
Core terminology
Relation = table; Tuple = row; Attribute = column
Domain = allowed values per attribute; values must be atomic (indivisible)
Relation schema R = (A₁, A₂, …, Aₙ) — structure
Relation instance — the actual set of tuples; order of tuples is irrelevant
Keys — know these cold
Superkey — any set of attributes that uniquely identifies a tuple (e.g., {ID} or {ID, name})
Candidate key — a minimal superkey (no redundant attributes)
Primary key — chosen candidate key; cannot be null
Foreign key — attribute in relation R referencing the primary key of relation S; enforces referential integrity
The 6 basic relational algebra operations
σ_p(r) Select — filter rows satisfying predicate p. Connectives: ∧ ∨ ¬
π_{A1,…}(r) Project — keep only listed columns; duplicates are removed
r ∪ s Union — combine; relations must have same arity & compatible domains
r − s Set difference — tuples in r but not s (same arity/compatibility)
r × s Cartesian product — all pairs of tuples from r and s
ρ_x(E) Rename — gives expression E the name x
Derived operations
Join = σ_{condition}(r × s) — Cartesian product filtered by a join condition
Set intersection r ∩ s — tuples in both (same arity)
Assignment ← — assign subexpression to a temp variable for readability
3
Chapter 3 — Introduction to SQL
Basic query structure
SELECT A1, A2 FROM r1, r2 WHERE P — project → Cartesian product → filter
SELECT DISTINCT removes duplicates; SELECT ALL (default) keeps them
SELECT * returns all attributes; arithmetic expressions allowed: salary/12 AS monthly_salary
WHERE uses: < <= > >= = <>, AND OR NOT, BETWEEN x AND y
String, ordering, set operations
LIKE patterns: % = any substring, _ = any single character. Case sensitive.
ORDER BY attr DESC/ASC; can sort on multiple attributes
UNION, INTERSECT, EXCEPT — eliminate duplicates automatically. Use ALL suffix to keep.
Null values — tricky!
Any arithmetic with null = null. Use IS NULL / IS NOT NULL to test.
Comparisons with null = unknown (not true/false). WHERE treats unknown as false.
Exam trap null = null evaluates to unknown, not true
Logic: true AND unknown = unknown; false AND unknown = false; true OR unknown = true
Aggregate functions & grouping
Functions: avg, min, max, sum, count
GROUP BY attr — groups rows; every non-aggregate in SELECT must be in GROUP BY
HAVING condition — filters groups (applied after grouping; WHERE filters before)
Exam trap Putting a non-grouped, non-aggregated attribute in SELECT is an error
Subqueries
IN / NOT IN (subquery) — set membership test
> SOME (subquery) — true if greater than at least one value (= SOMEIN)
> ALL (subquery) — true if greater than every value (≠ ALLNOT IN)
EXISTS (subquery) — true if subquery is non-empty; used for correlated subqueries
NOT EXISTS — "find students who took ALL biology courses" pattern: double-except trick
WITH clause — defines temp named relation for the query scope; great for multi-step aggregation
Scalar subquery — subquery returning single value, usable in SELECT/WHERE; runtime error if >1 result
Modification statements
INSERT INTO r VALUES (...) or INSERT INTO r SELECT ...
DELETE FROM r WHERE P — avg/subquery computed first, then all matching rows deleted at once
UPDATE r SET attr = expr WHERE P — use CASE WHEN ... THEN ... ELSE ... END for conditional updates
4
Chapter 4 — Intermediate SQL
Join types
Natural join — auto-equates all common attribute names, keeps one copy. Dangerous if unrelated attributes share a name.
JOIN ... USING (col) — equates only the specified column(s); safer than natural join
JOIN ... ON (condition) — general predicate; equivalent to Cartesian product + WHERE
Left outer join ⟕ — all rows from left table; nulls for non-matching right rows
Right outer join ⟖ — all rows from right; nulls for non-matching left
Full outer join ⟗ — all rows from both; nulls where no match on either side
Exam trap student natural join takes natural join course silently drops students in depts different from their course dept
Views
CREATE VIEW v AS <query> — saves the query expression, not a copy of data
Views can depend on other views; must not be recursive
Materialized view — physically stored copy; must be refreshed when base relations change
Updates through views are restricted: only simple views (single table, no aggregates/group by/distinct) are generally updateable
Transactions
A transaction = sequence of operations as a single unit of work
COMMIT WORK — makes changes permanent; ROLLBACK WORK — undoes all changes
Property: atomic (all or nothing) and isolated from concurrent transactions
Integrity constraints
NOT NULL, PRIMARY KEY, UNIQUE (candidate key; can be null), CHECK (predicate)
Foreign key with cascading: ON DELETE CASCADE / ON UPDATE CASCADE; alternatives: SET NULL, SET DEFAULT
Assertions — DB-wide constraints; checked on every relevant modification (expensive)
Data types, indexes & authorization
Built-in: date, time, timestamp, interval, blob (binary), clob (character large object)
User-defined: CREATE TYPE Dollars AS numeric(12,2); CREATE DOMAIN (can have constraints)
CREATE INDEX idx ON table(attr) — speeds up lookups; not part of SQL standard but universally supported
Privileges: SELECT, INSERT, UPDATE, DELETE — granted with GRANT priv ON rel TO user, revoked with REVOKE
Roles: CREATE ROLE r; GRANT r TO user; — roles inherit privileges and can be granted to other roles
WITH GRANT OPTION — lets the grantee re-grant the privilege; CASCADE vs RESTRICT on revoke
5
Chapter 5 — Advanced SQL
Accessing SQL from programs
JDBC (Java): DriverManager.getConnection(...)StatementexecuteQuery/executeUpdateResultSet
Prepared statements: use ? placeholders + setString/setInt. Always use for user input — prevents SQL injection.
Security Never build queries by string concatenation with user input — SQL injection risk
SQL injection: attacker input like X' OR 'Y'='Y makes the WHERE always true
ODBC — language-agnostic API (originally Microsoft/Windows); same open/query/fetch model as JDBC
Embedded SQL: EXEC SQL <stmt>; in host language (C, Java, etc.); host vars prefixed with :
Embedded SQL cursor pattern: DECLARE c CURSOR FOR ... ; OPEN c; FETCH c INTO :var; CLOSE c;
Functions & procedures
CREATE FUNCTION f(arg type) RETURNS type BEGIN ... RETURN val; END
Table functions: return a table; called with SELECT * FROM TABLE(f('arg'))
CREATE PROCEDURE p(IN x type, OUT y type) BEGIN ... END; called with CALL p(val, var)
Control flow: WHILE ... DO ... END WHILE; REPEAT ... UNTIL ... END REPEAT; FOR r AS SELECT ... DO ... END FOR; IF ... THEN ... ELSEIF ... ELSE ... END IF
Benefits: compiled once, reusable, security (restrict table access, expose only procedure)
Triggers
A trigger fires automatically on INSERT, UPDATE, or DELETE
Syntax: CREATE TRIGGER name BEFORE|AFTER INSERT|UPDATE|DELETE ON table FOR EACH ROW BEGIN ... END
REFERENCING OLD ROW AS orow NEW ROW AS nrow — access values before/after change
FOR EACH STATEMENT — one execution per SQL statement (more efficient for bulk updates)
Avoid triggers when loading backups, replicating; disable before such operations
Risks: cascading triggers, unexpected side effects, transaction failures
Recursive queries
WITH RECURSIVE rec(cols) AS (base_query UNION recursive_query) SELECT * FROM rec;
Computes transitive closure — iterates until a fixed point (no new tuples added)
Views must be monotonic: adding tuples to input can only add tuples to the view, never remove
Why needed Cannot write transitive closure in plain SQL without recursion or iteration
Window functions & ranking
rank() OVER (ORDER BY GPA DESC) — ranks with gaps (ties both get 1, next is 3)
dense_rank() — no gaps (ties both get 1, next is 2)
PARTITION BY dept OVER (...) — rank within each group independently
ntile(n) — divide rows into n equal buckets; percent_rank; cume_dist; row_number
Windowing: sum(val) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) — sliding window aggregates (e.g., moving average)
OLAP
OLAP — interactive multi-dimensional data analysis. Dimensions = what you slice by; Measures = what you aggregate.
Cross-tab / pivot table — rows & columns are dimension values; cells are aggregated measures
GROUP BY CUBE(a,b,c) — all 2ⁿ subsets of attributes (every combination of groupings)
GROUP BY ROLLUP(a,b,c) — hierarchical prefixes only: (a,b,c), (a,b), (a), ()
OLAP ops: drill down (coarse→fine), roll up (fine→coarse), slicing (fix one dimension), pivoting (swap dimensions)
MOLAP = multidimensional arrays in memory; ROLAP = relational only; HOLAP = hybrid