SQL Interview Questions

Question 1

What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?

Answer

INNER JOIN returns only rows that have matching values in both tables. LEFT JOIN returns all rows from the left table and matching rows from the right (nulls where no match exists). RIGHT JOIN is the reverse, returning all rows from the right table. FULL OUTER JOIN returns all rows from both tables, with nulls where there is no match on either side.

Question 2

What is database normalization and what are the normal forms?

Answer

Normalization is the process of organizing database tables to reduce redundancy and dependency. First Normal Form (1NF) eliminates repeating groups, Second Normal Form (2NF) removes partial dependencies on composite keys, and Third Normal Form (3NF) removes transitive dependencies. Higher forms like BCNF and 4NF address more subtle anomalies but are less commonly applied in practice.

Question 3

How do indexes work and when should you create them?

Answer

Indexes are data structures (typically B-trees) that maintain a sorted reference to table rows, enabling fast lookups without scanning the entire table. Create indexes on columns frequently used in WHERE clauses, JOIN conditions, and ORDER BY. However, indexes add overhead to INSERT, UPDATE, and DELETE operations and consume additional storage, so avoid over-indexing tables with heavy write workloads.

Question 4

What is the difference between WHERE and HAVING?

Answer

WHERE filters rows before any grouping or aggregation takes place, while HAVING filters groups after GROUP BY and aggregation. WHERE cannot be used with aggregate functions like COUNT, SUM, or AVG, but HAVING can. A query can use both: WHERE to filter individual rows first, then HAVING to filter the aggregated groups.

Question 5

Explain ACID properties in database transactions.

Answer

Atomicity ensures all operations in a transaction succeed or none do. Consistency guarantees the database moves from one valid state to another. Isolation ensures concurrent transactions do not interfere with each other. Durability guarantees that committed transactions persist even after system failures. These properties together ensure reliable and predictable database behavior.

Question 6

What is a Common Table Expression (CTE) and when would you use one?

Answer

A CTE is a temporary named result set defined with the WITH keyword that exists only for the duration of a single query. CTEs improve readability by breaking complex queries into logical steps and can reference themselves for recursive queries (like traversing hierarchies). They are functionally similar to subqueries but are easier to read, maintain, and debug.

Question 7

What is the difference between a clustered and non-clustered index?

Answer

A clustered index determines the physical storage order of data in a table, so a table can have only one clustered index (typically the primary key). A non-clustered index creates a separate structure with pointers back to the actual data rows, and a table can have multiple non-clustered indexes. Clustered indexes are faster for range queries, while non-clustered indexes are better for selective lookups on non-primary columns.

Question 8

How do window functions work?

Answer

Window functions perform calculations across a set of rows related to the current row without collapsing them into a single output row like GROUP BY does. They use the OVER clause to define the window (partition and ordering), enabling calculations like running totals, rankings, and moving averages. Common window functions include ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and SUM/AVG with OVER.

Question 9

What is SQL injection and how do you prevent it?

Answer

SQL injection is an attack where malicious SQL code is inserted into application queries through user input, potentially allowing unauthorized data access or modification. Prevent it by using parameterized queries (prepared statements) instead of string concatenation, validating and sanitizing all input, applying the principle of least privilege for database accounts, and using an ORM that handles parameterization automatically.

Question 10

Explain the different transaction isolation levels.

Answer

Read Uncommitted allows dirty reads (seeing uncommitted changes). Read Committed prevents dirty reads but allows non-repeatable reads. Repeatable Read prevents both but allows phantom reads (new rows appearing). Serializable provides full isolation but with the most locking overhead. Higher isolation levels provide more consistency but reduce concurrency and throughput.

Question 11

What is the difference between DELETE, TRUNCATE, and DROP?

Answer

DELETE removes specific rows based on a WHERE condition, is logged row-by-row, and can be rolled back within a transaction. TRUNCATE removes all rows from a table quickly by deallocating data pages, resets identity counters, and is minimally logged. DROP removes the entire table structure, its data, indexes, and permissions from the database permanently.

Question 12

How would you optimize a slow SQL query?

Answer

Start by examining the execution plan (EXPLAIN/EXPLAIN ANALYZE) to identify full table scans, missing indexes, and costly operations. Add appropriate indexes on filtered and joined columns, rewrite subqueries as joins where possible, avoid SELECT *, and ensure statistics are up to date. Consider denormalization, materialized views, or query restructuring for complex analytical queries that cannot be sufficiently optimized with indexes alone.

Question 13

What is a self-join and when is it useful?

Answer

A self-join is when a table is joined to itself using aliases to treat it as two separate tables. It is useful for querying hierarchical data (employees and managers in the same table), comparing rows within the same table (finding duplicates), or establishing relationships between rows of the same entity. The join condition typically relates a foreign key column to the primary key of the same table.

Question 14

What are stored procedures and what are their advantages?

Answer

Stored procedures are precompiled SQL code blocks stored on the database server that can accept parameters, contain logic, and return results. They reduce network traffic by executing multiple statements in a single call, improve security by restricting direct table access, and can be optimized by the database engine since execution plans are cached. However, they can be harder to version control and test compared to application-level code.

Question 15

What is a deadlock in a database and how can you prevent it?

Answer

A deadlock occurs when two or more transactions are waiting for locks held by each other, creating a circular dependency that prevents any of them from proceeding. Databases detect deadlocks by building wait-for graphs and resolve them by rolling back one transaction. Prevent deadlocks by accessing tables in a consistent order, keeping transactions short, using appropriate isolation levels, and avoiding user interaction during open transactions.