Database questions & answers

A beautiful scene

Here is a list of essential Database questions and answers to refresh your knowledge in this field.

What are Database and DBMS?

A Database is a collection of organized data to serve some purpose.

A DBMS (Database Management System) comprises a set of tools to create, update, restore, query databases efficiently. Examples of DBMSs are Postgresql, MySQL, SQLite, MongoDB.

What is a Relational DBMS?

RDBMSs are systems that store databases as tables and there are relationships between the tables. Each table can be thought of as an Entity. For example, we can have table Person, table Football Match, table Country.

SQL (Structured Query Language) is a programming language specifically designed for managing relational DBMS.

What are the types of relationships in RDBMS?

There are 3 types of relationships:

  • One-to-one: a row in one table is related to none or only one row in another table. For example, the relationship between table Person and table PassportId is one-to-one.
  • One-to-many: a row in the first table can be related to 0, 1, or any number of rows in the second table, but a row in the second table can only be related to at most 1 row in the first table. Example: a person can possess many computers, but a computer can belong to only 1 person.
  • Many-to-many: a row of a table may be related to 0, 1, or any number of of rows in another table. Example: an user can use upto many software and a software can serve upto many users.

Define Data Integrity

Data Integrity is the assurance of accuracy and consistency during the whole life of the databases.

What is Data redundancy?

In Databases, data redundancy means some data is stored in more than 1 place, which consumes more storage than needed and makes it harder to maintain data integrity.

Describe Database Normalization and De-normalization

Database Normalization is the process of splitting tables into sub-tables in a well-defined manner to reduce data redundancy and increase data integrity.

On the reverse, Database De-normalization introduces data redundancy into the database to improve querying speed.

Name and briefly describe Normalization levels

  • First Normal Form (1NF) has cell, row, and column conditions: every cell in each table contains an atomic value (but not a list of values, for example). There is no duplication of rows in any table. Each column has a specific domain (e.g. a column cannot contains integers in some cells and strings in some other cells).
  • Second Normal Form (2NF): 1NF is satisfied and there is no partial dependency in the database. Partial dependency means there is some non-prime attributes that only depend on a proper subset of a candidate key (but not the whole candidate key). Note that a non-prime attribute is an attribute that does not appear in any candidate key.
  • Third Normal Form (3NF): 2NF is satisfied and there is no transitive-dependancy between non-prime attributes. In other words, any non-prime attribute only depends on the candidate keys but not on any other non-prime attribute.

What are the different types of SQL statements?

There are 3 main types of SQL statements, each with a specific function: Define, Manipulate, and Control.

  • Data Definition commands usually affect tables and columns. Examples are: CREATE, ALTER, DROP, RENAME. As a special case, the TRUNCATE command affects rows, it deletes all rows from a table but keeping columns, indexes, constraints and any other things.
  • Data Manipulation commands usually affect rows. Examples are: INSERT, DELETE, UPDATE, SELECT. These commands need to be committed before taking effects.
  • Data Control commands. Example are: COMMIT, ROLLBACK.

What is a View?

A View is a virtual table not present in the database but conceptually formed using data from one or more tables.

What is Functional Dependency?

We say an attribute Y is functionally dependent on attribute X if by knowing X we can infer Y. This is denoted by: X -> Y. For example, if a table has a single primary key then any other attributes are functionally dependent on this key.

Define the Cursor

The Cursor is a temporary work area that stores the data and the results achieved through Data Manipulation commands like SELECT.

What is a Transaction?

A database transaction is a sequence of commands that operate on the database. The transactions are to be executed atomically, i.e. either the whole transaction is completed or nothing is completed.

Compare Databases and Data warehouses

DatabaseData Warehouse
The main purpose is to manipulate (e.g. insert, update, delete) operational data.The main purpose is to analyze data.
Tables are normalized to maintain data integrity.Tables are de-normalized to optimize querying speed.
Use OLTP (Online Transactional Processing).Use OLAP (Online Analytical Processing).
Store detailed data.Store summarized data.
The differences between Databases and Data Warehouses.

Compare Clustered and Non-clustered index

With a Clustered Index, rows in the table are physically stored on disks according to the index. With a Non-clustered Index, there is a second list that contains pointers to point to the rows according to the index, while the physical locations of the rows are unchanged.

For a table, there is only 1 clustered index, while there can be many non-clustered indices.

What are JOIN types?

  • Inner joins: only matched rows between 2 tables are kept in the result.
  • Natural joins: are inner joins on common columns of the 2 tables.
  • Left [outer] joins: all rows in the left table are kept in the result. Similar idea goes to Right [outer] joins.
  • Outer joins: all rows of both tables are kept in the result.
  • Cross joins: a cartesian product of 2 tables.

Define Primary Key and Composite Key

The Primary Key of a table is a column that uniquely identifies all rows. A table can have only 1 primary key. The Primary Key is UNIQUE and NOT NULL. Note that, unlike a Primary Key, a regular UNIQUE attribute can accept one value to be NULL.

A Composite Key is a set of attributes that act the same as a Primary Key (i.e. uniquely define all rows).

Define Triggers

A Trigger is a set of commands to be executed when a specified event happens, for example, before INSERT, after INSERT, on UPDATE.

Explain B-Trees

The B-Tree and its variants are the most popular data structure for indexing in Databases. The most important difference between a B-tree and a Binary tree is that while a Binary tree node has at most 2 children, a B-tree node has more children, this is to adapt to the fact that accessing time is expensive.

B-trees are self-balancing and support all searches, sequential accesses, insertions, and deletions in logarithmic time.

Explain DELETE, TRUNCATE, and DROP

DELETE is a Data Manipulation command. We can use DELETE to get rid of some or all rows in the table. DELETE can be rolled back (by ROLLBACK).

TRUNCATE is a Data Definition command that gets rid of all rows in the table by removing all pages that the rows are stored in. TRUNCATE can or cannot be rolled back, depending on the DBMS.

DROP is to remove the whole table (not just rows but also columns, indices, etc.) or key (e.g. the primary key, foreign keys).

What is a Constraint?

A constraint is a rule that is enforced on the tables of the database. This rule needs to be followed during the entire life of the tables, as long as the constraint is still present.

For example, we can enforce a column to not have duplicated values by using the UNIQUE constraint.

Common constraints are:

  • PRIMARY KEY, FOREIGN KEY,
  • UNIQUE, NOT NULL,
  • DEFAULT, CHECK, INDEX.

Explain Self-joins

A Self-join joins one table with itself. Unlike normal joins, self-joins require aliasing names for the 2 appearances of the joining table.

What is Collation?

Collation refers to a set of rules on how to order data. For example, Cafe and Café, which one is lexicographically smaller is dependent on Collation.

Describe SELECT INTO

The SELECT [column names] INTO [a new table name] ... creates a new table based on the result from the SELECT statement. Unlike a regular SELECT, this command doesn’t return anything but rather uses the extracted data to make a new (either temporary or permanent) table.

What are the wildcard characters in SQL?

When pattern matching with SQL, the wildcard characters are used with the LIKE operator to allow matchings of a wider range of strings.

  • % represents anything. For example, Ab% matches all strings that start with Ab.
  • _ represents any single character.
  • [] is to match any one of the characters in the brackets. For example, B[aiu]g matches Bag, Big, and Bug.
  • [^ ] is to match any one of the characters NOT in the brackets. For example, C[bc]t matches Cat, Cdt but NOT Cbt and Cct.
  • - is to match any characters in a range. For example D[a-c]t matches only Dat, Dbt, and Dct.

What are ACID properties?

  • Atomicity: Either the whole transaction is executed or nothing is executed.
  • Consistency: all modification of the database must not violate the defined rules.
  • Isolation: multiple transactions that are executed in parallel must result in the same as they are executed sequentially.
  • Durability: committed transactions are stored permanently in the database.

Briefly explain NoSQL

NoSQL means Not Only SQL. It refers to the databases that, instead of storing data in tables, various other mechanisms can be used. For example, data can be stored as documents, graphs, or key-value pairs.

NoSQL is more suitable for unstructured data.

What is vertical and horizontal scaling in DBs?

Vertical scaling means we add more storage, RAM, and upgrade CPUs to our computer, while Horizontal scaling means adding more machines to the cluster of resources.

While SQL can only be vertically scaled, NoSQL databases even have the ability to be scaled horizontally. This is a big advantage as horizontal scaling is more cost-effective and has virtually no limit, we can enhance our system by adding dozens or even hundreds of servers.

What are the advantages of SQL and NoSQL?

NoSQL has the following advantages:

  • It can store unstructured and semi-structured data. Furthermore, each data instance can has its own structure.
  • Database can be horizontally scaled. Thus, big data is feasible and we can make use of cheaper machines.
  • This is a distributed system, which can take advantages of distributed (or parallel) computing, resulting in very high throughput.

On the other hand, there are also areas where SQL is superior:

  • A strict structure infers easier data understanding and manipulation.
  • Highly complex queries can be executed with JOIN and GROUP BY.
  • ACID can be maintained. (Most NoSQL trade-off Consistency for performance and Availability.)

References:

Leave a Reply