Concept of Normalization
- Purpose of Normalisation
- Redundancy and Data Anomalies
- Functional Dependency
- Transitive Dependency
- Stages of Normalisation
Purpose of Normalisation
1. To avoid redundancy by storing each ‘fact’ within the database only once.
2. To put data into a form that conforms to relational principles (e.g., single valued attributes, each relation represents one entity) - no repeating groups.
3.To put the data into a form that is more able to accurately accommodate change.
4.To avoid certain updating ‘anomalies’.
5.To facilitate the enforcement of data constraints.
Redundancy and Data Anomalies
Redundant data is where we have stored the same ‘information’ more than once. i.e., the redundant data could be removed without the loss of information.
Example: We have the following relation that contains staff and department details:
staffNo job dept dname city
SL10 Salesman 10 Sales Stratford
SA51 Manager 20 Accounts Barking
DS40 Clerk 20 Accounts Barking
OS45 Clerk 30 Operations Barking
the redundency in table could lead to following anamolies:
Insert Anomaly: We can’t insert a dept without inserting a member of staff that works in that department
Update Anomaly: We could change the name of the dept that SA51 works in without simultaneously changing the dept that DS40 works in.
Deletion Anomaly: By removing employee SL10 we have removed all information pertaining to the Sales dept.
Functional Dependency
Formal Definition: Attribute B is functionally dependant upon attribute A (or a collection of attributes) if a value of A determines a single value of attribute B at any one time.
Formal Notation: A ® B This should be read as ‘A determines B’ or ‘B is functionally dependant on A’. A is called the determinant and B is called the object of the determinant.
Example:
staffNo job dept dname Functional Dependencies
SL10 Salesman 10 Sales staffNo ® job
SA51 Manager 20 Accounts staffNo ® dept
DS40 Clerk 20 Accounts staffNo ® dname
OS45 Clerk 30 Operations dept ® dname
Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.
Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely.
Example:
order# line# qty price Full Functional Dependencies
A001 001 10 200 (Order#, line#) ® qty
A002 001 20 400 (Order#, line#) ® price
A002 002 20 800
A004 001 15 300
Example:
student# unit# room grade Full Functional Dependencies
9900100 A01 TH224 2 (student#, unit#) ® grade
9900010 A01 TH224 14
9901011 A02 JS075 3 Partial Functional Dependencies
9900001 A01 TH224 16 unit# ® room
Repetition of data!
Transitive Dependency:A transitive dependency exists when there is an intermediate functional dependency.
Formal Notation: If A ® B and B ® C, then it can be stated that the following transitive dependency exists: A ® B ® C
Example:
staffNo job dept dname Transitive Dependencies
SL10 Salesman 10 Sales staffNo ® dept
SA51 Manager 20 Accounts dept ® dname
DS40 Clerk 20 Accounts
OS45 Clerk 30 Operations staffNo ® dept ® dname
here also Repetition of data!
Normalization Achieves This!
remove repeating groups and avoid redundancy and data anomalies by remoting partial and transitive functional dependencies.
Definition:
A relation is unnormalised
when it has not had any normalisation
rules applied to it, and it suffers from various anomalies.
First
Normal Form (1NF)
Definition:
A relation is in 1NF if, and only if, all its underlying attributes contain
atomic values only.
Second
Normal Form (2NF)
Definition:
A relation is in 2NF if, and only if, it is in 1NF and every non-key attribute
is fully dependent on the primary key.
Remove partial functional dependencies
into a new relation
consider the relation
student( Rollno, Name, CourseNo, CourseName, Deptid, DeptName)
here {Rollno,,courseNo,Deptid} is candidate key
if there exist the partial functional dependency like Rollno-->Name or Deptid-->DeptName it is not in 2NF
Decomposition
student(Rollno,CourseNo,CourseName,Deptid)
X1(Rollno,Name)
X2(Deptid,DeptName)
Third normal form(3NF)
A relation R is in 3rd normal form if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.
Remove transitive functional dependencies
into a new relation
consider the relation
Shipdetails(Ship,Capacity,Date,Cargo,Value) with following functional dependency
Ship,Date-->Cargo,Capacity
Cargo-->Value relation is not in 3NF
Decomposition
Shipdetails(Ship,Capacity,Date,Cargo)
X1(Cargo,Value)
A relation in 3NF does not have any anamolies but it still have redundency.
Boyce-Codd Normal Form
(BCNF)
A
more restricted version of 3NF (known as Boyce-Codd
Normal Form)
requires that the determinant of every functional dependency in a relation be a
superkey
-
for every FD: X => Y, X is a superkey
Consider the following relation:
STU-MAJ-ADV (Student-Id, Major, Advisor)
Advisor --> Major , but Advisor is not a superkey
Decomposition
STU-ADV (Student-Id,
Advisor)
ADV-MAJ (Advisor,
Major)
4th Normal Form
A Boyce Codd normal form relation is in fourth normal form if
(a)there
is no multi value dependency in the relation or
(b)there
are multi value dependency but the attributes, which are multi valued dependent
on a specific attribute, are independent each other.
consider the following relation
Employee (Eid, Languages, Skills)
Eid
--->> Languages Eid
--->> Skills
Languages and Skills are independent.
Languages and Skills are independent.
Decomposition
Emplyee_Language(Eid,
Languages)
Emplyee_Language(Eid,
Skills)
What is a Transaction?
n A logical unit of work on a database
n An entire program
n A portion of a program
n A single command
n The entire series of steps necessary to accomplish a
logical unit of work
n Successful transactions change the database from one
CONSISTENT STATE to another
(One where all
data integrity constraints are satisfied)
Example
of a Transaction
n Updating a Record
n Locate the Record on Disk
n Bring record into Buffer
n Update Data in the Buffer
n Writing Data Back to Disk
ACID Properties of a Transaction
n Atomic – All or Nothing
All parts of
the transaction must be completed and committed or it must be aborted and
rolled back
n Consistency
The consistency property ensures that any transaction will bring
the database from one valid state to another. Any data written to the database
must be valid according to all defined rules, including but not limited to
constraints, cascades, triggers, and any combination thereof.
n Isolation
The final effects of multiple simultaneous
transactions must be the same as if they were executed one right after the
other .
n Durability
If a transaction has been committed, the DBMS
must ensure that its effects are permanently recorded in the database (even if
the system crashes)
Transaction
Management with SQL
n SQL Statements à Commit / Rollback
n When a transaction sequence is initiated it must
continue through all succeeding SQL statements until:
!A Commit
Statement is Reached
!A Rollback
Statement is Reached
!The End of the
Program is Reached (Commit)
!The
Program is Abnormally Terminated
(Rollback)
Transaction
Log
n Keeps track of all transactions that update the
database
n Record for the beginning of the transaction
n Type of operation (insert / update / delete)
n Names of objects/tables affected by the transaction
n Before and After Values for Updated Fields
n Pointers to Previous and Next Transaction Log Entries
for the same transaction
n The Ending of the Transaction (Commit)
n Used for recovery in case of a Rollback
Concurrency
Control
n Coordination
of simultaneous transaction execution in a multiprocessing database system
n Ensure
transaction serializability in a multi-user database
n Lack of
Concurrency Control can create data integrity and consistency problems:
n Lost
Updates
n Uncommitted
Data
n inconsistent
Retrievals
Serial
Execution of Transactions
n Serial Execution of transaction means that the
transactions are performed one after another.
n No interaction between transactions - No Concurrency Control Problems
n Serial Execution will never leave the database in an
inconsistent state à Every Serial Execution is considered correct (Even if
a different order would cause different results)
Serializability
n If 2 Transactions are only reading data items – They
do not conflict à Order is unimportant
n If 2 Transactions operate (Read/Write) on Separate
Data Items
– They do not
conflict à Order is unimportant
n If 1 Transaction Writes to a Data Item and Another
Reads or Writes to the Same Data Item à The Order of Execution IS Important
The
Scheduler
n Special DBMS Program to establish the order of
operations in which concurrent transactions are executes
n Interleaves the execution of database operations to
ensure:
Serializability
Isolation of
Transactions
n Bases its actions on Concurrency Control Algorithms
(Locking / Time Stamping)
n Ensures the CPU is used efficiently (Scheduling
Methods)
n Facilitates Data Isolation à Ensure that 2 transactions do not update the same
data at the same time
Concurrency
Control Algorithms
n Locking
A Transaction “locks” a database object to prevent
another object from modifying the object
n Time-Stamping
Assign a global unique time stamp to each
transaction
n Optimistic
Assumption that most database operations do
not conflict
Locking
n Lock guarantees exclusive use of data item to current
transaction
n Prevents reading Inconsistent Data
n Lock Manager is responsible for assigning and policing
the locks used by the transaction
Locking
Granularity
Indicates
the level of lock use
n Database
Level – Entire Database is Locked
n Table
Level – Entire Table is Locked
n Page
Level – Locks an Entire Diskpage
(Most Frequently Used)
n Row
Level – Locks Single Row of Table
n Field
Level – Locks a Single Attribute of a Single Row (Rarely Done)
Types
of Locks:
n Binary Locks – Lock with 2 States
n Locked – No other transaction can use that object
n Unlocked – Any transaction can lock and use object
All
Transactions require a Lock and Unlock Operation for Each
Object
Accessed (Handled by DBMS)
n Eliminates Lost Updates
n Too Restrictive to Yield Optimal Concurrency
Conditions
n Shared
Lock – Concurrent Transactions are granted READ access on the basis of a common
lock
n Exclusive Lock – Access is reserved for the transaction that locked the object
n Exclusive Lock – Access is reserved for the transaction that locked the object
n 3
States: Unlocked, Shared (Read),
Exclusive (Write)
n More
Efficient Data Access Solution
n More
Overhead for Lock Manager
n Type of
lock needed must be known
n 3
Operations:
nRead_Lock –
Check to see the type of lock
nWrite_Lock –
Issue a Lock
nUnlock
– Release a Lock
n Allow
Upgrading / Downgrading of Locks
Problems
with Locking
n Transaction Schedule May Not be Serializable
n Can be solved with 2-Phase Locking
n May Cause Deadlocks
n A deadlock is caused when 2 transactions wait for each
other to unlock data
Two
Phase Locking
n Defines how transactions Acquire and Relinquish Locks
!Growing
Phase – The transaction acquires all locks (doesn’t unlock any data)
!Shrinking
Phase – The transaction releases locks (doesn’t lock any additional data)
n Transactions acquire all locks it needs until it
reaches locked point
n When locked, data is modified and locks are released
Deadlocks
n Occur
when 2 transactions exist in the following mode:
T1 = access data item X and Y
T2 = Access data items Y and X
If T1
does not unlock Y, T2 cannot begin
If T2
does not unlock X, T1 cannot continue
T1
& T2 wait indefinitely for each other to unlock data
n Deadlocks
are only possible if a transactions wants an Exclusive Lock
(No Deadlocks on Shared Locks)
(No Deadlocks on Shared Locks)
Controlling
Deadlocks
n Prevention
– A transaction requesting a new lock is aborted if there is the
possibility of a deadlock – Transaction is rolled back, Locks are released,
Transaction is rescheduled
possibility of a deadlock – Transaction is rolled back, Locks are released,
Transaction is rescheduled
n Detection
– Periodically test the database for deadlocks.
If a deadlock is found,
abort / rollback one of the transactions
abort / rollback one of the transactions
n Avoidance
– Requires a transaction to obtain all locks needed before it can
execute – requires locks to be obtained in succession
execute – requires locks to be obtained in succession
Time Stamping
n Creates
a specific order in which the transactions are processed by the DBMS
n 2 Main
Properties
!Uniqueness
– Assumes that no equal time stamp value can exist (ensures serializability of the
transactions)
!Monotonicity
– Ensures that time stamp values always increases
n All
operations within the same transaction have the same time stamp
n f
Transactions conflict, one is rolled back and rescheduled
n Each
value in Database requires 2 Additional Fields:
Last Time Read / Last Time Updated
n Increases
Memory Need and Processing Overhead
Time
Stamping Schemes
n Wait
/ Die Scheme
The older transaction will wait
The older transaction will wait
The younger transaction will be rolled back
n Wound /
Wait Scheme
The older transaction will preempt (wound)
the younger transaction and roll it back
The younger transaction waits for the older
transaction to release the locks
n Without
time-out values, Deadlocks may be created
Database Recovery
n Restore
a database from a given state to a previous consistent state
n Atomic
Transaction Property (All or None)
n Backup
Levels:
n Full
Backup
n Differential
Backup
n Transaction
Log Backup
n Database
/ System Failures:
n Software
(O.S., DBMS, Application Programs, Viruses)
n Hardware
(Memory Chips, Disk Crashes, Bad Sectors)
n Programming
Exemption (Application Program rollbacks)
n Transaction
(Aborting transactions due to deadlock detection)
n External
(Fire, Flood, etc)
Transaction Recovery
n Recover
Database by using data in the Transaction Log
n Write-Ahead-Log
– Transaction logs need to be written before any database data is updated
n Redundant
Transaction Logs – Several copies of log on different devices
n Database
Buffers – Buffers are used to increase processing time on updates instead of
accessing data on disk
n Database
Checkpoints – Process of writing all updated buffers to disk à While this is taking place, all other requests are
not executes
n Scheduled
several times per hour
n Checkpoints
are registered in the transaction log