Friday, 18 May 2012

DATABASE MANAGEMENT SYSTEM



Concept of Normalization

  •     Purpose of Normalisation
  •     Redundancy and Data Anomalies
  •     Functional Dependency
  •     Transitive Dependency
  •     Stages of Normalisation

Purpose of Normalisation

          1. To avoid redundancy by storing each ‘fact’ within the database only once.

         2. To put data into a form that conforms to relational principles (e.g., single valued attributes, each           relation represents one entity) - no repeating groups.

        3.To put the data into a form that is more able to accurately accommodate change.

        4.To avoid certain updating ‘anomalies’.

        5.To facilitate the enforcement of data constraints.


Redundancy and Data Anomalies

Redundant data is where we have stored the same ‘information’ more than once. i.e., the redundant data could be removed without the loss of information.
Example: We have the following relation that contains staff and department details:

staffNo     job           dept   dname        city
SL10        Salesman   10      Sales           Stratford  
SA51       Manager    20     Accounts      Barking
DS40       Clerk         20     Accounts      Barking
OS45       Clerk         30    Operations    Barking

the redundency in table could lead to following anamolies:

Insert Anomaly: We can’t insert a dept without inserting a member of staff that works in that department

Update Anomaly: We could change the name of the dept that SA51 works in without simultaneously changing the dept that DS40 works in.

Deletion Anomaly: By removing employee SL10 we have removed all information pertaining to the Sales dept.

Functional Dependency

Formal Definition:  Attribute B is functionally dependant upon attribute A (or a collection of attributes) if           a value of A determines a single value of attribute B at any one time.

Formal Notation:  ® B  This should be read as A determines B or ‘B is functionally dependant on A’A is called the determinant and B is called the object of the determinant.
Example:                                     
staffNo  job           dept   dname                                Functional Dependencies           
SL10     Salesman  10      Sales                                       staffNo ® job
SA51    Manager    20     Accounts                                 staffNo ® dept
DS40    Clerk         20      Accounts                                staffNo ® dname
OS45    Clerk         30      Operations                             dept ® dname
  

Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.

                                                                                
Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely.                                                                                 
Example:                                                                              
order#   line#    qty     price                          Full Functional Dependencies
A001     001        10      200                           (Order#, line#) ® qty
A002     001        20      400                          (Order#, line#) ® price
A002     002        20     800
A004     001        15     300 
                                                                      
Partial Functional Dependency: This is the situation that exists if it is necessary to only use a subset of the attributes of the composite determinant to identify its object uniquely                                                                 

Example:

student#   unit#   room    grade                    Full Functional Dependencies
9900100     A01   TH224      2                     (student#, unit#) ® grade
9900010     A01   TH224    14                    
9901011     A02   JS075       3                       Partial Functional Dependencies    
9900001     A01   TH224    16                      unit# ® room
  Repetition of data!


          

Transitive Dependency:A transitive dependency exists when there is an intermediate functional dependency.

Formal Notation:  If ® B and B ® C, then it can be stated that the following transitive dependency exists: ® B ® C
Example:

staffNo    job            dept    dname                        Transitive Dependencies
     
SL10    Salesman   10      Sales                              staffNo ® dept
SA51    Manager     20     Accounts                         dept ® dname
DS40     Clerk          20      Accounts
OS45     Clerk          30      Operations                    staffNo ® dept ® dname



here also Repetition of data!

Normalization Achieves This!

remove repeating groups and  avoid redundancy and data anomalies by remoting partial and transitive functional dependencies.

Unnormalised Normal Form (UNF)

Definition: A relation is unnormalised when it has not had any normalisation rules applied to it, and it suffers from various anomalies.

First Normal Form (1NF)
Definition: A relation is in 1NF if, and only if, all its underlying attributes contain atomic values only.


Second Normal Form (2NF)
Definition: A relation is in 2NF if, and only if, it is in 1NF and every non-key attribute is fully dependent on the primary key.
  Remove partial functional dependencies into a new relation
consider the relation 
student( Rollno, Name, CourseNo, CourseName, Deptid, DeptName)
                     here {Rollno,,courseNo,Deptid} is candidate key
if there exist the partial functional dependency like Rollno-->Name  or  Deptid-->DeptName it is not in 2NF
Decomposition
                           student(Rollno,CourseNo,CourseName,Deptid)
                           X1(Rollno,Name)
                           X2(Deptid,DeptName)

Third normal form(3NF)

A relation R is in 3rd normal form if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.
      Remove transitive functional dependencies into a new relation
consider the relation
                             Shipdetails(Ship,Capacity,Date,Cargo,Value)    with following functional dependency
 Ship,Date-->Cargo,Capacity
 Cargo-->Value  relation is not in 3NF 
Decomposition
                             Shipdetails(Ship,Capacity,Date,Cargo)
                             X1(Cargo,Value)

A relation in 3NF does not have any anamolies but it still have redundency.

Boyce-Codd Normal Form (BCNF) 
         A more restricted version of 3NF (known as Boyce-Codd Normal Form) requires that the determinant of every functional dependency in a relation be a superkey - for every FD: X => Y, X is a superkey
Consider the following relation:
STU-MAJ-ADV (Student-Id, Major, Advisor)

Advisor --> Major                  , but Advisor is not a superkey
Decomposition
STU-ADV (Student-Id, Advisor)
ADV-MAJ (Advisor, Major)

4th Normal Form

A Boyce Codd normal form relation is in fourth normal form if

(a)there is no multi value dependency in the relation or

(b)there are multi value dependency but the attributes, which are multi valued dependent on a specific attribute, are independent each other.
consider the following relation 
Employee (Eid, Languages, Skills) 
Eid --->> Languages  Eid --->> Skills
Languages and Skills are independent.
Decomposition
Emplyee_Language(Eid, Languages)
Emplyee_Language(Eid, Skills)






What is a Transaction?


A logical unit of work on a database
An entire program
A portion of a program
A single command
The entire series of steps necessary to accomplish a logical unit of work
Successful transactions change the database from one CONSISTENT STATE to another
  (One where all data integrity constraints are satisfied)
Example of a Transaction
Updating a Record
Locate the Record on Disk
Bring record into Buffer
Update Data in the Buffer
Writing Data Back to Disk

ACID Properties of a Transaction

n Atomic – All or Nothing
  All parts of the transaction must be completed and committed or it must be aborted and rolled back
n Consistency
  The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof.
Isolation
  The final effects of multiple simultaneous transactions must be the same as if they were executed one right after the other .
Durability
  If a transaction has been committed, the DBMS must ensure that its effects are permanently recorded in the database (even if the system crashes)
Transaction Management with SQL
SQL Statements à Commit / Rollback
When a transaction sequence is initiated it must continue through all succeeding SQL statements until:
!A Commit Statement is Reached
!A Rollback Statement is Reached
!The End of the Program is Reached (Commit)
!The Program  is Abnormally Terminated (Rollback)
Transaction Log
Keeps track of all transactions that update the database
Record for the beginning of the transaction
Type of operation (insert / update / delete)
Names of objects/tables affected by the transaction
Before and After Values for Updated Fields
Pointers to Previous and Next Transaction Log Entries for the same transaction
The Ending of the Transaction (Commit)
Used for recovery in case of a Rollback
Concurrency Control
Coordination of simultaneous transaction execution in a multiprocessing database system
Ensure transaction serializability in a multi-user database
Lack of Concurrency Control can create data integrity and consistency problems:
Lost Updates
Uncommitted Data
inconsistent Retrievals

Serial Execution of Transactions
Serial Execution of transaction means that the transactions are performed one after another.
No interaction between transactions  - No Concurrency Control Problems
Serial Execution will never leave the database in an inconsistent state à Every Serial Execution is considered correct (Even if a different order would cause different results)

Serializability

If 2 Transactions are only reading data items – They do not conflict à Order is unimportant
If 2 Transactions operate (Read/Write) on Separate Data Items
  – They do not conflict à Order is unimportant
If 1 Transaction Writes to a Data Item and Another Reads or Writes to the Same Data Item à The Order of Execution IS Important

The Scheduler
Special DBMS Program to establish the order of operations in which concurrent transactions are executes
Interleaves the execution of database operations to ensure:
             Serializability
             Isolation of Transactions
Bases its actions on Concurrency Control Algorithms (Locking / Time Stamping)
Ensures the CPU is used efficiently (Scheduling Methods)
Facilitates Data Isolation à Ensure that 2 transactions do not update the same data at the same time
Concurrency Control Algorithms
Locking
     A Transaction “locks” a database object to prevent another object from modifying the object
Time-Stamping
                Assign a global unique time stamp to each transaction
Optimistic
                Assumption that most database operations do not conflict

Locking

Lock guarantees exclusive use of data item to current transaction
Prevents reading Inconsistent Data
Lock Manager is responsible for assigning and policing the locks used by the transaction
Locking Granularity
Indicates the level of lock use
Database Level – Entire Database is Locked
Table Level – Entire Table is Locked
Page Level – Locks an Entire Diskpage
       (Most Frequently Used)
n Row Level – Locks Single Row of Table
Field Level – Locks a Single Attribute of a Single Row (Rarely Done)
Types of Locks:  

Binary Locks – Lock with 2 States
Locked – No other transaction can use that object
Unlocked – Any transaction can lock and use object
         All Transactions require a Lock and Unlock Operation for Each
          Object Accessed (Handled by DBMS)
Eliminates Lost Updates
Too Restrictive to Yield Optimal Concurrency Conditions

Shared / Exclusive Locks:Indicates the Nature of the Lock

n Shared Lock – Concurrent Transactions are granted READ access on the basis of a     common lock


n    Exclusive Lock – Access is reserved for the transaction that locked the object
3 States:  Unlocked, Shared (Read), Exclusive (Write)
More Efficient Data Access Solution
More Overhead for Lock Manager

Type of lock needed must be known
3 Operations: 
nRead_Lock – Check to see the type of lock
nWrite_Lock – Issue a Lock
nUnlock – Release a Lock
Allow Upgrading / Downgrading of Locks

Problems with Locking

Transaction Schedule May Not be Serializable
Can be solved with 2-Phase Locking
May Cause Deadlocks
A deadlock is caused when 2 transactions wait for each other to unlock data
Two Phase Locking
Defines how transactions Acquire and Relinquish Locks
!Growing Phase – The transaction acquires all locks (doesn’t unlock any data)
!Shrinking Phase – The transaction releases locks (doesn’t lock any additional data)
Transactions acquire all locks it needs until it reaches locked point
When locked, data is modified and locks are released
Deadlocks
Occur when 2 transactions exist in the following mode:
      T1 = access data item X and Y
      T2 = Access data items Y and X
          If T1 does not unlock Y, T2 cannot begin
          If T2 does not unlock X, T1 cannot continue
     T1 & T2 wait indefinitely for each other to unlock data
Deadlocks are only possible if a transactions wants an Exclusive Lock 


(No Deadlocks on Shared Locks)
Controlling Deadlocks
Prevention – A transaction requesting a new lock is aborted if there is the 


            possibility of a deadlock – Transaction is rolled back, Locks are released, 


            Transaction is rescheduled
Detection – Periodically test the database for deadlocks.  If a deadlock is found, 


             abort / rollback one of the transactions
Avoidance – Requires a transaction to obtain all locks needed before it can 


             execute – requires locks to be obtained in succession

Time Stamping

Creates a specific order in which the transactions are processed by the DBMS
2 Main Properties

!Uniqueness – Assumes that no equal time stamp value can exist (ensures serializability of the transactions)
!Monotonicity – Ensures that time stamp values always increases
All operations within the same transaction have the same time stamp
f Transactions conflict, one is rolled back and rescheduled
Each value in Database requires 2 Additional Fields:  Last Time Read / Last Time Updated
Increases Memory Need and Processing Overhead



Time Stamping Schemes

Wait /  Die Scheme
  The older transaction will wait 
  The younger transaction will be rolled back
Wound / Wait Scheme
  The older transaction will preempt (wound)
  the younger transaction and roll it back
  The younger transaction waits for the older
  transaction to release the locks
Without time-out values, Deadlocks may be created

Database Recovery

Restore a database from a given state to a previous consistent state
Atomic Transaction Property (All or None)
Backup Levels:
Full Backup
Differential Backup
Transaction Log Backup
Database / System Failures:
Software (O.S., DBMS, Application Programs, Viruses)
Hardware (Memory Chips, Disk Crashes, Bad Sectors)
Programming Exemption (Application Program rollbacks)
Transaction (Aborting transactions due to deadlock detection)
External (Fire, Flood, etc)

Transaction Recovery

Recover Database by using data in the Transaction Log
Write-Ahead-Log – Transaction logs need to be written before any database data is updated
Redundant Transaction Logs – Several copies of log on different devices
Database Buffers – Buffers are used to increase processing time on updates instead of accessing data on disk
Database Checkpoints – Process of writing all updated buffers to disk à While this is taking place, all other requests are not executes
Scheduled several times per hour
Checkpoints are registered in the transaction log


1 comment: