Configure CDC
Firstly we need to configure the
database and required table for CDC. I’ve created a test database for this
(called CDCTest).
To enable CDC for a database run:
To avoid 15404 sql server errors
ensure that the SQL Server service is running as a domain user when a domain
account is the CDCTest DB owner.
To check if cdc is enabled for the
database run the query:
The sp_cdc_enable_db procedure will
create system tables in the cdc schema. They are:
- cdc.captured_columns – defines which columns will be
monitoring for changes
- cdc.change_table – defines the change table which will
be populated by the CDC process
- cdc.index_columns – defines the index columns for each
change table
- cdc.ddl_history – records DDL changes for source tables
enabled for CDC
- cdc.lsn_time_mapping – captures the sequence number and
time for each transaction on the source table
We don’t have to worry about these
tables from here on in for this example. They are used by stored procs and the
jobs to manage the CDC process and maintenance.
We enable CDC on this table by
running:
Check that CDC is enabled for the
table with this query:
The sp_cdc_enable_table procedure
will create the following:
- cdc.dbo_tbl1_CT – this is the change table (a system
table), it includes the same columns as the source table and a number of
control columns
- Two table-valued functions –
(cdc.fn_cdc_get_all_changes_dbo_tbl1 and
cdc.fn_cdc_get_net_changes_dbo_tbl1) these are created and are used to
return the required result set from the change table
- Two SQL Server Agent Jobs (CDCTest_capture and
CDCTest_cleanup) – Capture runs to monitor the transaction log and
populates the change table. Cleanup runs to clean up the change table to
avoid them getting too big.
Insert, update and delete records
Now we’ll produce some insert,
updates and delete statements to populate our source table with data. The
comment is the time when the transaction was run:
The source table now looks like:
We inserted id 1 row, inserted id 2
row, updated id 1 row and deleted id 2 row. This leaves the one row as above.
Taking a look at the change table we
see a different story:
Here we can see the change table has
a number of control columns and also shows the history of the changes to the
table. The _$operation column defines what operation has taken place (1:
delete, 2:insert, 3: update (prior to change), 4: update (after change)). The
_$update_mark defines the hexadecimal value. This relates to a binary value.
The binary value shows which column has been updated. For example, 0x07 = 0b111
reading from right to left, means all 3 columns change; 0x04 = 0b100 means the
third column has changed (col2).
From this change table we can see
that id 1 row was inserted first, then id 2 inserted, id 1 row was updated, and
finally id 2 row was deleted.
The __$start_lsn column is the log
sequence number (LSN) associated with the committed transaction. This is used
by the table-valued functions to select the data when queried. These functions
take LSN (or associated time) to determine what to return.
__$end_lsn is not used in SQL Server
2008 and the __$seqval column can be used to sort the row changes within a
transaction.
Querying the Change table using the
table-valued functions.
The table-valued functions are used
to extract the changes from the change table based on a start and end LSN. This
defines the start and end positions that you want changes returned for. There
are a few ways to get your start and end LSN, there are functions available to
help you do this. Examples are:
- Get min LSN for a capture instance (associated with a
source table) - sys.fn_cdc_get_min_lsn ('dbo_tbl1')
- Get max LSN available - sys.fn_cdc_get_max_lsn ()
- Convert a time to the nearest LSN (‘smallest LSN,
greater than a time’ or ‘greatest LSN, less than or equal to a time’, etc)
- sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', '2009-09-03
16:22:20.000')
- Get next incremental LSN, used when you may save
previous upper bound limit and now want to use the next LSN for the lower
bound limit - sys.fn_cdc_increment_lsn (<previous upper bound
limit>)
- etc
Here’s an example of how to use the
table-valued functions:
Results are:
With the query and results above we
are setting the start and end LSNs to cover the full range of our insert,
update and delete statements. Remember we had for row with id 1 = insert,
update, and for row with id 2 = insert and delete. For the all_changes
table-valued functions we see it shows each operation we have performed during
this period. An insert (operation = 2), a second insert (2), an update (4) ,
and a delete (1). The net_changes functions shows just the net change an insert
(op = 2) with col2 column showing ‘update’.
The net_changes function just gives
the net changes with respect to the primary key. Note to use the net changes
function requires either a primary key or unique index to be set up on the
source table first (as we did above).
The next example uses time to set
the start LSN we chose the time 16:22:50. Only a delete occurred after this
point:
Here the all and net changes
functions returns the same value, a delete operation (1) on row with id 2.
Clean
up change table and set lowest watermark
Over time the change table will grow
bigger and bigger and may eventually affect CDC performance. To stop this there
is a clean up job to remove redundant changes. The clean up job
(CDCTest_Cleanup) runs on a configured schedule (daily basis by default) or
manually by stored proc. Based on the retention value configured for the job
(see table msdb.dbo.cdc_jobs) a new lowest watermark LSN for capture instance
is defined. This will be used to remove records from the associated change
table which are below this watermark.
Change Schema of Table
alter schema HR transfer
dbo.payroll