Monday, May 21, 2012

Configuring CDC (Copied)


Configure CDC

Firstly we need to configure the database and required table for CDC. I’ve created a test database for this (called CDCTest).
To enable CDC for a database run:
To avoid 15404 sql server errors ensure that the SQL Server service is running as a domain user when a domain account is the CDCTest DB owner.
To check if cdc is enabled for the database run the query:
The sp_cdc_enable_db procedure will create system tables in the cdc schema. They are:
  • cdc.captured_columns – defines which columns will be monitoring for changes
  • cdc.change_table – defines the change table which will be populated by the CDC process
  • cdc.index_columns – defines the index columns for each change table
  • cdc.ddl_history – records DDL changes for source tables enabled for CDC
  • cdc.lsn_time_mapping – captures the sequence number and time for each transaction on the source table
We don’t have to worry about these tables from here on in for this example. They are used by stored procs and the jobs to manage the CDC process and maintenance.
We enable CDC on this table by running:
Check that CDC is enabled for the table with this query:
The sp_cdc_enable_table procedure will create the following:
  • cdc.dbo_tbl1_CT – this is the change table (a system table), it includes the same columns as the source table and a number of control columns
  • Two table-valued functions – (cdc.fn_cdc_get_all_changes_dbo_tbl1 and cdc.fn_cdc_get_net_changes_dbo_tbl1) these are created and are used to return the required result set from the change table
  • Two SQL Server Agent Jobs (CDCTest_capture and CDCTest_cleanup) – Capture runs to monitor the transaction log and populates the change table. Cleanup runs to clean up the change table to avoid them getting too big.
Insert, update and delete records
Now we’ll produce some insert, updates and delete statements to populate our source table with data. The comment is the time when the transaction was run:
The source table now looks like:
We inserted id 1 row, inserted id 2 row, updated id 1 row and deleted id 2 row. This leaves the one row as above.
Taking a look at the change table we see a different story:
Here we can see the change table has a number of control columns and also shows the history of the changes to the table. The _$operation column defines what operation has taken place (1: delete, 2:insert, 3: update (prior to change), 4: update (after change)). The _$update_mark defines the hexadecimal value. This relates to a binary value. The binary value shows which column has been updated. For example, 0x07 = 0b111 reading from right to left, means all 3 columns change; 0x04 = 0b100 means the third column has changed (col2).
From this change table we can see that id 1 row was inserted first, then id 2 inserted, id 1 row was updated, and finally id 2 row was deleted.
The __$start_lsn column is the log sequence number (LSN) associated with the committed transaction. This is used by the table-valued functions to select the data when queried. These functions take LSN (or associated time) to determine what to return.
__$end_lsn is not used in SQL Server 2008 and the __$seqval column can be used to sort the row changes within a transaction.
Querying the Change table using the table-valued functions.
The table-valued functions are used to extract the changes from the change table based on a start and end LSN. This defines the start and end positions that you want changes returned for. There are a few ways to get your start and end LSN, there are functions available to help you do this. Examples are:
  • Get min LSN for a capture instance (associated with a source table) - sys.fn_cdc_get_min_lsn ('dbo_tbl1')
  • Get max LSN available - sys.fn_cdc_get_max_lsn ()
  • Convert a time to the nearest LSN (‘smallest LSN, greater than a time’ or ‘greatest LSN, less than or equal to a time’, etc) - sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', '2009-09-03 16:22:20.000')
  • Get next incremental LSN, used when you may save previous upper bound limit and now want to use the next LSN for the lower bound limit - sys.fn_cdc_increment_lsn (<previous upper bound limit>)
  • etc
Here’s an example of how to use the table-valued functions:
Results are:
With the query and results above we are setting the start and end LSNs to cover the full range of our insert, update and delete statements. Remember we had for row with id 1 = insert, update, and for row with id 2 = insert and delete. For the all_changes table-valued functions we see it shows each operation we have performed during this period. An insert (operation = 2), a second insert (2), an update (4) , and a delete (1). The net_changes functions shows just the net change an insert (op = 2) with col2 column showing ‘update’.
The net_changes function just gives the net changes with respect to the primary key. Note to use the net changes function requires either a primary key or unique index to be set up on the source table first (as we did above).
The next example uses time to set the start LSN we chose the time 16:22:50. Only a delete occurred after this point:
Here the all and net changes functions returns the same value, a delete operation (1) on row with id 2.
There are more options available with both all and net functions such as : “all update old”, “all with mask”, etc. Take a look at Books Online for more details: http://msdn.microsoft.com/en-us/library/bb510744.aspx
Clean up change table and set lowest watermark
Over time the change table will grow bigger and bigger and may eventually affect CDC performance. To stop this there is a clean up job to remove redundant changes. The clean up job (CDCTest_Cleanup) runs on a configured schedule (daily basis by default) or manually by stored proc. Based on the retention value configured for the job (see table msdb.dbo.cdc_jobs) a new lowest watermark LSN for capture instance is defined. This will be used to remove records from the associated change table which are below this watermark.

Change Schema of Table
alter schema HR transfer dbo.payroll

SQL Server 2008 R2 SSIS SQL Server Destination usage under memory pressure

When trying to use SSIS SQL Server Destination to improve data insertion performance sometimes buffer swapping occurs when RAM is strained out. This results in SSIS sometimes giving the below error:
Information: The buffer manager failed a memory allocation call for 8912512 bytes, but was unable to swap out any buffers to relieve memory pressure. 104744 buffers were considered and 104744 were locked. Either not enough memory is available to the pipeline because not enough are installed, other processes were using it, or too many buffers are locked.
[SS_DST_Fact[1412]] Error: The attempt to send a row to SQL Server failed with error code 0x80004005.
[OLE_SRC_[1]] Error: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.

This is due to the fact that by default the SQL Server Destination property MaxInsertCommitSize is set to 0 which leads to SSIS trying to commit all rows in one bulk insert, this can be very straining to SSIS if there are huge amounts of rows. to solve this issue set this number to X.

In my case setting it to 100 000 Rows did the trick.