SQL Server and ReFS: Part 1 - DBCC and In Memory OLTP

As I was sitting in Bob Ward’s Inside SQL Server I/O presentation, something interesting caught my eye on a particular slide.

ward

It looks like ReFS is now supported for SQL Server 2014. I’d run into problems with 2012 so I’d just given up but this looks promising. I am neither a filesystem aficionado, nor a dilettante but I know that there are some interesting features of ReFS at which Windows server admins are looking to see if it’s viable.

There are a few known gotchas with ReFS that I’m going to (hopefully) test, along with performance characteristics. Performance will be addressed in a separate post because I’ve already got the numbers and the post will be a bit deeper and less along the lines of “work/no-work.” Back in July, 2014, Qumio gave In-Memory OLTP a try with the ReFS filesystem without success. It’s been a few months, SQL Server 2014 is out of CTP, and I trust Bob Ward (and his caveat to disable integrity checks as they can cause unintended data file corruption). Additionally, the ReFS documentation notes that alternate data stream support has been added to ReFS, so let’s test.

Quick-format a volume (j:, in this case) with ReFS, disabling Integrity Steam checks. fmt

For fun, I wanted to see would happen with different file layout configurations, with at least one being on an ReFS-formatted volume (j:\ and p:\ are my ReFS volumes, the others are NTFS):

[code language=“sql”] – note, the SQL Server Service account has the Perform Volume Maintenance Tasks privilege on the OS create database refs_test on ( name = refs_test_dat, filename = ‘j:\data\refs_test.mdf’, size = 25GB, maxsize = 45GB, filegrowth = 1GB ) log on ( name = refs_test_log, filename = ‘l:\logs\refs_test.ldf’, size = 1GB, maxsize = 10GB, filegrowth = 1GB );

– SQL Server Execution Times: – CPU time = 15 ms, elapsed time = 5675 ms.

create database refs_test on ( name = refs_test_dat, filename = ‘j:\data\refs_test.mdf’, size = 25GB, maxsize = 45GB, filegrowth = 1GB ) log on ( name = refs_test_log, filename = ‘j:\data\refs_test.ldf’, size = 1GB, maxsize = 10GB, filegrowth = 1GB );

– SQL Server Execution Times: – CPU time = 30 ms, elapsed time = 10522 ms.

create database refs_test on ( name = refs_test_dat, filename = ’m:\data\refs_test.mdf’, size = 25GB, maxsize = 45GB, filegrowth = 1GB ) log on ( name = refs_test_log, filename = ‘j:\data\refs_test.ldf’, size = 1GB, maxsize = 10GB, filegrowth = 1GB );

– SQL Server Execution Times: – CPU time = 31 ms, elapsed time = 2566 ms

create database refs_test 
on (
    name = refs_test_dat, 
    filename = 'm:\data\refs_test.mdf',
    size = 25GB,
    maxsize = 45GB,
    filegrowth = 1GB
)
log on (
    name = refs_test_log,
    filename = 'l:\logs\refs_test.ldf',
    size = 1GB,
    maxsize = 10GB,
    filegrowth = 1GB
);

– SQL Server Execution Times: – CPU time = 47 ms, elapsed time = 2737 ms.

create database refs_test on ( name = refs_test_dat, filename = ‘j:\data\refs_test.mdf’, size = 25GB, maxsize = 45GB, filegrowth = 1GB ) log on ( name = refs_test_log, filename = ‘p:\logs\refs_test.ldf’, size = 1GB, maxsize = 10GB, filegrowth = 1GB ); – SQL Server Execution Times: – CPU time = 16 ms, elapsed time = 6773 ms.

create database refs_test on ( name = refs_test_dat, filename = ’m:\data\refs_test.mdf’, size = 25GB, maxsize = 45GB, filegrowth = 1GB ) log on ( name = refs_test_log, filename = ’m:\data\refs_test.ldf’, size = 1GB, maxsize = 10GB, filegrowth = 1GB ); – SQL Server Execution Times: – CPU time = 47 ms, elapsed time = 20573 ms. [/code]

(Each DDL was executed five times and the best run of each was taken. No averages, variances, or standard deviations were harmed in this exercise.)

So the fastest database creation occurred with the log on ReFS and the data file on NTFS. The slowest was with all files on the same volume with ReFS edging NTFS in that regard. This looks like an area for exploration with Bob’s PASS Summit 2014 WinDbg scripts.

Now let’s see if we can do some In-Memory OLTP.

[code language=“sql”] alter database refs_test add filegroup refs_test_memopt contains memory_optimized_data;

alter database refs_test add file ( name = refs_test_memopt_file, filename = ‘j:\data\refs_test_memopt_file’ ) to filegroup refs_test_memopt;

alter database refs_test set memory_optimized_elevate_to_snapshot=on;

[/code] addfg No errors.

Now let’s create a memory-optimized table and populate it with some data: [code language=“sql”] use refs_test; go

CREATE TABLE dbo.lemma( lemma_id int identity(1,1) not null primary key nonclustered hash with (bucket_count = 400000), lemma nvarchar(255) null ) with (memory_optimized=on);

– can’t do a cross-database transaction with a memory-optimized table select * into dbo.l2 from greek.dbo.lemma;

insert lemma (lemma) select lemma from l2

drop table l2;

select * into dbo.f_word – create a non-memory optimized table from greek.dbo.f_word; [/code] popdata

That works as well. What about creating a database snapshot?

[code language=“sql”] use master; go

create database greek_snap on ( name = greek_Data, filename=‘j:\data\greek_snap.ss’ ) as snapshot of greek;

select d.name, mf.name, mf.physical_name, type_desc, data_space_id, is_sparse, is_memory_optimized_elevate_to_snapshot_on from sys.master_files mf join sys.databases d on mf.database_id = d.database_id where d.database_id > 4; [/code] snap Well that worked well and we can see a sparse file sitting on

Still no problems. What about a DBCC CHECKDB?

[code language=“sql”] use master; dbcc checkdb(refs_test); [/code] We can see that the Memory-Optimized table is skipped, but the other table is not skipped and is checked, but it is done without any errors. dbcc

Recalling a previous blog post (and cannibalizing its code), let’s check to see how CHECKDB is performed. Does it create a snapshot or is it taking out TABLOCKX locks while running?

[code language=“sql”] – create the event session create event session [DBCC_Check] on server add event sqlserver.check_phase_tracing( action(sqlserver.database_id, sqlserver.database_name)), add event sqlserver.check_thread_message_statistics( action(sqlserver.database_id, sqlserver.database_name)), add event sqlserver.check_thread_page_io_statistics( action(sqlserver.database_id, sqlserver.database_name)), add event sqlserver.check_thread_page_latch_statistics( action(sqlserver.database_id, sqlserver.database_name)) add target package0.ring_buffer WITH (track_causality = on);

– start the event session alter event session [DBCC_Check] on server state = start;

– run the dbcc dbcc checkdb(refs_test);

– parse the data in the ring buffer select *
from ( select event_data.value(‘(event/@name)[1]‘, ‘varchar(50)’) AS event_name, DATEADD(hh,DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_data.value(‘(event/@timestamp)[1]‘, ‘datetime2’)) AS [timestamp], event_data.value(‘(event/data[@name=“database_id”]/value)[1]‘, ‘sysname’) AS event_database_id, event_data.value(‘(event/action[@name=“database_id”]/value)[1]‘, ‘sysname’) AS action_database_id, event_data.value(‘(event/data[@name=“opcode”]/text)[1]‘, ‘NVARCHAR(25)’) AS opcode_text, event_data.value(‘(event/data[@name=“call_duration”]/value)[1]‘, ‘bigint’) AS call_duration, event_data.value(‘(event/data[@name=“is_remote”]/value)[1]‘, ‘bit’) AS is_remote, event_data.value(‘(event/data[@name=“command_phase”]/text)[1]‘, ‘nvarchar(100)’) AS command_phase_text, event_data.value(‘(event/data[@name=“logical_reads”]/value)[1]‘, ‘bigint’) AS logical_reads, event_data.value(‘(event/data[@name=“physical_reads”]/value)[1]‘, ‘bigint’) AS physical_reads, event_data.value(‘(event/data[@name=“run_ahead_reads”]/value)[1]‘, ‘bigint’) AS run_ahead_reads, event_data.value(‘(event/data[@name=“total_page_io_latch_waits”]/value)[1]‘, ‘bigint’) AS total_page_io_latch_waits, event_data.value(‘(event/data[@name=“page_io_latch_wait_time_in_ms”]/value)[1]‘, ‘bigint’) AS page_io_latch_wait_time_in_ms, event_data.value(‘(event/data[@name=“total_page_latch_waits”]/value)[1]‘, ‘bigint’) AS total_page_latch_waits, event_data.value(‘(event/data[@name=“page_latch_wait_time_in_ms”]/value)[1]‘, ‘bigint’) AS page_latch_wait_time_in_ms, event_data.value(‘(event/data[@name=“messages_sent”]/value)[1]‘, ‘int’) AS messages_sent, event_data.value(‘(event/data[@name=“messages_received”]/value)[1]‘, ‘int’) AS messages_received, event_data.value(‘(event/action[@name=“database_name”]/value)[1]‘, ‘sysname’) AS database_name, CAST(SUBSTRING(event_data.value(‘(event/action[@name=“attach_activity_id”]/value)[1]‘, ‘varchar(50)’), 1, 36) AS uniqueidentifier) as activity_id, CAST(SUBSTRING(event_data.value(‘(event/action[@name=“attach_activity_id”]/value)[1]‘, ‘varchar(50)’), 38, 10) AS int) as event_sequence, CAST(SUBSTRING(event_data.value(‘(event/action[@name=“attach_activity_id_xfer”]/value)[1]‘, ‘varchar(50)’), 1, 36) AS uniqueidentifier) as activity_id_xfer from ( select XEvent.query(‘.’) AS event_data from ( – Cast the target_data to XML select CAST(target_data AS XML) AS TargetData from sys.dm_xe_session_targets st JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address where name = ‘DBCC_Check’ AND target_name = ‘ring_buffer’ ) AS Data – Split out the Event Nodes CROSS APPLY TargetData.nodes (‘RingBufferTarget/event’) AS XEventData (XEvent) ) AS tab (event_data) ) xedata order by timestamp

– stop the event session alter event session [DBCC_Check] on server state = stop;

– drop the event session drop event session [DBCC_Check] on server; [/code]

dbcc2

The row outlined in green shows the start of the step a snapshot is being created (so it actually creates a snapshot). The “action” database_id, outlined in red, is the database where the DBCC check is taking place - the snapshot.

Based on this evidence, it looks like ReFS is supported by SQL Server 2014 for at least basic utilization. There may be things that are yet undiscovered in what amounts to a v1.0 release of a new filesystem, so I wouldn’t necessarily run out and put all of my production SQL Servers on 2014 + ReFS. However, it’s looking like support is certainly advancing.

Next time I’ll talk IOPS and throughput using fio.