Top Function – Performance Gain

Recently, I presented at Singapore SQL Pass Chapter on the topic T-SQL Fun, and the most of the attendees were a developer. During the session, I discussed TOP function, and I throw a question to the audience – “Not using Top function especially when you can use it – Can it cause performance problem or not?”
I got a spontaneous response that it wouldn’t create any performance problem. To conclude that I did a demo to show the audience. Here, I am going to blog the same things for the wider audience.

Top (N) Function

The Top (N) limit the result to a specific number of rows. You can use it when you don’t need all rows

To simulate the performance problem let’s run the below script, and the script is going to perform following steps;

  • Use the database AdventureworksDW2016CTP3
  • Run the DBCC DROPCLEANBUFFERS to clean buffers to drop the actual data pages from SQL Server buffer cache (Note: Don’t run the DBCC command in Production environment)
  • Count the number of data page that read from the disk into memory using the DMV called DMV sys.dm_os_buffer_descriptors
  • Select the columns ProductKey and OrderDateKey data from the table FactResellerSalesXL_CCI. The table got 11.6 million rows. so I’m narrowing down 11.6 million rows.
  • Count again the number of data page into the memory using the DMV sys.dm_os_buffer_descriptors to see how many pages got loaded from the disk into memory.
USE AdventureworksDW2016CTP3
GO
-- DON'T RUN "DROPCLEANBUFFERS" IN PRODUCTION ENVIORNMENT 
DBCC DROPCLEANBUFFERS;
GO

-- Actual data pages in SQL buffer cache 
SELECT  COUNT(*) AS [buffer_count], ((COUNT(*)*8)/1024)  [Used Buffer Size in MB]
FROM    [sys].[dm_os_buffer_descriptors]
WHERE   [dm_os_buffer_descriptors].[database_id] = DB_ID();

SELECT [ProductKey]
      ,[OrderDateKey]
FROM dbo.FactResellerSalesXL_CCI


SELECT  COUNT(*) AS [buffer_count], ((COUNT(*)*8)/1024)  [Used Buffer Size in MB]
FROM    [sys].[dm_os_buffer_descriptors]
WHERE   [dm_os_buffer_descriptors].[database_id] = DB_ID();

Here is the output of the query

Aaahhhh! it is going to kill your system performance, if you are reading a few hundred millions rows. 

Let’s run same code with TOP function and see how many pages got loaded from the disk into memory.

USE AdventureworksDW2016CTP3
GO
-- DON'T RUN "DROPCLEANBUFFERS" IN PRODUCTION ENVIORNMENT 
DBCC DROPCLEANBUFFERS;
GO

-- Actual data pages in SQL buffer cache 
SELECT  COUNT(*) AS [buffer_count], ((COUNT(*)*8)/1024)  [Used Buffer Size in MB]
FROM    [sys].[dm_os_buffer_descriptors]
WHERE   [dm_os_buffer_descriptors].[database_id] = DB_ID();

SELECT TOP 10 [ProductKey]
      ,[OrderDateKey]
FROM dbo.FactResellerSalesXL_CCI


SELECT  COUNT(*) AS [buffer_count], ((COUNT(*)*8)/1024)  [Used Buffer Size in MB]
FROM    [sys].[dm_os_buffer_descriptors]
WHERE   [dm_os_buffer_descriptors].[database_id] = DB_ID();

Here is the output of the query

When we used the TOP function, we didn’t drive all data pages from the disc into memory. Basically, it will drastically reduce the amount of IO and the amount of memory that needs to be taken up by pages.

Summary:

The demo confirms that “Not using Top function especially when you can use it” – It can cause performance problem to the server. The big take away from the demo is – you need to check with application team that needs all the rows or only a few rows as output.

Leave a comment

Your email address will not be published. Required fields are marked *