Out of memory error because of the very low virtual address space – Error 701 – Trace Flag 8075

If you are dealing with a SQL Server which got a lot of memory, you may encounter the out of memory (OOM) error because of low “Available Virtual Memory”.

Recently, I assigned more than 820GB memory to one of the SQL Server instances. The instance was dealing with a significant reporting workload. After a couple of days, the SQL Server instance got hung with error “Insufficient System Memory”.

Here is the physical machine details – IBM (Lenovo) x3850 X6, 64 cores, 1TB RAM.

Problem

After pumping a lot of the memory to a SQL Server, you may end up by seeing “Error 701” in the error log of SQL Server. The error can be generated by many reasons. For example;

  • The SQL Server instance really needs more physical memory because it got very aggressive and heavy workload.
  • If you do ten of thousands of inserts in a single batch (Applies to SQL Server 2005 & 2008, but you need to check with Microsoft for other versions)
  • SQL Server virtual address space got exhausted

In my scenario, I was not sure what caused SQL Server to throw the error that’s where the finding started.

Findings

To figure out the cause of the error 701, I started analyzing the SQL Server error log file.

  • Here is the complete error which I found in the error log
2016-11-30 04:27:59.42 spid30284 Failed allocate pages : FAIL_PAGE_ALLOCATION 207
2016-11-30 04:28:19.42 spid30284 Error: 701, Severity: 17, State: 123.
2016-11-30 04:29:00.12 spid30284 There is insufficient system memory in resource pool ‘default’ to run this query.
  • When I further looked up at the error log, the above error was logged multiple times with the output of DBCC MEMORYSTATUS that is automatically logged to the error log on 701 error messages
Process/System Counts                         Value
---------------------------------------- ----------
Available Physical Memory                135581184896 ---->126GB
Available Virtual Memory                 17179169280------->15GB
Available Paging File                    337114435584
Working Set                              44043255808
Percent of Committed Memory in WS               100
Page Faults                                26218401
System physical memory high                       0
System physical memory low                        0
Process physical memory low                       0
Process virtual memory low                        0
2016-02-16 04:28:00.10 spid2284    
Memory Manager                                   KB
  • It also generated a SQL Server memory dump for “Non-yielding” process.
2016-11-30 04:32:42.12 Server **Dump thread – spid = 0, EC = 0x0000000000000000
2016-11-30 04:32:42.12 Server ***Stack Dump being sent to XXXXXX\MSSQL\XXXX\SQLDump0001.txt
2016-11-30 04:32:42.12 Server * *******************************************************************************
2016-11-30 04:32:42.12 Server *
2016-11-30 04:32:42.12 Server * BEGIN STACK DUMP:
2016-11-30 04:32:42.12 Server * 2016-11-30 04:32:42 spid 31092
2016-11-30 04:32:42.12 Server *
2016-11-30 04:32:42.12 Server * Non-yielding IOCP Listener
  • The wait type SOS_MEMORY_TOPLEVELBLOCKALLOCATOR was part of the top three wait types on the SQL Server.

Observation

  • The output of DBCC MEMORYSTATUS was logged 8 times in the error log and each time the “Available Physical Memory” was showing greater than 100GB but the “Available Virtual Memory” was almost equal to 15GB. Kindly refer the below details;

Note:
The counter “Available Virtual Memory” tells how much virtual memory is available at OS Level. By default, the X64 OS supports 8TB of the virtual memory. The Windows OS runs every process, including the SQL Server process, in its own dedicated area of virtual memory, known as the Virtual Address Space (VAS).

  • To get insight from the generated SQL memory dump, I further analyzed the non-yielding scheduler dump using Public symbol;
.cxr sqlmin!g_copiedStackInfo+0X20
*** WARNING: Unable to verify timestamp for sqlmin.dll
rax=0000000000000000 rbx=00000000f105eb48 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000077661190 rdi=0000000000000002
rip=00000000774ebd6a rsp=00000000f105ea40 rbp=00000000f105eac8
r8=00000000f105f4c8 r9=00000000ffffffff r10=0000000000554a90
r11=00000000f105f2e8 r12=00000000f105eb48 r13=0000000000145000
r14=00000000f105eaa8 r15=0000000000000020
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246
ntdll!ZwQueryVirtualMemory+0xa:
00000000774ebd6a c3 ret
0:045> k100
# Child-SP RetAddr Call Site
00 00000000f105ea40 00000000`772bfcb1 ntdll!ZwQueryVirtualMemory+0xa

Note:
The dump output also indicates that it was calling virtual memory allocator “ZwAllocateVirtualMemory”. To know, the “ZwQueryVirtualMemory” exists for what kindly refer the link
https://msdn.microsoft.com/en-us/library/windows/hardware/ff566416(v=vs.85).aspx

Conclusion

The Windows Server was continuously experiencing low “Available Virtual Memory” (AVM was 15GB instead of 8TB). As the available virtual address space was very low, queries had to wait longer to perform memory allocation. Sometimes SQL Server could not perform memory allocation at all and ended up by throwing the error 701. 

The repeated allocation pattern under such condition might have caused fragmentation of the memory blocks and consumption of virtual address space. As the SQL Server reported the error many times, it means the processes got repeated many times which exhausted the virtual address space of the server. Therefore, even we had enough physical memory available on the server but could not login into the server because of the virtual memory crunch.

It confirms that the issue happened because of the “Available Virtual Memory”

Solution

  • As per KB3074434, Microsoft recommends to apply Cumulative Update for SQL Server;

– Cumulative Update 4 for SQL Server 2014 SP1
– Cumulative Update 10 for SQL Server 2014
– Cumulative Update 8 for SQL Server 2012 SP2

  • Once you apply the Cumulative Update, you need to enable the trace flag T8075.

Additional information

What is the purpose of the trace flag 8075?
It helps eliminate VAS fragmentation issue by caching large memory blocks to minimize the VAS fragmentation.

Is the trace flag only available after SQL Server 2012 SP2 CU8 or Can I apply on the server which is running only SQL Server 2012 RTM or an older version?
The trace flag wasn’t available before versions specified in the KB3074434 article. For example, it was not available in the SQL Server 2012 SP1 CU7.

Do I still need to enable the trace flag 8075 even we have applied SQL Server 2012 SP3 CU8?
Yes, it applies to 2012/2014 versions specified in the KB3074434 article

Do I need to apply on all the server irrespective we see high SOS_MEMORY_TOPLEVELBLOCKALLOCATOR waits or not?
Yeah, it would make sense to apply in particular the bigger machines.

Do I need to apply this trace flag in SQL Server 2016 also?
It is enabled by default in SQL 2016 so you won’t require it in SQL Server 2016. 

Is there performance impact of the trace flag?
There is no negative impact of trace flag 8075

Reference Link

Special thanks to Microsoft Program Managers – Denzil Ribeiro and Parikshit Savjani for guiding on this!

Thanks for reading!!

6 thoughts on “Out of memory error because of the very low virtual address space – Error 701 – Trace Flag 8075

  • Raj

    Interesting read. But wondering what caused AVM to go low. Increased workload? Specific queries? Did system health xevents reveal more info?

    Reply
    • Dharmendra

      Thanks Raj for your feedback! As the system was hung, it didn’t allow me to access the SQL Server; even as DAC admin also. Therefore, I was unable to find what caused AVM to go down. The workload was not increased.

      No, the system health xevent didn’t reveal anything. If we follow the “Non-yielding” process SPID, it says 31092. I assume that there were a lot of session on the server which might be waiting for the virtual address space.

      Reply
  • Kevin

    Hello Dharmendra,
    Good day! This was actually a very nice article. Thanks for the sharing!
    May I ask one question, when you apply the MS KB3074434 to your SQL servers, the OOM(SQL Error 701) issue has disappeared? Moreover, what is the Windows operating system version you’re using for SQL servers? Thanks a lot!

    Kevin

    Reply
    • Dharmendra

      Greeting Kevin, I am glad that you find the article useful. Since I applied the KB3074434 , I didn’t encounter this issue. The Windows OS was 2012 R2.

      Reply
  • Binu

    Is the trace flag required if I apply SQL Server 2014 SP1/SP2 or SQL 2012 SP3

    Reply
    • Dharmendra

      Binu, you can only enable the trace flag after applying the below package
      – Cumulative Update 4 for SQL Server 2014 SP1
      – Cumulative Update 10 for SQL Server 2014
      Yes, you have to enable the trace flag after the applying the required packages.

      Reply

Leave a comment

Your email address will not be published. Required fields are marked *