Skip to content

NTLM Connection Timeout due to Domain Controller

Malcolm Stewart edited this page Jul 30, 2021 · 8 revisions

NTLM Connection Timeout due to Domain Controller

The Players

IP Address Computer Role
10.10.10.1 DC01
10.10.10.2 DC02
10.10.10.3 Client
10.10.10.4 SQL Server virtual IP address
10.10.10.5 SQL Server physical IP address

Symptom

Intermittently, the client application would get a login timeout error:

[Microsoft][SQL Server Native Client 11.0]Login timeout expired

Data Collection

We captured a network trace and ran it through the SQL Network Analyzer program.
A number of login timeout errors while collecting the network trace.

SQLNA Report Analysis

Trace was probably taken on this IP address: 10.10.10.4, MAC Addr 001DD8A7211B, (80%)
Trace was probably taken on this IP address: 10.10.10.5, MAC Addr 001DD8A7211B, (20%)

The network trace was taken on a machine with two IP addresses and the MAC address matches. The first address matches the SQL Server IP address:

    IP Address   HostName       Port  ServerPipe  Version      Files  Clients  Conversations  Kerb Conv  NTLM Conv  MARS Conv  non-TLS 1.2 Conv  Redirected Conv  Frames       Bytes  Resets  Retransmits  IsClustered
    -----------  -------------  ----  ----------  -----------  -----  -------  -------------  ---------  ---------  ---------  ----------------  ---------------  ------  ----------  ------  -----------  -----------
    10.10.10.4   SQLPROD01\v01  1433              13.0.17.122      0        6             77          0         37          0                 0                0  114366  95,275,362       6          354             

The server is a named instance on port 1433; most likely SQL Server is clustered and 10.10.10.4 is the cluster virtual IP address. Many of the conversations are using NTLM to authenticate the user.

There are two domain controllers visible in the network trace:

    IP Address  Files  Clients  Conversations  Kerb Conv  DNS Conv  LDAP Conv  MSRPC Conv  MSRPC Port  Frames    Bytes
    ----------  -----  -------  -------------  ---------  --------  ---------  ----------  ----------  ------  -------
    10.10.10.1      0        2             80          0        61          0           5       49673     448  104,027
    10.10.10.2      0        1             17          4         0          5           7       49673     292  111,353

There were a number of SQL Server conversations that resulted in a network reset:

The following conversations with SQL Server 10.10.10.4 on port 1433 were reset:

    NETMON Filter (Client conv.)                  Files  Reset Frame  Start Offset  End Offset         End Time  Frames   Duration  Who Reset  Flags  Keep-Alives  KA Timeout  Retransmits  Max RT
    --------------------------------------------  -----  -----------  ------------  ----------  ---------------  ------  ---------  ---------  -----  -----------  ----------  -----------  ------
    IPV4.Address==10.10.10.3 AND tcp.port==57714      0        12659     11.537988   32.585142  10:35:47.380 AM      18  21.047154  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57719      0        12872     12.639515   33.676661  10:35:48.472 AM      18  21.037146  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57726      0        13683     26.267277   35.293956  10:35:50.089 AM      18   9.026679  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57727      0        13830     27.376348   36.402662  10:35:51.198 AM      18   9.026314  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57722      0        13879     15.843832   36.871136  10:35:51.666 AM      19  21.027304  Client     A.R..            0           0            1       1
    IPV4.Address==10.10.10.3 AND tcp.port==57723      0        15859     23.251253   44.278744  10:35:59.074 AM      18  21.027491  Client     A.R..            0           0            0       0

    Distribution of RESET connections.

    81+|                                                                                                                                                      
    27+|                                                                                                                                                      
     9+|                                                                                                                                                      
     3+|                                                                                                                                                      
     1+|                   XXXX   X                                                                                                                           
       |---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|

The conversations all cluster together, indicating there was probably a systematic issue that resulted in multiple failures, and that it cleared up on its own.

There were a number of login failures:

The following conversations with SQL Server 10.10.10.4 on port 1433 timed out or were closed prior to completing the login process or had a login error:

    NETMON Filter (Client conv.)                  Files  Last Frame  Start Offset  End Offset         End Time  Frames   Duration  Login Progress                      Keep-Alives  Retransmits  DHE  NullCreds  LoginAck  Error
    --------------------------------------------  -----  ----------  ------------  ----------  ---------------  ------  ---------  ----------------------------------  -----------  -----------  ---  ---------  --------  -----
    IPV4.Address==10.10.10.3 AND tcp.port==57714      0       12659     11.537988   32.585142  10:35:47.380 AM      18  21.047154  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57719      0       12872     12.639515   33.676661  10:35:48.472 AM      18  21.037146  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57726      0       13683     26.267277   35.293956  10:35:50.089 AM      18   9.026679  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57727      0       13830     27.376348   36.402662  10:35:51.198 AM      18   9.026314  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57722      0       13879     15.843832   36.871136  10:35:51.666 AM      19  21.027304  S PL PR CH SH    CE AD NC NR                  0            1                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57723      0       15859     23.251253   44.278744  10:35:59.074 AM      18  21.027491  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
  • These are on the same connections that got reset.
  • The NC and NR entries in the Login Progress means the connection was using NTLM.
  • The LoginAck column shows "Late" for all entries, meaning that the SQL Server successfully logged the user in, but that it took a while and the client timed out the connection.
  • You can see the Duration for many of the connection attempts is 21 seconds, which is longer than the default of 15 seconds for the connection timeout value.

Network Trace Exploration

Conclusion

Clone this wiki locally