Cleveland DBA

SQL Server administration in Cleveland, OH

A Fairy Tale – Epilogue

When the cheers subsided and the parade was over (oh yes, there was a parade;  complete with floats and bagpipes), the DBA found herself dissatisfied.  There was a minor detail that was nagging at her.  What about those linked servers that used an alias?  With her original query, the remote server would be the alias name, and that may or may not be helpful.  Surely she could do better.

So our heroine searched the Windows registry until she found where all of the alias information was stored, and she wrote a function that used the xp_regread extended stored procedure to retrieve the target server for each alias.

IF OBJECT_ID('fn_getaliastarget', 'FN') IS NOT NULL
DROP FUNCTION fn_getaliastarget
GO
create function dbo.fn_getaliastarget (@alias varchar(40))
returns nvarchar(250)
AS
BEGIN
declare
@regbase nvarchar(30)
,@regkey nvarchar(100)
,@dirbasepath nvarchar(250)

SET @regbase = 'HKEY_LOCAL_MACHINE'
SET @regkey = 'Software\Microsoft\MSSQLServer\Client\ConnectTo\'

EXECUTE master..xp_regread @regbase
, @regkey
, @alias
, @dirbasepath OUTPUT

RETURN substring(@dirbasepath,CHARINDEX(',',@dirbasepath)+1,LEN(@dirbasepath))
END
GO

Then she revised her query to use the new function.

SELECT @@servername as [SQLInstance]
, s.name as [LinkedServer]
, ISNULL(dbo.fn_getaliastarget(s.data_source), s.data_source) as [RemoteServer]
, CASE l.uses_self_credential
WHEN 1 THEN 'UNMAPPED LOGINS'
ELSE ISNULL(p.name, 'UNMAPPED LOGINS')
END AS [LocalLogin]
, CASE l.uses_self_credential
WHEN 1 THEN 'USE SELF'
ELSE l.remote_name
END AS [RemoteLogin]
FROM sys.linked_logins l
JOIN sys.servers s ON s.server_id = l.server_id AND is_data_access_enabled = 1
LEFT OUTER JOIN sys.server_principals p ON p.principal_id = l.local_principal_id
ORDER BY [LinkedServer], [LocalLogin]

And finally, the DBA wrote a version of the query for SQL Server 2000 instances, too, just for good measure.

SELECT @@servername as [SQLInstance]
, s.srvname as [LinkedServer]
, ISNULL(dbo.fn_getaliastarget(s.datasource), s.datasource) as [RemoteServer]
,  ISNULL(p.name, 'UNMAPPED LOGINS') AS [LocalLogin]
,  ISNULL(l.rmtloginame, 'USE SELF')     AS [RemoteLogin]
FROM master..sysoledbusers l
JOIN master..sysservers s ON s.srvid = l.rmtsrvid AND dataaccess = 1 AND isremote = 1
LEFT OUTER JOIN master..syslogins p ON p.sid = l.loginsid
ORDER BY [LinkedServer], [LocalLogin]

And she was happy. And awesome.

March 31, 2011 Posted by | General, T-SQL | , , | Leave a comment

A Fairy Tale

Once upon a time there was a developer who was concerned about the security of his application’s data.  So he decided to change the password for his application’s SQL login.  So he logged into the database and used sp_password to change his password and he was happy.

On the other end of the hall lived another developer.  This developer was not happy.  His application had suddenly stopped working and he was getting angry phone calls from users.  “But I didn’t change anything!” he cried.  And that was true.  But his application was using a linked server that mapped to the login whose password was changed by the first developer (who lived in the house that Jack built).

The second developer was very angry with the first developer.

“You can’t just go around changing passwords willy-nilly!”

The first developer got defensive.

“It’s my application and my login, how was I supposed to know you were mapped to it?”

Suddenly they stopped, and the hall got very quiet.  They both turned and looked at the DBA.

“Surely there must be some way of knowing what linked servers are mapped to what logins on what servers,” they implored.

The DBA replied, equivocally, “Let me see what I can do.”

Now this DBA was a very smart DBA (and beautiful.  like a princess, really).  She already had a program she had written that inventoried each SQL Server instance to track things like jobs, database growth, backup locations, etc.  She had even written MSRS reports against this data that listed applications on each instance, job statuses, license counts and more.  She knew it wouldn’t be difficult to gather this bit of data, too.  Once she knew where it was stored, that is.

So she started looking in the system views in the master database.  That was where sys.servers and sys.database_principals were located, after all.  It would make sense to store the linked server login mapping there, too, right?  Only, it wasn’t there!  Hmm.  Not to be discouraged, our (smart, beautiful, princess-like) heroine looked in the msdb database.  And voila!  There, in the system views:  sys.linked_logins.

And, after a bit of muttering about why the hell did they put it in msdb, she wrote the following code and included it in her program.

SELECT @@servername as [SQL Instance]
, s.name as [Linked Server]
, s.data_source as [Remote Server]
, CASE l.uses_self_credential
WHEN 1 THEN 'PASS-THRU'
ELSE ISNULL(p.name, 'UNMAPPED LOGINS')
END AS [Local Login]
, CASE l.uses_self_credential
WHEN 1 THEN 'PASS-THRU'
ELSE l.remote_name
END AS [Remote Login]
FROM sys.linked_logins l
JOIN sys.servers s ON s.server_id = l.server_id AND is_data_access_enabled = 1
LEFT OUTER JOIN sys.server_principals p ON p.principal_id = l.local_principal_id
ORDER BY [Linked Server], [Local Login]

Now developers could know what linked servers might be impacted by a password change.  Peace and harmony reigned throughout the hall.  Hurrah!

And they lived happily ever after.

The End.

March 28, 2011 Posted by | General, T-SQL | , | Leave a comment

Call me Ishmael

This week I’ve been chasing my own white whale.  Another developer came to me with this little problem.  A query he was trying to run over a linked server to an Oracle database was returning unexpected results.  Here’s his original query:

select count(order_num) from openquery(PROD, 'select t2.order_num
from myschema.mytable t2
where open_date < to_date(''01-JAN-2011'', ''DD-MON-YYYY'')
and nature = ''057''
and dept_code = ''7240''
and status_code like ''O%''')a

The resulting count was 200. However, running that same query directly on Oracle returns 658.  So I moved the count() function inside the remote query (which is a more efficient query anyway, but that’s beside the point).

select * from openquery(PROD, 'select count(order_num)
from myschema.mytable t2
where open_date < to_date(''01-JAN-2011'', ''DD-MON-YYYY'')
and nature = ''057''
and dept_code = ''7240''
and status_code like ''O%''')a 

The result?  658.  And running a simple select *, rather that count(*) also returns 658 records.  A little poking around on Oracle’s Metalink site netted a few bugs with the Oracle OLEDB provider, including one specifically related to count(*).  The bug description said it was 100% reproducible with no workaround.  And normally I might let it go at that and move on.  But the thing is, it’s not 100% reproducible.  If I run that same original query against the same table in our QA environment, I get the correct count.  In fact, if I run it against a different column in that same table in the Prod environment, I get the correct count.  Running test queries against many columns in several tables in each environment yielded mixed results, with seemingly no rhyme or reason behind which columns/tables returned a correct count, and which returned a count of 200.  Now, I realize that this is called a “bug” for a reason, but computers just don’t produce random and arbitrary results.  So I determined to figure out why.

I ran an 10046 event trace in Oracle along with a Profiler trace for both the “good” and “bad” scenarios.  In

the “good” test, SQL Server requests rows (OLEDB Call Event) in batches of 100:

<inputs>
    <hReserved>0x0000000000000000</hReserved>
    <lRowsOffset>0</lRowsOffset>
    <cRows>100</cRows>

</inputs>

And you see an “OLEDB DataRead Event” for each row fetched:

<inputs>
    <hRow>0x0000000000000259</hRow>
    <hAccessor>0x000000001127D3E0</hAccessor>

</inputs>

For the last batch, it requests 100 records again, but only gets 4 back:

<hresult>265926</hresult>

<outputs>
    <pcRowsObtained>4</pcRowsObtained>
    <prghRows>
        <HROW>0x000000000000025A</HROW>
        <HROW>0x000000000000025B</HROW>
        <HROW>0x000000000000025C</HROW>
        <HROW>0x000000000000025D</HROW>
    </prghRows>

</outputs>

That hresult of 265926 means “Reached start or end of rowset or chapter.”  And that all corresponds with what we see

in the Oracle trace file.  Each fetch is recorded with a line like this:

FETCH #3:c=0,e=2772,p=0,cr=73,cu=0,mis=0,r=100,dep=0,og=4,tim=18884103172270

where r=100 is the number of rows returned.  The final fetch shows r=4.  This is what we would expect to see. 

Now let’s look at the “bad” test.  Here we see no DataRead events in the Profiler trace.  The very last OLEDB

Call Event

<inputs>
    <hReserved>0x0000000000000000</hReserved>
    <lRowsOffset>0</lRowsOffset>
    <cRows>100</cRows>

</inputs>

shows 99 records returned.

<hresult>265926</hresult>

<outputs>
    <pcRowsObtained>99</pcRowsObtained>
    <prghRows>
        <HROW>0x0000000000000066</HROW>
        […]
        <HROW>0x00000000000000C8</HROW>
    </prghRows>

</outputs>

But what’s different here is that the Oracle trace shows 100 records being returned in that last fetch.  So why did

SQL Server only receive 99 records?  I thought perhaps an OLEDB trace would shed some light, but that’s like reading Sanskrit. 

Further examination of the Profiler traces yielded one difference that seemed relevant.  From the “good” test:

<inputs>
    <cPropertyIDSets>1</cPropertyIDSets>
    <rgPropertyIDSets>
        <DBPROPIDSET>
            <cPropertyIDs>1</cPropertyIDs>
            <rgPropertyIDs>
                <DBPROPID>DBPROP_ISequentialStream</DBPROPID>
            </rgPropertyIDs>
            <guidPropertySet>DBPROPSET_ROWSET</guidPropertySet>
        </DBPROPIDSET>
    </rgPropertyIDSets>
</inputs>
next line:
<hresult>0</hresult>
<outputs>
    <pcPropertySets>1</pcPropertySets>
    <prgPropertySets>
        <DBPROPSET>
            <cProperties>1</cProperties>
            <guidPropertySet>DBPROPSET_ROWSET</guidPropertySet>
            <rgProperties>
                <DBPROP>
                    <dwPropertyID>DBPROP_ISequentialStream</dwPropertyID>
                    <dwOptions>0</dwOptions>
                    <dwStatus>0</dwStatus>
                    <colid>DB_NULLID</colid>
                    <vValue>
                        <VARIANT>
                            <vt>VT_BOOL</vt>
                            <boolVal>-1</boolVal>
                        </VARIANT>
                    </vValue>
                </DBPROP>
            </rgProperties>
        </DBPROPSET>
    </prgPropertySets>
</outputs>
next line:
<inputs>
    <riid>IID_IRowsetInfo</riid>
</inputs>
next line:
<hresult>0</hresult>

That DBPROP_ISequentialStream.  I’m thinking that’s the critical factor that makes the OLEDB provider handle the data differently in the “good” tests, and allows it to process the data correctly.  The only problem is:  I can’t prove it.  Yet.  So far the Great Gazoogle has provided me with little helpful information on this property and its impact.  But the search continues, if in a slightly less obsessed manner.  And hopefully things will work out better for me than they did for Ahab.

Incidentally, there does seem to be a workaround.  If, when you create the linked server, you include a FetchSize in the provider string, @provstr=N’FetchSize=500′, with the fetchsize being 200 or more, this should get you the correct result.  At least, it did for me. 

February 25, 2011 Posted by | Troubleshooting | , , , , , | Leave a comment