๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Database/Brent Ozar Unlimited

#4. ์ฒซ ๋ฒˆ์งธ ์ˆ˜์—… - How to Think Like the Engine 2

<How to Think Like the Engine>

(* ์ง์ ‘ ์ฝ๊ณ  ๋“ฃ๊ณ  ํ•ด์„ํ•œ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์˜ค๋ฅ˜ ๋งŽ์„ ์ˆ˜ ์žˆ์Œ ์ฃผ์˜)

 

โ–  ๋„ค ๋ฒˆ์งธ ์ฟผ๋ฆฌ

 

SELECT *
FROM dbo.Users
WHERE LastAccessDate > '2014/07/01'
ORDER BY LastAccessDate;

 

์ด๋ฒˆ์—๋Š” id๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ชจ๋“  ์ปฌ๋Ÿผ์„ ๋‹ค ๊ฐ€์ง€๊ณ  ์™€๋ณด์ž. ์ด ์ฟผ๋ฆฌ๋Š” ์„ธ ๋ฒˆ์งธ ์ฟผ๋ฆฌ(id๋งŒ SELECTํ•˜๋Š” ์ฟผ๋ฆฌ)๋ณด๋‹ค ๊ต‰์žฅํžˆ ์„ฑ๋Šฅ์ด ๋ณ„๋กœ์ธ ์ฟผ๋ฆฌ๊ฐ€ ๋  ๊ฒƒ์ด๋‹ค. ์™œ ๋ณ„๋กœ์ผ๊นŒ?

 

Do we work harder to read the data?

YES. SELECT id ์ฟผ๋ฆฌ๋ž‘ ๊ฐ™์€ row๋ฅผ ์ฝ๊ฒŒ ๋˜๊ธด ํ• ๊ฑด๋ฐ, ์ด๋ฒˆ์—๋Š” About me ์ปฌ๋Ÿผ ์ฒ˜๋Ÿผ varchar(max)๋‚˜ nvarchar(max) ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ปฌ๋Ÿผ๋“ค๋„ ๊ฐ™์ด ๋”ธ๋ ค์˜ฌ ๊ฒƒ์ด๋‹ค. ์ด๋ฒˆ์—๋Š” ๊ทธ๋Ÿฐ ์ปฌ๋Ÿผ๋“ค์„ ๊ฐ–๊ณ  ์žˆ๋Š” ํŽ˜์ด์ง€๋„ ๊ฐ™์ด ์ฝ์–ด์•ผ ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋Ÿฐ ํฐ ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ปฌ๋Ÿผ๋“ค์„ ๊ฐ–๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ๋ผ๋ฉด ์ฝ๋Š” ๋ฐ ๋” ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ๊ฑด ๋งž๋‹ค.

 

Do we work harder to write the data?

YES. SELECT id ์ฟผ๋ฆฌ์—์„œ ์‚ดํŽด๋ดค๋“ฏ์ด ORDER BY๋ฅผ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํฌ์ŠคํŠธ์ž‡์—๋‹ค๊ฐ€ ๊ฐ–๊ณ  ์˜จ id๋ž‘ LastAccessDate๋ฅผ ์ ์—ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ด๋ฒˆ์—๋Š” SELECT * ์ด๋‹ˆ๊นŒ id๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ•ด๋‹น row์˜ ์ปฌ๋Ÿผ ์ „์ฒด๋ฅผ ํฌ์ŠคํŠธ์ž‡์—๋‹ค ์ ๊ณ  ์ •๋ ฌ์„ ํ•ด์•ผํ•  ๊ฒƒ์ด๋‹ค.

 

Do we work harder to sort the data?

YES. SELECT id ์ฟผ๋ฆฌ์™€ ๊ฐ™์€ ์ˆ˜์˜ row๋ฅผ ์ •๋ ฌํ•˜๊ธฐ๋Š” ํ•˜์ง€๋งŒ, ์ด๋ฒˆ ์ฟผ๋ฆฌ์˜ ๊ฒฝ์šฐ ์šฐ๋ฆฌ๊ฐ€ ์ •๋ ฌํ•˜๋ ค๋Š” ํŽ˜์ด์ง€ ์Šคํƒ์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ํ›จ์”ฌ ํฌ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์šฐ๋ฆฌ๋Š” ๊ทธ ๋ฐ์ดํ„ฐ๋“ค์„ CPU ์บ์‹œ์—๋‹ค ๋†“๊ณ  ์„ž์„ ๊ฑด๋ฐ ์ด๊ฒƒ๋„ ์ถ”๊ฐ€์ ์ธ ๋น„์šฉ์ด ๋“œ๋Š” ๊ฒƒ์ด๋‹ค.

 

Do we work harder to output the data?

YES. ํ•„์š” ์—†๋Š” ์ปฌ๋Ÿผ์€ ์ตœ๋Œ€ํ•œ ์•ˆ ๊ฐ€์ ธ์˜ค๋ฉด ์ข‹๊ธด ํ•œ๋ฐ ์‚ฌ์‹ค์ƒ ์ด ์ปฌ๋Ÿผ๋“ค์„ output ํ•˜๋Š” ๋น„์šฉ์€ ์ƒ๊ฐ๋ณด๋‹ค ๊ทธ๋ ‡๊ฒŒ ํฌ์ง€ ์•Š๋‹ค๊ณ  ํ•œ๋‹ค. ์‚ฌ์‹ค ์ด ์ฟผ๋ฆฌ ์„ฑ๋Šฅ์„ ์•ˆ ์ข‹๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฑด sorting์ด ๋Œ€๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•œ๋‹ค.

 

 

sorting์— 95%์˜ ๋น„์šฉ์„ ์“ฐ๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. SELECT id ์ฟผ๋ฆฌ์™€ SELECT * ์ฟผ๋ฆฌ์˜ Estimated Subtree Cost ๋น„๊ตํ–ˆ์„ ๋•Œ๋„ ์ฐจ์ด๊ฐ€ ๋งŽ์ด ๋‚˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

SELECT id ์ฟผ๋ฆฌ์˜ SELECT ๋น„์šฉ

 

SELECT * ์ฟผ๋ฆฌ์˜ SELECT ๋น„์šฉ

 

โ–ถ Lesson

Sorting data is expensive, and more fields make it worse.

 

์ด ์ฟผ๋ฆฌ๊ฐ€ ์—ฌ๋Ÿฌ ๋ฒˆ ๋Œ์ˆ˜๋ก ๋” ๋ณ„๋กœ์ผ ๊ฒƒ์ด๋‹ค. ์ด ์ฟผ๋ฆฌ๋ฅผ 100๋ฒˆ ๋Œ๋ฆฐ๋‹ค๊ณ  ํ•ด๋ณด์ž.

 

SELECT Id
FROM dbo.Users
WHERE LastAccessDate > '2014/07/01'
ORDER BY LastAccessDate;
GO 100

 

ํŒ€์žฅ๋‹˜๊ป˜์„œ ๋‚˜ํ•œํ…Œ ์‚ฌ์›๋“ค ์ธ์‚ฌ๊ธฐ๋ก์ง€๋ฅผ ๋‹ค ๊ฐ€์ ธ์˜ค๋ผ๊ณ  (LastAccessDate > '2014/07/01'์ธ ๊ฒƒ๋งŒ) ํ•˜์‹œ๋Š”๋ฐ ์ด๊ฑธ 100๋ฒˆ ํ•ด์•ผ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‚ฌ์ „์— ํŒ€์žฅ๋‹˜๊ป˜์„œ ๋„Œ ์ด ์ง“์„ 100๋ฒˆ ํ•  ๊ฒƒ์ด๋ผ๊ณ  ๋ฏธ๋ฆฌ ๊ท€๋”ํ•ด์ฃผ์…จ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž. (์šฐ๋ฆฌ๊ฐ€ ์ฟผ๋ฆฌ์— 100๋ฒˆ ํ• ๊ฑฐ๋ผ๊ณ  ์ผ๋“ฏ์ด) 

๋งŒ์•ฝ ๋‚ด๊ฐ€ ์ด ์ฟผ๋ฆฌ ๋‚ด์šฉ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด, ์•ž์œผ๋กœ ๊ฐ™์€ ๊ฑธ 100๋ฒˆ ํ•  ๊ฑธ ์•„๋‹ˆ๊นŒ ํ•„ํ„ฐ๋งํ•˜๊ณ  ์ •๋ ฌํ•ด๋†“์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทผ์ฒ˜ ๊ฐ€๊นŒ์šด๋ฐ๋‹ค๊ฐ€ ์Ÿ์—ฌ๋†“๊ณ  ๋‹ค์Œ ๋ฒˆ์— ์žฌ์‚ฌ์šฉํ•˜์ง€ ์•Š์„๊นŒ?

โž” ์˜ค๋ผํด์€ ์ด๊ฒŒ ๊ฐ€๋Šฅํ•˜๋‹ค. (core ๋‹น 47,000๋‹ฌ๋Ÿฌ ์งœ๋ฆฐ๋ฐ ๊ฐ€๋Šฅํ•ด์•ผ๊ฒ ์ง€)

 

์ด ๋ถ„ ์˜์ƒ์„ ๋ณด๋‹ค๋ณด๋ฉด ์˜ค๋ผํด์ด ์„ธ๊ณ„์—์„œ ์ œ์ผ ๋น„์‹ผ DBMS๋ผ๋Š” ๋ง์ด ์ข…์ข… ๋‚˜์˜จ๋‹ค. ์˜ค๋ผํด์ด ๊ทธ ๋ˆ ๋ฐ›์•„์„œ ๋ญํ•ด๋จน๊ณ  ์‚ฌ๋Š”์ง€๋„ ์•„๋ž˜ ํ™”๋ฉด๊ณผ ํ•จ๊ป˜ ์–ธ๊ธ‰์ด ๋˜๊ณค ํ•œ๋‹ค.

 

 

์˜ค๋ผํด ๋ˆ ์ง„์งœ ์ž˜ ๋ฒŒ๊ธด ํ•˜๋‚˜๋ณด๋‹ค. 

 

SQL Server๋Š” query output์ด ์•„๋‹Œ ๋ฐ์ดํ„ฐ ํŽ˜์ด์ง€๋ฅผ ์ €์žฅํ•œ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— SQL Server๋Š” ์ € ์ฟผ๋ฆฌ๊ฐ€ 100๋ฒˆ ๋Œ๋ฉด 100๋ฒˆ์˜ sorting์„ ํ•œ๋‹ค. 

๊ทธ๋Ÿฌ๋ฉด ์ด์ œ ์–ด๋–ป๊ฒŒ ์ด ์ฟผ๋ฆฌ๋ฅผ ๋นจ๋ผ์ง€๊ฒŒ ํ•  ์ˆ˜ ์žˆ์„๊นŒ?

โž” Nonclustered index๋ฅผ ์ด์šฉํ•˜๋ฉด ๋œ๋‹ค.

 

Nonclusterd indexes: copies.

- Stored in order we want, include the fields we want

 

์ด๋ฒˆ์—๋Š” ๊ฒ€์€์ƒ‰ ๋ฐ์ดํ„ฐ ํŽ˜์ด์ง€๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

 

 

CREATE INDEX IX_LastAccessDate_Id 
ON dbo.Users (LastAccessDate, Id)

 

์œ„์˜ ๊ฒ€์€์ƒ‰ ํŽ˜์ด์ง€๋Š” IX_LastAccessDate_Id ์ธ๋ฑ์Šค ๋ฐ์ดํ„ฐ์˜ ๋ชจ์Šต์ด๋‹ค. ์ด์ œ ์šฐ๋ฆฌ๋Š” Users ํ…Œ์ด๋ธ”์˜ ๋‘ ๊ฐœ์˜ ๋ฌผ๋ฆฌ์  ์นดํ”ผ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์…ˆ์ด๋‹ค.

  • PK_Users_Id : id๋กœ ์ •๋ ฌ๋œ clustered index. ๋ชจ๋“  ์ปฌ๋Ÿผ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ.
  • IX_LastAccessDate_Id : LastAccessDate, Id๋กœ ์ •๋ ฌ๋œ nonclustered index. ์ธ๋ฑ์Šค ์ปฌ๋Ÿผ(LastAccessDate, Id)์˜ ๋ฐ์ดํ„ฐ๋งŒ ๊ฐ€์ง€๊ณ  ์žˆ์Œ.

 

 

๊ทธ๋Ÿฌ๋ฉด ์„ธ ๋ฒˆ์งธ ์ฟผ๋ฆฌ๋ถ€ํ„ฐ ๋‹ค์‹œ ํ•ด ์‹คํ–‰ํ•ด๋ณด์ž.

SELECT Id
FROM dbo.Users
WHERE LastAccessDate > '2014/07/01'
ORDER BY LastAccessDate;

 

 

LastAccessDate๋กœ ์ •๋ ฌํ•ด๋†“์€ ๋ฐ์ดํ„ฐ ์นดํ”ผ(IX_LastAccessDate_Id ์ธ๋ฑ์Šค)๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ, LastAccessDate > '2014/07/01' ์กฐ๊ฑด์— ๋งž๋Š” ์‹œ์ž‘์ง€์ ์„ ์•Œ๊ณ  ๋ฐ”๋กœ ์ง„์ž…ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ทธ ์นดํ”ผ์— ํ•จ๊ป˜ ์žˆ๋Š” Id๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ์ฝ์–ด์˜ฌ ๊ฒƒ์ด๋‹ค. ๋ฐ์ดํ„ฐ ์ž์ฒด๋Š” ์ด๋ฏธ LastAccessDate๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ณ„๋„์˜ sorting ๊ณผ์ •์€ ํ•„์š” ์—†๋‹ค.

 

 

Estimated Subtree Cost๊ฐ€ ์—„์ฒญ ์ค„์–ด๋“  ๊ฑธ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. โž” It's cheap.

 

 

Why cheaper?

For starters, it does less logical reads...

 

์ธ๋ฑ์Šค๊ฐ€ ์žˆ๊ณ  ์—†์„ ๋•Œ์˜ CPU time์„ ๋น„๊ตํ•ด๋ณด์ž.

 

 

time์ด๋ผ๋Š” ๊ฑด ์‚ฌ์‹ค ๊ทธ๋ ‡๊ฒŒ ๋ฏฟ์„ ๋งŒํ•œ ์š”์ธ์ด ๋ชป ๋œ๋‹ค. (๋‹ค๋ฅธ ์ฟผ๋ฆฌ ๋•Œ๋ฌธ์— ๋Šฆ์–ด์งˆ ์ˆ˜๋„ ์žˆ๊ณ , ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ํ™˜๊ฒฝ์  ์š”์ธ ๋“ฑ์œผ๋กœ ์ธํ•ด time์€ ๋Š˜ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค.) ์ง€๊ธˆ์€ index scan๊ณผ index seek์˜ ๋‹จ์ˆœ ๋น„๊ต๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉํ•ด๋ณธ ๊ฒƒ์ด๋‹ค.

 

Covering index

์„ธ ๋ฒˆ์งธ ์ฟผ๋ฆฌ๋ฅผ ๋ณด๋ฉด Users ํ…Œ์ด๋ธ”์—์„œ LastAccessDate ์ปฌ๋Ÿผ๊ณผ Id ์ปฌ๋Ÿผ๋งŒ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค. ๊ณต๊ต๋กญ๊ฒŒ๋„ ๋ชจ๋‘ IX_LastAccessDate_Id ์ธ๋ฑ์Šค์— ์žˆ๋Š” ์ปฌ๋Ÿผ๋“ค์ด๋‹ค.

โž” ์ฟผ๋ฆฌ์— ํ•„์š”ํ•œ ์ปฌ๋Ÿผ์ด ๋ชจ๋‘ ํ•œ ์ธ๋ฑ์Šค ์•ˆ์— ์žˆ์„ ๋•Œ, ์ด ์ธ๋ฑ์Šค๋ฅผ ์ปค๋ฒ„๋ง ์ธ๋ฑ์Šค๋ผ๊ณ  ํ•œ๋‹ค. 

The index covers the fields needed by the query, so we call it a covering index.

*But covering isn't really a special kind of index - it's only covering when we're talking about a query.

 

So nonclustered index seeks are great, right?

์ด๋ ‡๊ฒŒ๋Š” ์„ฃ๋ถˆ๋ฆฌ ๊ฒฐ๋ก ์ง“์ง€๋Š” ์•Š์•˜์œผ๋ฉด ์ข‹๊ฒ ๋‹ค. 

์ด๋ ‡๊ฒŒ๋“ค ์˜คํ•ดํ•˜๋Š” ์ด์œ ๊ฐ€ "seek"๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์ฃผ๋Š” ๋А๋‚Œ ๋•Œ๋ฌธ์ธ ๊ฒƒ ๊ฐ™๋‹ค. "seek"๋ผ๊ณ  ํ•˜๋ฉด ๋ญ”๊ฐ€ ์ž‘์„ ๊ฒƒ ๊ฐ™์€ ๋А๋‚Œ์„ ์ค€๋‹ค. ๊ทธ์น˜๋งŒ ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ๋ฅผ ๋ดค๋“ฏ์ด(14๋งŒ ๊ฑด) ๋ฐ์ดํ„ฐ๋Ÿ‰์€ ์ธ๋ฑ์Šค๋ฅผ ์“ฐ๋“  ์•ˆ ์“ฐ๋“  ๋™์ผํ•˜๊ฒŒ ๋งŽ๋‹ค.

 

You probably think "seek" means, "I'm going to jump to a row and read that one row."

You probably think "scan" means, "I'm going to read the whole thing."

โž” ์ด๊ฑฐ์Šจ ใ„ดใ„ด

"seek"๋ผ๋Š” ๋ง์˜ ์˜๋ฏธ๋Š” "๋‚˜๋Š” ๋‚ด๊ฐ€ ์–ด๋””์„œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์•ผํ•˜๋Š”์ง€๋ฅผ(starting point) ์•Œ๊ณ  ์žˆ๋‹ค"๋ผ๋Š” ์˜๋ฏธ์ด๋‹ค.

 

Seek means,

"I'm going to jump to a row and start reading."

Scan means,

"I'm going to start at either end of the object(might be either the start, or the end) and start reading."

Neither terms defines how many rows will be read.

 

์ ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด์˜ฌ ๋•Œ์˜ scan์ด seek ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์„ ์ˆ˜ ์žˆ๋‹ค.

 

 

Seeks vs Scans

A seek can start at the first row, and read the entire table.

A scan can start at one end of the table, and only read a few pages.

 

We can't just aty, "All index seeks! We're done."

 

โ–  ๋‹ค์„ฏ ๋ฒˆ์งธ ์ฟผ๋ฆฌ

SELECT Id ์ฟผ๋ฆฌ์— ๋ช‡ ๊ฐ€์ง€ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด๋ณด์ž.

 

SELECT Id, DisplayName, Age
FROM dbo.Users
WHERE LastAccessDate > '2014/07/01'
ORDER BY LastAccessDate;

 

์ด ์ฟผ๋ฆฌ์—๋Š” IX_LastAccessDate_Id ์ธ๋ฑ์Šค์— ์—†๋Š” ์ปฌ๋Ÿผ๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ๋‹ค. 

โž” ์ด ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” IX_LastAccessDate_Id ์ธ๋ฑ์Šค๋งŒ์œผ๋กœ๋Š” ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๋‹ค๋Š” ์˜๋ฏธ

์ด ๊ฒฝ์šฐ, ์•„๋ž˜์™€ ๊ฐ™์€ ์ˆœ์„œ๋กœ ์‹คํ–‰์„ ํ•  ๊ฒƒ์ด๋‹ค.

 

1. IX_LastAccessDate_Id ์ธ๋ฑ์Šค์—์„œ LastAccessDate๊ฐ€ 2014/07/01์ธ ๊ณณ(์‹œ์ž‘์ )์„ ์ฐพ๋Š”๋‹ค.

2. ์กฐ๊ฑด์— ๋งž๋Š” ๋ฐ์ดํ„ฐ๋“ค์˜ Id์™€ LastAccessDate๋ฅผ ๋”ฐ๋กœ ์ ์–ด๋‘”๋‹ค.

3. PK_Users_Id๋กœ ๊ฐ€์„œ ๋”ฐ๋กœ ์ ์–ด๋‘” Id์— ํ•ด๋‹นํ•˜๋Š” row๋ฅผ ์ฐพ์•„์„œ DisplayName๊ณผ Age๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.

โž” key lookup์ด๋ผ๊ณ  ํ•œ๋‹ค.

 

 

SQL Server always includes your clustering keys wheter you ask for'em or not because it has to join indexes together.

nonclustered index๋ฅผ ๋งŒ๋“ค ๋•Œ Id๋ฅผ ์•ˆ ํฌํ•จ์‹œ์ผœ๋„ key๋ผ๋ฉด ์ž๋™์œผ๋กœ ํฌํ•จ๋œ๋‹ค.

 

Classic index tuning sign

key lookup์€ ์ธ๋ฑ์Šค์— ์ฟผ๋ฆฌ์— ํ•„์š”ํ•œ ๋ชจ๋“  ์ปฌ๋Ÿผ์ด ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š์„ ๋•Œ ๋ฐœ์ƒํ•œ๋‹ค. key lookup ์‹คํ–‰๊ณ„ํš์— ๋งˆ์šฐ์Šค๋ฅผ ์˜ฌ๋ ค์„œ Output List๋ฅผ ํ™•์ธํ•œ๋‹ค.

์ปฌ๋Ÿผ์ด ์ž‘๊ฑฐ๋‚˜ ์‚ฌ์šฉ ๋นˆ๋„๊ฐ€ ๋†’์œผ๋ฉด ์ธ๋ฑ์Šค์— ํฌํ•จ ใ„ฑใ„ฑ

 

 

SQL Server์˜ ํŠน์ง• ์ค‘ ํ•˜๋‚˜๋Š” ์‹คํ–‰๊ณ„ํš์„ ๋งŒ๋“ค ๋•Œ ์›๋ณธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์žˆ๋Š” ํ…Œ์ด๋ธ”์„ ๋ณด๊ณ  ๋งŒ๋“ค์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ฟผ๋ฆฌ๊ฐ€ ์‹คํ–‰๋˜๊ธฐ ์ „์—(์ฟผ๋ฆฌ๊ฐ€ ์‹คํ–‰๋  ๋•Œ๊ฐ€ ์•„๋‹ˆ๋ผ) ๋ฏธ๋ฆฌ ๊ณ„ํš์„ ์งœ๋†“์•„์•ผ ํ•œ๋‹ค.

์ด๋ ‡๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ…Œ์ด๋ธ” ์•ˆ์— ๋ญ๊ฐ€ ๋“ค์–ด์žˆ๋Š”์ง€์— ๋Œ€ํ•œ rough idea๊ฐ€ ํ•„์š”ํ•˜๋‹ค. 

โž” ํ†ต๊ณ„์น˜(statistics)๊ฐ€ ํ•„์š”ํ•œ ์ด์œ 

 

์šฐ๋ฆฌ๊ฐ€ ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ•  ๋•Œ๋งˆ๋‹ค SQL Server๋Š” ์ธ๋ฑ์Šค์™€ ๋™์ผํ•œ ์ด๋ฆ„์˜ statistics๋ฅผ ํ•จ๊ป˜ ๋งŒ๋“ ๋‹ค.

 

 

statistics๋Š” ์ธ๋ฑ์Šค ๋‚ด์šฉ์— ๊ด€ํ•œ 1 ํŽ˜์ด์ง€(8KB)์งœ๋ฆฌ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ์ด๋‹ค. SQL Server๊ฐ€ ์ธ๋ฑ์Šค ๋‚ด์šฉ์„ ๋ณด๊ณ  ์ด ์ธ๋ฑ์Šค๊ฐ€ ๋ญํ•˜๋Š” ์•ค์ง€ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋‚จ๊ฒจ๋‘” ๋…ธํŠธ ์ •๋„๋กœ ์ƒ๊ฐํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™๋‹ค.

 

Statistics help SQL Server:

  • Decide which index to use
  • What order to process tables/indexs in 
  • Whether to do seeks or scans
  • Guess how many rows will match you query
  • How much memory to allocate for the query

 

statistics๋Š” ์–ด๋–ป๊ฒŒ ์ƒ๊ฒผ์„๊นŒ?

 

 

์ฒซ ๋ฒˆ์งธ result : ํ…Œ์ด๋ธ” ๋˜๋Š” ์ธ๋ฑ์Šค์˜ ๊ธฐ๋ณธ ์ •๋ณด (at the time that the stat was updated)

๋‘ ๋ฒˆ์งธ result : ์ธ๋ฑ์Šค ์ปฌ๋Ÿผ ์ •๋ณด

์„ธ ๋ฒˆ์งธ result : ํžˆ์Šคํ† ๊ทธ๋žจ ์ •๋ณด

 

LastAccessDate > '2014/07/01' ์˜ ์˜ˆ์ƒ ํ–‰์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ, SQL Server๋Š” ํžˆ์Šคํ† ๊ทธ๋žจ์—์„œ DISTINCT_RANGE_ROWS๋ฅผ ๋ณธ๋‹ค. ๊ฑฐ๊ธฐ์„œ ์กฐ๊ฑด์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ฑด์ˆ˜์˜ ํ•ฉ์„ ๋‹ค ํ•ฉ์นœ๋‹ค.

 

 

๊ธฐ์–ตํ•ด์•ผํ•  ๊ฒƒ์€ statistics๋Š” 1 ํŽ˜์ด์ง€(8KB)์งœ๋ฆฌ์ด๊ณ , ์ด๊ฒƒ์€ ์ตœ๋Œ€ 201๊ฐœ์˜ bucket(๋˜๋Š” step)๊นŒ์ง€๋งŒ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. 

 

 

์ด stats์—์„œ ์‚ฌ์šฉํ•œ bucket(step)์€ 134๊ฐœ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค์–‘ํ•ด์ง€๊ฑฐ๋‚˜ ํžˆ์Šคํ† ๋ฆฌ๊ฐ€ ๋งŽ์ด ์Œ“์ผ์ˆ˜๋ก ๋ฒ„ํ‚ท 200๊ฐœ๋กœ๋Š” ๋ฐ์ดํ„ฐ ์ „์ฒด๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๊ธฐ์— ๋ถ€์กฑํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

Two ways you can help

1. Keep your stats updated at least weekly. Automatic stats updates aren't enough. Consider Ola Hallengren's free scripts: Ola. Hallengren.com

  • SQL Server๋Š” ํ…Œ์ด๋ธ” ๋‚ด ๋ฐ์ดํ„ฐ์˜ 20%๊ฐ€ ๋ณ€๊ฒฝ๋˜๋ฉด stats์„ ์—…๋ฐ์ดํŠธ ํ•œ๋‹ค. (SQL Server ๋ฒ„์ „ ๋ณ„๋กœ ์ด ์ˆ˜์น˜๋Š” ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Œ) ๊ทธ๋ ‡์ง€๋งŒ ์„ฑ๋Šฅ ๋ฌธ์ œ๋ฅผ ๋นจ๋ฆฌ ํ•ด๊ฒฐํ•ด์•ผํ•˜๋Š”๋ฐ SQL Server๊ฐ€ ์ž๋™์œผ๋กœ ํ†ต๊ณ„๋ฅผ ์—…๋ฐ์ดํŠธํ•ด์ค„ ๋•Œ๊นŒ์ง€(๋ฐ์ดํ„ฐ์˜ 20%๊ฐ€ ๋ณ€๊ฒฝ๋  ๋•Œ๊นŒ์ง€) ๊ธฐ๋‹ค๋ฆฌ๋Š” ๊ฑด ์ข€ ์•„๋‹ˆ๋‹ค. ๊ทธ๋ณด๋‹ค ์ž์ฃผ ์‚ฌ์šฉ์ž๊ฐ€ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ๋‹ค. 

2. Learn which T-SQL elements will cause cardinality estimation problems, ignoring statistics

  • ๋น„์Šทํ•œ ๋™์ž‘์„ ํ•˜๋Š” ์ฟผ๋ฆฌ๋ผ๋„ ์‹คํ–‰๊ณ„ํš์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค.

<One idea, written differently>

 

์œ„์˜ ์ฟผ๋ฆฌ๋ฅผ ๋ณด๋ฉด ๋‘ ์ฟผ๋ฆฌ ๋ชจ๋‘ ๋น„์Šทํ•œ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ด๊ณ  ๊ฒฐ๊ณผ ๊ฑด์ˆ˜๋„ ๋™์ผํ•˜๋‹ค. ํ•˜์ง€๋งŒ ์‹คํ–‰๊ณ„ํš์€ ๋‹ค๋ฅด๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ฟผ๋ฆฌ๋Š” ์˜ˆ์ƒ ๊ฑด์ˆ˜๋ฅผ 2262๋กœ ์žก์•˜๊ณ , ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์ง€ ์•Š๋‹ค๊ณ  ํŒ๋‹จํ•ด index scan์„ ํ•œ ๊ฒƒ ๊ฐ™๋‹ค. ๋‘ ๋ฒˆ์งธ ์ฟผ๋ฆฌ๋Š” ์˜ˆ์ƒ ๊ฑด์ˆ˜๋ฅผ 1๊ฑด์œผ๋กœ ์žก์•˜๊ณ , ๋ฐ์ดํ„ฐ๊ฐ€ 1๊ฑด ๋ฐ–์— ์—†์œผ๋‹ˆ index seek + ๋ฃฉ์—…์„ ํ•˜์ž๊ณ  ํŒ๋‹จํ–ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค.

Both produce the same 2,443 rows, but they use 2 different ways to retrieve thosde rows due to their different estimates.

 

SQL Server has to decide between:

  • Scanning the entire table, which is great for big data, or
  • An index seek + key lookup, which is better for small data

โž” Cardinality estimation

We can avoid this problem by widening our nonclustered index.

 

CREATE INDEX IX_LastAccessDate_Id_DisplayName_Age
ON dbo.Users (LastAccessDate, Id, DisplayName, Age)

Or:

CREATE INDEX IX_LastAccessDate_Id_Includes
ON dbo.Users (LastAccessDate, Id)
INCLUDE (DisplayName, Age)

 

์ฒซ ๋ฒˆ์งธ ์ฟผ๋ฆฌ๋Š” ์ด ์ธ๋ฑ์Šค๋ฅผ LastAccessDate, Id, DisplayName, Age ์ปฌ๋Ÿผ ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ƒ์„ฑํ•˜๊ฒ ๋‹ค๋Š” ์ฟผ๋ฆฌ์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ์ฟผ๋ฆฌ๋Š” ์ด ์ธ๋ฑ์Šค๋ฅผ LastAccessDate, Id ์ปฌ๋Ÿผ ์ˆœ์œผ๋กœ๋งŒ ์ •๋ ฌํ• ๊ฑด๋ฐ ๊ทธ ์˜†์—๋‹ค๊ฐ€ DisplayName๊ณผ Age๋ฅผ ํฌํ•จ์‹œํ‚ค๊ฒ ๋‹ค๋Š” ์ฟผ๋ฆฌ์ด๋‹ค.

 

 

INCLUDE๋กœ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ฐ์ดํ„ฐ์˜ ์ •๋ ฌ ์ˆœ์„œ์™€๋Š” ๋ฌด๊ด€ํ•˜๋ฏ€๋กœ, ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณ€๊ฒฝ๋ ๋•Œ๋งˆ๋‹ค ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ์ •๋ ฌํ•˜์ง€ ์•Š์•„๋„ ๋˜๋ฏ€๋กœ ํšจ๊ณผ์ ์ด๋ผ๊ณ  ์ฐฉ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ ‡์ง€๋Š” ์•Š๋‹ค.

์œ„ ํšŒ์ƒ‰ ๋ฐ์ดํ„ฐ๋Š” IX_LastAccessDate_Id_DisplayName_Age ์ธ๋ฑ์Šค์˜ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์—ฌ๊ธฐ์„œ Id๊ฐ€ 643์ธ Sarcastic์˜ Age๋ฅผ 30์—์„œ 31๋กœ ๋ณ€๊ฒฝํ•ด์•ผ ํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž. (์šฐ์ธก ๋‘ ๋ฒˆ์งธ ์ค„)

 

 

์ •๋ ฌ์ˆœ์„œ์— Age ์ปฌ๋Ÿผ์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š” ์ฒซ ๋ฒˆ์งธ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณ€๊ฒฝ๋  ๋•Œ๋งˆ๋‹ค ๋ฐ์ดํ„ฐ๊ฐ€ ์žฌ์ •๋ ฌ๋  ๊ฒƒ์ด๊ณ , INCLUDE๋กœ Age๋ฅผ ํฌํ•จ์‹œํ‚จ ๋‘ ๋ฒˆ์งธ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์žฌ์ •๋ ฌํ•˜์ง€ ์•Š์„ ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์œ„์˜ Sarcastic์˜ Age๊ฐ€ 30์—์„œ 31๋กœ ๋ณ€๊ฒฝ๋œ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ ๋ฐ์ดํ„ฐ์˜ ์ด๋™์€ ์ผ์–ด๋‚˜์ง€ ์•Š๋Š”๋‹ค.

โž” ์™œ๋ƒํ•˜๋ฉด ์ธ๋ฑ์Šค์˜ ์ฒซ ๋ฒˆ์งธ ์ปฌ๋Ÿผ(LastAccessDate)์ด selectiveํ•˜๊ธฐ ๋•Œ๋ฌธ์—(id๊ฐ€ 643์ธ ์‚ฌ๋žŒ๊ณผ LastAccessDate๊ฐ€ ๊ฐ™์€ ์‚ฌ๋žŒ์ด ์—†์Œ), ์ด ์‚ฌ๋žŒ์˜ DisplayName, Id, Age๋ฅผ ๋ณ€๊ฒฝํ•œ๋‹ค๊ณ  ํ•ด๋„ ๋ฐ์ดํ„ฐ๋Š” ์ด๋™ํ•˜์ง€ ์•Š๋Š”๋‹ค.

โž” ์ด๋ž˜์„œ ์ธ๋ฑ์Šค์˜ ์ฒซ ๋ฒˆ์งธ ์ปฌ๋Ÿผ์„ selectiveํ•œ ์ปฌ๋Ÿผ(uniqueํ•œ ์ปฌ๋Ÿผ)์œผ๋กœ ์žก์œผ๋ผ๊ณ  ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. (๋งŒ์•ฝ ์„ฑ๋ณ„ ์ปฌ๋Ÿผ์ด ์ธ๋ฑ์Šค์˜ ์ฒซ ๋ฒˆ์งธ ์ปฌ๋Ÿผ์ด๋ผ๋ฉด ๋ฌธ์ œ๊ฐ€ ๋งŽ์„ ๊ฒƒ์ด๋‹ค.)

 

 

์ด์ œ ์ด 3๊ฐœ์˜ ์ธ๋ฑ์Šค(=3๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์นดํ”ผ๋ณธ)๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋˜์—ˆ๋‹ค. ํšŒ์ƒ‰ ์ธ๋ฑ์Šค(IX_LastAccessDate_Id_DisplayName_Age)๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์ด์ œ ๊ฒ€์€์ƒ‰ ์ธ๋ฑ์Šค(IX_LastAccessDate_Id)๋Š” drop ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

 

Recap

  • SET STATISTICS IO ON: shows # of 8KB pages read
  • SET STATISTICS TIME ON: shows CPU work done
  • WHERE without a supporting index: table scan
  • ORDER BY  without a supporting index: CPU work
  • Indexes reduce page read and sorts
  • Seek =! awesome, and scan!= terribad
  • Index seek + key lookup = we may need wider indexes
  • Statistics help SQL Server pick indexes, methods
  • Cardinality estimation isn't perfect (especially with real-world T-SQL and joins to multiple tables)
  • You can help by understanding SQL's limitations and crafting your T-SQL to avoid them

 

 

๊ทธ๋ƒฅ ๋“ฃ๊ธฐ๋งŒ ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ•œ ์„ธ ๋ฐฐ๋Š” ๋” ๊ฑธ๋ฆฐ ๊ฒƒ ๊ฐ™์€ ๋А๋‚Œ์ด๋‹ค. ์ด์ œ ์ฒซ ๋ฒˆ์งธ ์ˆ˜์—…์„ ๋๋ƒˆ๋‹ค๋Š” ๊ฒƒ์ด ๋ฏฟ๊ธฐ์ง€ ์•Š๋Š”๋‹ค. ์•ž์œผ๋กœ ๊ฐˆ ๊ธธ์ด ๋ฉ€๋‹ค. ์—ด์‹ฌํžˆ ํ•˜์ž.