您當前位置：首頁 > 數據庫 > 數據庫應用 > 關于動態抽樣（Dynamic Sampling）

關于動態抽樣（Dynamic Sampling）

來源：程序員人生發布時間：2015-01-14 09:04:58 閱讀次數：4288次

關于動態抽樣（Dynamic Sampling）

原文：http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

本文將回答：甚么是動態抽樣？動態抽樣有啥作用？和不同級別的動態抽樣的意思？

1、甚么是動態采樣？
動態抽樣從 oracle 9i第2版引入。它使得優化器（CBO）在硬解析期間有能力抽樣1個未分析的表
（any table that has been created and loaded but not yet analyzed）的統計（決定表默許統計），并且可以驗證優化器的”料想“。
因其只在查詢硬解析期間為優化器動態生成更好的統計，得名動態采樣。

動態采樣提供11個設置級別。注意：9i中其默許值為1 到了10g默許值為2

2、動態采樣如何工作？
有兩種使用方式：
△ 設置OPTIMIZER_DYNAMIC_SAMPLING參數，可以再實例和會話級別設置動態采樣。
△ 使用DYNAMIC_SAMPLING hint

來看1下不使用動態采樣的日子怎樣過的
create table t
as
select owner, object_type
from all_objects
/

select count(*) from t;

COUNT(*)
------------------------
68076

code1: 禁用動態采樣視察默許基數

set autotrace traceonly explain
SQL> select /*+ dynamic_sampling(t 0) */ * from t;

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16010 | 437K| 55 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 16010 | 437K| 55 (0)| 00:00:01 |
--------------------------------------------------------------------------

--注意0級別即為禁用動態采樣，環境默許是開啟動態采樣的

履行計劃顯示基數：16010遠低于上面查詢的68076，明顯不靠譜。

code2: 更加接近顯示的基數
select * from t;

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 77871 | 2129K| 56 (2)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 77871 | 2129K| 56 (2)| 00:00:01 |
--------------------------------------------------------------------------

Note
------------------------------------------
- dynamic sampling used for this statement

code3: 被高估的基數
SQL> delete from t;
68076 rows deleted.

SQL> commit;
Commit complete.

SQL> set autotrace traceonly explain
SQL> select /*+ dynamic_sampling(t 0) */ * from t;

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16010 | 437K| 55 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 16010 | 437K| 55 (0)| 00:00:01 |
--------------------------------------------------------------------------

SQL> select * from t;

Execution Plan
-----------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 28 | 55 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 1 | 28 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
---------------------------------------
- dynamic sampling used for this statement

3、動態采樣什么時候幫助優化器驗證其猜想？
我們知道當使用DBMS_STATS搜集了表信息后，優化器會得到以下統計：
1）表，行數，平均行寬等；
2）單獨列，高低值，唯1值數量，直方圖（可能）等；
3）單獨索引，聚集因素，葉子塊數量，索引高度等。

但注意這里面缺少了某些關鍵統計信息，例如表中不同列數據之間的關聯！
假定你你有1個全球人口普查表！
1個屬性是：誕生月份MONTH_BORN_IN，另外一個屬性是：所屬星座ZODIAC_SIGN。搜集信息后，你問優化器誕生在11月份的人數？
假定12個月人數正常散布，那末優化器很快給出答案是全量數據的1/12！再問1個：星座是雙魚座的人數呢？答案也是1/12！
迄今為止優化器對答如流！！！nice work!
但是第3個問題來了：誕生在11月份并且星座是雙魚座的人數是多少呢？
明眼人轉下頭腦就知道答案是0（雙魚座2月19日－3月20日）！但是我們看優化器的答案：1/12/12！！！多么想入非非的答案，思惟定式！這樣就會誕生差的履行計劃，
也正是在此時我們的動態采樣開始干預：

code4: 創建摹擬數據
SQL > create table t
as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.*
from all_objects a
/
Table created.

SQL > create index t_idx on t(flag1,flag2);
Index created.

SQL > begin
dbms_stats.gather_table_stats
( user, 'T',
method_opt=>'for all indexed columns size 254' );
end;
/
PL/SQL procedure successfully completed.

SQL> select num_rows, num_rows/2,
num_rows/2/2 from user_tables
where table_name = 'T';

NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
-------- ---------- ------------
68076 34038 17019

code5：驗證1下上面的說法：
SQL> set autotrace traceonly explain
SQL> select * from t where flag1='N';

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 33479 | 3432K| 292 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 33479 | 3432K| 292 (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAG1"='N')

SQL> select * from t where flag2='N';

Execution Plan
----------------------------
Plan hash value: 1601196873

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 34597 | 3547K| 292 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 34597 | 3547K| 292 (1)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("FLAG2"='N')

--至此1切正常！so far, so good！

code5: here comes the problem
SQL> select * from t where flag1 = 'N' and flag2 = 'N';

Execution Plan
----------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17014 | 1744K| 292 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 17014 | 1744K| 292 (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
----------------------------------------------------

1 - filter("FLAG1" = 'N' AND "FLAG2" = 'N')

--驗證了我們前面說的優化器此時想入非非了

code7: 動態采樣聽令，開始參與
SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';

Execution Plan
-----------------------------
Plan hash value: 470836197

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 630 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 6 | 630 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 6 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
----------------------------------------------------

2 - access("FLAG1"='N' AND "FLAG2"='N')

code8: 我們打開SQL_TRACE會看到以下語句：
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE
NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"),
NVL(SUM(C2),:"SYS_B_01"), NVL(SUM(C3),:"SYS_B_02")
FROM
(SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T")
NO_PARALLEL_INDEX("T") */ :"SYS_B_03" AS C1, CASE WHEN "T"."FLAG1"=
:"SYS_B_04" AND "T"."FLAG2"=:"SYS_B_05" THEN :"SYS_B_06" ELSE :"SYS_B_07"
END AS C2, CASE WHEN "T"."FLAG2"=:"SYS_B_08" AND "T"."FLAG1"=:"SYS_B_09"
THEN :"SYS_B_10" ELSE :"SYS_B_11" END AS C3 FROM "T" SAMPLE BLOCK
(:"SYS_B_12" , :"SYS_B_13") SEED (:"SYS_B_14") "T") SAMPLESUB

可以看出來優化器在驗證其料想。。。

4、動態采樣級別：
現在列出11個級別，詳細請參考：http://docs.oracle.com/cd/B19306_01/server.102/b14211/stats.htm#i43032
1）Level 0: Do not use dynamic sampling.
0級：不使用動態采樣。

2）Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
1級：滿足以下條件則采樣所有沒被分析的表：
（1）查詢中最少有1個未分析表；
（2）這個未分析表被關聯另外1個表或出現在子查詢或非merge視圖中；
（3）這個未分析表有索引；
（4）這個未分析表有過剩動態采樣默許的數據塊數（默許是32塊）。

3）Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
2級：對所有未分析表進行動態采樣。采樣數據塊數量是默許數量的2倍。

4）Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
3級：在2級基礎上加上那些使用了料想選擇消除表，采樣數據塊數量等于默許數量。對未分析表，采樣數量2倍于默許數量。

5）Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
4級：在3級基礎上加上那些有單表謂詞關聯2個或多個列，采樣數據塊數量等于默許數量。對未分析表，采樣數量2倍于默許數量。

6）Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
5,6,7,8,9級在4級基礎上分別使用2,4,8,32,128倍于默許動態采樣數據塊數量。

7）Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
10級：在9級基礎上對表中所有數據塊進行采樣。

5、甚么時候合適采取動態采樣？

這是1個狡猾的問題，沒有1定使用經驗，還真不好意思說。
通常：
1）我們使用3和4級進行動態采樣。
2）如果我們SQL的解析時間很快但是履行時間巨慢，可以斟酌使用動態采樣。典型的就是數據倉庫系統。
3）OLTP系統中都是1個SQL重復履行，解析和履行都在瞬息之間，所以不建議使用高級別的動態采樣。這會給SQL帶來硬解析消耗。
這個時候可以斟酌SQL Profile，你可以理解為“靜態采樣”。

關于SQL Profiles參考：http://docs.oracle.com/cd/B28359_01/server.111/b28274/sql_tune.htm#PFGRF02605

-------------------------------------

Dylan Presents.

生活不易，碼農辛苦
如果您覺得本網站對您的學習有所幫助,可以手機掃描二維碼進行捐贈
程序員人生

------分隔線----------------------------

上一篇 MySQL慢查詢

下一篇 Android Logcat之過濾多個標簽

分享到:

------分隔線----------------------------

為碼而活

積分：4237

15粉絲

7關注

欄目熱點

日本搞逼视频_黄色一级片免费在线观看_色99久久_性明星video另类hd_欧美77_综合在线视频

關于動態抽樣（Dynamic Sampling）