- 論壇徽章:
- 0
|
[color="#02368d"]linux deadline I/O調(diào)度算法分析筆記
deadline算法的核心就是在傳統(tǒng)的電梯算法中加入了請(qǐng)求超時(shí)的機(jī)制,該機(jī)制主要體現(xiàn)在兩點(diǎn):(1)請(qǐng)求
超時(shí)時(shí),對(duì)超時(shí)請(qǐng)求的選擇。(2)沒有請(qǐng)求超時(shí)時(shí),當(dāng)掃描完電梯最后一個(gè)request后,準(zhǔn)備返回時(shí),對(duì)第一個(gè)request的選擇;谝陨蟽牲c(diǎn),平
衡了系統(tǒng)i/o吞吐量和響應(yīng)時(shí)間。此外,該算法開考慮到了讀操作對(duì)寫操作造成的饑餓。
算法核心數(shù)據(jù)結(jié)構(gòu):
struct deadline_data {
struct rb_root sort_list[2];
struct list_head fifo_list[2];
/*
* next in sort order. read, write or both are NULL
*/
struct request *next_rq[2];
unsigned int batching; /* number of sequential requests made */
sector_t last_sector; /* head position */
unsigned int starved; /* times reads have starved writes */
/*
* settings that change how the i/o scheduler behaves
*/
int fifo_expire[2];
int fifo_batch;
int writes_starved;
int front_merges;
};
sort_list:按照request中的sector號(hào)大小,把每個(gè)request組織在以sort_list為根的紅黑樹中。這樣方便快速查找?偣灿凶x寫兩棵樹。
fifo_list:按照超時(shí)先后順尋,把request鏈入filo_list,同樣也是分為讀寫兩個(gè)隊(duì)列。
因此,任何一個(gè)request,在未提交給設(shè)備的請(qǐng)求隊(duì)列之前,都會(huì)同時(shí)存在于以上兩個(gè)結(jié)構(gòu)中。
next_rq:指向sort_list中的下一個(gè)請(qǐng)求。
batching:調(diào)度算法可能連續(xù)提交多個(gè)請(qǐng)求,batching用于記錄當(dāng)前連續(xù)提交的request數(shù)目V灰猙atching
deadline I/O調(diào)度器中最重要的兩個(gè)方法是:deadline_add_request和deadline_dispatch_requests。它們分別向調(diào)度器隊(duì)列(非請(qǐng)求隊(duì)列)添加request和把調(diào)度器隊(duì)列中的請(qǐng)求分發(fā)到塊設(shè)備的請(qǐng)求隊(duì)列。
static void
deadline_add_request(struct request_queue *q, struct request *rq)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int data_dir = rq_data_dir(rq);
deadline_add_rq_rb(dd, rq);
/*
* set expire time (only used for reads) and add to fifo list
*/
rq_set_fifo_time(rq, jiffies + dd->fifo_expire[data_dir]);
list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
}
deadline_add_request把請(qǐng)求分別加入紅黑樹和fifo。并記錄請(qǐng)求的超時(shí)時(shí)間。這兒借用了request的donelist的next字段。因?yàn)樵赿eadline調(diào)度隊(duì)列的請(qǐng)求,絕不可能被調(diào)入請(qǐng)求完成隊(duì)列:
#define rq_set_fifo_time(rq,exp) ((rq)->donelist.next = (void *) (exp))
static int deadline_dispatch_requests(struct request_queue *q, int force)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int reads = !list_empty(&dd->fifo_list[READ]);
const int writes = !list_empty(&dd->fifo_list[WRITE]);
struct request *rq;
int data_dir;
/*
* batches are currently reads XOR writes
*/
if (dd->next_rq[WRITE])
rq = dd->next_rq[WRITE];
else
rq = dd->next_rq[READ];
if (rq) {
/* we have a "next request" */
if (dd->last_sector != rq->sector)
/* end the batch on a non sequential request */
dd->batching += dd->fifo_batch;
if (dd->batching fifo_batch)
/* we are still entitled to batch */
goto dispatch_request;
}
/*
* at this point we are not running a batch. select the appropriate
* data direction (read / write)
*/
if (reads) {
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;
data_dir = READ;
goto dispatch_find_request;
}
/*
* there are either no reads or writes have been starved
*/
if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
dd->starved = 0;
data_dir = WRITE;
goto dispatch_find_request;
}
return 0;
dispatch_find_request:
/*
* we are not running a batch, find best request for selected data_dir
*/
if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
/*
* A deadline has expired, the last request was in the other
* direction, or we have run out of higher-sectored requests.
* Start again from the request with the earliest expiry time.
*/
rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
} else {
/*
* The last req was the same dir and we have a next request in
* sort order. No expired requests so continue on from here.
*/
rq = dd->next_rq[data_dir];
}
dd->batching = 0;
dispatch_request:
/*
* rq is the selected appropriate request.
*/
dd->batching++;
deadline_move_request(dd, rq);
return 1;
}
deadline_dispatch_requests是deadline算法的核心。主要分為三個(gè)部分:(1)確定是要分發(fā)讀還是寫
request。影響處理分發(fā)類型因素主要包括是否處于batching階段以及是否發(fā)生寫?zhàn)囸I。(2)確定方向以后,根據(jù)讀寫方向找到該方向上下一個(gè)請(qǐng)
求進(jìn)行分發(fā)。影響定位下一個(gè)請(qǐng)求因素主要包括請(qǐng)求是否超時(shí)。(3)找到合適的request后,調(diào)用deadline_move_request分發(fā)給塊
設(shè)備的請(qǐng)求隊(duì)列。
1.確定讀寫方向.
a.首先,根據(jù)是否處于batching來確定當(dāng)前處理讀寫的方向。因?yàn)槿绻幵赽atching過
程中,就意味著調(diào)度程序需要連續(xù)處理同一方向的請(qǐng)求。這樣可以有效增加系統(tǒng)吞吐量。因此,根據(jù)batching的方向,可以確定當(dāng)前處理請(qǐng)求的方向。而讀
寫batching是互斥的:
/*
* batches are currently reads XOR writes
*/
if (dd->next_rq[WRITE])
rq = dd->next_rq[WRITE];
else
rq = dd->next_rq[READ];
通過這個(gè)判斷,保證了batching只會(huì)在某一方向進(jìn)行,而不會(huì)交錯(cuò)。因?yàn)樵赿eadline_move_request中有:
dd->next_rq[READ] = NULL;
dd->next_rq[WRITE] = NULL;
dd->next_rq[data_dir] = deadline_latter_request(rq);
也就是在batching方向的next_rq才可能指向下一個(gè)request。
if (rq) {
/* we have a "next request" */
if (dd->last_sector != rq->sector)
/* end the batch on a non sequential request */
dd->batching += dd->fifo_batch;
if (dd->batching fifo_batch)
/* we are still entitled to batch */
goto dispatch_request;
}
如
果存在下一個(gè)request,也就是沒有掃描到電梯的末尾。則判斷該request是否和上一個(gè)request相連。如果相連,并且batching的
request數(shù)沒有超過fifo_batch,則當(dāng)前這個(gè)request就是我們要分發(fā)的request。因此直接跳到最后,把request分發(fā)到設(shè)
備請(qǐng)求隊(duì)列。此時(shí)將忽略寫?zhàn)囸I和超時(shí)的處理。如果不連續(xù),則要結(jié)束batching。
b.如果此時(shí)并不處于batching過程中,則根據(jù)是否造成寫?zhàn)囸I“超標(biāo)”來確定讀寫方向。
if (reads) {
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;
data_dir = READ;
goto dispatch_find_request;
}
/*
* there are either no reads or writes have been starved
*/
if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
dd->starved = 0;
data_dir = WRITE;
goto dispatch_find_request;
}
調(diào)度器先處理read,也就是read請(qǐng)求優(yōu)先。但在處理過
程中考慮到了寫?zhàn)囸I。如果此時(shí)還有寫請(qǐng)求,則寫?zhàn)囸I計(jì)數(shù)+1,如果寫?zhàn)囸I次數(shù)大于了writes_starved,則寫?zhàn)囸I已經(jīng)“超標(biāo)”了,因此直接跳到
dispath_writes去處理寫請(qǐng)求。如果寫?zhàn)囸I沒有“超標(biāo)”,則繼續(xù)處理讀請(qǐng)求。
2.根據(jù)讀寫方向,找到當(dāng)前要處理的請(qǐng)求:
dispatch_find_request:
/*
* we are not running a batch, find best request for selected data_dir
*/
if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
/*
* A deadline has expired, the last request was in the other
* direction, or we have run out of higher-sectored requests.
* Start again from the request with the earliest expiry time.
*/
rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
} else {
/*
* The last req was the same dir and we have a next request in
* sort order. No expired requests so continue on from here.
*/
rq = dd->next_rq[data_dir];
}
dd->batching = 0;
在尋找指定方向上的請(qǐng)求時(shí),考慮了請(qǐng)求的超時(shí)時(shí)間。這就是deadline的算法核心所
在。調(diào)度器首先調(diào)用deadline_check_fifo來檢查隊(duì)列中隊(duì)首,也就是最老的一個(gè)請(qǐng)求是否超時(shí)。如果超時(shí)則指定當(dāng)前處理的請(qǐng)求為該超時(shí)的請(qǐng)
求。但如果沒有超時(shí),但已經(jīng)掃描了電梯的末尾:!dd->next_rq[data_dir]。此時(shí)需要返回到電梯首部。但與傳統(tǒng)的電梯算法不
同,deadline調(diào)度器不是返回到sector最小的request開始繼續(xù)掃描。而是返回到等待時(shí)間最久的那個(gè)requst,從那個(gè)request
開始,沿sector遞增方向繼續(xù)掃描。也就是說,如果超時(shí),或者掃描到電梯尾,都會(huì)返回來處理等待最久的request,并從這個(gè)request開始繼
續(xù)進(jìn)行電梯掃描。當(dāng)然,如果既沒有發(fā)生超時(shí),也沒有掃描到電梯末尾,則沿sector遞增方向上的下一個(gè)request就是當(dāng)前要處理的request。
3.找到要處理的request后,把它分發(fā)到塊設(shè)備的請(qǐng)求隊(duì)列。
整個(gè)deadline調(diào)度器比較簡(jiǎn)潔,總共只有400多行。它充分考慮了batching,寫?zhàn)囸I,請(qǐng)求超時(shí)這三大因素。在保證吞吐量的基礎(chǔ)上,有考慮到了響應(yīng)延時(shí)。
![]()
本文來自ChinaUnix博客,如果查看原文請(qǐng)點(diǎn):http://blog.chinaunix.net/u2/86301/showart_2179183.html |
|