您當前位置：首頁 > php開源 > php教程 > memcached源碼分析-----slab automove和slab rebalance

memcached源碼分析-----slab automove和slab rebalance

來源：程序員人生發布時間：2015-01-26 08:49:09 閱讀次數：4272次

轉載請注明出處：http://blog.csdn.net/luotuo44/article/details/43015129

需求：

斟酌這樣的1個情形：在1開始，由于業務緣由向memcached存儲大量長度為1KB的數據，也就是說memcached服務器進程里面有很多大小為1KB的item。現在由于業務調劑需要存儲大量10KB的數據，并且很少使用1KB的那些數據了。由于數據愈來愈多，內存開始吃緊。大小為10KB的那些item頻繁訪問，并且由于內存不夠需要使用LRU淘汰1些10KB的item。

對上面的情形，會不會覺得大量1KB的item實在太浪費了。由于很少訪問這些item，所以即便它們超時過期了，還是會占據著哈希表和LRU隊列。LRU隊列還好，不同大小的item使用不同的LRU隊列。但對哈希表來講大量的僵尸item會增加哈希沖突的可能性，并且在遷移哈希表的時候也浪費時間。有無辦法干掉這些item？使用LRU爬蟲+lru_crawler命令是可以強迫干掉這些僵尸item。但干掉這些僵尸item后，它們占據的內存是歸還到1KB的那些slab分配器中。1KB的slab分配器不會為10KB的item分配內存。所以還是功敗垂成。

那有無別的辦法呢？是有的。memcached提供的slab automove 和 rebalance兩個東西就是完成這個功能的。在默許情況下，memcached不啟動這個功能，所以要想使用這個功能必須在啟動memcached的時候加上參數-o slab_reassign。以后就能夠在客戶端發送命令slabsreassign <source class> <dest class>，手動將source class的內存頁分給dest class。后文會把這個工作稱為內存頁重分配。而命令slabs automove則是讓memcached自動檢測是不是需要進行內存頁重分配，如果需要的話就自動去操作，這樣1切都不需要人工的干預。

如果在啟動memcached的時候使用了參數-o slab_reassign，那末就會把settings.slab_reassign賦值為true(該變量的默許值為false)。還記得《slab內存分配器》說到的每個內存頁的大小嗎？在do_slabs_newslab函數中，1個內存頁的大小會根據settings.slab_reassign是不是為true而不同。

static int do_slabs_newslab(const unsigned int id) { slabclass_t *p = &slabclass[id]; //settings.slab_reassign的默許值為false int len = settings.slab_reassign ? settings.item_size_max : p->size * p->perslab; //len就是1個內存頁的大小 ... }

當settings.slab_reassign為true，也就是啟動rebalance功能的時候，slabclass數組中所有slabclass_t的內存頁都是1樣大的，等于settings.item_size_max(默許為1MB)。這樣做的好處就是在需要將1個內存頁從某1個slabclass_t強搶給另外1個slabclass_t時，比較好處理。不然的話，slabclass[i]從slabclass[j] 搶到的1個內存頁可以切分為n個item，而從slabclass[k]搶到的1個內存頁卻切分為m個item，而本身的1個內存頁有s個item。這樣的話是相當混亂的。假設畢竟統1了內存頁大小，那末不管從哪里搶到的內存頁都是切分成1樣多的item個數。

啟動和終止rebalance：

main函數會調用start_slab_maintenance_thread函數啟動rebalance線程和automove線程。main函數是在settings.slab_reassign為true時才會調用的。

//slabs.c文件 static pthread_cond_t maintenance_cond = PTHREAD_COND_INITIALIZER; static pthread_cond_t slab_rebalance_cond = PTHREAD_COND_INITIALIZER; static volatile int do_run_slab_thread = 1; static volatile int do_run_slab_rebalance_thread = 1; #define DEFAULT_SLAB_BULK_CHECK 1 int slab_bulk_check = DEFAULT_SLAB_BULK_CHECK; static pthread_mutex_t slabs_lock = PTHREAD_MUTEX_INITIALIZER; static pthread_mutex_t slabs_rebalance_lock = PTHREAD_MUTEX_INITIALIZER; static pthread_t maintenance_tid; static pthread_t rebalance_tid; //由main函數調用，如果settings.slab_reassign為false將不會調用本函數(默許是false) int start_slab_maintenance_thread(void) { int ret; slab_rebalance_signal = 0; slab_rebal.slab_start = NULL; char *env = getenv("MEMCACHED_SLAB_BULK_CHECK"); if (env != NULL) { slab_bulk_check = atoi(env); if (slab_bulk_check == 0) { slab_bulk_check = DEFAULT_SLAB_BULK_CHECK; } } if (pthread_cond_init(&slab_rebalance_cond, NULL) != 0) { fprintf(stderr, "Can't intiialize rebalance condition "); return ⑴; } pthread_mutex_init(&slabs_rebalance_lock, NULL); if ((ret = pthread_create(&maintenance_tid, NULL, slab_maintenance_thread, NULL)) != 0) { fprintf(stderr, "Can't create slab maint thread: %s ", strerror(ret)); return ⑴; } if ((ret = pthread_create(&rebalance_tid, NULL, slab_rebalance_thread, NULL)) != 0) { fprintf(stderr, "Can't create rebal thread: %s ", strerror(ret)); return ⑴; } return 0; } void stop_slab_maintenance_thread(void) { mutex_lock(&cache_lock); do_run_slab_thread = 0; do_run_slab_rebalance_thread = 0; pthread_cond_signal(&maintenance_cond); pthread_mutex_unlock(&cache_lock); /* Wait for the maintenance thread to stop */ pthread_join(maintenance_tid, NULL); pthread_join(rebalance_tid, NULL); }

要注意的是，start_slab_maintenance_thread函數啟動了兩個線程：rebalance線程和automove線程。automove線程會自動檢測是不是需要進行內存頁重分配。如果檢測到需要重分配，那末就會叫rebalance線程履行這個內存頁重分配工作。

默許情況下是不開啟自動檢測功能的，即便在啟動memcached的時候加入了-o slab_reassign參數。自動檢測功能由全局變量settings.slab_automove控制(默許值為0，0就是不開啟)。如果要開啟可以在啟動memcached的時候加入slab_automove選項，并將其參數數設置為1。比如命令$memcached -o slab_reassign,slab_automove=1就開啟了自動檢測功能。固然也是可以在啟動memcached后通過客戶端命令啟動automove功能，使用命令slabsautomove <0|1>。其中0表示關閉automove，1表示開啟automove。客戶真個這個命令只是簡單地設置settings.slab_automove的值，不做其他任何工作。

automove線程：

item狀態記錄儀：

由于rebalance線程啟動后就會由于等待條件變量而進入休眠狀態，等待他人給它內存頁重分配任務。所以我們先來看1下automove線程。

automove線程要進行自動檢測，檢測就需要1些實時數據進行分析。然后得出結論：哪一個slabclass_t需要更多的內存，哪一個又不需要。automove線程通過全局變量itemstats搜集item的各種數據。下面看1下itemstats變量和它的類型定義。

//items.c文件 typedef struct { uint64_t evicted;//由于LRU踢了多少個item //即便1個item的exptime設置為0，也是會被踢的 uint64_t evicted_nonzero;//被踢的item中，超時時間(exptime)不為0的item數 //最后1次踢item時，被踢的item已過期多久了 //itemstats[id].evicted_time = current_time - search->time; rel_time_t evicted_time; uint64_t reclaimed;//在申請item時，發現過期并回收的item數量 uint64_t outofmemory;//為item申請內存，失敗的次數 uint64_t tailrepairs;//需要修復的item數量(除非worker線程有問題否則1般為0) //直到被超時刪除時都還沒被訪問過的item數量 uint64_t expired_unfetched; //直到被LRU踢出時都還沒有被訪問過的item數量 uint64_t evicted_unfetched; uint64_t crawler_reclaimed;//被LRU爬蟲發現的過期item數量 //申請item而搜索LRU隊列時，被其他worker線程援用的item數量 uint64_t lrutail_reflocked; } itemstats_t; #define POWER_LARGEST 200 #define LARGEST_ID POWER_LARGEST static itemstats_t itemstats[LARGEST_ID];

注意上面代碼是在items.c文件的，并且全局變量itemstats是static類型。itemstats變量是1個數組，它是和slabclass數組逐一對應的。itemstats數組的元素負責搜集slabclass數組中對應元素的信息。itemstats_t結構體雖然提供了很多成員，可以搜集很多信息，但automove線程只用到第1個成員evicted。automove線程需要知道每個尺寸的item的被踢情況，然后判斷哪1類item資源緊缺，哪1類item資源又多余。

itemstats廣泛散布在items.c文件的多個函數中(主要是為了能搜集各種數據)，所以這里就不給出itemstats的具體搜集實現了。固然由于evicted是重要的而且只在1個函數出現，就貼出evicted的搜集代碼吧。

item *do_item_alloc(char *key, const size_t nkey, const int flags, const rel_time_t exptime, const int nbytes, const uint32_t cur_hv) { item *it = NULL; int tries = 5; item *search; item *next_it; rel_time_t oldest_live = settings.oldest_live; search = tails[id]; for (; tries > 0 && search != NULL; tries--, search=next_it) { /* we might relink search mid-loop, so search->prev isn't reliable */ next_it = search->prev; ... if ((search->exptime != 0 && search->exptime < current_time) || (search->time <= oldest_live && oldest_live <= current_time)) { ... } else if ((it = slabs_alloc(ntotal, id)) == NULL) {//申請內存失敗 //此刻，過期失效的item沒有找到，申請內存又失敗了。看來只能使用 //LRU淘汰1個item(即便這個item并沒有過期失效) if (settings.evict_to_free == 0) {//設置了不進行LRU淘汰item //此時只能向客戶端回復毛病了 itemstats[id].outofmemory++; } else { itemstats[id].evicted++;//增加被踢的item數 itemstats[id].evicted_time = current_time - search->time; //即便1個item的exptime成員設置為永不超時(0)，還是會被踢的 if (search->exptime != 0) itemstats[id].evicted_nonzero++; if ((search->it_flags & ITEM_FETCHED) == 0) { itemstats[id].evicted_unfetched++; } it = search; //1旦發現有item被踢，那末就啟動內存頁重分配操作 //這個太頻繁了，不推薦 if (settings.slab_automove == 2) slabs_reassign(⑴, id); } } break; } ... return it; }

從上面的代碼可以看到，如果某個item由于LRU被踢了，那末就會被記錄起來。在最后還可以看到如果settings.slab_automove 等于2，那末1旦有item被踢了就調用slabs_reassign函數。slabs_reassign函數就是內存頁重分配處理函數。明顯1有item被踢就重分配太頻繁了，所以這是不推薦的。

肯定貧困和富有item：

現在回過來看1下automove線程的線程函數slab_maintenance_thread。

static void *slab_maintenance_thread(void *arg) { int src, dest; while (do_run_slab_thread) { if (settings.slab_automove == 1) {//啟動了automove功能 if (slab_automove_decision(&src, &dest) == 1) { /* Blind to the return codes. It will retry on its own */ slabs_reassign(src, dest); } sleep(1); } else {//等待用戶啟動automove /* Don't wake as often if we're not enabled. * This is lazier than setting up a condition right now. */ sleep(5); } } return NULL; }

可以看到如果settings.slab_automove就調用slab_automove_decision判斷是不是應當進行內存頁重分配。返回1就說明需要重分配內存頁，此時調用slabs_reassign進行處理。現在來看1下automove線程是怎樣判斷要不要進行內存頁重分配的。

//items.c文件 void item_stats_evictions(uint64_t *evicted) { int i; mutex_lock(&cache_lock); for (i = 0; i < LARGEST_ID; i++) { evicted[i] = itemstats[i].evicted; } mutex_unlock(&cache_lock); } //slabs.c文件 //本函數選出最好被踢選手，和最好不被踢選手。返回1表示成功選手兩位選手 //返回0表示沒有選出。要同時選出兩個選手才返回1。并用src參數記錄最好不 //不踢選手的id，dst記錄最好被踢選手的id static int slab_automove_decision(int *src, int *dst) { static uint64_t evicted_old[POWER_LARGEST]; static unsigned int slab_zeroes[POWER_LARGEST]; static unsigned int slab_winner = 0; static unsigned int slab_wins = 0; uint64_t evicted_new[POWER_LARGEST]; uint64_t evicted_diff = 0; uint64_t evicted_max = 0; unsigned int highest_slab = 0; unsigned int total_pages[POWER_LARGEST]; int i; int source = 0; int dest = 0; static rel_time_t next_run; /* Run less frequently than the slabmove tester. */ //本函數的調用不能過于頻繁，最少10秒調用1次 if (current_time >= next_run) { next_run = current_time + 10; } else { return 0; } //獲得每個slabclass的被踢item數 item_stats_evictions(evicted_new); pthread_mutex_lock(&cache_lock); for (i = POWER_SMALLEST; i < power_largest; i++) { total_pages[i] = slabclass[i].slabs; } pthread_mutex_unlock(&cache_lock); //本函數會頻繁被調用，所以有次數可說。 /* Find a candidate source; something with zero evicts 3+ times */ //evicted_old記錄上1個時刻每個slabclass的被踢item數 //evicted_new則記錄了現在每個slabclass的被踢item數 //evicted_diff則能表現某1個LRU隊列被踢的頻繁程度 for (i = POWER_SMALLEST; i < power_largest; i++) { evicted_diff = evicted_new[i] - evicted_old[i]; if (evicted_diff == 0 && total_pages[i] > 2) { //evicted_diff等于0說明這個slabclass沒有item被踢，而且 //它又占有最少兩個slab。 slab_zeroes[i]++;//增加計數 //這個slabclass已歷經3次都沒有被踢記錄，說明空間多得很 //就選你了,最好不被踢選手 if (source == 0 && slab_zeroes[i] >= 3) source = i; } else { slab_zeroes[i] = 0;//計數清零 if (evicted_diff > evicted_max) { evicted_max = evicted_diff; highest_slab = i; } } evicted_old[i] = evicted_new[i]; } /* Pick a valid destination */ //選出1個slabclass，這個slabclass要連續3次都是被踢最多item的那個slabclass if (slab_winner != 0 && slab_winner == highest_slab) { slab_wins++; if (slab_wins >= 3)//這個slabclass已連續3次成為最好被踢選手了 dest = slab_winner; } else { slab_wins = 1;//計數清零(固然這里是1) slab_winner = highest_slab;//本次的最好被踢選手 } if (source && dest) { *src = source; *dst = dest; return 1; } return 0; }

從上面的代碼也能夠看到，其實判斷的方法也比較簡單。從slabclass數組當選出兩個選手：1個是連續3次沒有被踢item了，另外1個則是連續3次都成為最好被踢手。如果找到了滿足條件的兩個選手，那末返回1。此時automove線程就會調用slabs_reassign函數。

下達 rebalance任務：

在貼出slabs_reassign函數前，回想1下slabs reassign命令。前面講的都是自動檢測要不要進行內存頁重分配，都快要忘了還有1個手動要求內存頁重分配的命令。如果客戶端使用了slabs reassign命令，那末worker線程在接收到這個命令后，就會調用slabs_reassign函數，函數參數是slabs reassign命令的參數。現在自動檢測和手動設置大1統了。

enum reassign_result_type { REASSIGN_OK=0, REASSIGN_RUNNING, REASSIGN_BADCLASS, REASSIGN_NOSPARE, REASSIGN_SRC_DST_SAME }; enum reassign_result_type slabs_reassign(int src, int dst) { enum reassign_result_type ret; if (pthread_mutex_trylock(&slabs_rebalance_lock) != 0) { return REASSIGN_RUNNING; } ret = do_slabs_reassign(src, dst); pthread_mutex_unlock(&slabs_rebalance_lock); return ret; } static enum reassign_result_type do_slabs_reassign(int src, int dst) { if (slab_rebalance_signal != 0) return REASSIGN_RUNNING; if (src == dst)//不能相同 return REASSIGN_SRC_DST_SAME; /* Special indicator to choose ourselves. */ if (src == ⑴) {//客戶端命令要求隨機選出1個源slab class //選出1個頁數大于1的slab class，并且該slab class不能是dst //指定的那個。如果不存在這樣的slab class，那末返回⑴ src = slabs_reassign_pick_any(dst); /* TODO: If we end up back at ⑴, return a new error type */ } if (src < POWER_SMALLEST || src > power_largest || dst < POWER_SMALLEST || dst > power_largest) return REASSIGN_BADCLASS; //源slab class沒有或只有1個內存頁，那末就不能分給別的slab class if (slabclass[src].slabs < 2) return REASSIGN_NOSPARE; //全局變量slab_rebal slab_rebal.s_clsid = src;//保存源slab class slab_rebal.d_clsid = dst;//保存目標slab class slab_rebalance_signal = 1; //喚醒slab_rebalance_thread函數的線程. //在slabs_reassign函數中已鎖上了slabs_rebalance_lock pthread_cond_signal(&slab_rebalance_cond); return REASSIGN_OK; } //選出1個內存頁數大于1的slab class，并且該slab class不能是dst //指定的那個。如果不存在這樣的slab class，那末返回⑴ static int slabs_reassign_pick_any(int dst) { static int cur = POWER_SMALLEST - 1; int tries = power_largest - POWER_SMALLEST + 1; for (; tries > 0; tries--) { cur++; if (cur > power_largest) cur = POWER_SMALLEST; if (cur == dst) continue; if (slabclass[cur].slabs > 1) { return cur; } } return ⑴; }

do_slabs_reassign會把源slab class 和目標slab class保存在全局變量slab_rebal，并且在最后會調用pthread_cond_signal喚醒rebalance線程。

rebalance線程：

現在automove線程已退出歷史舞臺了，rebalance線程也從沉睡中蘇醒過來并登上舞臺。現在來看1下rebalance線程的線程函數slab_rebalance_thread。注意：在1開始slab_rebalance_signal是等于0的，當需要進行內存頁重分配就會把slab_rebalance_signal變量賦值為1。

static void *slab_rebalance_thread(void *arg) { int was_busy = 0; /* So we first pass into cond_wait with the mutex held */ mutex_lock(&slabs_rebalance_lock); while (do_run_slab_rebalance_thread) { if (slab_rebalance_signal == 1) { //標志要移動的內存頁的信息，并將slab_rebalance_signal賦值為2 //slab_rebal.done賦值為0，表示沒有完成 if (slab_rebalance_start() < 0) {//失敗 /* Handle errors with more specifity as required. */ slab_rebalance_signal = 0; } was_busy = 0; } else if (slab_rebalance_signal && slab_rebal.slab_start != NULL) { was_busy = slab_rebalance_move();//進行內存頁遷移操作 } if (slab_rebal.done) {//完成內存頁重分配操作 slab_rebalance_finish(); } else if (was_busy) {//有worker線程在使用內存頁上的item /* Stuck waiting for some items to unlock, so slow down a bit * to give them a chance to free up */ usleep(50);//休眠1會兒，等待worker線程放棄使用item，然后再次嘗試 } if (slab_rebalance_signal == 0) {//1開始就在這里休眠 /* always hold this lock while we're running */ pthread_cond_wait(&slab_rebalance_cond, &slabs_rebalance_lock); } } return NULL; }

鎖定內存頁：

函數slab_rebalance_start對要源slab class進行1些標注，當worker線程要訪問源slab class的時候意想到正在內存頁重分配。

//memcached.h文件 struct slab_rebalance { //記錄要移動的頁的信息。slab_start指向頁的開始位置。slab_end指向頁 //的結束位置。slab_pos則記錄當前處理的位置(item) void *slab_start; void *slab_end; void *slab_pos; int s_clsid; //源slab class的下標索引 int d_clsid; //目標slab class的下標索引 int busy_items; //是不是worker線程在援用某個item uint8_t done;//是不是完成了內存頁移動 }; //memcached.c文件 struct slab_rebalance slab_rebal; //slabs.c文件 static int slab_rebalance_start(void) { slabclass_t *s_cls; int no_go = 0; pthread_mutex_lock(&cache_lock); pthread_mutex_lock(&slabs_lock); if (slab_rebal.s_clsid < POWER_SMALLEST || slab_rebal.s_clsid > power_largest || slab_rebal.d_clsid < POWER_SMALLEST || slab_rebal.d_clsid > power_largest || slab_rebal.s_clsid == slab_rebal.d_clsid)//非法下標索引 no_go = ⑵; s_cls = &slabclass[slab_rebal.s_clsid]; //為這個目標slab class增加1個頁表項都失敗，那末就 //根本沒法為之增加1個頁了 if (!grow_slab_list(slab_rebal.d_clsid)) { no_go = ⑴; } if (s_cls->slabs < 2)//目標slab class頁數太少了，沒法分1個頁給他人 no_go = ⑶; if (no_go != 0) { pthread_mutex_unlock(&slabs_lock); pthread_mutex_unlock(&cache_lock); return no_go; /* Should use a wrapper function... */ } //標志將源slab class的第幾個內存頁分給目標slab class //這里是默許是將第1個內存頁分給目標slab class s_cls->killing = 1; //記錄要移動的頁的信息。slab_start指向頁的開始位置。slab_end指向頁 //的結束位置。slab_pos則記錄當前處理的位置(item) slab_rebal.slab_start = s_cls->slab_list[s_cls->killing - 1]; slab_rebal.slab_end = (char *)slab_rebal.slab_start + (s_cls->size * s_cls->perslab); slab_rebal.slab_pos = slab_rebal.slab_start; slab_rebal.done = 0; /* Also tells do_item_get to search for items in this slab */ slab_rebalance_signal = 2;//要rebalance線程接下來進行內存頁移動 pthread_mutex_unlock(&slabs_lock); pthread_mutex_unlock(&cache_lock); return 0; }

slab_rebalance_start會將1個slab class的1個內存頁標注為要移動的，此時就不能讓worker線程訪問這個內存頁的item了。現在看1下假設worker線程恰好要訪問這個內存頁的1個item時會產生甚么。

item *do_item_get(const char *key, const size_t nkey, const uint32_t hv) { item *it = assoc_find(key, nkey, hv);//assoc_find函數內部沒有加鎖 if (it != NULL) {//找到了，此時item的援用計數最少為1 refcount_incr(&it->refcount);//線程安全地自增1 /* Optimization for slab reassignment. prevents popular items from * jamming in busy wait. Can only do this here to satisfy lock order * of item_lock, cache_lock, slabs_lock. */ if (slab_rebalance_signal && ((void *)it >= slab_rebal.slab_start && (void *)it < slab_rebal.slab_end)) { //這個item恰好在要移動的內存頁里面。此時不能返回這個item //worker線程要負責把這個item從哈希表和LRU隊列中刪除這個item，避免 //后面有其他worker線程又訪問這個不能使用的item do_item_unlink_nolock(it, hv); do_item_remove(it); it = NULL; } } ... return it; }

移動(歸還)item：

現在回過頭繼續看rebalance線程。前面說到已標注了源slab class的1個內存頁。標注完rebalance線程就會調用slab_rebalance_move函數完成真實的內存頁遷移操作。源slab class上的內存頁是有item的，那末在遷移的時候怎樣處理這些item呢？memcached的處理方式是很粗魯的：直接刪除。如果這個item還有worker線程在使用，rebalance線程就等你1下。如果這個item沒有worker線程在援用，那末即便這個item沒有過期失效也將直接刪除。

由于1個內存頁可能會有很多個item，所以memcached也采取分期處理的方法，每次只處理少許的item(默許為1個)。所以呢，slab_rebalance_move函數會在slab_rebalance_thread線程函數中屢次調用，直到處理了所有的item。

/* refcount == 0 is safe since nobody can incr while cache_lock is held. * refcount != 0 is impossible since flags/etc can be modified in other * threads. instead, note we found a busy one and bail. logic in do_item_get * will prevent busy items from continuing to be busy */ static int slab_rebalance_move(void) { slabclass_t *s_cls; int x; int was_busy = 0; int refcount = 0; enum move_status status = MOVE_PASS; pthread_mutex_lock(&cache_lock); pthread_mutex_lock(&slabs_lock); s_cls = &slabclass[slab_rebal.s_clsid]; //會在start_slab_maintenance_thread函數中讀取環境變量設置slab_bulk_check //默許值為1.一樣這里也是采取分期處理的方案處理1個頁上的多個item for (x = 0; x < slab_bulk_check; x++) { item *it = slab_rebal.slab_pos; status = MOVE_PASS; if (it->slabs_clsid != 255) { void *hold_lock = NULL; uint32_t hv = hash(ITEM_key(it), it->nkey); if ((hold_lock = item_trylock(hv)) == NULL) { status = MOVE_LOCKED; } else { refcount = refcount_incr(&it->refcount); if (refcount == 1) { /* item is unlinked, unused */ //如果it_flags&ITEM_SLABBED為真，那末就說明這個item //根本就沒有分配出去。如果為假，那末說明這個item被分配 //出去了，但處于歸還途中。參考do_item_get函數里面的 //判斷語句，有slab_rebalance_signal作為判斷條件的那個。 if (it->it_flags & ITEM_SLABBED) {//沒有分配出去 /* remove from slab freelist */ if (s_cls->slots == it) { s_cls->slots = it->next; } if (it->next) it->next->prev = it->prev; if (it->prev) it->prev->next = it->next; s_cls->sl_curr--; status = MOVE_DONE;//這個item處理成功 } else {//此時還有另外1個worker線程在歸還這個item status = MOVE_BUSY; } } else if (refcount == 2) { /* item is linked but not busy */ //沒有worker線程援用這個item if ((it->it_flags & ITEM_LINKED) != 0) { //直接把這個item從哈希表和LRU隊列中刪除 do_item_unlink_nolock(it, hv); status = MOVE_DONE; } else { /* refcount == 1 + !ITEM_LINKED means the item is being * uploaded to, or was just unlinked but hasn't been freed * yet. Let it bleed off on its own and try again later */ status = MOVE_BUSY; } } else {//現在有worker線程正在援用這個item status = MOVE_BUSY; } item_trylock_unlock(hold_lock); } } switch (status) { case MOVE_DONE: it->refcount = 0;//援用計數清零 it->it_flags = 0;//清零所有屬性 it->slabs_clsid = 255; break; case MOVE_BUSY: refcount_decr(&it->refcount); //注意這里沒有break case MOVE_LOCKED: slab_rebal.busy_items++; was_busy++;//記錄是不是有不能馬上處理的item break; case MOVE_PASS: break; } //處理這個頁的下1個item slab_rebal.slab_pos = (char *)slab_rebal.slab_pos + s_cls->size; if (slab_rebal.slab_pos >= slab_rebal.slab_end)//遍歷完了這個頁 break; } //遍歷完了這個頁的所有item if (slab_rebal.slab_pos >= slab_rebal.slab_end) { /* Some items were busy, start again from the top */ //在處理的時候，跳過了1些item(由于有worker線程在援用) if (slab_rebal.busy_items) {//此時需要從頭再掃描1次這個頁 slab_rebal.slab_pos = slab_rebal.slab_start; slab_rebal.busy_items = 0; } else { slab_rebal.done++;//標志已處理完這個頁的所有item } } pthread_mutex_unlock(&slabs_lock); pthread_mutex_unlock(&cache_lock); return was_busy;//返回記錄 }

劫富濟貧：

上面代碼中的was_busy就標志了是不是有worker線程在援用內存頁中的1個item。其實slab_rebalance_move函數的名字獲得不好，由于實現的不是移動(遷移)，而是把內存頁中的item刪除從哈希表和LRU隊列中刪除。如果處理完內存頁的所有item，那末就會slab_rebal.done++，標志處理完成。在線程函數slab_rebalance_thread中，如果slab_rebal.done為真就會調用slab_rebalance_finish函數完成真實的內存頁遷移操作，把1個內存頁從1個slab class 轉移到另外1個slab class中。

static void slab_rebalance_finish(void) { slabclass_t *s_cls; slabclass_t *d_cls; pthread_mutex_lock(&cache_lock); pthread_mutex_lock(&slabs_lock); s_cls = &slabclass[slab_rebal.s_clsid]; d_cls = &slabclass[slab_rebal.d_clsid]; /* At this point the stolen slab is completely clear */ //相當于把指針賦NULL值 s_cls->slab_list[s_cls->killing - 1] = s_cls->slab_list[s_cls->slabs - 1]; s_cls->slabs--;//源slab class的內存頁數減1 s_cls->killing = 0; //內存頁所有字節清零，這個也很重要的 memset(slab_rebal.slab_start, 0, (size_t)settings.item_size_max); //將slab_rebal.slab_start指向的1個頁內存饋贈給目標slab class //slab_rebal.slab_start指向的頁是從源slab class中得到的。 d_cls->slab_list[d_cls->slabs++] = slab_rebal.slab_start; //依照目標slab class的item尺寸進行劃分這個頁，并且將這個頁的 //內存并入到目標slab class的空閑item隊列中 split_slab_page_into_freelist(slab_rebal.slab_start, slab_rebal.d_clsid); //清零 slab_rebal.done = 0; slab_rebal.s_clsid = 0; slab_rebal.d_clsid = 0; slab_rebal.slab_start = NULL; slab_rebal.slab_end = NULL; slab_rebal.slab_pos = NULL; slab_rebalance_signal = 0;//rebalance線程完成工作后，再次進入休眠狀態 pthread_mutex_unlock(&slabs_lock); pthread_mutex_unlock(&cache_lock); }

生活不易，碼農辛苦
如果您覺得本網站對您的學習有所幫助,可以手機掃描二維碼進行捐贈
程序員人生