- 論壇徽章:
- 0
|
本帖最后由 ydfgic 于 2011-08-01 13:58 編輯
總結一下:
最后修改了一下測試代碼,比較方便的設置線程數(shù),選擇測試對象,設置delay的參數(shù)。
經(jīng)過測試發(fā)現(xiàn)我的實現(xiàn)測試的結果時間非常穩(wěn)定,2個,10個,20個乃至40個線程的差別很小,幾乎達到了與線程數(shù)無關,同樣mutex也是和線程數(shù)關系不大,但是pthread_spinlock_t對線程數(shù)敏感,線程多的情況下,效率會降低很多。
同樣增加加鎖的粒度,對測試結果也有影響,當粒度很小的情況下,我的實現(xiàn)是mutex的4-5倍快,但是粒度很大的情況,比如我設置到delay的循環(huán)次數(shù)為1000時,效率是mutex的兩倍多快,但是cpu更忙些。spinlock的速度就不提了,很低。
結論:
如果是比較大粒度的加鎖肯定是mutex首選,雖然性能中庸,但是它因為會休眠掛起,不占用cpu,對系統(tǒng)影響小。
如果是比較細粒度的加鎖可以用我實現(xiàn)的lock,對線程數(shù)量幾乎無關,效率極高,可能是因為實現(xiàn)簡單,最大效率的做到了切換cpu,保證一個線程執(zhí)行,減少了多余環(huán)節(jié)。
pthread_spinlock_t的局限性太大,如果線程多的情況下,會造成性能的很大程度的損失。同時還僅限于小粒度的加鎖情況。
給出測試數(shù)據(jù):
1)20線程,0delay
我的:time ./myspinlock_O3.out 20 0 0
real 0m0.323s
user 0m0.364s
sys 0m0.276s
mutex:time ./myspinlock_O3.out 20 1 0
real 0m1.634s
user 0m1.972s
sys 0m1.264s
spinlock: time ./myspinlock_O3.out 20 2 0
real 0m6.259s
user 0m12.477s
sys 0m0.004s
2)20線程,100 delay
我的:time ./myspinlock_O3.out 20 0 100
real 0m2.965s
user 0m3.268s
sys 0m2.636s
mutex:time ./myspinlock_O3.out 20 1 100
real 0m6.493s
user 0m6.344s
sys 0m6.604s
spinlock: time ./myspinlock_O3.out 20 2 100
real 0m15.760s
user 0m31.378s
sys 0m0.004s
3)10線程,0delay
我的:time ./myspinlock_O3.out 10 0 0
real 0m0.318s
user 0m0.372s
sys 0m0.248s
mutex:time ./myspinlock_O3.out 10 1 0
real 0m1.511s
user 0m1.808s
sys 0m1.200s
spinlock:time ./myspinlock_O3.out 10 2 0
real 0m3.625s
user 0m7.224s
sys 0m0.004s
4)2線程,0delay
我的:time ./myspinlock_O3.out 2 0 0
real 0m0.323s
user 0m0.376s
sys 0m0.184s
mutex:time ./myspinlock_O3.out 2 1 0
real 0m1.453s
user 0m1.688s
sys 0m1.136s
spinlock:time ./myspinlock_O3.out 2 2 0
real 0m0.819s
user 0m1.624s
sys 0m0.004s
最終版的實現(xiàn)- #include<stdint.h>
- #include<unistd.h>
- typedef volatile uint32_t spinlock_t;
- #define MY_SPINLOCK_INITIALIZER 0
- #define spinlock_lock(lock) do{ \
- while(!__sync_bool_compare_and_swap(lock, 0, 1)) \
- sched_yield(); \
- }while(0)
- #define spinlock_unlock(lock) do{ \
- *lock = 0; \
- }while(0)
復制代碼 最終版的測試代碼- #include"myspinlock.h"
- // gcc -Wall -g -O3 -o myspinlock.out myspinlock.c -lpthread
- ///////////////////////// test
- my_spinlock_t lock = MY_SPINLOCK_INITIALIZER;
- volatile int cnt = 0;
- #include<pthread.h>
- #include<stdio.h>
- #include <stdlib.h>
- #define TOTAL 1000000 * 20
- int NR;
- int DELAY_CNT = 100;
- void * fun1(void * arg)
- {
- int i = 0, id = *(int*)arg;
- printf("thread:%d\n",id);
- for(; i < NR; i++)
- {
- spinlock_lock(&lock);
- cnt++;
- int j = 0;
- for (; j < DELAY_CNT; j++) {
- *foo = (*foo * 33) + 17;
- }
- spinlock_unlock(&lock);
- }
- printf("thread:%d over, lock:%d\n",id, lock);
- return 0;
- }
- pthread_mutex_t mlock = PTHREAD_MUTEX_INITIALIZER;
- void * fun2(void * arg)
- {
- int i = 0, id = *(int*)arg;
- printf("thread:%d\n",id);
- for(; i < NR; i++)
- {
- pthread_mutex_lock(&mlock);
- cnt++;
- int j = 0;
- for (; j < DELAY_CNT; j++) {
- *foo = (*foo * 33) + 17;
- }
- pthread_mutex_unlock(&mlock);
- }
- printf("thread:%d over, lock:%d\n",id, lock);
- return 0;
- }
- pthread_spinlock_t splock;
- void * fun3(void * arg)
- {
- int i = 0, id = *(int*)arg;
- printf("thread:%d\n",id);
- for(; i < NR; i++)
- {
- pthread_spin_lock(&splock);
- cnt++;
- int j = 0;
- for (; j < DELAY_CNT; j++) {
- *foo = (*foo * 33) + 17;
- }
- pthread_spin_unlock(&splock);
- }
- printf("thread:%d over, lock:%d\n",id, lock);
- return 0;
- }
- int N = 20;
- int main(int c, char * s[])
- {
- int which = 0;
- if(c > 1)
- {
- //線程數(shù)
- N = atoi(s[1]);
- if(N > 20 || N <= 1) N = 10;
- }
- if(c > 2)
- {
- //which func?
- which = atoi(s[2]);
- if(which > 2 || which < 0) which = 0;
- }
- if(c > 3)
- {
- //delay param
- DELAY_CNT = atoi(s[3]);
- if(DELAY_CNT > 10000 || DELAY_CNT < 0) DELAY_CNT= 100;
- }
- pthread_t id[N];
- int args[N];
- int i = 0;
- void * (*fun[])(void*) = { fun1,fun2,fun3};
- pthread_spin_init(&splock,0);
- NR = TOTAL / N;
- for(;i<N;++i){
- args[i] = i;
- pthread_create(&id[i],NULL,fun[which],&args[i]);
- }
- for(i=0;i<N;++i){
- printf("join thread:%d\n", i);
- pthread_join(id[i],NULL);
- printf("join thread:%d done\n", i);
- }
- printf("cnt = %d, should be %d\n",cnt, N * NR);
- return 0;
- }
復制代碼 ===============================================
先前的更新僅僅做為參考
更新
重新修改了我的實現(xiàn),加入了放棄時間片的情況,測試結果,幾乎是mutex的2-3倍效率
real 0m0.431s
user 0m0.604s
sys 0m0.240s
我想這個應該就會是我理想中的最終版本了,起碼可以拋棄 pthread 庫的mutex實現(xiàn)一些簡單的加鎖的功能。
代碼:- #ifndef MY_SPINLOCK_H
- #define MY_SPINLOCK_H
- #include<stdint.h>
- #include<unistd.h>
- typedef volatile uint32_t my_spinlock_t;
- #define MY_SPINLOCK_INITIALIZER 0
- #define DELAY_NR 10000
- static uint32_t bar = 13;
- static uint32_t *foo = &bar;
- #define do_hash(a) do{ \
- (a) = ((a)+0x7ed55d16) + ((a)<<12); \
- (a) = ((a)^0xc761c23c) ^ ((a)>>19); \
- (a) = ((a)+0x165667b1) + ((a)<<5); \
- (a) = ((a)+0xd3a2646c) ^ ((a)<<9); \
- (a) = ((a)+0xfd7046c5) + ((a)<<3); \
- (a) = ((a)^0xb55a4f09) ^ ((a)>>16); \
- }while(0)
- #define my_spinlock_lock(lock) do{ \
- while(!__sync_bool_compare_and_swap(lock, 0, 1)) \
- { \
- while(*lock) \
- { \
- do_hash(*foo); \
- if((*foo % 11) == 1) \
- sched_yield(); \
- } \
- } \
- }while(0)
- #define my_spinlock_unlock(lock) do{ \
- *lock = 0; \
- }while(0)
- #endif
復制代碼 =======================================
最近在研究原子操作,按網(wǎng)上一些資料實現(xiàn)了個自旋鎖
拿來和 posix 的mutex,spinlock 一起測,結果出乎我意料。
mutex的成績非常好,我自己實現(xiàn)的稍微差點,posix 的pthread_spinlock_t的結果比較差。
這個真沒想到,mutex的效率這么高,看到這個結果我都覺得不相信自己的眼睛了
還是印證了,不要靠自己感覺,實際數(shù)據(jù)才是最真實的。
誰能解釋一下,謝謝~
環(huán)境:
uname -a
Linux bsd02 2.6.35.9 #1 SMP Tue Jan 11 02:09:50 EST 2011 x86_64 GNU/Linux
雙核 Pentium(R) Dual-Core CPU E5400 @ 2.70GHz
并發(fā)20個線程測試,結果:
我的實現(xiàn):
real 0m1.659s
user 0m3.276s
sys 0m0.000s
mutex:
real 0m1.481s
user 0m1.164s
sys 0m1.764s
pthread spinlock:
real 0m6.171s
user 0m12.301s
sys 0m0.004s |
|