【導讀】GMP 模型是讓 go 語言輕量快速高效的重要調度模型,本文從 GMP 源碼出發直觀地解析了這一模型。
這篇文章就來看看 golang 的調度模型-GPM 模型的源碼結構。
Go 版本:go1.13.9
M 結構體
M 結構體是 OS 線程的一個抽象,主要負責結合 P 運行 G。它里面有很多字段,差不多有 60 個字段,我們看看里面主要的字段意思。/src/runtime/runtime2.go
Copytype m struct {
// 系統管理的一個 g,執行調度代碼時使用的。比如執行用戶的 goroutine 時,就需要把把用戶
// 的棧信息換到內核線程的棧,以便能夠執行用戶 goroutine
g0 *g // goroutine with scheduling stack
morebuf gobuf // gobuf arg to morestack
divmod uint32 // div/mod denominator for arm - known to liblink
// Fields not known to debuggers.
procid uint64 // for debuggers, but offset not hard-coded
//處理 signal 的 g
gsignal *g // signal-handling g
goSigStack gsignalStack // Go-allocated signal handling stack
sigmask sigset // storage for saved signal mask
//線程的本地存儲 TLS,這里就是為什么 OS 線程能運行 M 關鍵地方
tls [6]uintptr // thread-local storage (for x86 extern register)
//go 關鍵字運行的函數
mstartfn func()
//當前運行的用戶 goroutine 的 g 結構體對象
curg *g // current running goroutine
caughtsig guintptr // goroutine running during fatal signal
//當前工作線程綁定的 P,如果沒有就為 nil
p puintptr // attached p for executing go code (nil if not executing go code)
//暫存與當前 M 潛在關聯的 P
nextp puintptr
//M 之前調用的 P
oldp puintptr // the p that was attached before executing a syscall
id int64
mallocing int32
throwing int32
//當前 M 是否關閉搶占式調度
preemptoff string // if != “”, keep curg running on this m
locks int32
dying int32
profilehz int32
//M 的自旋狀態,為 true 時 M 處于自旋狀態,正在從其他線程偷 G; 為 false,休眠狀態
spinning bool // m is out of work and is actively looking for work
blocked bool // m is blocked on a note
newSigstack bool // minit on C thread called sigaltstack
printlock int8
incgo bool // m is executing a cgo call
freeWait uint32 // if == 0, safe to free g0 and delete m (atomic)
fastrand [2]uint32
needextram bool
traceback uint8
ncgocall uint64 // number of cgo calls in total
ncgo int32 // number of cgo calls currently in progress
cgoCallersUse uint32 // if non-zero, cgoCallers in use temporarily
cgoCallers *cgoCallers // cgo traceback if crashing in cgo call
//沒有 goroutine 運行時,工作線程睡眠
//通過這個來喚醒工作線程
park note // 休眠鎖
//記錄所有工作線程的鏈表
alllink *m // on allm
schedlink muintptr
//當前線程內存分配的本地緩存
mcache *mcache
//當前 M 鎖定的 G,
lockedg guintptr
createstack [32]uintptr // stack that created this thread.
lockedExt uint32 // tracking for external LockOSThread
lockedInt uint32 // tracking for internal lockOSThread
nextwaitm muintptr // next m waiting for lock
waitunlockf func(*g, unsafe.Pointer) bool
waitlock unsafe.Pointer
waittraceev byte
waittraceskip int
startingtrace bool
syscalltick uint32
//操作系統線程 id
thread uintptr // thread handle
freelink *m // on sched.freem
// these are here because they are too large to be on the stack
// of low-level NOSPLIT functions.
libcall libcall
libcallpc uintptr // for cpu profiler
libcallsp uintptr
libcallg guintptr
syscall libcall // stores syscall parameters on windows
vdsoSP uintptr // SP for traceback while in VDSO call (0 if not in call)
vdsoPC uintptr // PC for traceback while in VDSO call
dlogPerM
mOS
}
看看幾個比較重要的字段:g0:用于執行調度器的 g0gsignal:用于信號處理tls:線程本地存儲的 tlsp:goroutine 綁定的本地資源
P 結構體
一個 M 要運行,必須綁定 P 才能運行 goroutine,M 阻塞時,P 會被傳給其他 M。
/src/runtime/runtime2.go
Copytype p struct {
//allp 中的索引
id int32
//p 的狀態
status uint32 // one of pidle/prunning/。。.
link puintptr
schedtick uint32 // incremented on every scheduler call-》每次 scheduler 調用+1
syscalltick uint32 // incremented on every system call-》每次系統調用+1
sysmontick sysmontick // last tick observed by sysmon
//指向綁定的 m,如果 p 是 idle 的話,那這個指針是 nil
m muintptr // back-link to associated m (nil if idle)
mcache *mcache
raceprocctx uintptr
//不同大小可用 defer 結構池
deferpool [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
deferpoolbuf [5][32]*_defer
// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
goidcache uint64
goidcacheend uint64
//本地運行隊列,可以無鎖訪問
// Queue of runnable goroutines. Accessed without lock.
runqhead uint32 //隊列頭
runqtail uint32 //隊列尾
//數組實現的循環隊列
runq [256]guintptr
// runnext, if non-nil, is a runnable G that was ready‘d by
// the current G and should be run next instead of what’s in
// runq if there‘s time remaining in the running G’s time
// slice. It will inherit the time left in the current time
// slice. If a set of goroutines is locked in a
// communicate-and-wait pattern, this schedules that set as a
// unit and eliminates the (potentially large) scheduling
// latency that otherwise arises from adding the ready‘d
// goroutines to the end of the run queue.
// runnext 非空時,代表的是一個 runnable 狀態的 G,
//這個 G 被 當前 G 修改為 ready 狀態,相比 runq 中的 G 有更高的優先級。
//如果當前 G 還有剩余的可用時間,那么就應該運行這個 G
//運行之后,該 G 會繼承當前 G 的剩余時間
runnext guintptr
// Available G’s (status == Gdead)
//空閑的 g
gFree struct {
gList
n int32
}
sudogcache []*sudog
sudogbuf [128]*sudog
tracebuf traceBufPtr
// traceSweep indicates the sweep events should be traced.
// This is used to defer the sweep start event until a span
// has actually been swept.
traceSweep bool
// traceSwept and traceReclaimed track the number of bytes
// swept and reclaimed by sweeping in the current sweep loop.
traceSwept, traceReclaimed uintptr
palloc persistentAlloc // per-P to avoid mutex
_ uint32 // Alignment for atomic fields below
// Per-P GC state
gcAssistTime int64 // Nanoseconds in assistAlloc
gcFractionalMarkTime int64 // Nanoseconds in fractional mark worker (atomic)
gcBgMarkWorker guintptr // (atomic)
gcMarkWorkerMode gcMarkWorkerMode
// gcMarkWorkerStartTime is the nanotime() at which this mark
// worker started.
gcMarkWorkerStartTime int64
// gcw is this P‘s GC work buffer cache. The work buffer is
// filled by write barriers, drained by mutator assists, and
// disposed on certain GC state transitions.
gcw gcWork
// wbBuf is this P’s GC write barrier buffer.
//
// TODO: Consider caching this in the running G.
wbBuf wbBuf
runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point
pad cpu.CacheLinePad
}
其他的一些字段就是 gc,trace,debug 信息
G 結構體
G 就是 goroutine。主要保存 goroutine 的所有信息以及棧信息,gobuf 結構體:cpu 里的寄存器信息,以便在輪到本 goroutine 執行時,知道從哪里開始執行。
/src/runtime/runtime2.go
Copytype stack struct {
lo uintptr //棧頂,指向內存低地址
hi uintptr //棧底,指向內存搞地址
}
type g struct {
// Stack parameters.
// stack describes the actual stack memory: [stack.lo, stack.hi)。
// stackguard0 is the stack pointer compared in the Go stack growth prologue.
// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
// stackguard1 is the stack pointer compared in the C stack growth prologue.
// It is stack.lo+StackGuard on g0 and gsignal stacks.
// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash)。
// 記錄該 goroutine 使用的棧
stack stack // offset known to runtime/cgo
//下面兩個成員用于棧溢出檢查,實現棧的自動伸縮,搶占調度也會用到 stackguard0
stackguard0 uintptr // offset known to liblink
stackguard1 uintptr // offset known to liblink
_panic *_panic // innermost panic - offset known to liblink
_defer *_defer // innermost defer
// 此 goroutine 正在被哪個工作線程執行
m *m // current m; offset known to arm liblink
//這個字段跟調度切換有關,G 切換時用來保存上下文,保存什么,看下面 gobuf 結構體
sched gobuf
syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc
syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc
stktopsp uintptr // expected sp at top of stack, to check in traceback
param unsafe.Pointer // passed parameter on wakeup,wakeup 喚醒時傳遞的參數
// 狀態 Gidle,Grunnable,Grunning,Gsyscall,Gwaiting,Gdead
atomicstatus uint32
stackLock uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
goid int64
//schedlink 字段指向全局運行隊列中的下一個 g,
//所有位于全局運行隊列中的 g 形成一個鏈表
schedlink guintptr
waitsince int64 // approx time when the g become blocked
waitreason waitReason // if status==Gwaiting,g 被阻塞的原因
//搶占信號,stackguard0 = stackpreempt,如果需要搶占調度,設置 preempt 為 true
preempt bool // preemption signal, duplicates stackguard0 = stackpreempt
paniconfault bool // panic (instead of crash) on unexpected fault address
preemptscan bool // preempted g does scan for gc
gcscandone bool // g has scanned stack; protected by _Gscan bit in status
gcscanvalid bool // false at start of gc cycle, true if G has not run since last scan; TODO: remove?
throwsplit bool // must not split stack
raceignore int8 // ignore race detection events
sysblocktraced bool // StartTrace has emitted EvGoInSyscall about this goroutine
sysexitticks int64 // cputicks when syscall has returned (for tracing)
traceseq uint64 // trace event sequencer
tracelastp puintptr // last P emitted an event for this goroutine
// 如果調用了 LockOsThread,那么這個 g 會綁定到某個 m 上
lockedm muintptr
sig uint32
writebuf []byte
sigcode0 uintptr
sigcode1 uintptr
sigpc uintptr
// 創建這個 goroutine 的 go 表達式的 pc
gopc uintptr // pc of go statement that created this goroutine
ancestors *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
startpc uintptr // pc of goroutine function
racectx uintptr
waiting *sudog // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
cgoCtxt []uintptr // cgo traceback context
labels unsafe.Pointer // profiler labels
timer *timer // cached timer for time.Sleep, 為 time.Sleep 緩存的計時器
selectDone uint32 // are we participating in a select and did someone win the race?
// Per-G GC state
// gcAssistBytes is this G‘s GC assist credit in terms of
// bytes allocated. If this is positive, then the G has credit
// to allocate gcAssistBytes bytes without assisting. If this
// is negative, then the G must correct this by performing
// scan work. We track this in bytes to make it fast to update
// and check for debt in the malloc hot path. The assist ratio
// determines how this corresponds to scan work debt.
gcAssistBytes int64
}
gobuf
gobuf 結構體用于保存 goroutine 的調度信息,主要包括 CPU 的幾個寄存器的值。
/src/runtime/runtime2.go
Copytype gobuf struct {
// The offsets of sp, pc, and g are known to (hard-coded in) libmach.
//
// ctxt is unusual with respect to GC: it may be a
// heap-allocated funcval, so GC needs to track it, but it
// needs to be set and cleared from assembly, where it’s
// difficult to have write barriers. However, ctxt is really a
// saved, live register, and we only ever exchange it between
// the real register and the gobuf. Hence, we treat it as a
// root during stack scanning, which means assembly that saves
// and restores it doesn‘t need write barriers. It’s still
// typed as a pointer so that any other writes from Go get
// write barriers.
sp uintptr // 保存 CPU 的 rsp 寄存器的值
pc uintptr // 保存 CPU 的 rip 寄存器的值
g guintptr // 記錄當前這個 gobuf 對象屬于哪個 goroutine
ctxt unsafe.Pointer
//保存系統調用的返回值,因為從系統調用返回之后如果 p 被其它工作線程搶占,
//則這個 goroutine 會被放入全局運行隊列被其它工作線程調度,其它線程需要知道系統調用的返回值。
ret sys.Uintreg // 保存系統調用的返回值
lr uintptr
//保存 CPU 的 rip 寄存器的值
bp uintptr // for GOEXPERIMENT=framepointer
}
調度器 sched 結構
所有的 gorouteine 都是被調度器調度運行,調度器持有全局資源
sched
/src/runtime/runtime2.go
Copytype schedt struct {
// accessed atomically. keep at top to ensure alignment on 32-bit systems.
// 需以原子訪問訪問。
// 保持在 struct 頂部,以使其在 32 位系統上可以對齊
goidgen uint64
lastpoll uint64
lock mutex
// When increasing nmidle, nmidlelocked, nmsys, or nmfreed, be
// sure to call checkdead()。
//由空閑的工作線程組成的鏈表
midle muintptr // idle m‘s waiting for work
//空閑的工作線程的數量
nmidle int32 // number of idle m’s waiting for work
//空閑的且被 lock 的 m 計數
nmidlelocked int32 // number of locked m‘s waiting for work
//已經創建的多個 m,下一個 m id
mnext int64 // number of m’s that have been created and next M ID
//被允許創建的最大 m 線程數量
maxmcount int32 // maximum number of m‘s allowed (or die)
nmsys int32 // number of system m’s not counted for deadlock
//累積空閑的 m 數量
nmfreed int64 // cumulative number of freed m‘s
//系統 goroutine 的數量,自動更新
ngsys uint32 // number of system goroutines; updated atomically
//由空閑的 p 結構體對象組成的鏈表
pidle puintptr // idle p’s
//空閑的 p 結構體對象的數量
npidle uint32
nmspinning uint32 // See “Worker thread parking/unparking” comment in proc.go.
// Global runnable queue.
//全局運行隊列 G 隊列
runq gQueue //這個結構體在 proc.go 里
//元素數量
runqsize int32
// disable controls selective disabling of the scheduler.
//
// Use schedEnableUser to control this.
//
// disable is protected by sched.lock.
disable struct {
// user disables scheduling of user goroutines.
user bool
runnable gQueue // pending runnable Gs
n int32 // length of runnable
}
// Global cache of dead G‘s. 有效 dead G 全局緩存
gFree struct {
lock mutex
stack gList // Gs with stacks
noStack gList // Gs without stacks
n int32
}
// Central cache of sudog structs. dusog 結構的集中緩存
sudoglock mutex
sudogcache *sudog
// Central pool of available defer structs of different sizes. 不同大小有效的 defer 結構的池
deferlock mutex
deferpool [5]*_defer
// freem is the list of m’s waiting to be freed when their
// m.exited is set. Linked through m.freelink.
freem *m
gcwaiting uint32 // gc is waiting to run
stopwait int32
stopnote note
sysmonwait uint32
sysmonnote note
// safepointFn should be called on each P at the next GC
// safepoint if p.runSafePointFn is set.
safePointFn func(*p)
safePointWait int32
safePointNote note
profilehz int32 // cpu profiling rate
procresizetime int64 // nanotime() of last change to gomaxprocs
totaltime int64 // ∫gomaxprocs dt up to procresizetime
}
gQueue
/src/runtime/proc.go
Copytype gQueue struct {
head guintptr //隊列頭
tail guintptr //隊列尾
}
一些重要全局變量
/src/runtime/proc.go
Copym0 m //代表主線程
g0 g //m0 綁定的 g0,也就是 M 結構體中 m0.g0=&g0
allgs []*g //保存所有的 g
/src/runtime/runtime2.go
Copyallm *m //所有的 m 構成的一個鏈表,包括上面的 m0
allp []*p //保存所有的 p, len(allp) == gomaxprocs
sched schedt //調度器的結構體,保存了調度器的各種信息
ncpu int32 //系統 cpu 核的數量,程序啟動時由 runtime 初始化
gomaxprocs int32 //p 的最大數量,默認等于 ncpu,可以通過 GOMAXPROCS 修改
在程序初始化時,這些變量都會被初始化為 0 值,指針會被初始化為 nil 指針,切片初始化為 nil 切片,int 被初始化為數字 0,結構體的所有成員變量按其本類型初始化為其類型的 0 值。
調度器初始化
調度器初始化有一個主要的函數 schedinit(), 這個函數在 /src/runtime/proc.go 文件中。函數開頭還把初始化的順序給列出來了:
// The bootstrap sequence is://// call osinit// call schedinit// make & queue new G// call runtime·mstart//// The new G calls runtime·main.
Copyfunc schedinit() {
// raceinit must be the first call to race detector.
// In particular, it must be done before mallocinit below calls racemapshadow.
_g_ := getg() //getg() 在 src/runtime/stubs.go 中聲明,真正的代碼由編譯器生成
if raceenabled {
_g_.racectx, raceprocctx0 = raceinit()
}
//設置最大 M 的數量
sched.maxmcount = 10000
tracebackinit()
moduledataverify()
//初始化??臻g常用管理鏈表
stackinit()
mallocinit()
//初始化當前 m
mcommoninit(_g_.m)
cpuinit() // must run before alginit
alginit() // maps must not be used before this call
modulesinit() // provides activeModules
typelinksinit() // uses maps, activeModules
itabsinit() // uses activeModules
msigsave(_g_.m)
initSigmask = _g_.m.sigmask
goargs()
goenvs()
parsedebugvars()
gcinit()
sched.lastpoll = uint64(nanotime())
// 把 p 數量從 1 調整到默認的 CPU Core 數量
procs := ncpu
if n, ok := atoi32(gogetenv(“GOMAXPROCS”)); ok && n 》 0 {
procs = n
}
//調整 P 數量
//這里的 P 都是新建的,所以不返回有本地任務的 p
if procresize(procs) != nil {
throw(“unknown runnable goroutine during bootstrap”)
}
// For cgocheck 》 1, we turn on the write barrier at all times
// and check all pointer writes. We can‘t do this until after
// procresize because the write barrier needs a P.
if debug.cgocheck 》 1 {
writeBarrier.cgo = true
writeBarrier.enabled = true
for _, p := range allp {
p.wbBuf.reset()
}
}
if buildVersion == “” {
// Condition should never trigger. This code just serves
// to ensure runtime·buildVersion is kept in the resulting binary.
buildVersion = “unknown”
}
if len(modinfo) == 1 {
// Condition should never trigger. This code just serves
// to ensure runtime·modinfo is kept in the resulting binary.
modinfo = “”
}
}
開頭的這個函數 getg(),跳轉到了 func getg() *g ,定義這么一個形式,什么意思?函數首先調用 getg() 函數獲取當前正在運行的 g,getg() 在 src/runtime/stubs.go 中聲明,真正的代碼由編譯器生成。
Copy// getg returns the pointer to the current g.// The compiler rewrites calls to this function into instructions// that fetch the g directly (from TLS or from the dedicated register).func getg() *g
注釋里也說了,getg 返回當前正在運行的 goroutine 的指針,它會從 tls 里取出 tls[0],也就是當前運行的 goroutine 的地址。編譯器插入類似下面的代碼:
Copyget_tls(CX)
MOVQ g(CX), BX; // BX 存器里面現在放的是當前 g 結構體對象的地址
原來是這么個意思。
調度器初始化大致過程:M 初始化 --》 P 初始化 - -》 G 初始化mcommoninit Procresize newproc-------------------------------------------------------allm 池 allp 池 g.sched 執行現場p.runq 調度隊列
M/P/G 初始化:mcommoninit、procresize、newproc,他們負責 M 資源池(allm)、p 資源池(allp)、G 的運行現場(g.sched) 以及調度隊列(p.runq)
調度循環
所有的工作初始化完成后,就要啟動運行器了。準備工作做好了,就要啟動 mstart 了。這個工作在匯編語言中也可以看出來
/src/runtime/asm_amd64.s (在 linux 下)
CopyTEXT runtime·rt0_go(SB),NOSPLIT,$0
。。. 。。. 。。.
MOVL 16(SP), AX // copy argc
MOVL AX, 0(SP)
MOVQ 24(SP), AX // copy argv
MOVQ AX, 8(SP)
CALL runtime·args(SB)
CALL runtime·osinit(SB) //OS 初始化
CALL runtime·schedinit(SB) //調度器初始化
// create a new goroutine to start program
MOVQ $runtime·mainPC(SB), AX // entry
PUSHQ AX
PUSHQ $0 // arg size
CALL runtime·newproc(SB) // G 初始化
POPQ AX
// start this M , 啟動 M
CALL runtime·mstart(SB)
CALL runtime·abort(SB) // mstart should never return
RET
轉自:九卷
cnblogs.com/jiujuan/p/12977832.html
編輯:jq
-
GMP
+關注
關注
0文章
11瀏覽量
8991 -
源碼
+關注
關注
8文章
652瀏覽量
29370 -
代碼
+關注
關注
30文章
4823瀏覽量
68904
原文標題:深入理解 Go scheduler 調度器:GPM 源碼分析
文章出處:【微信號:LinuxHub,微信公眾號:Linux愛好者】歡迎添加關注!文章轉載請注明出處。
發布評論請先 登錄
相關推薦
評論