日本搞逼视频_黄色一级片免费在线观看_色99久久_性明星video另类hd_欧美77_综合在线视频

國內(nèi)最全I(xiàn)T社區(qū)平臺(tái) 聯(lián)系我們 | 收藏本站
阿里云優(yōu)惠2
您當(dāng)前位置:首頁 > php開源 > php教程 > [置頂] rwthlm分析(五)之LSTM結(jié)構(gòu)

[置頂] rwthlm分析(五)之LSTM結(jié)構(gòu)

來源:程序員人生   發(fā)布時(shí)間:2015-05-22 07:57:33 閱讀次數(shù):5383次

第5篇依然介紹隱層,這1篇實(shí)際上是我最初要學(xué)習(xí)的主要內(nèi)容――LSTM,LSTM的效果比rnn好,rnn存在的1個(gè)問題就是誤差梯度會(huì)隨著往前時(shí)刻深度的增加而逐步減少消失,這樣rnn的學(xué)習(xí)算法BPTT的深度就有了限制。LSTM解決了這樣的問題,關(guān)于LSTM的結(jié)構(gòu)的擴(kuò)大也有幾個(gè)階段,這篇不會(huì)再去詳細(xì)介紹LSTM了,關(guān)于LSTM更詳細(xì)的介紹可以看看我寫的另外1篇博客。依然和前面1樣,自己的認(rèn)知與理解有限,哪里寫的不對(duì)的還請(qǐng)看到的朋友指出,再次謝過~

LSTM的實(shí)現(xiàn)在lstm.cc里面,在rwthlm工具包里面,這是最核心的實(shí)現(xiàn),也是代碼量最大的部份,大概超過1000行代碼的實(shí)現(xiàn)。首先把lstm.cc的構(gòu)造函數(shù)放上來,其實(shí)通過構(gòu)造函數(shù)的初始化分配,就可以夠把LSTM的網(wǎng)絡(luò)結(jié)構(gòu)給畫出來,代碼以下:


LSTM::LSTM(const int input_dimension, const int output_dimension, const int max_batch_size, const int max_sequence_length, const bool use_bias) : Function(input_dimension, output_dimension, max_batch_size, max_sequence_length), sigmoid_(), tanh_() { //這里的1維數(shù)組依然是前面那種類似的結(jié)構(gòu) int size = output_dimension * max_batch_size * max_sequence_length; //lstm層的cell的輸出 b_ = FastMalloc(size); //保存cec的輸入輸出 cec_b_ = FastMalloc(size); //cell的輸入 cec_input_b_ = FastMalloc(size); //保存輸入控制門的輸入輸出 input_gate_b_ = FastMalloc(size); //保存遺忘控制門的輸入輸出 forget_gate_b_ = FastMalloc(size); //保存輸出控制門的輸入輸出 output_gate_b_ = FastMalloc(size); //_t_命名類指針都是會(huì)變動(dòng)的,用于表示時(shí)間的變化 b_t_ = b_; cec_input_b_t_ = cec_input_b_; cec_b_t_ = cec_b_; input_gate_b_t_ = input_gate_b_; forget_gate_b_t_ = forget_gate_b_; output_gate_b_t_ = output_gate_b_; //這里不明白為啥要重新賦值,上面定義size時(shí)不就初始化為這個(gè)了嘛 size = output_dimension * max_batch_size * max_sequence_length; //output gate的誤差信號(hào) cec_epsilon_ = FastMalloc(size); delta_ = FastMalloc(size); //輸入控制門的誤差 input_gate_delta_ = FastMalloc(size); //遺忘控制門的誤差 forget_gate_delta_ = FastMalloc(size); //輸出控制門的誤差 output_gate_delta_ = FastMalloc(size); //這里同上 cec_epsilon_t_ = cec_epsilon_; delta_t_ = delta_; input_gate_delta_t_ = input_gate_delta_; forget_gate_delta_t_ = forget_gate_delta_; output_gate_delta_t_ = output_gate_delta_; //std::cout << "input_dimension: " << input_dimension << " output_dimension: " << output_dimension << std::endl; //假定命令是myExample-i10-M12 //這里的input_dimension就是10,output_dimension就是12 size = input_dimension * output_dimension; //這里的權(quán)值僅僅是輸入層到該lstm層的 weights_ = FastMalloc(size); //下面控制門的權(quán)重僅僅是輸入層到控制門的 input_gate_weights_ = FastMalloc(size); forget_gate_weights_ = FastMalloc(size); output_gate_weights_ = FastMalloc(size); momentum_weights_ = FastMalloc(size); momentum_input_gate_weights_ = FastMalloc(size); momentum_forget_gate_weights_ = FastMalloc(size); momentum_output_gate_weights_ = FastMalloc(size); //這部份權(quán)重是循環(huán)結(jié)構(gòu)的,即前1時(shí)刻lstm層到當(dāng)前時(shí)刻lstm層的連接 size = output_dimension * output_dimension; recurrent_weights_ = FastMalloc(size); input_gate_recurrent_weights_ = FastMalloc(size); forget_gate_recurrent_weights_ = FastMalloc(size); output_gate_recurrent_weights_ = FastMalloc(size); momentum_recurrent_weights_ = FastMalloc(size); momentum_input_gate_recurrent_weights_ = FastMalloc(size); momentum_forget_gate_recurrent_weights_ = FastMalloc(size); momentum_output_gate_recurrent_weights_ = FastMalloc(size); //從上面的分配來看,容易知道控制門的輸入來自于3部份: 1.輸入層的輸出 2.本層的前1時(shí)刻輸出 3.來自cec狀態(tài)的前1時(shí)刻輸出 //lstm層的輸入自于這兩部份:1.輸入層的輸出 2.本層的前1時(shí)刻輸出 //peephole connection,這是從cec到gate的連接 input_gate_peephole_weights_ = FastMalloc(output_dimension); forget_gate_peephole_weights_ = FastMalloc(output_dimension); output_gate_peephole_weights_ = FastMalloc(output_dimension); momentum_input_gate_peephole_weights_ = FastMalloc(output_dimension); momentum_forget_gate_peephole_weights_ = FastMalloc(output_dimension); momentum_output_gate_peephole_weights_ = FastMalloc(output_dimension); //從這里的分配來看,能夠知道lstm層內(nèi)部的結(jié)構(gòu): //output_dimension的大小即是block的大小,每一個(gè)block大小包括1個(gè)cell,1個(gè)cell里面包括1個(gè)cec //即output_dimension的大小就是cec個(gè)數(shù),每一個(gè)cec與3個(gè)gate連接 //bias的設(shè)置 bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; input_gate_bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; forget_gate_bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; output_gate_bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; momentum_bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; momentum_input_gate_bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; momentum_forget_gate_bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; momentum_output_gate_bias_ = use_bias ? FastMalloc(output_dimension) : nullptr; }

從代碼來看,能夠得到LSTM網(wǎng)絡(luò)的結(jié)構(gòu)以下圖:



全部LSTM的結(jié)構(gòu)是每一個(gè)block(圖中方框)包括1個(gè)cell(從tanh到tanh部份), 每一個(gè)cell包括1個(gè)cec(圖中紅色圓圈)  (PS: 這個(gè)圖沒畫全,畫了第1個(gè)block就已覺的很復(fù)雜了,加上第2個(gè)估計(jì)會(huì)被連接線繞昏頭了,而且比較費(fèi)時(shí),如果覺的不錯(cuò),就點(diǎn)個(gè)贊吧,哈哈,開玩笑:D)

從代碼的履行來看,容易知道每一個(gè)block的輸入來自于兩部份:

  1. 輸入層的輸出
  2. 本層前1時(shí)刻block的輸出

gate的輸入來自于3部份: 

  1. 輸入層的輸出 
  2. 本層的前1時(shí)刻block輸出 
  3. 來自cec狀態(tài)的前1時(shí)刻輸出(對(duì)input,forget gate而言, 對(duì)output gate,來自于當(dāng)前時(shí)刻cec輸出)
cec的輸入來自于兩部份:
  1. block的輸入,和input gate的輸出
  2. cec前1時(shí)刻的輸出,和forget的輸出
另外LSTM結(jié)構(gòu)的前向計(jì)算的順序很重要,必須依照下面的來:
  1. input gate, forget gate的輸入輸出
  2. cell的輸入
  3. output gate的輸入輸出
  4. cell的輸出(這里也是block的輸出)
誤差的流向情況就把箭頭反過來便可,這里另外提1點(diǎn),在Felix提出peephole connection時(shí),他的論文里面寫到?jīng)]有誤差從cec通過peephole connection流向gate,但是在rwthlm里有點(diǎn)不1樣,這里誤差是流過去了的,而且peephole weight也會(huì)用BPTT學(xué)習(xí)算法來更新其權(quán)值。最后這里的學(xué)習(xí)算法是FULL BPTT,在之前Felix,Hochreiter所做實(shí)驗(yàn)用的LSTM網(wǎng)絡(luò)學(xué)習(xí)算法是截?cái)嗟腂PTT + RTRL來進(jìn)行更新的,我認(rèn)為FULL BPTT反而還簡單些,最少在公式看起來更容易懂吧,看前者的推導(dǎo)時(shí),頭腦里面1片茫然,大版大版的數(shù)學(xué)公式讓我眼神恍忽誒。好啦,結(jié)合著前面寫的,然后下面的lstm.cc的核心實(shí)現(xiàn)代碼理解起來就比較容易了,帶著注釋貼上來,本篇終了。

const Real *LSTM::Evaluate(const Slice &slice, const Real x[]) { //形參x依然表示前層的輸入 //start為真表示起始時(shí)刻 const bool start = b_t_ == b_; //OpenMP提供的并行功能 //下面兩個(gè)section同時(shí)并行 #pragma omp parallel sections { //在帶有peephole connection的lstm結(jié)構(gòu)中,前向計(jì)算的順序有要求 //1.先必須計(jì)算input gate和forget gate的輸出 //2.計(jì)算cell輸入和cec的狀態(tài) //3.計(jì)算output gate的輸出 //4.計(jì)算cell的輸出 #pragma omp section //注意這里start的作用,起始時(shí)刻時(shí),gate輸入本來是包括peephole前1時(shí)刻cec的輸出,和前1時(shí)刻層的輸入兩部份的 //但由于起始時(shí)刻,它們⑴時(shí)刻的輸出狀態(tài)相當(dāng)于0,這里不做計(jì)算 //只有t>0,即非起始時(shí)刻后,才會(huì)有前1時(shí)刻的輸出 //計(jì)算input gate的輸出 EvaluateSubUnit(slice.size(), input_gate_weights_, input_gate_bias_, start ? nullptr : input_gate_recurrent_weights_, start ? nullptr : input_gate_peephole_weights_, x, b_t_ - GetOffset(), cec_b_t_ - GetOffset(), input_gate_b_t_, &sigmoid_); #pragma omp section //計(jì)算forget的輸出 EvaluateSubUnit(slice.size(), forget_gate_weights_, forget_gate_bias_, start ? nullptr : forget_gate_recurrent_weights_, start ? nullptr : forget_gate_peephole_weights_, x, b_t_ - GetOffset(), cec_b_t_ - GetOffset(), forget_gate_b_t_, &sigmoid_); } //計(jì)算cell的輸入,它的輸入來自于兩部份,1部份是輸入層,1部份是前1時(shí)刻本層的輸出 EvaluateSubUnit(slice.size(), weights_, bias_, start ? nullptr : recurrent_weights_, nullptr, x, b_t_ - GetOffset(), nullptr, cec_input_b_t_, &tanh_); const int size = slice.size() * output_dimension(); //cec_b_t_ <= cec_input_b_t_ * input_gate_b_t_ //計(jì)算cec的輸入 FastMultiply(input_gate_b_t_, size, cec_input_b_t_, cec_b_t_); //非起始時(shí)刻履行,這里這樣限制的緣由是cec的輸入來自于cell輸入的1部份,還有cec前1狀態(tài)的輸出 //如果并不是起始時(shí)刻,是不存在cec前1狀態(tài)的輸出的 //另外要注意,cec的結(jié)構(gòu)是線性的,即為了保證誤差的常數(shù)流,激活函數(shù)用的是f(x) = x //所以計(jì)算cec的輸入后,自然也是它的輸出 if (!start) { //cec_b_t_ <= cec_b_t_ + forget_gate_b_t_*cec_b_(t⑴)_ FastMultiplyAdd(forget_gate_b_t_, size, cec_b_t_ - GetOffset(), cec_b_t_); } //計(jì)算output gate的輸出 EvaluateSubUnit(slice.size(), output_gate_weights_, output_gate_bias_, start ? nullptr : output_gate_recurrent_weights_, output_gate_peephole_weights_, x, b_t_ - GetOffset(), cec_b_t_, output_gate_b_t_, &sigmoid_); //這里將cec的輸出拷貝到b_t_上了 FastCopy(cec_b_t_, size, b_t_); //cec的輸出經(jīng)過tanh函數(shù)的緊縮 tanh_.Evaluate(output_dimension(), slice.size(), b_t_); //現(xiàn)在b_t_是全部cell的輸出 FastMultiply(b_t_, size, output_gate_b_t_, b_t_); const Real *result = b_t_; b_t_ += GetOffset(); cec_input_b_t_ += GetOffset(); cec_b_t_ += GetOffset(); input_gate_b_t_ += GetOffset(); forget_gate_b_t_ += GetOffset(); output_gate_b_t_ += GetOffset(); return result; } //該函數(shù)是計(jì)算lstm層的輸出 void LSTM::EvaluateSubUnit(const int batch_size, const Real weights[], const Real bias[], const Real recurrent_weights[], const Real peephole_weights[], const Real x[], const Real recurrent_b_t[], const Real cec_b_t[], Real b_t[], ActivationFunction *activation_function) { //存在偏置,復(fù)制過去,在下次計(jì)算時(shí)就相當(dāng)于把偏置加上去了 if (bias) { for (int i = 0; i < batch_size; ++i) FastCopy(bias, output_dimension(), b_t + i * output_dimension()); } //b_t <= b_t + weights * x //這里計(jì)算層的輸入 FastMatrixMatrixMultiply(1.0, weights, false, output_dimension(), input_dimension(), x, false, batch_size, b_t); //非起始時(shí)刻 //b_t <= b_t + recurrent_weights * recurrent_b_t //這部份層的輸入來自上1時(shí)刻層的輸出乘以recurrent_weights if (recurrent_weights) { FastMatrixMatrixMultiply(1.0, recurrent_weights, false, output_dimension(), output_dimension(), recurrent_b_t, false, batch_size, b_t); } //非起始時(shí)刻 if (peephole_weights) { #pragma omp parallel for for (int i = 0; i < batch_size; ++i) { //b_t <= b_t + peephole_weights * cec_b_t //這里gate的輸入來自于cec的部份 FastMultiplyAdd(peephole_weights, output_dimension(), cec_b_t + i * output_dimension(), b_t + i * output_dimension()); } } //上面計(jì)算的b_t_都是輸入,下面這步后經(jīng)過了相應(yīng)激活函數(shù),變成了輸出 activation_function->Evaluate(output_dimension(), batch_size, b_t); } void LSTM::ComputeDelta(const Slice &slice, FunctionPointer f) { //從時(shí)刻t到0 b_t_ -= GetOffset(); cec_input_b_t_ -= GetOffset(); cec_b_t_ -= GetOffset(); input_gate_b_t_ -= GetOffset(); forget_gate_b_t_ -= GetOffset(); output_gate_b_t_ -= GetOffset(); // cell outputs //計(jì)算輸出層傳到lstm層的誤差delta_t_ f->AddDelta(slice, delta_t_); //并不是句子末尾,如果當(dāng)前時(shí)刻為t,要存在t+1時(shí)刻的相干計(jì)算 if (delta_t_ != delta_) { //delta_t_ <= delta_t_ + recurrent_weights_ * delta_(t+1)_ //即計(jì)算t+1時(shí)刻lstm層的誤差傳到t時(shí)刻該層的誤差 FastMatrixMatrixMultiply(1.0, recurrent_weights_, true, output_dimension(), output_dimension(), delta_t_ - GetOffset(), false, slice.size(), delta_t_); //delta_t_ <= delta_t_ + input_gate_recurrent_weights_ * input_gate_delta_(t⑴)_ //input gate在t+1時(shí)刻的誤差傳到t時(shí)刻該層 FastMatrixMatrixMultiply(1.0, input_gate_recurrent_weights_, true, output_dimension(), output_dimension(), input_gate_delta_t_ - GetOffset(), false, slice.size(), delta_t_); //delta_t_ <= delta_t_ + forget_gate_recurrent_weights_ * forget_gate_delta_(t⑴)_ //forget gate在t+1時(shí)刻的誤差傳到t時(shí)刻該層 FastMatrixMatrixMultiply(1.0, forget_gate_recurrent_weights_, true, output_dimension(), output_dimension(), forget_gate_delta_t_ - GetOffset(), false, slice.size(), delta_t_); //delta_t_ <= delta_t_ + output_gate_recurrent_weights_ * output_gate_delta_(t⑴)_ //output gate在t+1時(shí)刻的誤差傳到t時(shí)刻該層 FastMatrixMatrixMultiply(1.0, output_gate_recurrent_weights_, true, output_dimension(), output_dimension(), output_gate_delta_t_ - GetOffset(), false, slice.size(), delta_t_); } //到這里delta_t_表示到達(dá)lstm層的誤差,如果記L為目標(biāo)函數(shù),b為lstm層cell的輸出 //現(xiàn)在delta_t_寄存的是?L/?b // output gates, part I const int size = slice.size() * output_dimension(); //將cec的輸出復(fù)制到output_gate_delta_t_ FastCopy(cec_b_t_, size, output_gate_delta_t_); //cec的輸出經(jīng)過tanh函數(shù),依然寄存到output_gate_delta_t_ tanh_.Evaluate(output_dimension(), slice.size(), output_gate_delta_t_); // states, part I //cec_epsilon_t_ <= output_gate_b_t_ * delta_t_ //這行語句是計(jì)算到達(dá)輸出控制門那兒的激活函數(shù)前的誤差 FastMultiply(output_gate_b_t_, size, delta_t_, cec_epsilon_t_); //下面計(jì)算的是到達(dá)cec的誤差,寄存在cec_epsilon_t_,這只是流向cec誤差的其中1部份 tanh_.MultiplyDerivative(output_dimension(), slice.size(), output_gate_delta_t_, cec_epsilon_t_); // output gates, part II //output_gate_delta_t_ <= output_gate_delta_t_ * delta_t_ //這行語句是計(jì)算到達(dá)output gate的誤差 FastMultiply(output_gate_delta_t_, size, delta_t_, output_gate_delta_t_); //下面計(jì)算的是output gate的誤差信號(hào),寄存在output_gate_delta_t_ sigmoid_.MultiplyDerivative(output_dimension(), slice.size(), output_gate_b_t_, output_gate_delta_t_); // states, part II #pragma omp parallel for for (int i = 0; i < (int) slice.size(); ++i) { //cec_epsilon_t_ <= cec_epsilon_t_ + output_gate_peephole_weights_ * output_gate_delta_t_ //這部份是output gate的誤差信號(hào)流過來的 FastMultiplyAdd(output_gate_peephole_weights_, output_dimension(), output_gate_delta_t_ + i * output_dimension(), cec_epsilon_t_ + i * output_dimension()); } //即非最末時(shí)刻 if (delta_t_ != delta_) { //cec_epsilon_t_ <= cec_epsilon_t_ + forget_gate_b_(t+1)_ * cec_epsilon_(t+1)_ //這部份是從cec的t+1時(shí)刻那兒流過來的誤差 FastMultiplyAdd(forget_gate_b_t_ + GetOffset(), size, cec_epsilon_t_ - GetOffset(), cec_epsilon_t_); #pragma omp parallel for for (int i = 0; i < (int) slice.size(); ++i) { //cec_epsilon_t_ <= cec_epsilon_t_ + input_gate_peephole_weights_ * input_gate_delta_(t+1)_ //從input gate那兒流過來的誤差 FastMultiplyAdd(input_gate_peephole_weights_, output_dimension(), input_gate_delta_t_ - GetOffset() + i * output_dimension(), cec_epsilon_t_ + i * output_dimension()); //從forget gate那兒流過來的誤差 FastMultiplyAdd( forget_gate_peephole_weights_, output_dimension(), forget_gate_delta_t_ - GetOffset() + i * output_dimension(), cec_epsilon_t_ + i * output_dimension()); } } // cells //delta_t_ <= input_gate_b_t_ * cec_epsilon_t_ //下面兩句計(jì)算cell輸入處的誤差信號(hào) FastMultiply(input_gate_b_t_, size, cec_epsilon_t_, delta_t_); tanh_.MultiplyDerivative(output_dimension(), slice.size(), cec_input_b_t_, delta_t_); //到現(xiàn)在delta_t_表示cell輸入處的誤差信號(hào) #pragma omp parallel sections { #pragma omp section { // forget gates if (b_t_ != b_) { //forget_gate_delta_t_ <= cec_epsilon_t_ * cec_b_(t⑴)_ //流向forget gate的誤差 FastMultiply(cec_b_t_ - GetOffset(), size, cec_epsilon_t_, forget_gate_delta_t_); //計(jì)算forget gate的誤差信號(hào) sigmoid_.MultiplyDerivative(output_dimension(), slice.size(), forget_gate_b_t_, forget_gate_delta_t_); } } #pragma omp section { // input gates //input_gate_delta_t_ <= cec_epsilon_t_ * cec_input_b_t_ //流向input gate的誤差 FastMultiply(cec_epsilon_t_, size, cec_input_b_t_, input_gate_delta_t_); //計(jì)算input gate的誤差信號(hào) sigmoid_.MultiplyDerivative(output_dimension(), slice.size(), input_gate_b_t_, input_gate_delta_t_); } } } //計(jì)算流向輸入層的誤差 void LSTM::AddDelta(const Slice &slice, Real delta_t[]) { //delta_t <= delta_t + weights_ * delta_t_ //這里cell輸入處的誤差信號(hào),流向輸入層 FastMatrixMatrixMultiply(1.0, weights_, true, input_dimension(), output_dimension(), delta_t_, false, slice.size(), delta_t); //delta_t <= input_gate_delta_t_ * input_gate_weights_ + delta_t //input gate的誤差信號(hào)流向輸入層部份 FastMatrixMatrixMultiply(1.0, input_gate_weights_, true, input_dimension(), output_dimension(), input_gate_delta_t_, false, slice.size(), delta_t); //delta_t <= forget_gate_delta_t_ * forget_gate_weights_ + delta_t //forget gate的誤差信號(hào)流向輸入層部份 FastMatrixMatrixMultiply(1.0, forget_gate_weights_, true, input_dimension(), output_dimension(), forget_gate_delta_t_, false, slice.size(), delta_t); //delta_t <= output_gate_delta_t_ * output_gate_weights_ + delta_t //output gate的誤差信號(hào)流向輸入層部份 FastMatrixMatrixMultiply(1.0, output_gate_weights_, true, input_dimension(), output_dimension(), output_gate_delta_t_, false, slice.size(), delta_t); //t+1時(shí)刻 -> t時(shí)刻 cec_epsilon_t_ += GetOffset(); delta_t_ += GetOffset(); input_gate_delta_t_ += GetOffset(); forget_gate_delta_t_ += GetOffset(); output_gate_delta_t_ += GetOffset(); } const Real *LSTM::UpdateWeights(const Slice &slice, const Real learning_rate, const Real x[]) { const int size = slice.size() * output_dimension(); //0到末尾時(shí)刻 cec_epsilon_t_ -= GetOffset(); delta_t_ -= GetOffset(); input_gate_delta_t_ -= GetOffset(); forget_gate_delta_t_ -= GetOffset(); output_gate_delta_t_ -= GetOffset(); #pragma omp parallel sections { #pragma omp section { if (bias_) { for (size_t i = 0; i < slice.size(); ++i) { //momentum_bias_ <= -learning_rate*delta_t_ + momentum_bias_ //這是對(duì)cell的bias的改變量累加 FastMultiplyByConstantAdd(-learning_rate, delta_t_ + i * output_dimension(), output_dimension(), momentum_bias_); } } } #pragma omp section { if (input_gate_bias_) { //momentum_input_gate_bias_ <= -learning_rate*input_gate_delta_t_ + momentum_input_gate_bias_ //這是對(duì)input gate的bias改變量累加 for (size_t i = 0; i < slice.size(); ++i) { FastMultiplyByConstantAdd(-learning_rate, input_gate_delta_t_ + i * output_dimension(), output_dimension(), momentum_input_gate_bias_); } } } #pragma omp section { //momentum_forget_gate_bias_ <= -learning_rate*forget_gate_delta_t_ + momentum_forget_gate_bias_ //這是對(duì) forget gate的bias改變量累加 if (forget_gate_bias_) { for (size_t i = 0; i < slice.size(); ++i) { FastMultiplyByConstantAdd(-learning_rate, forget_gate_delta_t_ + i * output_dimension(), output_dimension(), momentum_forget_gate_bias_); } } } #pragma omp section { //momentum_output_gate_bias_ <= -learning_rate*output_gate_delta_t_ + momentum_output_gate_bias_ //這是對(duì) output gate的bias改變量累加 if (output_gate_bias_) { for (size_t i = 0; i < slice.size(); ++i) { FastMultiplyByConstantAdd(-learning_rate, output_gate_delta_t_ + i * output_dimension(), output_dimension(), momentum_output_gate_bias_); } } } //以上部份是計(jì)算各個(gè)bias的改變量,但并未真正改變bias #pragma omp section { //momentum_weights_ <= -learning_rate * delta_t_ * x + momentum_weights_ //這是計(jì)算輸入層到lstm層權(quán)重的改變量 FastMatrixMatrixMultiply(-learning_rate, delta_t_, false, output_dimension(), slice.size(), x, true, input_dimension(), momentum_weights_); } #pragma omp section { //momentum_input_gate_weights_<= -learning_rate * input_gate_delta_t_ * x + momentum_input_gate_weights_ //這是計(jì)算輸入層到 input gate 權(quán)重的改變量 FastMatrixMatrixMultiply(-learning_rate, input_gate_delta_t_, false, output_dimension(), slice.size(), x, true, input_dimension(), momentum_input_gate_weights_); } #pragma omp section { //momentum_forget_gate_weights_<= -learning_rate * forget_gate_delta_t_ * x + momentum_forget_gate_weights_ //這是計(jì)算輸入層到 forget gate 權(quán)重的改變量 FastMatrixMatrixMultiply(-learning_rate, forget_gate_delta_t_, false, output_dimension(), slice.size(), x, true, input_dimension(), momentum_forget_gate_weights_); } #pragma omp section { //momentum_output_gate_weights_<= -learning_rate * output_gate_delta_t_ * x + momentum_output_gate_weights_ //這是計(jì)算輸入層到 output gate 權(quán)重的改變量 FastMatrixMatrixMultiply(-learning_rate, output_gate_delta_t_, false, output_dimension(), slice.size(), x, true, input_dimension(), momentum_output_gate_weights_); } #pragma omp section { //momentum_recurrent_weights_<= -learning_rate * delta_t_ * b_(t⑴)_ + momentum_recurrent_weights_ //這是計(jì)算t⑴時(shí)刻lstm層到 t時(shí)刻本身權(quán)重的改變量 if (b_t_ != b_) { FastMatrixMatrixMultiply(-learning_rate, delta_t_, false, output_dimension(), slice.size(), b_t_ - GetOffset(), true, output_dimension(), momentum_recurrent_weights_); } } #pragma omp section { //momentum_input_gate_recurrent_weights_<= -learning_rate * input_gate_delta_t_ * b_(t⑴)_ + momentum_input_gate_recurrent_weights_ //這是計(jì)算t⑴時(shí)刻lstm層到 t時(shí)刻 input gate權(quán)重的改變量 if (b_t_ != b_) { FastMatrixMatrixMultiply(-learning_rate, input_gate_delta_t_, false, output_dimension(), slice.size(), b_t_ - GetOffset(), true, output_dimension(), momentum_input_gate_recurrent_weights_); } } #pragma omp section { //momentum_forget_gate_recurrent_weights_<= -learning_rate * forget_gate_delta_t_ * b_(t⑴)_ + momentum_forget_gate_recurrent_weights_ //這是計(jì)算t⑴時(shí)刻lstm層到 t時(shí)刻 forget gate權(quán)重的改變量 if (b_t_ != b_) { FastMatrixMatrixMultiply(-learning_rate, forget_gate_delta_t_, false, output_dimension(), slice.size(), b_t_ - GetOffset(), true, output_dimension(), momentum_forget_gate_recurrent_weights_); } } #pragma omp section { //momentum_output_gate_recurrent_weights_<= -learning_rate * output_gate_delta_t_ * b_(t⑴)_ + momentum_output_gate_recurrent_weights_ //這是計(jì)算t⑴時(shí)刻lstm層到 t時(shí)刻 output gate權(quán)重的改變量 if (b_t_ != b_) { FastMatrixMatrixMultiply(-learning_rate, output_gate_delta_t_, false, output_dimension(), slice.size(), b_t_ - GetOffset(), true, output_dimension(), momentum_output_gate_recurrent_weights_); } } //注意上面改變分為3部份:1.計(jì)算bias的改變量 2.計(jì)算輸入層到cell各部份的權(quán)值改變量 3.計(jì)算t⑴時(shí)刻cell到t時(shí)刻cell各部份權(quán)重改變量 } #pragma omp parallel sections { #pragma omp section { if (b_t_ != b_) { // destroys ..._gate_delta_t_, but this will not be used later anyway //input_gate_delta_t_ <= -learning_rate*input_gate_delta_t_ //下面計(jì)算后,就破壞了input gate的誤差信號(hào)值了,不過后面也不會(huì)再使用了。 FastMultiplyByConstant(input_gate_delta_t_, size, -learning_rate, input_gate_delta_t_); for (size_t i = 0; i < slice.size(); ++i) { //momentum_input_gate_peephole_weights_ <= momentum_input_gate_peephole_weights_ + input_gate_delta_t_ * cec_b_(t⑴)_ //計(jì)算 input gate到cec的權(quán)值改變量 FastMultiplyAdd(input_gate_delta_t_ + i * output_dimension(), output_dimension(), cec_b_t_ - GetOffset() + i * output_dimension(), momentum_input_gate_peephole_weights_); } } } #pragma omp section { if (b_t_ != b_) { //forget_gate_delta_t_ <= -learning_rate*forget_gate_delta_t_ FastMultiplyByConstant(forget_gate_delta_t_, size, -learning_rate, forget_gate_delta_t_); //momentum_forget_gate_peephole_weights_ <= momentum_forget_gate_peephole_weights_ + forget_gate_delta_t_ * cec_b_(t⑴)_ //計(jì)算 forget gate到cec的權(quán)值改變量 for (size_t i = 0; i < slice.size(); ++i) { FastMultiplyAdd(forget_gate_delta_t_ + i * output_dimension(), output_dimension(), cec_b_t_ - GetOffset() + i * output_dimension(), momentum_forget_gate_peephole_weights_); } } } #pragma omp section { //output_gate_delta_t_ <= -learning_rate*output_gate_delta_t_ FastMultiplyByConstant(output_gate_delta_t_, size, -learning_rate, output_gate_delta_t_); //momentum_output_gate_peephole_weights_ <= momentum_output_gate_peephole_weights_ + output_gate_delta_t_ * cec_b_(t⑴)_ //計(jì)算 forget gate到cec的權(quán)值改變量 for (size_t i = 0; i < slice.size(); ++i) { FastMultiplyAdd(output_gate_delta_t_ + i * output_dimension(), output_dimension(), cec_b_t_ + i * output_dimension(), momentum_output_gate_peephole_weights_); } } } const Real *result = b_t_; // let b_t_ point to next time step //朝下1個(gè)時(shí)刻走 b_t_ += GetOffset(); cec_input_b_t_ += GetOffset(); cec_b_t_ += GetOffset(); input_gate_b_t_ += GetOffset(); forget_gate_b_t_ += GetOffset(); output_gate_b_t_ += GetOffset(); return result; }



生活不易,碼農(nóng)辛苦
如果您覺得本網(wǎng)站對(duì)您的學(xué)習(xí)有所幫助,可以手機(jī)掃描二維碼進(jìn)行捐贈(zèng)
程序員人生
------分隔線----------------------------
分享到:
------分隔線----------------------------
關(guān)閉
程序員人生
主站蜘蛛池模板: 99精品电影| 精产国产伦理一二三区 | 国产欧美在线观看 | 嫩草影院免费进入网站 | 欧美日本高清 | 成人97精品毛片免费看 | 99精品视频在线观看免费 | 日韩国产精品一区二区 | 欧美中文日韩 | 日韩在线不卡 | 手机看片福利视频 | 国产a毛片 | 91视频精品 | 91麻豆精品一区二区三区 | 日本一级在线观看 | 精品视频免费在线 | 视频黄色片 | 福利在线观看 | 精品久久久久久久久久久久久久久久久久久 | 一级片大全 | 能在线观看的黄色网址 | 国产一区不卡在线 | 国产免费区一区二区三视频免费 | av免费在线网站 | 在线电影一区二区三区 | 中文字幕国产亚洲 | 国产一区二区三区影视 | 久久国产精品免费视频 | 精品在线免费观看 | 免费的性爱视频 | 久久久久久国产精品久久 | 久久精品久久综合 | 一区二区电影网 | 中文字字幕在线中文乱码免费 | 婷婷综合一区 | 国内在线视频 | 欧美日韩国产一区二区在线观看 | 一级性黄色片 | 最新国产精品 | 久久99精品久久久久久按摩秒播 | 精品一区久久 |