博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
LSTM基础
阅读量:5263 次
发布时间:2019-06-14

本文共 3519 字,大约阅读时间需要 11 分钟。

LSTM BASICS

understand the benefits and problems it solves, and its inner workings and calculations.

1.The Problem to be Solved

RNN

RNN’s Problem

computationally expensive to maintain the state for a large amount of units;
very sensitive to changes in their parameters;
Exploding Gradient and Vanishing Gradient;

2.Long Short-Term Memory

lstm

you have a linear unit, which is the information cell itself, surrounded by three logistic gates responsible for maintaining the data.the “Input” or “Write” Gate, which handles the writing of data into the information cell, the “Output” or “Read” Gate, which handles the sending of data back onto the Recurrent Network, and the “Keep” or “Forget” Gate, which handles the maintaining and modification of the data stored in the information cell.

3.RNN with LSTM

rnn_lstm

4.an usual flow of operations for the LSTM unit

closelook_RNNLSTM

First off, the Keep Gate has to decide whether to keep or forget the data currently stored in memory. It receives both the input and the state of the Recurrent Network, and passes it through its Sigmoid activation. A value of 1 means that the LSTM unit should keep the data stored perfectly and a value of 0 means that it should forget it entirely.
Consider St1St−1 as the incoming (previous) state, xtxt as the incoming input, and WkWk , BkBk as the weight and bias for the Keep Gate. consider Oldt1Oldt−1 as the data previously in memory.

Kt=σ(Wk×[St1,xt]+Bk)Kt=σ(Wk×[St−1,xt]+Bk)
Oldt=Kt×Oldt1Oldt=Kt×Oldt−1

keep
Then, the input and state are passed on to the Input Gate, in which there is another Sigmoid activation applied. Concurrently, the input is processed as normal by whatever processing unit is implemented in the network, and then multiplied by the Sigmoid activation’s result, much like the Keep Gate. Consider
WiWi and
BiBi as the weight and bias for the Input Gate, and
CtCt the result of the processing of the inputs by the Recurrent Network.
It=σ(Wi×[St1,xt]+Bi)It=σ(Wi×[St−1,xt]+Bi)
Newt=It×CtNewt=It×Ct
NewtNewt is the new data to be input into the memory cell. This is then added to whatever value is still stored in memory.
Cellt=Oldt+NewtCellt=Oldt+Newt
CelltCellt is the candidate data which is to be kept in the memory cell.what would happen if the keep Gate was set to 0 and the Input Gate was set to 1:
Oldt=0×Oldt1Oldt=0×Oldt−1
Newt=1×CtNewt=1×Ct
Cellt=CtCellt=Ct
The old data would be totally forgotten and the new data would overwrite it completely.
write
For the output gate,To decide what we should output, we take the input data and state and pass it through a Sigmoid function as usual. The contents of our memory cell, however, are pushed onto a Tanh function to bind them between a value of -1 to 1. Consider
WoWo and
BoBo as the weight and bias for the Output Gate.
Ot=σ(Wo×[St1,xt]+Bo)Ot=σ(Wo×[St−1,xt]+Bo)
Outputt=Ot×tanh(Cellt)Outputt=Ot×tanh(Cellt)
output
And that
Output+tOutput+t is what is output into the Recurrent Network.

5.why all three gates are logistic?

logistic

(1)it is very easy to backpropagate through them.

(2)solves the gradient problems by being able to manipulate values through the gates themselves – by passing the inputs and outputs through the gates, we have now a easily derivable function modifying our inputs.
(3)In regards to the problem of storing many states over a long period of time, LSTM handles this perfectly by only keeping whatever information is necessary and forgetting it whenever it is not needed anymore.

Deep Learning with TensorFlow IBM Cognitive Class ML0120EN

转载于:https://www.cnblogs.com/siucaan/p/9623108.html

你可能感兴趣的文章
【读书笔记】C#高级编程 第三章 对象和类型
查看>>
针对sl的ICSharpCode.SharpZipLib,只保留zip,gzip的流压缩、解压缩功能
查看>>
【转】代码中特殊的注释技术——TODO、FIXME和XXX的用处
查看>>
【SVM】libsvm-python
查看>>
Jmeter接口压力测试,Java.net.BindException: Address already in use: connect
查看>>
Leetcode Balanced Binary Tree
查看>>
九.python面向对象(双下方法内置方法)
查看>>
go:channel(未完)
查看>>
[JS]递归对象或数组
查看>>
LeetCode(17) - Letter Combinations of a Phone Number
查看>>
Linux查找命令对比(find、locate、whereis、which、type、grep)
查看>>
路由器外接硬盘做nas可行吗?
查看>>
python:从迭代器,到生成器,再到协程的示例代码
查看>>
Java多线程系列——原子类的实现(CAS算法)
查看>>
在Ubuntu下配置Apache多域名服务器
查看>>
多线程《三》进程与线程的区别
查看>>
linux sed命令
查看>>
html标签的嵌套规则
查看>>
[Source] Machine Learning Gathering/Surveys
查看>>
HTML <select> 标签
查看>>