本文共 16001 字,大约阅读时间需要 53 分钟。
主要对《深度学习框架PyTorch:入门与实践》进行学习:
相关资料:1.1 PyTorch的诞生
1.2 常见的深度学习框架介绍
1.3 属于动态图的未来
上的微积分:反向传播(Backpropagation)
反向传播是使训练深度模型在计算上变得容易的关键算法。对于现代神经网络来说,相对于简单的实现,它可以让梯度下降的训练速度快上千万倍。这就是一个模型需要一周的训练和需要20万年的区别
除了在深度学习中的应用,反向传播在许多其他领域也是一个强大的计算工具,从天气预报到数值稳定性分析,只是名称不同而已。事实上,这个算法在不同的领域已经被重新发明了至少几十次(见)。一般的、与应用无关的名称是"逆向模式分化" 从根本上来说,这是一种快速计算导数的技术。而这是你包里必不可少的技巧,不仅在深度学习中,而且在各种各样的数值计算情况下,都可以使用
计算图分为静态计算图和动态计算图,PyTorch使用的是动态图
1.4 为什么选择PyTorch
1.5 星火燎原
1.6 fast.ai放弃Keras+TensorFlow选择PyTorch
输入:
import torchx = torch.rand(5, 3)print(x)
显示:
tensor([[0.9732, 0.1357, 0.9145], [0.5854, 0.5963, 0.1376], [0.2889, 0.1247, 0.4303], [0.9021, 0.8625, 0.6613], [0.4179, 0.7646, 0.3817]])
说明安装成功
踩坑:
在C:\Windows\System32\drivers\etc\hosts
增加以下:
#github140.82.112.4 github.com199.232.68.133 raw.githubusercontent.com199.232.69.194 github.global.ssl.fastly.net185.199.108.153 assets-cdn.github.com185.199.110.153 assets-cdn.github.com185.199.111.153 assets-cdn.github.com
import torch as tx = t.Tensor(5,3) # 构建5*3矩阵,只分配空间,为初始化
tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
x = t.rand(5,3) # 构造一个随机初始化的矩阵,[0,1]均匀分布
tensor([[0.5630, 0.5831, 0.1832], [0.3755, 0.1908, 0.2778], [0.6050, 0.8965, 0.4945], [0.7089, 0.0270, 0.5071], [0.6807, 0.5496, 0.0910]])
x = t.zeros(5, 3, dtype=t.long) # 数据类型是 long
tensor([[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]])
x = t.tensor([5.5, 3]) # 构造一个张量,直接使用数据
tensor([5.5000, 3.0000])
x = x.new_ones(5, 3, dtype=t.double)print(x)x = t.randn_like(x, dtype=t.float) # 创建一个 tensor 基于已经存在的 tensorprint(x)
tensor([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], dtype=torch.float64)tensor([[ 0.2489, 0.7820, 1.7745], [ 0.3888, -0.0552, 0.3124], [ 1.0615, 0.1501, 0.8262], [-0.5375, 1.3126, -0.2629], [ 0.8085, -0.3260, 1.7282]])
print(x[:,1])
tensor([ 0.7820, -0.0552, 0.1501, 1.3126, -0.3260])
print(x.size())
torch.Size([5, 3])
x.size(0) # torch.Size 是一个元组
5
print(x.size()[0])
5
y = t.rand(5,3)
tensor([[0.6101, 0.8795, 0.0847], [0.2647, 0.0896, 0.2071], [0.5567, 0.5395, 0.7964], [0.3699, 0.4707, 0.9524], [0.4630, 0.4474, 0.4270]])
print(x+y) # 加法: 方式 1print(t.add(x,y)) # 加法: 方式2result = t.empty(5,3)t.add(x,y, out=result) # 加法: 提供一个输出 tensor 作为参数print(result)y.add_(x) # # adds x to yprint(y)
tensor([[ 0.8590, 1.6615, 1.8591], [ 0.6535, 0.0344, 0.5195], [ 1.6182, 0.6895, 1.6226], [-0.1676, 1.7833, 0.6895], [ 1.2715, 0.1214, 2.1552]])
x = t.randn(4,4)y = x.view(16) # 改变一个 tensor 的大小或者形状z = x.view(-1,8)print(x.size(), y.size(), z.size())
torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])
x = t.randn(1)print(x)print(x.item()) # 使用 .item() 来获得这个 value
tensor([0.1640])0.16402670741081238
Tensor 不支持的操作,可以先转成numpy数组处理,之后再转回Tensor
a = t.ones(5)b = a.numpy() # Tensor → Numpyb
array([1., 1., 1., 1., 1.], dtype=float32)
import numpy as npa = np.ones(5)b = t.from_numpy(a) # Numpy → Tensorb.add_(1)print(b)print(a)
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)[2. 2. 2. 2. 2.]
Tensor和Numpy对象共享内存,所以它们之间的转换很快,而且几乎不会消耗资源
import torchx = torch.ones(2,2, requires_grad=True) # 创建一个张量,设置 requires_grad=True 来跟踪与它相关的计算print(x)
tensor([[1., 1.], [1., 1.]], requires_grad=True)
y = x + 2 # y 作为操作的结果被创建,所以它有 grad_fnprint(y)
tensor([[3., 3.], [3., 3.]], grad_fn=)
print(y.grad_fn)
z = y * y * 3out = z.mean()print(z, out)
tensor([[27., 27.], [27., 27.]], grad_fn=) tensor(27., grad_fn= )
a = torch.randn(2,2)a = ((a * 3)/(a - 1))print(a.requires_grad)a.requires_grad_(True)print(a.requires_grad)b = (a * a).sum()print(b.grad_fn)
FalseTrue
out.backward()print(x.grad)
tensor([[4.5000, 4.5000], [4.5000, 4.5000]])
x = torch.randn(3, requires_grad=True)x
tensor([-0.3211, -0.7231, 0.9793], requires_grad=True)
y = x * 2while y.data.norm() < 1000: # 对y张量L2范数,先对y中每一项取平方,之后累加,最后取根号 y = y * 2print(y)
tensor([-328.8195, -740.4708, 1002.8046], grad_fn=)
y.backward()
报错:
RuntimeError: grad can be implicitly created only for scalar outputs
改为:
y = y.sum()y.backward()x.grad
tensor([512., 512., 512.])
或者:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)y.backward(v) # 要雅可比向量积print(x.grad)
tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])
print(x.requires_grad)print((x ** 2).requires_grad)with torch.no_grad(): print((x ** 2).requires_grad)
TrueTrueFalse
from torch.autograd import Variablex = Variable(torch.ones(2, 2), requires_grad=True)x
tensor([[1., 1.], [1., 1.]], requires_grad=True)
y = x.sum()y
tensor(4., grad_fn=)
y.grad_fn
y.backward() # 反向传播,计算梯度
x.grad
tensor([[1., 1.], [1., 1.]])
y.backward()x.grad
tensor([[2., 2.], [2., 2.]])
y.backward() # 每次运行反向传播,梯度都会累加之前的梯度,所以反向传播之前需要把梯度清零x.grad
tensor([[3., 3.], [3., 3.]])
x.grad.data.zero_()
tensor([[0., 0.], [0., 0.]])
y.backward()x.grad
tensor([[1., 1.], [1., 1.]])
x = Variable(torch.ones(4, 5)) # Variable和Tensor具有几乎一样的接口,实际使用中可以无缝切换y = torch.cos(x)x_tensor_cos = torch.cos(x.data)print(y)print(x_tensor_cos)
tensor([[0.5403, 0.5403, 0.5403, 0.5403, 0.5403], [0.5403, 0.5403, 0.5403, 0.5403, 0.5403], [0.5403, 0.5403, 0.5403, 0.5403, 0.5403], [0.5403, 0.5403, 0.5403, 0.5403, 0.5403]])tensor([[0.5403, 0.5403, 0.5403, 0.5403, 0.5403], [0.5403, 0.5403, 0.5403, 0.5403, 0.5403], [0.5403, 0.5403, 0.5403, 0.5403, 0.5403], [0.5403, 0.5403, 0.5403, 0.5403, 0.5403]])
import torchimport torch.nn as nnimport torch.nn.functional as Fclass Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel, 6 output channels, 5x5 square convolution # kernel self.conv1 = nn.Conv2d(1, 6, 5) self.conv2 = nn.Conv2d(6, 16, 5) # an affine operation: y = Wx + b self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): # 只要在nn.Module的子类中定义了forward函数,backward函数就会被自动实现(利用Autograd) # Max pooling over a (2, 2) window x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # If the size is a square you can only specify a single number x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features(self, x): size = x.size()[1:] # all dimensions except the batch dimension num_features = 1 for s in size: num_features *= s return num_featuresnet = Net()print(net)
Net( (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=400, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True))
torch.randn(1,1,32,32)
tensor([[[[ 0.9383, 0.2258, -1.6244, ..., 0.1584, 1.0838, -0.3517], [ 0.0886, 0.1057, -2.0140, ..., 0.9298, 0.3106, -0.7022], [-1.2740, -0.1253, -0.2867, ..., 0.6535, -0.6852, 0.5318], ..., [-0.9887, 0.5887, 1.6174, ..., -0.0556, -0.7984, -0.4886], [-1.3098, 0.0088, 0.1746, ..., 0.9128, 0.2602, 0.4438], [ 0.6318, -0.7189, -0.5646, ..., 0.5879, 0.7369, -0.3482]]]])
params = list(net.parameters())print(len(params))print(params[0].size()) # conv1's .weightfor name,parameters in net.named_parameters(): print(name,':',parameters.size())
10torch.Size([6, 1, 5, 5])conv1.weight : torch.Size([6, 1, 5, 5])conv1.bias : torch.Size([6])conv2.weight : torch.Size([16, 6, 5, 5])conv2.bias : torch.Size([16])fc1.weight : torch.Size([120, 400])fc1.bias : torch.Size([120])fc2.weight : torch.Size([84, 120])fc2.bias : torch.Size([84])fc3.weight : torch.Size([10, 84])fc3.bias : torch.Size([10])
input = torch.randn(1, 1, 32, 32)out = net(input)print(out)
tensor([[ 0.0509, 0.0137, 0.0444, -0.1061, -0.0869, 0.0608, -0.1220, 0.0101, -0.0129, 0.0442]], grad_fn=)
net.zero_grad()out.backward(torch.randn(1, 10))
output = net(input)target = torch.randn(10) # a dummy target, for exampletarget = target.view(1, -1) # make it the same shape as outputcriterion = nn.MSELoss()loss = criterion(output, target)print(loss)
计算图:input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss
tensor(1.0540, grad_fn=)
net.zero_grad() # zeroes the gradient buffers of all parametersprint('conv1.bias.grad before backward')print(net.conv1.bias.grad)loss.backward()print('conv1.bias.grad after backward')print(net.conv1.bias.grad)
conv1.bias.grad before backwardtensor([0., 0., 0., 0., 0., 0.])conv1.bias.grad after backwardtensor([ 0.0016, -0.0151, 0.0127, 0.0059, 0.0009, -0.0097])
import torch.optim as optim# create your optimizeroptimizer = optim.SGD(net.parameters(), lr=0.01)# in your training loop:optimizer.zero_grad() # zero the gradient buffersoutput = net(input)loss = criterion(output, target)loss.backward()optimizer.step() # Does the update
import torchimport torchvisionimport torchvision.transforms as transformstransform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])trainset = torchvision.datasets.CIFAR10(root='D:/University_Study/2021_Graduation_project/Code/data', train=True, download=True, transform=transform)trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)testset = torchvision.datasets.CIFAR10(root='D:/University_Study/2021_Graduation_project/Code/data', train=False, download=True, transform=transform)testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to D:/University_Study/2021_Graduation_project/Code/data\cifar-10-python.tar.gz100%|█████████▉| 170385408/170498071 [00:29<00:00, 2917300.60it/s]Extracting D:/University_Study/2021_Graduation_project/Code/data\cifar-10-python.tar.gz to D:/University_Study/2021_Graduation_project/Code/dataFiles already downloaded and verified170500096it [00:33, 5065648.50it/s]
import matplotlib.pyplot as pltimport numpy as np# functions to show an imagedef imshow(img): img = img / 2 + 0.5 # unnormalize npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0))) plt.show()# get some random training imagesdataiter = iter(trainloader)images, labels = dataiter.next()# show imagesimshow(torchvision.utils.make_grid(images))# print labelsprint(' '.join('%5s' % classes[labels[j]] for j in range(4)))
报错:
The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
解决办法:加入if __name__ == '__main__':
报错:
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
把E:\ProgramData\Anaconda3\Lib\site-packages\torch\lib\libiomp5md.dll
移到别的地方
结果:
cat frog plane deer
import torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimclass Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return xif __name__ == '__main__': net = Net() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) for epoch in range(2): # loop over the dataset multiple times running_loss = 0.0 for i, data in enumerate(trainloader, 0): # get the inputs inputs, labels = data # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # print statistics running_loss += loss.item() if i % 2000 == 1999: # print every 2000 mini-batches print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000)) running_loss = 0.0 print('Finished Training') correct = 0 total = 0 with torch.no_grad(): for data in testloader: images, labels = data outputs = net(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Accuracy of the network on the 10000 test images: %d %%' % ( 100 * correct / total)) class_correct = list(0. for i in range(10)) class_total = list(0. for i in range(10)) with torch.no_grad(): for data in testloader: images, labels = data outputs = net(images) _, predicted = torch.max(outputs, 1) c = (predicted == labels).squeeze() for i in range(4): label = labels[i] class_correct[label] += c[i].item() class_total[label] += 1 for i in range(10): print('Accuracy of %5s : %2d %%' % ( classes[i], 100 * class_correct[i] / class_total[i]))
结果:
[1, 2000] loss: 2.228[1, 4000] loss: 1.849[1, 6000] loss: 1.649[1, 8000] loss: 1.553[1, 10000] loss: 1.504[1, 12000] loss: 1.436[2, 2000] loss: 1.388[2, 4000] loss: 1.346[2, 6000] loss: 1.305[2, 8000] loss: 1.306[2, 10000] loss: 1.290[2, 12000] loss: 1.247Finished TrainingAccuracy of the network on the 10000 test images: 54 %Accuracy of plane : 68 %Accuracy of car : 58 %Accuracy of bird : 39 %Accuracy of cat : 14 %Accuracy of deer : 47 %Accuracy of dog : 75 %Accuracy of frog : 58 %Accuracy of horse : 56 %Accuracy of ship : 59 %Accuracy of truck : 68 %
大致了解了Tensor、自动微分、神经网络、PyTorch 图像分类,接下来自己找一个图像数据集自己动手编程实践
转载地址:http://ggvrn.baihongyu.com/