1. 首页
  2. 开发者
  3. 机器学习

机器学习进阶笔记之四 | 深入理解GoogLeNet

机器学习进阶笔记之四 | 深入理解GoogLeNet

引言

 

  TensorFlow是Google基于DistBelief进行研发的第二代人工智能学习系统,被广泛用于语音识别或图像识别等多项机器深度学习领域。其命名来源于本身的运行原理。Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,TensorFlow代表着张量从图象的一端流动到另一端计算过程,是将复杂的数据结构传输至人工智能神经网中进行分析和处理的过程。

 

  TensorFlow完全开源,任何人都可以使用。可在小到一部智能手机、大到数千台数据中心服务器的各种设备上运行。

 

  『机器学习进阶笔记』系列将深入解析TensorFlow系统的技术实践,从零开始,由浅入深,与大家一起走上机器学习的进阶之路。

 

  GoogLeNet是ILSVRC 2014的冠军,主要是直径经典的LeNet-5算法,主要是Google的team成员完成,paper见Going Deeper with Convolutions.相关工作主要包括LeNet-5、Gabor filters、Network-in-Network.Network-in-Network改进了传统的CNN网络,采用少量的参数就轻松地击败了AlexNet网络,使用Network-in-Network的模型最后大小约为29MNetwork-in-Network caffe model.GoogLeNet借鉴了Network-in-Network的思想,下面会详细讲述下。

 

Network-in-Network

机器学习进阶笔记之四 | 深入理解GoogLeNet

  左边是我们CNN的线性卷积层,一般来说线性卷积层用来提取线性可分的特征,但所提取的特征高度非线性时,我们需要更加多的filters来提取各种潜在的特征,这样就存在一个问题,filters太多,导致网络参数太多,网络过于复杂对于计算压力太大。

文章主要从两个方法来做了一些改良:1,卷积层的改进:MLPconv,在每个local部分进行比传统卷积层复杂的计算,如上图右,提高每一层卷积层对于复杂特征的识别能力,这里举个不恰当的例子,传统的CNN网络,每一层的卷积层相当于一个只会做单一任务,你必须要增加海量的filters来达到完成特定量类型的任务,而MLPconv的每层conv有更加大的能力,每一层能够做多种不同类型的任务,在选择filters时只需要很少量的部分;2,采用全局均值池化来解决传统CNN网络中最后全连接层参数过于复杂的特点,而且全连接会造成网络的泛化能力差,Alexnet中有提高使用dropout来提高网络的泛化能力。

机器学习进阶笔记之四 | 深入理解GoogLeNet

  最后作者设计了一个4层的Network-in-network+全局均值池化层来做imagenet的分类问题.

  1.  class NiN(Network):
  2.      def setup(self):
  3.          (self.feed(‘data’)
  4.               .conv(11, 11, 96, 4, 4, padding=’VALID’, name=’conv1′)
  5.               .conv(1, 1, 96, 1, 1, name=’cccp1′)
  6.               .conv(1, 1, 96, 1, 1, name=’cccp2′)
  7.               .max_pool(3, 3, 2, 2, name=’pool1′)
  8.               .conv(5, 5, 256, 1, 1, name=’conv2′)
  9.               .conv(1, 1, 256, 1, 1, name=’cccp3′)
  10.               .conv(1, 1, 256, 1, 1, name=’cccp4′)
  11.               .max_pool(3, 3, 2, 2, padding=’VALID’, name=’pool2′)
  12.               .conv(3, 3, 384, 1, 1, name=’conv3′)
  13.               .conv(1, 1, 384, 1, 1, name=’cccp5′)
  14.               .conv(1, 1, 384, 1, 1, name=’cccp6′)
  15.               .max_pool(3, 3, 2, 2, padding=’VALID’, name=’pool3′)
  16.               .conv(3, 3, 1024, 1, 1, name=’conv4-1024′)
  17.               .conv(1, 1, 1024, 1, 1, name=’cccp7-1024′)
  18.               .conv(1, 1, 1000, 1, 1, name=’cccp8-1024′)
  19.               .avg_pool(6, 6, 1, 1, padding=’VALID’, name=’pool4′)
  20.               .softmax(name=’prob’))

   网络基本结果如上,代码见GitHub – ethereon/caffe-tensorflow: Caffe models in TensorFlow.

   这里因为我最近工作变动的问题,没有了机器来跑一篇,也无法画下基本的网络结构图,之后我会补上。这里指的提出的是中间cccp1和ccp2(cross channel pooling)等价于1*1kernel大小的卷积层。caffe中NIN的实现如下:

 

  1.  name: “nin_imagenet”
  2.  layers {
  3.    top: “data”
  4.    top: “label”
  5.    name: “data”
  6.    type: DATA
  7.    data_param {
  8.      source: “/home/linmin/IMAGENET-LMDB/imagenet-train-lmdb”
  9.      backend: LMDB
  10.      batch_size: 64
  11.    }
  12.    transform_param {
  13.      crop_size: 224
  14.      mirror: true
  15.      mean_file: “/home/linmin/IMAGENET-LMDB/imagenet-train-mean”
  16.    }
  17.    include: { phase: TRAIN }
  18.  }
  19.  layers {
  20.    top: “data”
  21.    top: “label”
  22.    name: “data”
  23.    type: DATA
  24.    data_param {
  25.      source: “/home/linmin/IMAGENET-LMDB/imagenet-val-lmdb”
  26.      backend: LMDB
  27.      batch_size: 89
  28.    }
  29.    transform_param {
  30.      crop_size: 224
  31.      mirror: false
  32.      mean_file: “/home/linmin/IMAGENET-LMDB/imagenet-train-mean”
  33.    }
  34.    include: { phase: TEST }
  35.  }
  36.  layers {
  37.    bottom: “data”
  38.    top: “conv1”
  39.    name: “conv1”
  40.    type: CONVOLUTION
  41.    blobs_lr: 1
  42.    blobs_lr: 2
  43.    weight_decay: 1
  44.    weight_decay: 0
  45.    convolution_param {
  46.      num_output: 96
  47.      kernel_size: 11
  48.      stride: 4
  49.      weight_filler {
  50.        type: “gaussian”
  51.        mean: 0
  52.        std: 0.01
  53.      }
  54.      bias_filler {
  55.        type: “constant”
  56.        value: 0
  57.      }
  58.    }
  59.  }
  60.  layers {
  61.    bottom: “conv1”
  62.    top: “conv1”
  63.    name: “relu0”
  64.    type: RELU
  65.  }
  66.  layers {
  67.    bottom: “conv1”
  68.    top: “cccp1”
  69.    name: “cccp1”
  70.    type: CONVOLUTION
  71.    blobs_lr: 1
  72.    blobs_lr: 2
  73.    weight_decay: 1
  74.    weight_decay: 0
  75.    convolution_param {
  76.      num_output: 96
  77.      kernel_size: 1
  78.      stride: 1
  79.      weight_filler {
  80.        type: “gaussian”
  81.        mean: 0
  82.        std: 0.05
  83.      }
  84.      bias_filler {
  85.        type: “constant”
  86.        value: 0
  87.      }
  88.    }
  89.  }
  90.  layers {
  91.    bottom: “cccp1”
  92.    top: “cccp1”
  93.    name: “relu1”
  94.    type: RELU
  95.  }
  96.  layers {
  97.    bottom: “cccp1”
  98.    top: “cccp2”
  99.    name: “cccp2”
  100.    type: CONVOLUTION
  101.    blobs_lr: 1
  102.    blobs_lr: 2
  103.    weight_decay: 1
  104.    weight_decay: 0
  105.    convolution_param {
  106.      num_output: 96
  107.      kernel_size: 1
  108.      stride: 1
  109.      weight_filler {
  110.        type: “gaussian”
  111.        mean: 0
  112.        std: 0.05
  113.      }
  114.      bias_filler {
  115.        type: “constant”
  116.        value: 0
  117.      }
  118.    }
  119.  }
  120.  layers {
  121.    bottom: “cccp2”
  122.    top: “cccp2”
  123.    name: “relu2”
  124.    type: RELU
  125.  }
  126.  layers {
  127.    bottom: “cccp2”
  128.    top: “pool0”
  129.    name: “pool0”
  130.    type: POOLING
  131.    pooling_param {
  132.      pool: MAX
  133.      kernel_size: 3
  134.      stride: 2
  135.    }
  136.  }
  137.  layers {
  138.    bottom: “pool0”
  139.    top: “conv2”
  140.    name: “conv2”
  141.    type: CONVOLUTION
  142.    blobs_lr: 1
  143.    blobs_lr: 2
  144.    weight_decay: 1
  145.    weight_decay: 0
  146.    convolution_param {
  147.      num_output: 256
  148.      pad: 2
  149.      kernel_size: 5
  150.      stride: 1
  151.      weight_filler {
  152.        type: “gaussian”
  153.        mean: 0
  154.        std: 0.05
  155.      }
  156.      bias_filler {
  157.        type: “constant”
  158.        value: 0
  159.      }
  160.    }
  161.  }
  162.  layers {
  163.    bottom: “conv2”
  164.    top: “conv2”
  165.    name: “relu3”
  166.    type: RELU
  167.  }
  168.  layers {
  169.    bottom: “conv2”
  170.    top: “cccp3”
  171.    name: “cccp3”
  172.    type: CONVOLUTION
  173.    blobs_lr: 1
  174.    blobs_lr: 2
  175.    weight_decay: 1
  176.    weight_decay: 0
  177.    convolution_param {
  178.      num_output: 256
  179.      kernel_size: 1
  180.      stride: 1
  181.      weight_filler {
  182.        type: “gaussian”
  183.        mean: 0
  184.        std: 0.05
  185.      }
  186.      bias_filler {
  187.        type: “constant”
  188.        value: 0
  189.      }
  190.    }
  191.  }
  192.  layers {
  193.    bottom: “cccp3”
  194.    top: “cccp3”
  195.    name: “relu5”
  196.    type: RELU
  197.  }
  198.  layers {
  199.    bottom: “cccp3”
  200.    top: “cccp4”
  201.    name: “cccp4”
  202.    type: CONVOLUTION
  203.    blobs_lr: 1
  204.    blobs_lr: 2
  205.    weight_decay: 1
  206.    weight_decay: 0
  207.    convolution_param {
  208.      num_output: 256
  209.      kernel_size: 1
  210.      stride: 1
  211.      weight_filler {
  212.        type: “gaussian”
  213.        mean: 0
  214.        std: 0.05
  215.      }
  216.      bias_filler {
  217.        type: “constant”
  218.        value: 0
  219.      }
  220.    }
  221.  }
  222.  layers {
  223.    bottom: “cccp4”
  224.    top: “cccp4”
  225.    name: “relu6”
  226.    type: RELU
  227.  }
  228.  layers {
  229.    bottom: “cccp4”
  230.    top: “pool2”
  231.    name: “pool2”
  232.    type: POOLING
  233.    pooling_param {
  234.      pool: MAX
  235.      kernel_size: 3
  236.      stride: 2
  237.    }
  238.  }
  239.  layers {
  240.    bottom: “pool2”
  241.    top: “conv3”
  242.    name: “conv3”
  243.    type: CONVOLUTION
  244.    blobs_lr: 1
  245.    blobs_lr: 2
  246.    weight_decay: 1
  247.    weight_decay: 0
  248.    convolution_param {
  249.      num_output: 384
  250.      pad: 1
  251.      kernel_size: 3
  252.      stride: 1
  253.      weight_filler {
  254.        type: “gaussian”
  255.        mean: 0
  256.        std: 0.01
  257.      }
  258.      bias_filler {
  259.        type: “constant”
  260.        value: 0
  261.      }
  262.    }
  263.  }
  264.  layers {
  265.    bottom: “conv3”
  266.    top: “conv3”
  267.    name: “relu7”
  268.    type: RELU
  269.  }
  270.  layers {
  271.    bottom: “conv3”
  272.    top: “cccp5”
  273.    name: “cccp5”
  274.    type: CONVOLUTION
  275.    blobs_lr: 1
  276.    blobs_lr: 2
  277.    weight_decay: 1
  278.    weight_decay: 0
  279.    convolution_param {
  280.      num_output: 384
  281.      kernel_size: 1
  282.      stride: 1
  283.      weight_filler {
  284.        type: “gaussian”
  285.        mean: 0
  286.        std: 0.05
  287.      }
  288.      bias_filler {
  289.        type: “constant”
  290.        value: 0
  291.      }
  292.    }
  293.  }
  294.  layers {
  295.    bottom: “cccp5”
  296.    top: “cccp5”
  297.    name: “relu8”
  298.    type: RELU
  299.  }
  300.  layers {
  301.    bottom: “cccp5”
  302.    top: “cccp6”
  303.    name: “cccp6”
  304.    type: CONVOLUTION
  305.    blobs_lr: 1
  306.    blobs_lr: 2
  307.    weight_decay: 1
  308.    weight_decay: 0
  309.    convolution_param {
  310.      num_output: 384
  311.      kernel_size: 1
  312.      stride: 1
  313.      weight_filler {
  314.        type: “gaussian”
  315.        mean: 0
  316.        std: 0.05
  317.      }
  318.      bias_filler {
  319.        type: “constant”
  320.        value: 0
  321.      }
  322.    }
  323.  }
  324.  layers {
  325.    bottom: “cccp6”
  326.    top: “cccp6”
  327.    name: “relu9”
  328.    type: RELU
  329.  }
  330.  layers {
  331.    bottom: “cccp6”
  332.    top: “pool3”
  333.    name: “pool3”
  334.    type: POOLING
  335.    pooling_param {
  336.      pool: MAX
  337.      kernel_size: 3
  338.      stride: 2
  339.    }
  340.  }
  341.  layers {
  342.    bottom: “pool3”
  343.    top: “pool3”
  344.    name: “drop”
  345.    type: DROPOUT
  346.    dropout_param {
  347.      dropout_ratio: 0.5
  348.    }
  349.  }
  350.  layers {
  351.    bottom: “pool3”
  352.    top: “conv4”
  353.    name: “conv4-1024”
  354.    type: CONVOLUTION
  355.    blobs_lr: 1
  356.    blobs_lr: 2
  357.    weight_decay: 1
  358.    weight_decay: 0
  359.    convolution_param {
  360.      num_output: 1024
  361.      pad: 1
  362.      kernel_size: 3
  363.      stride: 1
  364.      weight_filler {
  365.        type: “gaussian”
  366.        mean: 0
  367.        std: 0.05
  368.      }
  369.      bias_filler {
  370.        type: “constant”
  371.        value: 0
  372.      }
  373.    }
  374.  }
  375.  layers {
  376.    bottom: “conv4”
  377.    top: “conv4”
  378.    name: “relu10”
  379.    type: RELU
  380.  }
  381.  layers {
  382.    bottom: “conv4”
  383.    top: “cccp7”
  384.    name: “cccp7-1024”
  385.    type: CONVOLUTION
  386.    blobs_lr: 1
  387.    blobs_lr: 2
  388.    weight_decay: 1
  389.    weight_decay: 0
  390.    convolution_param {
  391.      num_output: 1024
  392.      kernel_size: 1
  393.      stride: 1
  394.      weight_filler {
  395.        type: “gaussian”
  396.        mean: 0
  397.        std: 0.05
  398.      }
  399.      bias_filler {
  400.        type: “constant”
  401.        value: 0
  402.      }
  403.    }
  404.  }
  405.  layers {
  406.    bottom: “cccp7”
  407.    top: “cccp7”
  408.    name: “relu11”
  409.    type: RELU
  410.  }
  411.  layers {
  412.    bottom: “cccp7”
  413.    top: “cccp8”
  414.    name: “cccp8-1024”
  415.    type: CONVOLUTION
  416.    blobs_lr: 1
  417.    blobs_lr: 2
  418.    weight_decay: 1
  419.    weight_decay: 0
  420.    convolution_param {
  421.      num_output: 1000
  422.      kernel_size: 1
  423.      stride: 1
  424.      weight_filler {
  425.        type: “gaussian”
  426.        mean: 0
  427.        std: 0.01
  428.      }
  429.      bias_filler {
  430.        type: “constant”
  431.        value: 0
  432.      }
  433.    }
  434.  }
  435.  layers {
  436.    bottom: “cccp8”
  437.    top: “cccp8”
  438.    name: “relu12”
  439.    type: RELU
  440.  }
  441.  layers {
  442.    bottom: “cccp8”
  443.    top: “pool4”
  444.    name: “pool4”
  445.    type: POOLING
  446.    pooling_param {
  447.      pool: AVE
  448.      kernel_size: 6
  449.      stride: 1
  450.    }
  451.  }
  452.  layers {
  453.    name: “accuracy”
  454.    type: ACCURACY
  455.    bottom: “pool4”
  456.    bottom: “label”
  457.    top: “accuracy”
  458.    include: { phase: TEST }
  459.  }
  460.  layers {
  461.    bottom: “pool4”
  462.    bottom: “label”
  463.    name: “loss”
  464.    type: SOFTMAX_LOSS
  465.    include: { phase: TRAIN }
  466.  }

   NIN的提出其实也可以认为我们加深了网络的深度,通过加深网络深度(增加单个NIN的特征表示能力)以及将原先全连接层变为aver_pool层,大大减少了原先需要的filters数,减少了model的参数。paper中实验证明达到Alexnet相同的性能,最终model大小仅为29M。

 

  理解NIN之后,再来看GoogLeNet就不会有不明所理的感觉。

GoogLeNet

 

痛点

  • 越大的CNN网络,有更大的model参数,也需要更多的计算力支持,并且由于模型过于复杂会过拟合;
  • 在CNN中,网络的层数的增加会伴随着需求计算资源的增加;
  • 稀疏的network是可以接受,但是稀疏的数据结构通常在计算时效率很低

 

Inception module

机器学习进阶笔记之四 | 深入理解GoogLeNet 

  Inception module的提出主要考虑多个不同size的卷积核能够hold图像当中不同cluster的信息,为方便计算,paper中分别使用1*1,3*3,5*5,同时加入3*3 max pooling模块。

 

  然而这里存在一个很大的计算隐患,每一层Inception module的输出的filters将是分支所有filters数量的综合,经过多层之后,最终model的数量将会变得巨大,naive的inception会对计算资源有更大的依赖。

 

  前面我们有提到Network-in-Network模型,1*1的模型能够有效进行降维(使用更少的来表达尽可能多的信息),所以文章提出了”Inception module with dimension reduction”,在不损失模型特征表示能力的前提下,尽量减少filters的数量,达到减少model复杂度的目的:

机器学习进阶笔记之四 | 深入理解GoogLeNet

Overall of GoogLeNet

机器学习进阶笔记之四 | 深入理解GoogLeNet

  在tensorflow构造GoogLeNet基本的代码:

 

  from kaffe.tensorflow import Network

 

  class GoogleNet(Network):

 

  1.  def setup(self):
  2.      (self.feed(‘data’)
  3.           .conv(7, 7, 64, 2, 2, name=’conv1_7x7_s2′)
  4.           .max_pool(3, 3, 2, 2, name=’pool1_3x3_s2′)
  5.           .lrn(2, 2e-05, 0.75, name=’pool1_norm1′)
  6.           .conv(1, 1, 64, 1, 1, name=’conv2_3x3_reduce’)
  7.           .conv(3, 3, 192, 1, 1, name=’conv2_3x3′)
  8.           .lrn(2, 2e-05, 0.75, name=’conv2_norm2′)
  9.           .max_pool(3, 3, 2, 2, name=’pool2_3x3_s2′)
  10.           .conv(1, 1, 64, 1, 1, name=’inception_3a_1x1′))

  11.      (self.feed(‘pool2_3x3_s2’)
  12.           .conv(1, 1, 96, 1, 1, name=’inception_3a_3x3_reduce’)
  13.           .conv(3, 3, 128, 1, 1, name=’inception_3a_3x3′))

  14.      (self.feed(‘pool2_3x3_s2’)
  15.           .conv(1, 1, 16, 1, 1, name=’inception_3a_5x5_reduce’)
  16.           .conv(5, 5, 32, 1, 1, name=’inception_3a_5x5′))

  17.      (self.feed(‘pool2_3x3_s2’)
  18.           .max_pool(3, 3, 1, 1, name=’inception_3a_pool’)
  19.           .conv(1, 1, 32, 1, 1, name=’inception_3a_pool_proj’))

  20.      (self.feed(‘inception_3a_1x1’,
  21.                 ‘inception_3a_3x3’,
  22.                 ‘inception_3a_5x5’,
  23.                 ‘inception_3a_pool_proj’)
  24.           .concat(3, name=’inception_3a_output’)
  25.           .conv(1, 1, 128, 1, 1, name=’inception_3b_1x1′))

  26.      (self.feed(‘inception_3a_output’)
  27.           .conv(1, 1, 128, 1, 1, name=’inception_3b_3x3_reduce’)
  28.           .conv(3, 3, 192, 1, 1, name=’inception_3b_3x3′))

  29.      (self.feed(‘inception_3a_output’)
  30.           .conv(1, 1, 32, 1, 1, name=’inception_3b_5x5_reduce’)
  31.           .conv(5, 5, 96, 1, 1, name=’inception_3b_5x5′))

  32.      (self.feed(‘inception_3a_output’)
  33.           .max_pool(3, 3, 1, 1, name=’inception_3b_pool’)
  34.           .conv(1, 1, 64, 1, 1, name=’inception_3b_pool_proj’))

  35.      (self.feed(‘inception_3b_1x1’,
  36.                 ‘inception_3b_3x3’,
  37.                 ‘inception_3b_5x5’,
  38.                 ‘inception_3b_pool_proj’)
  39.           .concat(3, name=’inception_3b_output’)
  40.           .max_pool(3, 3, 2, 2, name=’pool3_3x3_s2′)
  41.           .conv(1, 1, 192, 1, 1, name=’inception_4a_1x1′))

  42.      (self.feed(‘pool3_3x3_s2’)
  43.           .conv(1, 1, 96, 1, 1, name=’inception_4a_3x3_reduce’)
  44.           .conv(3, 3, 208, 1, 1, name=’inception_4a_3x3′))

  45.      (self.feed(‘pool3_3x3_s2’)
  46.           .conv(1, 1, 16, 1, 1, name=’inception_4a_5x5_reduce’)
  47.           .conv(5, 5, 48, 1, 1, name=’inception_4a_5x5′))

  48.      (self.feed(‘pool3_3x3_s2’)
  49.           .max_pool(3, 3, 1, 1, name=’inception_4a_pool’)
  50.           .conv(1, 1, 64, 1, 1, name=’inception_4a_pool_proj’))

  51.      (self.feed(‘inception_4a_1x1’,
  52.                 ‘inception_4a_3x3’,
  53.                 ‘inception_4a_5x5’,
  54.                 ‘inception_4a_pool_proj’)
  55.           .concat(3, name=’inception_4a_output’)
  56.           .conv(1, 1, 160, 1, 1, name=’inception_4b_1x1′))

  57.      (self.feed(‘inception_4a_output’)
  58.           .conv(1, 1, 112, 1, 1, name=’inception_4b_3x3_reduce’)
  59.           .conv(3, 3, 224, 1, 1, name=’inception_4b_3x3′))

  60.      (self.feed(‘inception_4a_output’)
  61.           .conv(1, 1, 24, 1, 1, name=’inception_4b_5x5_reduce’)
  62.           .conv(5, 5, 64, 1, 1, name=’inception_4b_5x5′))

  63.      (self.feed(‘inception_4a_output’)
  64.           .max_pool(3, 3, 1, 1, name=’inception_4b_pool’)
  65.           .conv(1, 1, 64, 1, 1, name=’inception_4b_pool_proj’))

  66.      (self.feed(‘inception_4b_1x1’,
  67.                 ‘inception_4b_3x3’,
  68.                 ‘inception_4b_5x5’,
  69.                 ‘inception_4b_pool_proj’)
  70.           .concat(3, name=’inception_4b_output’)
  71.           .conv(1, 1, 128, 1, 1, name=’inception_4c_1x1′))

  72.      (self.feed(‘inception_4b_output’)
  73.           .conv(1, 1, 128, 1, 1, name=’inception_4c_3x3_reduce’)
  74.           .conv(3, 3, 256, 1, 1, name=’inception_4c_3x3′))

  75.      (self.feed(‘inception_4b_output’)
  76.           .conv(1, 1, 24, 1, 1, name=’inception_4c_5x5_reduce’)
  77.           .conv(5, 5, 64, 1, 1, name=’inception_4c_5x5′))

  78.      (self.feed(‘inception_4b_output’)
  79.           .max_pool(3, 3, 1, 1, name=’inception_4c_pool’)
  80.           .conv(1, 1, 64, 1, 1, name=’inception_4c_pool_proj’))

  81.      (self.feed(‘inception_4c_1x1’,
  82.                 ‘inception_4c_3x3’,
  83.                 ‘inception_4c_5x5’,
  84.                 ‘inception_4c_pool_proj’)
  85.           .concat(3, name=’inception_4c_output’)
  86.           .conv(1, 1, 112, 1, 1, name=’inception_4d_1x1′))

  87.      (self.feed(‘inception_4c_output’)
  88.           .conv(1, 1, 144, 1, 1, name=’inception_4d_3x3_reduce’)
  89.           .conv(3, 3, 288, 1, 1, name=’inception_4d_3x3′))

  90.      (self.feed(‘inception_4c_output’)
  91.           .conv(1, 1, 32, 1, 1, name=’inception_4d_5x5_reduce’)
  92.           .conv(5, 5, 64, 1, 1, name=’inception_4d_5x5′))

  93.      (self.feed(‘inception_4c_output’)
  94.           .max_pool(3, 3, 1, 1, name=’inception_4d_pool’)
  95.           .conv(1, 1, 64, 1, 1, name=’inception_4d_pool_proj’))

  96.      (self.feed(‘inception_4d_1x1’,
  97.                 ‘inception_4d_3x3’,
  98.                 ‘inception_4d_5x5’,
  99.                 ‘inception_4d_pool_proj’)
  100.           .concat(3, name=’inception_4d_output’)
  101.           .conv(1, 1, 256, 1, 1, name=’inception_4e_1x1′))

  102.      (self.feed(‘inception_4d_output’)
  103.           .conv(1, 1, 160, 1, 1, name=’inception_4e_3x3_reduce’)
  104.           .conv(3, 3, 320, 1, 1, name=’inception_4e_3x3′))

  105.      (self.feed(‘inception_4d_output’)
  106.           .conv(1, 1, 32, 1, 1, name=’inception_4e_5x5_reduce’)
  107.           .conv(5, 5, 128, 1, 1, name=’inception_4e_5x5′))

  108.      (self.feed(‘inception_4d_output’)
  109.           .max_pool(3, 3, 1, 1, name=’inception_4e_pool’)
  110.           .conv(1, 1, 128, 1, 1, name=’inception_4e_pool_proj’))

  111.      (self.feed(‘inception_4e_1x1’,
  112.                 ‘inception_4e_3x3’,
  113.                 ‘inception_4e_5x5’,
  114.                 ‘inception_4e_pool_proj’)
  115.           .concat(3, name=’inception_4e_output’)
  116.           .max_pool(3, 3, 2, 2, name=’pool4_3x3_s2′)
  117.           .conv(1, 1, 256, 1, 1, name=’inception_5a_1x1′))

  118.      (self.feed(‘pool4_3x3_s2’)
  119.           .conv(1, 1, 160, 1, 1, name=’inception_5a_3x3_reduce’)
  120.           .conv(3, 3, 320, 1, 1, name=’inception_5a_3x3′))

  121.      (self.feed(‘pool4_3x3_s2’)
  122.           .conv(1, 1, 32, 1, 1, name=’inception_5a_5x5_reduce’)
  123.           .conv(5, 5, 128, 1, 1, name=’inception_5a_5x5′))

  124.      (self.feed(‘pool4_3x3_s2’)
  125.           .max_pool(3, 3, 1, 1, name=’inception_5a_pool’)
  126.           .conv(1, 1, 128, 1, 1, name=’inception_5a_pool_proj’))

  127.      (self.feed(‘inception_5a_1x1’,
  128.                 ‘inception_5a_3x3’,
  129.                 ‘inception_5a_5x5’,
  130.                 ‘inception_5a_pool_proj’)
  131.           .concat(3, name=’inception_5a_output’)
  132.           .conv(1, 1, 384, 1, 1, name=’inception_5b_1x1′))

  133.      (self.feed(‘inception_5a_output’)
  134.           .conv(1, 1, 192, 1, 1, name=’inception_5b_3x3_reduce’)
  135.           .conv(3, 3, 384, 1, 1, name=’inception_5b_3x3′))

  136.      (self.feed(‘inception_5a_output’)
  137.           .conv(1, 1, 48, 1, 1, name=’inception_5b_5x5_reduce’)
  138.           .conv(5, 5, 128, 1, 1, name=’inception_5b_5x5′))

  139.      (self.feed(‘inception_5a_output’)
  140.           .max_pool(3, 3, 1, 1, name=’inception_5b_pool’)
  141.           .conv(1, 1, 128, 1, 1, name=’inception_5b_pool_proj’))

  142.      (self.feed(‘inception_5b_1x1’,
  143.                 ‘inception_5b_3x3’,
  144.                 ‘inception_5b_5x5’,
  145.                 ‘inception_5b_pool_proj’)
  146.           .concat(3, name=’inception_5b_output’)
  147.           .avg_pool(7, 7, 1, 1, padding=’VALID’, name=’pool5_7x7_s1′)
  148.           .fc(1000, relu=False, name=’loss3_classifier’)
  149.           .softmax(name=’prob’))

   代码在GitHub – ethereon/caffe-tensorflow: Caffe models in TensorFlow中,作者封装了一些基本的操作,了解网络结构之后,构造GoogLeNet很容易。之后等到新公司之后,我会试着在tflearn的基础上写下GoogLeNet的网络代码。

GoogLeNet on Tensorflow

 

  GoogLeNet为了实现方便,我用tflearn来重写了下,代码中和caffe model里面不一样的就是一些padding的位置,因为改的比较麻烦,必须保持inception部分的concat时要一致,我这里也不知道怎么修改pad的值(caffe prototxt),所以统一padding设定为same,具体代码如下:

  1.  # -*- coding: utf-8 -*-

  2.  “”” GoogLeNet.
  3.  Applying ‘GoogLeNet’ to Oxford’s 17 Category Flower Dataset classification task.
  4.  References:
  5.      – Szegedy, Christian, et al.
  6.      Going deeper with convolutions.
  7.      – 17 Category Flower Dataset. Maria-Elena Nilsback and Andrew Zisserman.
  8.  Links:
  9.      – [GoogLeNet Paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf)
  10.      – [Flower Dataset (17)](http://www.robots.ox.ac.uk/~vgg/data/flowers/17/)
  11.  “””

  12.  from __future__ import division, print_function, absolute_import

  13.  import tflearn
  14.  from tflearn.layers.core import input_data, dropout, fully_connected
  15.  from tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2d
  16.  from tflearn.layers.normalization import local_response_normalization
  17.  from tflearn.layers.merge_ops import merge
  18.  from tflearn.layers.estimator import regression

  19.  import tflearn.datasets.oxflower17 as oxflower17
  20.  X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227))


  21.  network = input_data(shape=[None, 227, 227, 3])
  22.  conv1_7_7 = conv_2d(network, 64, 7, strides=2, activation=’relu’, name = ‘conv1_7_7_s2’)
  23.  pool1_3_3 = max_pool_2d(conv1_7_7, 3,strides=2)
  24.  pool1_3_3 = local_response_normalization(pool1_3_3)
  25.  conv2_3_3_reduce = conv_2d(pool1_3_3, 64,1, activation=’relu’,name = ‘conv2_3_3_reduce’)
  26.  conv2_3_3 = conv_2d(conv2_3_3_reduce, 192,3, activation=’relu’, name=’conv2_3_3′)
  27.  conv2_3_3 = local_response_normalization(conv2_3_3)
  28.  pool2_3_3 = max_pool_2d(conv2_3_3, kernel_size=3, strides=2, name=’pool2_3_3_s2′)
  29.  inception_3a_1_1 = conv_2d(pool2_3_3, 64, 1, activation=’relu’, name=’inception_3a_1_1′)
  30.  inception_3a_3_3_reduce = conv_2d(pool2_3_3, 96,1, activation=’relu’, name=’inception_3a_3_3_reduce’)
  31.  inception_3a_3_3 = conv_2d(inception_3a_3_3_reduce, 128,filter_size=3,  activation=’relu’, name = ‘inception_3a_3_3’)
  32.  inception_3a_5_5_reduce = conv_2d(pool2_3_3,16, filter_size=1,activation=’relu’, name =’inception_3a_5_5_reduce’ )
  33.  inception_3a_5_5 = conv_2d(inception_3a_5_5_reduce, 32, filter_size=5, activation=’relu’, name= ‘inception_3a_5_5’)
  34.  inception_3a_pool = max_pool_2d(pool2_3_3, kernel_size=3, strides=1, )
  35.  inception_3a_pool_1_1 = conv_2d(inception_3a_pool, 32, filter_size=1, activation=’relu’, name=’inception_3a_pool_1_1′)

  36.  # merge the inception_3a__
  37.  inception_3a_output = merge([inception_3a_1_1, inception_3a_3_3, inception_3a_5_5, inception_3a_pool_1_1], mode=’concat’, axis=3)

  38.  inception_3b_1_1 = conv_2d(inception_3a_output, 128,filter_size=1,activation=’relu’, name= ‘inception_3b_1_1’ )
  39.  inception_3b_3_3_reduce = conv_2d(inception_3a_output, 128, filter_size=1, activation=’relu’, name=’inception_3b_3_3_reduce’)
  40.  inception_3b_3_3 = conv_2d(inception_3b_3_3_reduce, 192, filter_size=3,  activation=’relu’,name=’inception_3b_3_3′)
  41.  inception_3b_5_5_reduce = conv_2d(inception_3a_output, 32, filter_size=1, activation=’relu’, name = ‘inception_3b_5_5_reduce’)
  42.  inception_3b_5_5 = conv_2d(inception_3b_5_5_reduce, 96, filter_size=5,  name = ‘inception_3b_5_5’)
  43.  inception_3b_pool = max_pool_2d(inception_3a_output, kernel_size=3, strides=1,  name=’inception_3b_pool’)
  44.  inception_3b_pool_1_1 = conv_2d(inception_3b_pool, 64, filter_size=1,activation=’relu’, name=’inception_3b_pool_1_1′)

  45.  #merge the inception_3b_*
  46.  inception_3b_output = merge([inception_3b_1_1, inception_3b_3_3, inception_3b_5_5, inception_3b_pool_1_1], mode=’concat’,axis=3,name=’inception_3b_output’)

  47.  pool3_3_3 = max_pool_2d(inception_3b_output, kernel_size=3, strides=2, name=’pool3_3_3′)
  48.  inception_4a_1_1 = conv_2d(pool3_3_3, 192, filter_size=1, activation=’relu’, name=’inception_4a_1_1′)
  49.  inception_4a_3_3_reduce = conv_2d(pool3_3_3, 96, filter_size=1, activation=’relu’, name=’inception_4a_3_3_reduce’)
  50.  inception_4a_3_3 = conv_2d(inception_4a_3_3_reduce, 208, filter_size=3,  activation=’relu’, name=’inception_4a_3_3′)
  51.  inception_4a_5_5_reduce = conv_2d(pool3_3_3, 16, filter_size=1, activation=’relu’, name=’inception_4a_5_5_reduce’)
  52.  inception_4a_5_5 = conv_2d(inception_4a_5_5_reduce, 48, filter_size=5,  activation=’relu’, name=’inception_4a_5_5′)
  53.  inception_4a_pool = max_pool_2d(pool3_3_3, kernel_size=3, strides=1,  name=’inception_4a_pool’)
  54.  inception_4a_pool_1_1 = conv_2d(inception_4a_pool, 64, filter_size=1, activation=’relu’, name=’inception_4a_pool_1_1′)

  55.  inception_4a_output = merge([inception_4a_1_1, inception_4a_3_3, inception_4a_5_5, inception_4a_pool_1_1], mode=’concat’, axis=3, name=’inception_4a_output’)


  56.  inception_4b_1_1 = conv_2d(inception_4a_output, 160, filter_size=1, activation=’relu’, name=’inception_4a_1_1′)
  57.  inception_4b_3_3_reduce = conv_2d(inception_4a_output, 112, filter_size=1, activation=’relu’, name=’inception_4b_3_3_reduce’)
  58.  inception_4b_3_3 = conv_2d(inception_4b_3_3_reduce, 224, filter_size=3, activation=’relu’, name=’inception_4b_3_3′)
  59.  inception_4b_5_5_reduce = conv_2d(inception_4a_output, 24, filter_size=1, activation=’relu’, name=’inception_4b_5_5_reduce’)
  60.  inception_4b_5_5 = conv_2d(inception_4b_5_5_reduce, 64, filter_size=5,  activation=’relu’, name=’inception_4b_5_5′)

  61.  inception_4b_pool = max_pool_2d(inception_4a_output, kernel_size=3, strides=1,  name=’inception_4b_pool’)
  62.  inception_4b_pool_1_1 = conv_2d(inception_4b_pool, 64, filter_size=1, activation=’relu’, name=’inception_4b_pool_1_1′)

  63.  inception_4b_output = merge([inception_4b_1_1, inception_4b_3_3, inception_4b_5_5, inception_4b_pool_1_1], mode=’concat’, axis=3, name=’inception_4b_output’)


  64.  inception_4c_1_1 = conv_2d(inception_4b_output, 128, filter_size=1, activation=’relu’,name=’inception_4c_1_1′)
  65.  inception_4c_3_3_reduce = conv_2d(inception_4b_output, 128, filter_size=1, activation=’relu’, name=’inception_4c_3_3_reduce’)
  66.  inception_4c_3_3 = conv_2d(inception_4c_3_3_reduce, 256,  filter_size=3, activation=’relu’, name=’inception_4c_3_3′)
  67.  inception_4c_5_5_reduce = conv_2d(inception_4b_output, 24, filter_size=1, activation=’relu’, name=’inception_4c_5_5_reduce’)
  68.  inception_4c_5_5 = conv_2d(inception_4c_5_5_reduce, 64,  filter_size=5, activation=’relu’, name=’inception_4c_5_5′)

  69.  inception_4c_pool = max_pool_2d(inception_4b_output, kernel_size=3, strides=1)
  70.  inception_4c_pool_1_1 = conv_2d(inception_4c_pool, 64, filter_size=1, activation=’relu’, name=’inception_4c_pool_1_1′)

  71.  inception_4c_output = merge([inception_4c_1_1, inception_4c_3_3, inception_4c_5_5, inception_4c_pool_1_1], mode=’concat’, axis=3,name=’inception_4c_output’)

  72.  inception_4d_1_1 = conv_2d(inception_4c_output, 112, filter_size=1, activation=’relu’, name=’inception_4d_1_1′)
  73.  inception_4d_3_3_reduce = conv_2d(inception_4c_output, 144, filter_size=1, activation=’relu’, name=’inception_4d_3_3_reduce’)
  74.  inception_4d_3_3 = conv_2d(inception_4d_3_3_reduce, 288, filter_size=3, activation=’relu’, name=’inception_4d_3_3′)
  75.  inception_4d_5_5_reduce = conv_2d(inception_4c_output, 32, filter_size=1, activation=’relu’, name=’inception_4d_5_5_reduce’)
  76.  inception_4d_5_5 = conv_2d(inception_4d_5_5_reduce, 64, filter_size=5,  activation=’relu’, name=’inception_4d_5_5′)
  77.  inception_4d_pool = max_pool_2d(inception_4c_output, kernel_size=3, strides=1,  name=’inception_4d_pool’)
  78.  inception_4d_pool_1_1 = conv_2d(inception_4d_pool, 64, filter_size=1, activation=’relu’, name=’inception_4d_pool_1_1′)

  79.  inception_4d_output = merge([inception_4d_1_1, inception_4d_3_3, inception_4d_5_5, inception_4d_pool_1_1], mode=’concat’, axis=3, name=’inception_4d_output’)

  80.  inception_4e_1_1 = conv_2d(inception_4d_output, 256, filter_size=1, activation=’relu’, name=’inception_4e_1_1′)
  81.  inception_4e_3_3_reduce = conv_2d(inception_4d_output, 160, filter_size=1, activation=’relu’, name=’inception_4e_3_3_reduce’)
  82.  inception_4e_3_3 = conv_2d(inception_4e_3_3_reduce, 320, filter_size=3, activation=’relu’, name=’inception_4e_3_3′)
  83.  inception_4e_5_5_reduce = conv_2d(inception_4d_output, 32, filter_size=1, activation=’relu’, name=’inception_4e_5_5_reduce’)
  84.  inception_4e_5_5 = conv_2d(inception_4e_5_5_reduce, 128,  filter_size=5, activation=’relu’, name=’inception_4e_5_5′)
  85.  inception_4e_pool = max_pool_2d(inception_4d_output, kernel_size=3, strides=1,  name=’inception_4e_pool’)
  86.  inception_4e_pool_1_1 = conv_2d(inception_4e_pool, 128, filter_size=1, activation=’relu’, name=’inception_4e_pool_1_1′)


  87.  inception_4e_output = merge([inception_4e_1_1, inception_4e_3_3, inception_4e_5_5,inception_4e_pool_1_1],axis=3, mode=’concat’)

  88.  pool4_3_3 = max_pool_2d(inception_4e_output, kernel_size=3, strides=2, name=’pool_3_3′)


  89.  inception_5a_1_1 = conv_2d(pool4_3_3, 256, filter_size=1, activation=’relu’, name=’inception_5a_1_1′)
  90.  inception_5a_3_3_reduce = conv_2d(pool4_3_3, 160, filter_size=1, activation=’relu’, name=’inception_5a_3_3_reduce’)
  91.  inception_5a_3_3 = conv_2d(inception_5a_3_3_reduce, 320, filter_size=3, activation=’relu’, name=’inception_5a_3_3′)
  92.  inception_5a_5_5_reduce = conv_2d(pool4_3_3, 32, filter_size=1, activation=’relu’, name=’inception_5a_5_5_reduce’)
  93.  inception_5a_5_5 = conv_2d(inception_5a_5_5_reduce, 128, filter_size=5,  activation=’relu’, name=’inception_5a_5_5′)
  94.  inception_5a_pool = max_pool_2d(pool4_3_3, kernel_size=3, strides=1,  name=’inception_5a_pool’)
  95.  inception_5a_pool_1_1 = conv_2d(inception_5a_pool, 128, filter_size=1,activation=’relu’, name=’inception_5a_pool_1_1′)

  96.  inception_5a_output = merge([inception_5a_1_1, inception_5a_3_3, inception_5a_5_5, inception_5a_pool_1_1], axis=3,mode=’concat’)


  97.  inception_5b_1_1 = conv_2d(inception_5a_output, 384, filter_size=1,activation=’relu’, name=’inception_5b_1_1′)
  98.  inception_5b_3_3_reduce = conv_2d(inception_5a_output, 192, filter_size=1, activation=’relu’, name=’inception_5b_3_3_reduce’)
  99.  inception_5b_3_3 = conv_2d(inception_5b_3_3_reduce, 384,  filter_size=3,activation=’relu’, name=’inception_5b_3_3′)
  100.  inception_5b_5_5_reduce = conv_2d(inception_5a_output, 48, filter_size=1, activation=’relu’, name=’inception_5b_5_5_reduce’)
  101.  inception_5b_5_5 = conv_2d(inception_5b_5_5_reduce,128, filter_size=5,  activation=’relu’, name=’inception_5b_5_5′ )
  102.  inception_5b_pool = max_pool_2d(inception_5a_output, kernel_size=3, strides=1,  name=’inception_5b_pool’)
  103.  inception_5b_pool_1_1 = conv_2d(inception_5b_pool, 128, filter_size=1, activation=’relu’, name=’inception_5b_pool_1_1′)
  104.  inception_5b_output = merge([inception_5b_1_1, inception_5b_3_3, inception_5b_5_5, inception_5b_pool_1_1], axis=3, mode=’concat’)

  105.  pool5_7_7 = avg_pool_2d(inception_5b_output, kernel_size=7, strides=1)
  106.  pool5_7_7 = dropout(pool5_7_7, 0.4)
  107.  loss = fully_connected(pool5_7_7, 17,activation=’softmax’)
  108.  network = regression(loss, optimizer=’momentum’,
  109.                       loss=’categorical_crossentropy’,
  110.                       learning_rate=0.001)
  111.  model = tflearn.DNN(network, checkpoint_path=’model_googlenet’,
  112.                      max_checkpoints=1, tensorboard_verbose=2)
  113.  model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,
  114.            show_metric=True, batch_size=64, snapshot_step=200,
  115.            snapshot_epoch=False, run_id=’googlenet_oxflowers17′)

      大家如果感兴趣,可以看看这部分的caffe model prototxt, 帮忙检查下是否有问题,代码我已经提交到tflearn的官方库了,add GoogLeNet(Inception) in Example,各位有tensorflow的直接安装下tflearn,看看是否能帮忙检查下是否有问题,我这里因为没有GPU的机器,跑的比较慢,TensorBoard的图如下,不像之前Alexnet那么明显(主要还是没有跑那么多epoch,这里在写入的时候发现主机上没有磁盘空间了,尴尬,然后从新写了restore来跑的,TensorBoard的图也貌似除了点问题, 好像每次载入都不太一样,但是从基本的log里面的东西来看,是逐步在收敛的,这里图也贴下看看吧)

机器学习进阶笔记之四 | 深入理解GoogLeNet

      网络结构,也无法从TensorBoard上直接download下来,我这里就一步步自己截的图(勉强看看吧),好傻:

机器学习进阶笔记之四 | 深入理解GoogLeNet

      为了方便,这里也贴出一些我自己保存的运行的log,能够很明显的看出收敛:

机器学习进阶笔记之四 | 深入理解GoogLeNet

机器学习进阶笔记之四 | 深入理解GoogLeNet

发布者: aihot,转转请注明出处:http://www.aiwuyun.net/archives/3938.html

发表评论

登录后才能评论

联系我们

在线咨询:点击这里给我发消息

邮件:admin@aiwuyun.net

工作时间:周一至周五,9:30-18:30,节假日休息