All Articles

PointAugment explained

TLDR;

The authors propose an auto-augmentation framework that can learn to produce augmentations specific for the sample.

PointAugment: an Auto-Augmentation Framework for Point Cloud Classification (CVPR2020)

3D point clouds are a relatively new object of study, and in recent years it is becoming common. The development of new cheaper lidars and improvement on the stereo and monocular depth estimation encouraged research in the field. But point cloud data is hard to obtain and difficult to label. For example, a standard benchmark for classification tasks ModelNet40 has only 12311 models of 40 categories. For images, in ImageNet, we have 20000 categories and more than 14 million images. So the data augmentation comes very handy and vital for point clouds.

You can refer to the original paper here.

Implementation (not published yet): GitHub

main picture
Main results from the paper

Motivation

Conventional augmentation strategies are perturbing, scaling, and rotating the input randomly in a small range. Similar approaches work adequately for 2D data, but in 3D, it is not sufficient. We are interested in shape transformation and point displacement because this is what determines object. Conventional strategies so far doesn’t change it. For instance, if we rotate points of a sphere, it is still an identical sphere. The paper is using an adversarial learning strategy to tackle the problem.


Idea

The concept includes the reinforcement learning approach to generate samples. The input sample goes to both augmentor and classifier, and the augmentor is getting feedback on how well-fitting was the sample.

overview of the network
Simplified view on the procedure

Augmentor works in this procedure:

  1. Augmentor gets the point cloud of the sample;
  2. Computes the per-point features;
  3. Then applies sample-specific augmentation regression, using:

    • Shape-wise regression to produce transformation, getting linear matrix 3×3 that gives shear/scale/rotation;
    • Point-wise regression to produce displacement for each point;
  4. The sample is multiplied with the linear matrix, and displacement is added.
augmentor
Structure of the augmentor and classifier

Also, on step 3, the Gaussian noise is added to both of the procedures to more diverse transformations.

Augmentor loss

To maximize the learning Augmented sample should be more challenging than the original and should not lose shape distinctiveness from the initial sample. To accomplish this, the authors maximized the difference between the losses of an augmented and non-augmented sample.

LA=exp[(L(P)L(P))]\mathcal{L}_{\mathcal{A}}=\exp \left[-\left(L\left(\mathcal{P}^{\prime}\right)-L(\mathcal{P})\right)\right]

They introduced parameter ρ\rho to restrict the magnitude of how the augmented sample is different from the original one.

LA=1.0exp[L(P)ρL(P)]\mathcal{L}_{\mathcal{A}}=\left|1.0-\exp \left[L\left(\mathcal{P}^{\prime}\right)-\rho L(\mathcal{P})\right]\right|

This parameter should be more than 1, and because the classifier might be fragile at first, they increase it dynamically, which depends on the prediction probability of classifier. So the more confident the classifier, the harder samples classifier would get from the augmentor.

And to ensure that the object should still be classified correctly, the authors added the classification loss with additional hyperparameter λ\lambda to control the relative importance.

LA=L(P)+λ1.0exp(L(P)ρL(P))\mathcal{L}_{\mathcal{A}}=L\left(\mathcal{P}^{\prime}\right)+\lambda\left|1.0-\exp \left(L\left(\mathcal{P}^{\prime}\right)-\rho L(\mathcal{P})\right)\right|

Bigger λ\lambda leads to stronger augmentations, and the authors used it equal to 1.

Classifier loss

Additionally, the authors added the regularization term, which penalizes differences in extracted features F\boldsymbol{F} from augmented and true samples, which means that objects should have a close representation in feature space. Hyperparameter γ=10\gamma = 10 used to balance this importance.

LC=L(P)+L(P)+γFgFg2\mathcal{L}_{\mathcal{C}}=L\left(\mathcal{P}^{\prime}\right)+L(\mathcal{P})+\gamma\left\|\boldsymbol{F}_{g}-\boldsymbol{F}_{g^{\prime}}\right\|_{2}

Results

All the classifiers with new augmentation procedure benefit highly in accuracy, compared to the classifiers with a conventional augmentation procedure. And being more accurate on imbalanced SR16 dataset.

results accuracy

And in the task of shape retrieval on the MN40 dataset, mAP for various methods benefit significantly, improving it up to 6.4%. Authors plan to add more to their work and explore tasks such as part segmentation, semantic segmentation, and object detection using their framework.

results mAP

Check their GitHub for updates.