Where Are Pixels? – A Deep Learning Perspective

  1. Formation of Discrete Image
  2. Choices of Sampling Grid
  3. 2x Resize Operation
  4. Libraries
  5. Literature
  6. Choices of Origin
  7. Improvements in Detectron & Detectron2
  8. Box Regression Transform
  9. Flip Augmentation
  10. Anchor Generation
  11. RoIAlign
  12. Paste Mask
  13. Point-based Algorithms
  14. Summary

Technically, an image is a function that maps a continuous domain, e.g. a box [0,𝑋]×[0,𝑌][0,X]×[0,Y], to intensities such as (R, G, B). To store it on computer memory, an image is discretized to an array array[H][W], where each element array[i][j]is a pixel.

How does discretization work? How does a discrete pixel relate to the abstract notion of the underlying continuous image? These basic questions play an important role in computer graphics & computer vision algorithms.

This article discusses these low level details, and how they affect our CNN models and deep learning libraries. If you ever wonder which resize function to use or whether you should add/subtract 0.5 or 1 to some pixel coordinates, you may find answers here. Interestingly, these details have contributed to many accuracy improvements in Detectron and Detectron2.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Corresponding tweet for this thread:

Share link for this tweet.